MySQL++ is not “thread safe” in any meaningful sense. MySQL++ contains very little code that actively prevents trouble with threads, and all of it is optional. We have done some work in MySQL++ to make thread safety achievable, but it doesn’t come for free.
The main reason for this is that MySQL++ is generally I/O-bound, not processor-bound. That is, if your program’s bottleneck is MySQL++, the ultimate cause is usually the I/O overhead of using a client-server database. Doubling the number of threads will just let your program get back to waiting for I/O twice as fast. Since threads are evil and generally can’t help MySQL++, the only optional thread awareness features we turn on in the shipping version of MySQL++ are those few that have no practical negative consequences. Everything else is up to you, the programmer, to evaluate and enable as and when you need it.
We’re going to assume that you are reading this chapter because you find yourself needing to use threads for some other reason than to speed up MySQL access. Our purpose here is limited to setting down the rules for avoiding problems with MySQL++ in a multi-threaded program. We won’t go into the broader issues of thread safety outside the scope of MySQL++. You will need a grounding in threads in general to get the full value of this advice.
Before you can safely use MySQL++ with threads, there are several things you must do to get a thread-aware build:
Build MySQL++ itself with thread awareness turned on.
On Linux, Cygwin and Unix (OS X, *BSD, Solaris...),
pass the --enable-thread-check
flag to the configure
script. Beware, this
is only a request to the configure
script
to look for thread support on your system, not a requirement
to do or die: if the script doesn’t find what it needs
to do threading, MySQL++ will just get built without thread
support. See README-Unix.txt
for more
details.
On Windows, if you use the Visual C++ project files or the MinGW Makefile that comes with the MySQL++ distribution, threading is always turned on, due to the nature of Windows.
If you build MySQL++ in some other way, such as with Dev-Cpp (based on MinGW) you’re on your own to enable thread awareness.
Link your program to a thread-aware build of the MySQL C API library.
If you use a binary distribution of MySQL on
Unixy systems (including Cygwin) you usually get
two different versions of the MySQL C API library,
one with thread support and one without. These are
typically called libmysqlclient
and
libmysqlclient_r
, the latter being the
thread-safe one. (The “_r
”
means reentrant.)
If you’re using the Windows binary distribution
of MySQL, you should have only one version of the C
API library, which should be thread-aware. If you have
two, you probably just have separate debug and optimized
builds. See README-Visual-C++.txt
or
README-MinGW.txt
for details.
If you build MySQL from source, you might only get one version of the MySQL C API library, and it can have thread awareness or not, depending on your configuration choices.
Enable threading in your program’s build options.
This is different for every platform, but it’s usually the case that you don’t get thread-aware builds by default. Depending on the platform, you might need to change compiler options, linker options, or both. See your development environment’s documentation, or study how MySQL++ itself turns on thread-aware build options when requested.
The MySQL C API underpinning MySQL++ does not allow multiple concurrent queries on a single connection. You can run into this problem in a single-threaded program, too, which is why we cover the details elsewhere, in Section 3.16, “Concurrent Queries on a Connection”. It’s a thornier problem when using threads, though.
The simple fix is to just create a separarate Connection object for each thread that needs to make database queries. This works well if you have a small number of threads that need to make queries, and each thread uses its connection often enough that the server doesn’t time out waiting for queries.
If you have lots of threads or the frequency of queries is
low, the connection management overhead will be excessive. To avoid
that, we created the ConnectionPool
class. It manages a pool of Connection
objects like library books: a thread checks one out, uses
it, and then returns it to the pool as soon as it’s
done with it. This keeps the number of active connections
low. We suggest that you keep each connection’s
use limited to a single variable scope for RAII reasons;
we created a little helper called ScopedConnection to make that easy.
ConnectionPool
has three
methods that you need to override in a subclass to
make it concrete: create()
,
destroy()
, and
max_idle_time()
. These overrides let
the base class delegate operations it can’t successfully do
itself to its subclass. The ConnectionPool
can’t know how to create()
the Connection
objects, because that
depends on how your program gets login parameters, server
information, etc. ConnectionPool
also makes the subclass destroy()
the Connection
objects it created; it
could assume that they’re simply allocated on the heap
with new
, but it can’t be sure,
so the base class delegates destruction, too. Finally, the base
class can’t know which connection idle timeout policy
would make the most sense to the client, so it asks its subclass
via the max_idle_time()
method.
ConnectionPool
also allows you to
override release()
, if needed. For simple
uses, it’s not necessary to override this.
In designing your ConnectionPool
derivative, you might consider making it a Singleton,
since there should only be one pool in a program.
Another thing you might consider doing is passing a
ReconnectOption object to
Connection::set_option()
in your
create()
override before returning the
new Connection
pointer. This will cause
the underlying MySQL C API to try to reconnect to the database
server if a query fails because the connection was dropped
by the server. This can happen if the DB server is allowed to
restart out from under your application. In many applications,
this isn’t allowed, or if it does happen, you might want
your code to be able to detect it, so MySQL++ doesn’t set
this option for you automatically.
Here is an example showing how to use connection pools with threads:
#include "cmdline.h" #include "threads.h" #include <iostream> using namespace std; #if defined(HAVE_THREADS) // Define a concrete ConnectionPool derivative. Takes connection // parameters as inputs to its ctor, which it uses to create the // connections we're called upon to make. Note that we also declare // a global pointer to an object of this type, which we create soon // after startup; this should be a common usage pattern, as what use // are multiple pools? class SimpleConnectionPool : public mysqlpp::ConnectionPool { public: // The object's only constructor SimpleConnectionPool(mysqlpp::examples::CommandLine& cl) : conns_in_use_(0), db_(mysqlpp::examples::db_name), server_(cl.server()), user_(cl.user()), password_(cl.pass()) { } // The destructor. We _must_ call ConnectionPool::clear() here, // because our superclass can't do it for us. ~SimpleConnectionPool() { clear(); } // Do a simple form of in-use connection limiting: wait to return // a connection until there are a reasonably low number in use // already. Can't do this in create() because we're interested in // connections actually in use, not those created. Also note that // we keep our own count; ConnectionPool::size() isn't the same! mysqlpp::Connection* grab() { while (conns_in_use_ > 8) { cout.put('R'); cout.flush(); // indicate waiting for release sleep(1); } ++conns_in_use_; return mysqlpp::ConnectionPool::grab(); } // Other half of in-use conn count limit void release(const mysqlpp::Connection* pc) { mysqlpp::ConnectionPool::release(pc); --conns_in_use_; } protected: // Superclass overrides mysqlpp::Connection* create() { // Create connection using the parameters we were passed upon // creation. This could be something much more complex, but for // the purposes of the example, this suffices. cout.put('C'); cout.flush(); // indicate connection creation return new mysqlpp::Connection( db_.empty() ? 0 : db_.c_str(), server_.empty() ? 0 : server_.c_str(), user_.empty() ? 0 : user_.c_str(), password_.empty() ? "" : password_.c_str()); } void destroy(mysqlpp::Connection* cp) { // Our superclass can't know how we created the Connection, so // it delegates destruction to us, to be safe. cout.put('D'); cout.flush(); // indicate connection destruction delete cp; } unsigned int max_idle_time() { // Set our idle time at an example-friendly 3 seconds. A real // pool would return some fraction of the server's connection // idle timeout instead. return 3; } private: // Number of connections currently in use unsigned int conns_in_use_; // Our connection parameters std::string db_, server_, user_, password_; }; SimpleConnectionPool* poolptr = 0; static thread_return_t CALLBACK_SPECIFIER worker_thread(thread_arg_t running_flag) { // Ask the underlying C API to allocate any per-thread resources it // needs, in case it hasn't happened already. In this particular // program, it's almost guaranteed that the safe_grab() call below // will create a new connection the first time through, and thus // allocate these resources implicitly, but there's a nonzero chance // that this won't happen. Anyway, this is an example program, // meant to show good style, so we take the high road and ensure the // resources are allocated before we do any queries. mysqlpp::Connection::thread_start(); cout.put('S'); cout.flush(); // indicate thread started // Pull data from the sample table a bunch of times, releasing the // connection we use each time. for (size_t i = 0; i < 6; ++i) { // Go get a free connection from the pool, or create a new one // if there are no free conns yet. Uses safe_grab() to get a // connection from the pool that will be automatically returned // to the pool when this loop iteration finishes. mysqlpp::ScopedConnection cp(*poolptr, true); if (!cp) { cerr << "Failed to get a connection from the pool!" << endl; break; } // Pull a copy of the sample stock table and print a dot for // each row in the result set. mysqlpp::Query query(cp->query("select * from stock")); mysqlpp::StoreQueryResult res = query.store(); for (size_t j = 0; j < res.num_rows(); ++j) { cout.put('.'); } // Delay 1-4 seconds before doing it again. Because this can // delay longer than the idle timeout, we'll occasionally force // the creation of a new connection on the next loop. sleep(rand() % 4 + 1); } // Tell main() that this thread is no longer running *reinterpret_cast<bool*>(running_flag) = false; cout.put('E'); cout.flush(); // indicate thread ended // Release the per-thread resources before we exit mysqlpp::Connection::thread_end(); return 0; } #endif int main(int argc, char *argv[]) { #if defined(HAVE_THREADS) // Get database access parameters from command line mysqlpp::examples::CommandLine cmdline(argc, argv); if (!cmdline) { return 1; } // Create the pool and grab a connection. We do it partly to test // that the parameters are good before we start doing real work, and // partly because we need a Connection object to call thread_aware() // on to check that it's okay to start doing that real work. This // latter check should never fail on Windows, but will fail on most // other systems unless you take positive steps to build with thread // awareness turned on. See README-*.txt for your platform. poolptr = new SimpleConnectionPool(cmdline); try { mysqlpp::ScopedConnection cp(*poolptr, true); if (!cp->thread_aware()) { cerr << "MySQL++ wasn't built with thread awareness! " << argv[0] << " can't run without it." << endl; return 1; } } catch (mysqlpp::Exception& e) { cerr << "Failed to set up initial pooled connection: " << e.what() << endl; return 1; } // Setup complete. Now let's spin some threads... cout << endl << "Pool created and working correctly. Now to do " "some real work..." << endl; srand((unsigned int)time(0)); bool running[] = { true, true, true, true, true, true, true, true, true, true, true, true, true, true }; const size_t num_threads = sizeof(running) / sizeof(running[0]); size_t i; for (i = 0; i < num_threads; ++i) { if (int err = create_thread(worker_thread, running + i)) { cerr << "Failed to create thread " << i << ": error code " << err << endl; return 1; } } // Test the 'running' flags every second until we find that they're // all turned off, indicating that all threads are stopped. cout.put('W'); cout.flush(); // indicate waiting for completion do { sleep(1); i = 0; while (i < num_threads && !running[i]) ++i; } while (i < num_threads); cout << endl << "All threads stopped!" << endl; // Shut it all down... delete poolptr; cout << endl; #else (void)argc; // warning squisher cout << argv[0] << " requires that threads be enabled!" << endl; #endif return 0; }
The example works with both Windows native threads and with POSIX threads.[18] Because thread-enabled builds are only the default on Windows, it’s quite possible for this program to do nothing on other platforms. See above for instructions on enabling a thread-aware build.
If you write your code without checks for thread support
like you see in the code above and link it to a build of MySQL++
that isn’t thread-aware, it will still try to run. The
threading mechanisms fall back to a single-threaded mode when
threads aren’t available. A particular danger is that the
mutex lock mechanism used to keep the pool’s internal data
consistent while multiple threads access it will just quietly
become a no-op if MySQL++ is built without thread support. We do
it this way because we don’t want to make thread support
a MySQL++ prerequisite. And, although it would be of limited
value, this lets you use ConnectionPool
in single-threaded programs.
You might wonder why we don’t just work around this weakness in the C API transparently in MySQL++ instead of suggesting design guidelines to avoid it. We’d like to do just that, but how?
If you consider just the threaded case, you could argue for the use of mutexes to protect a connection from trying to execute two queries at once. The cure is worse than the disease: it turns a design error into a performance sap, as the second thread is blocked indefinitely waiting for the connection to free up. Much better to let the program get the “Commands out of sync” error, which will guide you to this section of the manual, which tells you how to avoid the error with a better design.
Another option would be to bury
ConnectionPool
functionality within MySQL++
itself, so the library could create new connections at need.
That’s no good because the above example is the most complex
in MySQL++, so if it were mandatory to use connection pools, the
whole library would be that much more complex to use. The whole
point of MySQL++ is to make using the database easier. MySQL++
offers the connection pool mechanism for those that really need it,
but an option it must remain.
Connection
has several thread-related
static methods you might care about when using MySQL++ with
threads.
You can call
Connection::thread_aware()
to
determine whether MySQL++ and the underlying C API library were
both built to be thread-aware. I want to stress that thread
awareness is not the same thing as thread
safety: it’s still up to you to make
your code thread-safe. If this method returns true, it just means
it’s possible to achieve thread-safety,
not that you actually have it.
If your program’s connection-management strategy
allows a thread to use a Connection
object that another thread created, you need to know
about Connection::thread_start()
.
This function sets up per-thread resources needed to make MySQL
server calls. You don’t need to call it when you use the
simple Connection
-per-thread strategy,
because this function is implicitly called the first time you
create a Connection
in a thread. It’s
not harmful to call this function from a thread that previously
created a Connection
, just unnecessary. The
only time it’s necessary is when a thread can make calls
to the database server on a Connection
that another thread created and that thread hasn’t already
created a Connection
itself.
If you use ConnectionPool
, you should
call thread_start()
at the start of each
worker thread because you probably can’t reliably predict
whether your grab()
call will create a new
Connection
or will return one previously
returned to the pool from another thread. It’s possible
to conceive of situations where you can guarantee that each pool
user always creates a fresh Connection
the
first time it calls grab()
, but thread
programming is complex enough that it’s best to take the
safe path and always call thread_start()
early in each worker thread.
Finally, there’s the complementary method,
Connection::thread_end()
. Strictly
speaking, it’s not necessary to call
this. The per-thread memory allocated by the C API is small,
it doesn’t grow over time, and a typical thread is going
to need this memory for its entire run time. Memory debuggers
aren’t smart enough to know all this, though, so they will
gripe about a memory leak unless you call this from each thread
that uses MySQL++ before that thread exits.
Although its name suggests otherwise,
Connection::thread_id()
has nothing to
do with anything in this chapter.
We’re in the process of making it safer to share MySQL++’s data structures across threads. Although things are getting better, it’s highly doubtful that all problems with this are now fixed. By way of illustration, allow me explain one aspect of this problem and how we solved it in MySQL++ 3.0.0.
When you issue a database query that returns rows, you
also get information about the columns in each row. Since the
column information is the same for each row in the result set,
older versions of MySQL++ kept this information in the result
set object, and each Row kept
a pointer back to the result set object that created it so it
could access this common data at need. This was fine as long as
each result set object outlived the Row
objects it returned. It required uncommon usage patterns to run
into trouble in this area in a single-threaded program, but in
a multi-threaded program it was easy. For example, there’s
frequently a desire to let one connection do the queries, and other
threads process the results. You can see how avoiding lifetime
problems here would require a careful locking strategy.
We got around this in MySQL++ v3.0 by giving these shared data structures a lifetime independent of the result set object that intitially creates it. These shared data structures stick around until the last object needing them gets destroyed.
Although this is now a solved problem, I bring it up because there are likely other similar lifetime and sequencing problems waiting to be discovered inside MySQL++. If you would like to help us find these, by all means, share data between threads willy-nilly. We welcome your crash reports on the MySQL++ mailing list. But if you’d prefer to avoid problems, it’s better to keep all data about a query within a single thread. Between this and the advice in prior sections, you should be able to use threads with MySQL++ without trouble.
[18] The file
examples/threads.h
contains a few macros and
such to abstract away the differences between the two threading
models.