Coming up for air
I’ve been too busy lately with real work to keep up with the Ruby spider project, but I’m hoping to get back to it real soon.
This is a vent/rant/gratitude post, based on what I’ve spent my last 2 weeks dealing with.
I’ll start with the gratitude, but it’s all downhill from there. I am immensely thankful for the developers responsible for the following tools which, IMO, no Linux developer should be without:
- Valgrind - helped me track down a bunch of obvious memory errors in the code
- dmalloc - helped me track down those which valgrind couldn’t
- gdb - the more I use gdb, the more effective a tool it becomes to me. Why aren’t you using it? At least learn to get a backtrace and inspect variables if you’re at all programming anything with GCC.
- distcc - chasing memory issues into base classes led to full recompilation more often than I would have liked. distcc let me put an extra 2 idle machines to work compiling the code.
What started as a rather curious crash in our dev branch a couple of weeks ago turned into a major audit of how we allocate, use and free memory in our codebase. When I did get to the bottom of the crash, it turned out not to be a memory issue, but a library dependency inherited from libstdc++.
So here’s a little nugget for you right off the bat:
If you use STL containers (or any code which uses STL’s default allocator) in a dynamic library which has been compiled with -MT, you must link any executable that dlopen’s said library with libpthread.
In hindsight this sounds fairly obvious, right? The problem seems to arise due to the fact that the gnu std lib stuff uses a multi-threaded allocator which is dependent on libpthread, yet the pthread_* functions are all weak references which will resolve to NULL if the actual functions can’t be found.
There are a few reasons why this issue was less than transparent.
- The error presented as a segfault in mt_allocator.h, rather than an undefined reference, which is usually an immediate sign that a library hasn’t been linked in.
- Our build system uses autoconf/automake, and we have it configured in a way which obscures the fact that we compile all dynamic libraries using -MT (although this is clearly obvious when you watch the compiler output)
- The test harness we experienced the issue in was compiled without -MT, although it shared identical AM_CPPFLAGS settings to the library in question.
In the end, I found myself here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25409, and sure enough, adding -lpthread solved my problem. It sounds like they may have patched the mainline of libstdc++ to complain more about weakref’d symbols, which would be a good thing.
So my beef is not so much with libstdc++, but with how we set up the build system. However having said that, this code used to work fine so I suspect we’ve tripped over something that was recently changed either in our source, or in the library.
So that’s some gratitude and some venting. This is getting long… the rant will get its own post I think.