Category Archives: c/c++


A while back, I wrote about cooperative threading libraries for C++. Coroutines are a closely related concept—coroutines and cooperative threads can be expressed/implemented in terms of each other.

Conspicuously absent from that coverage is Boost.Coroutine, which I’ll discuss here. The problem with Boost.Coroutine is that it was incomplete, and—last I checked—far from complete. I had spent some time trying to work with the author through its non-starter issues, as I was looking forward to using it in conjunction with Boost.Asio (this was one of Boost.Coroutine’s primary objectives), but the author has not had the time to take his work to the Boost formal review stage.

Cooperative threads for C/C++

Event-based programming is a PITA. In C/C++, there are a number of libraries that provide cooperative threads (a.k.a. fibers or lightweight threads or…), which can make single-kernel-threaded asynchronous programming less painful.

I’ve been using the solid, fast, and simple State Threads library. I haven’t used Pth, nor do I have a good sense of the performance difference, but that appears to be the only other option worth considering. There exist other libraries, and you can find links to most of them from the Pth links page, but they aren’t as useful out of the box. To mention a few:

  • libcoro, libcoroutine, and libpcl (which supersedes coro) don’t provide IO, only basic coroutine primitives, and don’t appear to be widley used or supported.
  • libtask provides IO, but the implementation uses the slower ucontext.h primitives, the IO functions aren’t very rich (they don’t appear to offer timeouts), and again the library doesn’t appear to be widely used or supported.

Pitfalls: tooling, composition, memory

Using something like this in your next project is always a bit of a risk. Your code will almost certainly be more natural to express and easier to maintain, but you may encounter incompatibilities with other systems/libraries/tools. For instance, one major issue is that ST is (or at least has been at some point in the past) incompatible with system threads. I haven’t stress-tested this, but mixing the two hasn’t produced any problems for me yet. (The problem, if you’re curious, appears to be due to both pthreads and ST attempting to use the same location to store thread-local data.) Other tools that I’ve found to not play nicely with ST include Valgrind and various CPU profilers (gprof, Google perftools, etc.). gdb works fine, though.

A downside to using cooperative threads instead of events is that you can’t compose together multiple asynchronous operations without using multiple stacks. E.g., if I wanted to listen for the first of n sockets to yield something to read, and I wanted to do this using only the provided primitive IO functions, I’d need to perform a read from each of n threads (and have the winner send interrupts to the losers). To do this more efficiently, I’d need to write my own IO functions, ones that cannot be expressed using the threaded abstraction, but must instead fall back to raw asynchronous operations.

In practice, I haven’t had to do much arbitrary composition. Composing IO operations with sleeps (in other words, adding timeouts to IO operations) is important, but that tends to be built in to the IO functions already provided by the threading library. Composing together inter-thread signals and IO operations is also important (e.g. for interrupts), but interrupts also tend to be already provided by the library.

Another downside is that events may be more scalable/flexible in terms of memory consumption—this is one of the main reasons why (system) threads are regarded as generally less scalable than events. The stack requires contiguous memory, whereas event state objects are dynamically heap-allocated, so you needn’t to worry about providing enough room in advance for each “stack.” Conservative over-estimates occupy more space than necessary, so when juggling hundreds of thousands of connections, you may exhaust main memory and incur excessive swapping (thrashing).

Variations: Tamer, Capriccio

Tamer is a preprocessor that lets you annotate functions that are expected to perform asynchronous operations as “tamed”. You also annotate which “stack” variables are used across these calls. Tamer transforms these tamed functions into objects with the appropriate callbacks, and it changes the “stack” variables into fields so that they persist across callbacks.

The Tamer approach lets you compose events without the stack expense, but every time you enter/exit a tamed function, a state object is dynamically allocated/freed. For another approach, Capriccio is a system from Berkeley that introduces a simple idea: use linked stacks, or dynamically growable stacks. I hope we eventually see this concept used in a maintained system that actually builds and is usable on modern platforms.

Also, Tamer’s annotation requirement is both a blessing and a curse: annotations clearly identify at what parts of your program you may be preempted, but of course this reduces flexibility, since you can’t (say) embed an asynchronous operation into certain places, such as inside an streambuf::underflow() (this is the method that pulls more data from the streambuf’s underlying stream source).

In an ideal world, you won’t need to annotate functions, but the fact that a function is asynchronous should be inferred by tools (starting with a root set of primitive asynchronous operations), so that your IDE can highlight which functions are asynchronous. This grants you the freedom to change what’s asynchronous and what isn’t without propagating lots of annotation changes throughout your code, but still remain cognizant of where you might be preempted and where you definitely won’t. This would only work given enough static assumptions, though—hiding asynchronous IO operations behind any form of dynamic dispatch/polymorphism would preclude an analysis that is both accurate and useful.

C++ physical design

A quote from a thread on the Boost user list describing C++ physical design as a trade-off against other, better-established qualities of code:

Whether or not precompiled headers lead to faster builds is besides the point, because they increase coupling.

The point of good physical design is to reduce coupling. Reduced coupling has many benefits beyond its effect on build times, in fact it often leads to slower “full rebuilds”. Also, good physical design sometimes requires compromises with logical design, type safety, or even performance.

The problem is that bad physical design has no measurable effects on anything until it’s too late. Once it becomes a problem, reducing physical coupling is very difficult. Boost with its header only nature is beyond that point, IMO.

Emil Dotchevski
Reverge Studios, Inc.


When working in C++, I occasionally find myself pining for a REPL like those for Python/Haskell/Scheme/[your language here]. Given the size and complexity of the language, I find it useful at times to explore certain nooks and crannies, and REPLs can be a quick way to try things out. It turns out that such a beast exists, and its name is CINT.

The craziest part about CINT is that this is a stand-alone interpreter, i.e. a full implementation of C++. Well, not quite full, but it implements a large subset of C++. CINT also implements a subset of system libraries from POSIX, Win32, and even the STL, but certainly not everything/with limitations. If you’re curious, check out the speeds on the Computer Language Benchmarks Game. This thing was clearly not built for speed.

Putting aside the sheer coolness factor, I haven’t actually used CINT much, and part of this is because (I believe) C++’s design is not amenable to interactive use. CINT does add a few language extensions which make it slightly more usable, including limited “type inference” and looser distinguishing of . from ->. However, the C++ test cases I’ve been coming up with are almost always at least several lines of code consisting of some function/class/template declarations—clumsier to input using readline.

There also exists a REPL for C: Evan Martin’s C-REPL. Unlike CINT, it’s a simpler program that uses incremental compilation tricks. Basically, every time you enter a line, C-REPL generates a source file, which then gets compiled into a shared library, which in turn is dynamically linked with and executed. As such, C-REPL supports the full language, system libraries, anything. And it’s fast. But it only works with C.

C trigraphs. WTF.

I added a “loud” but seemingly innocuous print statement to debug a C++ program:

cout << "WAITING!?!?!?!??!" << endl;

On compilation, I got:

$ g++ -Wall -c -o main.o
In file included from main.lzz:12,
warning: trigraph ??! ignored, use -trigraphs to enable

Turns out I used just the right combo of ?? and ! in my program. I had run into C trigraphs (gcc documentation).