Log in

Previous Entry | Next Entry

Following Up On The End Of The World

Being the end of the world and all, I figure I should go into a bit more details, especially as omnifarious went as far as commenting on this life-altering situation.

He's unfortunately correct about a shared-everything concurrency model being too hard for most people, mainly because the average programmer has a lizard's brain. There's not much I can do about that, unfortunately. We might be having an issue of operating systems here, rather than languages, for that aspect. We can fake it in our Erlang and Newsqueak runtimes, but really, we can only pile so many schedulers up on each others and convince ourselves that we still make sense. That theme comes back later in this post...

omnifarious's other complaint about threads is that they introduce latency, but I think he's got it backward. Communication introduces latency. Threads let the operating system reduce the overall latency by letting other runs whenever it's possible, instead of being stuck. But if you want to avoid the latency of a specific request, then you have to avoid communication, not threads. Now, that's the thing with a shared-everything model, is that it's kind of promiscuous, and not only is it tempting to poke around in memory that you shouldn't, but sometimes you even do it by accident, when multiple threads touch things that are on the same cache line (better allocators help with that, but you have to be careful still). More points in the "too hard for most people" column.

His analogy of memcached with NUMA is also to the point. While memcached is at the cluster end of the spectrum, at the other end, there is a similar phenomenon with SMP systems that aren't all that symmetrical, multi-cores add another layer, and hyper-threading yet another. All of this should emphasize how complicated writing a scheduler that will do a good job of using this properly is, and that I'm not particularly thrilled at the idea of having to do it myself, when there's a number of rather clever people trying to do it in the kernel.

What really won me over to threading is the implicit I/O. I got screwed over by paging, so I fought back (wasn't going to let myself be pushed around like that!), summoning the evil powers of mlockall(). That's where it struck me that I was forfeiting virtual memory, at this point, and figured that there had to be some way that sucked less. To use multiple cores, I was already going to have to use threads (assuming workloads that need a higher level of integration than processes), so I was already exposed to sharing and synchronization, and as I was working things out, it got clearer that this was one of those things where the worst is getting from one thread to more than one. I was already in it, why not go all the way?

One of the things that didn't appeal to me in threads was getting preempted. It turns out that when you're not too greedy, you get rewarded! A single-threaded, event-driven program is very busy, because it always finds something interesting to do, and when it's really busy, it tends to exhaust its time slice. With a blocking I/O, thread-per-request design, most servers do not overrun their time slice before running into another blocking point. So in practice, the state machine that I tried so hard to implement in user-space works itself out, if I don't eat all the virtual memory space with huge stacks. With futexes, synchronization is really only expensive in case of contention, so that on a single-processor machine, it's actually just fine too! Seems ironic, but none of it would be useful without futexes and a good scheduler, both of which we only recently got.

There's still the case of CPU intensive work, which could introduce trashing between threads and reduced throughput. I haven't figured out the best way to do this yet, but it could be kept under control with something like a semaphore, perhaps? Have it set to the maximum number of CPU intensive tasks you want going, have them wait on it before doing work, post it when they're done (or when there's a good moment to yield)...

omnifarious is right about being careful about learning from what others have done. Clever use of shared_ptr and immutable data can be used as a form of RCU, and immutable data in general tends to make good friends with being replicated (safely) in many places.

One of the great ironies of this, in my opinion, is that Java got NIO almost just in time for it to it to be obsolete, while we were doing this in C and C++ since, well, almost forever. Sun has this trick for being right, yet do it wrong, it's amazing!


( 9 comments — Leave a comment )
May. 19th, 2008 07:32 am (UTC)

Claudia Vieira

Edited at 2008-05-19 07:34 am (UTC)
May. 19th, 2008 03:50 pm (UTC)
You can't just talk about the performance characteristics and slur over the incredible difficulty of programming with threads - only very rare apps need to push every last jot of performance out of the system, but everybody has to deal with the overly-difficult programming model. People just aren't going to give up C++ any time soon, so there's got to be a better way than "share everything explicitly and let the programmer worry about locking", without switching to Erlang or something. Most threading packages provide a thread-specific data construct, but it's heavy-weight and takes work to set up. Is there any way short of a compiler extension to make plain old heap-allocated data visible only to the thread that allocates it, and explicitly declare the shared data?
May. 19th, 2008 07:05 pm (UTC)
Note that what I am thinking about is not something for the common case, I am specifically thinking of applications that have to drink from the fire hose, with GigE interfaces to saturate, and so on. When I was promoting fewer threads, I was doing so because that was the way to get highest throughput, not necessarily because it was easier on the developer.

I agree with you that a shared-everything model isn't something that the average application programmer is able to deal with. The worst is that they often think that they are able to deal with it, and then... Whoa.

The mechanism for shared-nothing concurrency and various means of IPC (sockets/pipes, shared memory) have existed in Unix since, well, forever. But where are the users? Now, we gave them nuclear chainsaws, and they were all over it. What do you make of that?
May. 19th, 2008 07:08 pm (UTC)
Nothing is shared unless you declare it global and/or pass pointers to it around to different parts of your program. Sure, everything is there in your virtual address space but a module in your program is not going to read/write to random parts of memory just for fun; it will only access stuff through pointers that you explicitly passed to that module.

You're already avoiding global variables, defining simple interfaces between modules and making sure that the ownership relations between all your objects are crystal-clear in your single-threaded programs. Keep doing all that in a multi-threaded program and I think you'll find that there's very few places that will require synchronization at all, and that 99% of them will be a leaf in your call tree where no deadlock is even possible.
May. 19th, 2008 07:33 pm (UTC)
So, what you're saying, in summary, is that we're all doomed?
May. 19th, 2008 08:00 pm (UTC)
Not at all. I'm saying that anyone who already knows how to write a well-structured program will not be overly burdenned by the little bit of extra work that needs to be done in a multi-threaded environment.

I suppose you could say that someone who was already doomed is going to suffer in all new and interesting ways with hard to reproduce bugs, but that would've been true if they had chosen an event-driven structure anyway.
May. 19th, 2008 10:14 pm (UTC)
I'm saying the latter. These days, I'm pointing at web browsers as a platform, if only because it limits the damage. I'm hoping for something like JavaScript to take over the world, because you can turn it into a functional language without people screaming at the parentheses...
May. 20th, 2008 12:28 am (UTC)

Are you trying to reward him for making posts about arcane technical matters most people wouldn't care a fig about? :-)

May. 20th, 2008 03:08 pm (UTC)
I didn't think of it that way.

I just figured that since I should make pertinent comments. And since I am not sure what the post was about I defaulted to the always pertinent scantily clad ladies...
( 9 comments — Leave a comment )