Efficient I/O: working smart with slow byte movers

erlang

These days I feel deeply puzzled with (not to say troubled by) a few aspects of our contemporary software architecture & development machinery.

There’s a lot of gossiping going on about efficient runtime & programming models in the context of contemporary operating systems’ support for so-called “non-blocking I/O” (read: vert.x, node.js, netty & the like), which basically means “tell what you’re interested in and where, then I’ll call you back so you don’t have to stand up waiting for a slow guy like I am”.

Actually, some smart people got the topic right in a clever way long time before this trend started to get hot: they are the Erlang guys. They built support for a few key features in the Beam virtual machine and in the language:

  • Low-overhead, lightweight “processes” (i.e. non-OS threads)
  • Message-passing and queued message processing
  • Hot module swap hooks for continued operations (this has actually little to do with I/O but it’s such a smart thing I couldn’t possibly leave it out)

Well, it turns out the first two enable a clever solution to manage efficiently I/O matters, effectively decoupling the fast computation work from the potentially very slow network and disk byte-moving grunt:

  • Many fast guys (i.e. the lightweight threads) do the “thinking” (computation) work, cooperating through async message-passing
  • A special guy (i.e. I/O process) handles slow byte-moving on a ticketing (i.e. still async messaging) discipline

The only disadvantage of this approach in the Beam/Erlang incarnation is, as far as I know (but I’m no expert of that world so please point out if that’s not the case), that message-passing means information-copying. I believe this is actually done on a defensive basis in order to keep lightweight threads completely insulated from one another.

In the more prosaic world of ours, daily JVM (or worse) dishwashers as we are, things are a bit less clever:

  • Most Java runtimes used to have “green threads” (lightweight threads) initially, not OS-threads
  • Native OS threads started to get better performance and anyway worked better with natively-implemented blocking I/O of that time (Java didn’t have other kinds), so JVMs began mapping Java threads to native (OS) threads
  • Nowadays, the “non-blocking I/O plus green threads (or fibers)” combo, where the latter are scheduled on OS threads in the number of one per core (e.g. see M:N in http://en.wikipedia.org/wiki/Thread_(computer_science)#Models) would perform best; still, we are stuck with async I/O and OS threads only
  • As a result, event-driven I/O programming got back hot to bridge the gap

Unfortunately, event-driven programming suffers from a few shortcomings (less so in functional languages but still) from a readability and maintainability perspective. It basically mandates either:

  • Breaking your program flow into callbacks, losing your stack in exchange for a new, contemporary code flatland of logic pieces spread everywhere and called in an ever-changing sequence (not to mention using global state to share information between them)
  • If you have them in a convenient enough form, nesting closures (and adding their accidental complexity) until you get mad and lost in either code indent or lambda-fest (or both)
  • Using non-blocking control flow variables and attached listeners (promises) even when you would happily trade them for simpler code

Now let’s see if Java I/O has been implemented in a pluggable fashion, like SPI (Service Provider Interface) or such already used for e.g. the crypto stuff, so that we can now implement something more in the vein of current times… Er… No.

Still, hope won’t die easily. JVM has what is called “bytecode instrumentation”, which basically means the ability to plug some code that would manipulate compiled classes at load-time. A few clever JVM guys (http://www.malhar.net/sriram/kilim/http://blog.paralleluniverse.co/post/49445260575/quasar-pulsarhttp://www.matthiasmann.de/content/view/24/26/) started putting together an interesting toolkit for efficient concurrent and I/O programming on the JVM:

  • A low-level library with interfaces that define and mark implementations (and client code) of lightweight threads or fibers, usually through special marker exceptions or annotations
  • A bytecode instrumenter
  • Higher-level libraries that define concurrent constructs borrowed from other runtimes, like Erlang and Go, and implement them on top of the above two
  • Quasar (for JVM) and Pulsar (for Clojure specifically) also provide NIO-compatible network and file channels that turn Java NIO‘s async I/O into sequential fiber-blocking calls
  • Quasar (for JVM) and Pulsar (for Clojure specifically) also provide utility classes that will help turning async callbacks into sequential fiber-blocking calls

Unfortunately the instrumentation approach makes it difficult to change existing (i.e. unmarked) code; for this reason you can’t expect, for example, to turn Quasar’s channels into Java streams and run traditional blocking I/O code on them.

I’m especially looking into the latter to build high-performance concurrent I/O based servers for the JVM. I think the next challenges will be to rebuild JDK-like I/O facilities on top of them and to bridge async toolkits to become fiber-blocking stuff.

If only we had an high-performance smarter runtime with the strong points of JVM and Beam, a well-documented one that would allow both traditional data-sharing concurrency and share-nothing message-passing, both lightweight and OS threads, and hot code management and replacement of course… That could possibly be one of the most fundamental, and most rewarding, challenges of our software architecture times.

Leave a Reply