XML as a canary in the mine: can Intel ISPC help stagnant C get its mojo back?

Posted on March 30, 2018 by Rick Jelliffe

JSON is the best thing that ever happened to XML. It whack-a-moles most new bright ideas people have that shoehorn XML into applications XML is not a good fit for. Hooray. (Which is not to say JSON is an unmitigated success!) XML is getting used for what it is good for, which is largely what SGML used to be good but too complex for. JSON is popular because of convenience (which I do not dispute, especially given XML’s confusing variety of APIs on different platforms) and performance. The poor performance is often regarded as being instrinsic from XML’s relative complexity. But I think there is another major cause: modern programming languages do not make it possible to utilize the capacity of modern CPUs well. XML parsing and processing is trapped, handled one thing at a time, like sheep being counted in a race.

I am really impressed with Intel’s ISPC: this is a compiler and language for exposing more of the capabilities of modern CPUs. It is a dialect of C with its own standard library. They have not yet made other libraries available, so my guess is Intel sees it’s sweet spot as compiling high-efficiency function librarys that you then plug into your C or C++ code etc.

ISPC is Intel SPDM Program Compiler (Github open source) and SPDM stands for Single Program, Multiple Data. The basic idea is to trim and extend C to support the major optimizations available in modern (i.e. Intel) CPUs: multiple cores, SIMD data and instructions, cache and prefetch instructions, and alternative layout options for structs to support better cacheline access patterns. Of course, many of these are also available through “intrinsics” libraries (or other features such as u32x4 datatypes in Rust), but this is the first time I have seen a solid effort produce a plausible and cohesive result. (You can supply different targets for which CPU SIMD features you want to support: SSE2, SSE4, AVX etc.) ISPC’s best benchmark case has a 240x speed improvement over single-threaded C on a bare machine: impressive in the modern cloudy world where we often need to optimize latency rmore than throughput rate:

If you understand, say, GPU programming you might understand that IPSC uses a similar parallel model based on generating masks for conditionals, but at a the much smaller grain of the SIMD array size rather than the larger size of the GPU warp.

I think there are basically four thriving classes of computer languages (of course they mix):

“I wanna be LISP when I grow up“. General purpose languages where execution is not primarily important. These inevitably converge on LISP’s feature set: modern JavaScript being a prime example. Garbage collection, evaluation of generated text as programs, nested and functional structures, lambdas, etc.
The Children of Simula. Languages where modelling (i.e. finding syntactic ways to express complex things more easily) is important. These take up Simula‘s mantle: all the Java, C++, Objective C, and C# and so: languages that have classes, agents, generics, annotations, iterators, modules, and so on.
“Everything is a …” Languages which explore one particular idea to its limit: in Lua everything is a table, in Erlang everything is a process, in SQL everything is a relational table, in XSLT everything is XML, in OmniMark everything is a shelf (keyed stack of arrays of streamable variables), in assembler everything is a particular CPU, in a spreadsheet everything is in a cell, in Prolog everything is a fact, and so on. (Of course, in LISP, everything is a list too…)

My bugbear is where a language that pretends to be a language of the second class actually is one of the third class. This is where the language designer (it is excusable in an academic) has some brilliant unifying feature and wants to squeeze everything into that: “you don’t need X, Y, Z let alone A, B, C because you can do them with lazy polymorphic, intensional, hyper-transitive monads” or whatever. Often Scala, Rust and Pony smack of this, to me. But for the developer, it means an abrupt switch of mentality from fluency to bafflement.

“Don’t tell me no”. This is C (and ADA too.) According to Dennis Ritchie: “BCPL, B, and C all fit firmly in the traditional procedural family typified by Fortran and Algol 60. ... They are `close to the machine' in that the abstractions they introduce are readily grounded in the concrete data types and operations supplied by conventional computers, … At the same time, their abstractions lie at a sufficiently high level that, with care, portability between machines can be achieved.“

But I don’t think C is actually thriving, I think it has been dead for 25 years. Because it simply is no longer grounded in the concrete data types and operations of conventional machines. SIMD. Multicore. Multiple levels of caches. GPUs. And it fails to provide abstractions that can get portability between machines, given that the main variants of machines now are different x86 systems and the GPUs.

My first professional job was to write a cross-assembler in PDP-11 assembler language. This is the same CPU that C was developed for. What C does is abstract the PDP-11 (i.e. a PDP-11/20), and this translates well to similar CPUs: VAX, 6800, 68000, SPARCs, and (with some memory model fudges, the 8086 too). But that technology is 45 years old now.

Let's look at XML parsing in particular. Do you know that modern Intel processors have some instructions specifically designed to allow faster parsing of XML? For 10 years, starting SSE4.2, the String and Text Processing Instructions have been in production. But (tell me if anything has changed) will your compiler generate it? Will your Java JIT compiler generate it? The answer is no.

Actually, the answer is more like “if you have called a library function that has optimized versions for different CPUs, they may generate it”. And there is more chance for some numeric operations, since most of the SIMD instructions are targeted for those uses. But that does not change the basic answer. Your own compiled code will probably not make use of SIMD instructions etc, unless you have specifically called intrinsic functions–no good for Java etc–, or put in bits of assembler, or happened to lay out your code in some way that the LLVM (or whatever compiler backend you are using) accidentally recognizes, so it can do something.

Now notice that in the first three language classes, the emphasis is really that optimization is a distraction: you select good data structures and algorithms, and it is the compilers job to be ever-smarter to figure out what is good for the CPU.

My problem is that this attitude is also what has guided C development over the last few decades: and consequently, C as a language has stagnated. There has been development by adding ideas from the other classes of language: that is what C++ is. But C’s core genius is being close to the machine and providing an abstraction to allow porting between common systems, and it simply has not kept up.

Language designers, and the C Programming Language Committee, should look seriously at ISPC. The rise of JSON is not only because it fits web application developer’s requirements for a format for sending data structures over the wire better than XML does. It is also because XML parser writers and other library and application developers simply are not capable (due to where we are in the food chain) of utilizing the instructions in CPUs, available for the last decade, that improve text processing and parsing. Languages need to expose them as first class abstractions, data types and control structures: ISPCs gangs and SOA and so on, really hit the nail on the head better than anything I have seen yet.

I can see that the other classes of programming languages have some excuse for being similarly deficient, that it is not their shtick. But it clearly was C’s shtick, and the reason for C’s success, but C’s stagnation is positively getting in the way now.