What's in Java 13-16 for XML Developers?

Java SE 16 came out in March. I like this regular "cadence", which is what Oracle calls their non-chaotic, regular, small-scale release cycle. As with 9, 10, and 11, here is a quick summary of the biggest changes that relate to XML development.

There are a few general purpose features added of interest to any Java developer: switch expressions, records and pattern matching of instanceof, which are catchups with competitor languages, and a changing of the guard for garbage collectors. But I think the directly-interesting additions impacting XML developers are these two; text blocks and Vectors.

Text Blocks

Text blocks are to Java what CDATA marked sections are to XML: they provide a mode with different and fewer delimiter mappings, to enable you to embed more directly text that otherwise would require additional delimiters, and so be difficult to read or write directly.

A text block is just syntactical sugar: it makes a normal Java String, so the normal Java String functions and operators work with it.

The delimiters used to get to this mode, is three double quotes, like Groovy, Kotlin, Python, Scala, and Swift: """ Rather than, say the backtick ` of JavaScript and Go.

Such as

          <x> (abc * d; exit; "stinkhorn" fungus
         in football shape</x>
          """ );

Which is equivalent to

"<x> (abc * d; exit; \"stinkhorn\" fungus in football shape</x>");

It does not have to be markup, it can be any text. But obviously it has a prime use for embedded markup.

In a fashion reminiscant of SGML's rules for removing certain newlines "attributable to markup", the initial whitespace is removed, and the same amount of leading whitespace on each line, and newlines converted to strings. So you can autoindent your code without adding whitespace to the text block.

Like some programming languages, a \ at the very end of the line preserves that newline.

Text blocks in Java came are fully baked from version 15, but available before that.


Vectors are a way to support SIMD (Single Instruction Multiple Data) operations better. Vectors were introduced in release 16, but are not hard-baked yet, I think.

A Vector lets you gather, say, some run of a larger array of primitive data (bytes, words, etc) into a chunk (eg. 128 bytes) and then run some operations on it. If you just use simple operations, the compiler can convert this to SIMD code.

So Vectors are an admission that you cannot treat parallelization of a program just as an optimization (GPUs like NVidea's are a prime example of this): instead you have to have parallelization in mind when you design your code and then try to write natural code using this feature.

Why is this relevant to XML developers? It should be very relevant for people who write parsers or XPath processors and, say, anyone doing non-parsing text processing on documents. For example, it opens the door for optimized UTF-8 to UTF16 conversion.

Will programming languages morph into and replace markup languages?

Over 30 years I used to work at TI, supporting their expert systems products. They had a CPU that ran LISP natively (i.e. it had a type-tagged instruction set), the TI Explorer which was a commercialization based on the chips from LISP Machines. In those days, TI made computers too.

A really interesting system (almost zero reboot time!), but at the tail-end of the initial AI hype cycle, just as expert systems were dying (because of the knowledge capture problem) and just before neural nets resurged to their current prominence where when people say AI they mean neural-net and other trained or self-trained systems only.

Neural nets then were poo-poohed but actually thrived under the guise of the modulation schemes use for data communications: determining what point in a constellation had been found (in phase-shift keying or quadrature modulation, for example.)

And expert systems have not gone away: in fact, Schematron is an expert system of the simplest type: more complex expert systems can use higher order logic, unification, and so on, but I have never seen much applicability for validation.

Getting to my point: back then I spoke with one of the big luminaries who said he thought that AI/LISP would die as a thing in its own right, but its ideas would be everywhere under the hood. He thought AI would win to win by losing. The examples he gave included classes and objects, garbage collection, arbitrary precision arithmetic, and functional programming, so he was mostly right! Of course, now he could add generics and lambdas.

Now that most major general languages have, in effect, CDATA sections, by text blocks, and they also have, in effect, processing instructions or PI-typed attributes (don't worry: it was an SGML thing), by annotations, and they also support Unicode, are we seeing the same thing as with AI/LISP? Scala has taken it to the greatest extreme, with its extensible syntax, in that you can embed semi-XML directly, and have the Scala parser parse it!

As programming languages get more of the kinds of features that only markup languages previously had, even if just syntactical sugar, does this mean that we will need markup languages less and less?

I would say no, currently, on the technical reason that I think embeddable islands of markup are not enough: however, if you had a bi-modal system that allowed free interleaving of program code and markup in a stack, then that would, I think be a big challenge. (I suppose this is what the syntaxes like early PHP allowed: just plonk in JavaScript with some special delimiter and away you go.)