Serialization format for your new webscale microservice architecture in the cloud.

Wanna look thinking different? Why use all these XMLs and JSONs yet again? That’s too mainstream, let’s invent our own, like no one else does!

Escaping and human-readability

You know what sucks about these language-independent data formats? They suck. They are not really human-readable, are hard to serialize data with, and not that easy to write correct parsers for, either. Also, too mainstream.

Let’s assume there are two computers -- A and B. They only know what types of data they are exchanging and what data to expect from each other. Humans debugging the software running on these two computers need to be able to read what’s being transferred back and forth, without having to use any tools (for pretty-printing or parsing).

(De)serialization on the software side should be as simple and as fast as possible. It should be straightforward to write a parser/serializer in awk, to use grep to filter stuff around and sed to replace values/etc.

The solution

<data>  ::= <line> | <data>
<line>  ::= <key> [\t] <value> [\n]
<key>   ::= [^\t\n]+
<value> ::= [^\n]+

Only UTF-8 is allowed.

And that’s it! As simple as possible. No escaping bullshit. One line, one key, one value! Highly readable!

If you really want newlines in your value, use lists instead.

You can have lists like [1, 2, 3] or ["<I'm\tbored>", "\"Привет!\"", "No."], just parse it:

numbers	1
numbers	2
numbers	3

what	<I'm	bored>
what	"Привет!"
what	No.

I WANT NESTED OBJECTS!11

Just as with lists, instead of forcing “objects” on our format itself, let’s put it on a different layer.

<key>              ::= [^\t\n]+

<nested-key>       ::= <nested-key-level> | <nested-key-level> [ ] <nested-key-level>
<nested-key-level> ::= [^ \t\n]+

So, nested key is a key with space character used to access different level of object. Let’s see how a list of objects could look like:

human	
human age	45
human name	John Snow
human	
human name	William Budd
human age	68
human	
human age	57
human name	Yoseph Thomas Clover

This data forms a list of objects:

[ Human{name: "John Snow", age: 45},
  Human{name: "William Budd", age: 68},
  Human{name: "Yoseph Thomas Clover", age: 57}
] :: [Human]

Each object starts with a human\t line here, it’s like having key="human" and value="".

But that format is not for “BigData®”

It’s not. But you don’t have “big data” either. If you are concerned that much about the number of bytes being transferred, use compression.

How is it simple and fast?

(Simplified) parser in C.

If you care much about the speed of comparison between the key you’ve got and the ones you defined (to deserialize into a structure field), don’t worry. strcmp is really fast.

(Simplified) serialization in C.

Conclusion

Wow.

Last update: January 10, 2020 04:32PM

simple serialization hipster why not a javascript library I need a smoothie

anonymous
September 30, 2020 10:04PM
I use something like it for the input of a program of mine, except that for nested objects I indent a line with a leading tab.
Leave a comment. Email addresses are not stored.