Σ | bitbucket | github | img atom

Serialization format for your new webscale microservice architecture in the cloud.

Wanna look thinking different? Why use all these XMLs and JSONs yet again? That’s too mainstream, let’s invent our own, like no one else does!

Escaping and human-readability

You know what sucks about these language-independent data formats? They suck. They are not really human-readable, are hard to serialize data with, and not that easy to write correct parsers for, either. Also, too mainstream.

Let’s assume there are two computers – A and B. They only know what types of data they are exchanging and what data to expect from each other. Humans debugging the software running on these two computers need to be able to read what’s being transferred back and forth, without having to use any tools (for pretty-printing or parsing).

(De)serialization on the software side should be as simple and as fast as possible. It should be straightforward to write a parser/serializer in awk, to use grep to filter stuff around and sed to replace values/etc.

The solution

<data>  ::= <line> | <data>
<line>  ::= <key> [\t] <value> [\n]
<key>   ::= [^\t\n]+
<value> ::= [^\n]+

Only UTF-8 is allowed.

And that’s it! As simple as possible. No escaping bullshit. One line, one key, one value! Highly readable!

If you really want newlines in your value, use lists instead.

You can have lists like [1, 2, 3] or ["<I'm\tbored>", "\"Привет!\"", "No."], just parse it:

numbers 1
numbers 2
numbers 3

what    <I'm    bored>
what    "Привет!"
what    No.

I WANT NESTED OBJECTS!11

Just as with lists, instead of forcing “objects” on our format itself, let’s put it on a different layer.

<key>              ::= [^\t\n]+

<nested-key>       ::= <nested-key-level> | <nested-key-level> [ ] <nested-key-level>
<nested-key-level> ::= [^ \t\n]+

So, nested key is a key with space character used to access different level of object. Let’s see how a list of objects could look like:

human   
human age   45
human name  John Snow
human   
human name  William Budd
human age   68
human   
human age   57
human name  Yoseph Thomas Clover

This data forms a list of objects:

[ Human{name: "John Snow", age: 45},
  Human{name: "William Budd", age: 68},
  Human{name: "Yoseph Thomas Clover", age: 57}
] :: [Human]

Each object starts with a human\t line here, it’s like having key="human" and value="".

But that format is not for “BigData®”

It’s not. But you don’t have “big data” either. If you are concerned that much about the number of bytes being transferred, use compression.

How is it simple and fast?

(Simplified) parser in C.

If you care much about the speed of comparison between the key you’ve got and the ones you defined (to deserialize into a structure field), don’t worry. strcmp is really fast.

(Simplified) serialization in C.

Conclusion

Wow.

Published on December 01, 2014 01:08PM

simple serialization hipster why not a javascript library I need a smoothie

Comments on this post are disabled.