Serialization format for your new webscale microservice architecture in the cloud.
Wanna look thinking different? Why use all these XMLs and JSONs yet again? That’s too mainstream, let’s invent our own, like no one else does!
Escaping and human-readability
You know what sucks about these language-independent data formats? They suck. They are not really human-readable, are hard to serialize data with, and not that easy to write correct parsers for, either. Also, too mainstream.
Let’s assume there are two computers -- A
and B
. They only know what types of
data they are exchanging and what data to expect from each other. Humans debugging
the software running on these two computers need to be able to read what’s being
transferred back and forth, without having to use any tools (for pretty-printing
or parsing).
(De)serialization on the software side should be as simple and as fast
as possible. It should be straightforward to write a parser/serializer in awk
,
to use grep
to filter stuff around and sed
to replace values/etc.
The solution
<data> ::= <line> | <data>
<line> ::= <key> [\t] <value> [\n]
<key> ::= [^\t\n]+
<value> ::= [^\n]+
Only UTF-8 is allowed.
And that’s it! As simple as possible. No escaping bullshit. One line, one key, one value! Highly readable!
If you really want newlines in your value, use lists instead.
You can have lists like [1, 2, 3]
or ["<I'm\tbored>", "\"Привет!\"", "No."]
, just parse it:
numbers 1
numbers 2
numbers 3
what <I'm bored>
what "Привет!"
what No.
I WANT NESTED OBJECTS!11
Just as with lists, instead of forcing “objects” on our format itself, let’s put it on a different layer.
<key> ::= [^\t\n]+
<nested-key> ::= <nested-key-level> | <nested-key-level> [ ] <nested-key-level>
<nested-key-level> ::= [^ \t\n]+
So, nested key is a key with space character used to access different level of object. Let’s see how a list of objects could look like:
human
human age 45
human name John Snow
human
human name William Budd
human age 68
human
human age 57
human name Yoseph Thomas Clover
This data forms a list of objects:
[ Human{name: "John Snow", age: 45},
Human{name: "William Budd", age: 68},
Human{name: "Yoseph Thomas Clover", age: 57}
] :: [Human]
Each object starts with a human\t
line here, it’s like having key="human"
and value=""
.
But that format is not for “BigData®”
It’s not. But you don’t have “big data” either. If you are concerned that much about the number of bytes being transferred, use compression.
How is it simple and fast?
(Simplified) parser in C.
key = readline(...)
. Read a stream of bytes until'\n'
is encountered.value = strchr(line, '\t') + 1
value[-1] = '\0'
- loop
If you care much about the speed of comparison between the key you’ve got and the ones you defined
(to deserialize into a structure field), don’t worry. strcmp
is really fast.
(Simplified) serialization in C.
fprintf(stream, "%s\t%s\n", key, value)
- loop
Conclusion
Wow.
Last update: January 10, 2020 04:32PM
September 30, 2020 10:04PM
February 04, 2022 06:06PM
February 04, 2022 10:27PM
I don't use any at all, really. Definitely create your own format! The more the merrier :D
April 30, 2023 03:26AM
cialis in canada buy brand cialis canada