Begrudgingly choosing CBOR over MessagePack
In my pursuit of a sharable programming language, I need a suitable serialization format that's (1) performant to send/store and (2) easy to understand/implement.
Serialized scrapscript expressions are called "flat scraps".
In a previous essay, I crammed scrapscript into MessagePack.
At the time, I didn't realize that Max Bernstein had already written an entire serializer in ~100 LOC. Mind blown!
After further experiments, I somehow convinced myself that this serialization format also needed to (3) commit to existing standards, (4) work overtime as a pseudo-IR, (5), and run on relatively crappy hardware.
I'm still in the discovery phase here. Max's format remains very attractive after witnessing the weight of popular CBOR and msgpack implementations.
Meanwhile, Peter Saxton (EYG) sent me a friendly email recommending CBOR as a potential alternative to MessagePack.
Uh oh -- competing standards? Marginal design tradeoffs? Open-source woes? Buckle up!
Drama is fun, but not relevant to this essay. This post provides a solid summary of CBOR's history. This comment links to the most contentious public spats.
Which is cooler?
Obviously MessagePack is what cool kids would use.
Compare the subheadings on each landing page:
- msgpack: "It's like JSON. but fast and small."
- CBOR: "RFC 8949 Concise Binary Object Representation"
One of these formats is wearing a damn necktie.
Everything about CBOR is uncool. It was designed by a committee. It reeks of RFCs. Acronyms are lame. Saying "SEE-BORE" is like licking a nickel. One of the authors is "Carsten Bormann", which makes the name feel masturbatory.
CBOR was inspired by MessagePack. MessagePack was developed and promoted by Sadayuki Furuhashi ("frsyuki").
-- RFC 8949
Loyalty to an "original" brand has merit. Instead of chasing mere incremental improvements, you can support creators who synthesize value from nearly nothing. To favor derivative work can feel like choosing bureaucracy over personal expression.
Which is more efficient?
But many people don't care about "coolness" -- they want compression and speed and performance.
In these benchmarks, the author compared performance between two popular Go libraries. From these tests, it appears that the CBOR library encodes/decodes ~200% faster.
Which is simpler?
But efficiency isn't everything. I usually choose conceptual simplicity over performance. Like most makers, I resent depending on forces I cannot understand.
To measure complexity, you can often use documentation length as a proxy. MessagePack is just a markdown file. The CBOR spec has its own gravitational field.
However, upon further scrutiny, I've found that the documentation sniff-test misled me. This HN comment shares my latest thoughts:
Yeah, I skipped all the drama, read the spec and implemented an encoder/decoder. CBOR is just how MessagePack-like format should have been done from the beginning: it's technically superior in a sense that it's neat and simple, replacing many specialized rules with one generalization.
At its top-level, MessagePack defines a bunch of types: integers, floats, arrays, extensions, etc. CBOR unifies these types with "tags"; this pattern seems much easier to explain and implement.
This person posits that CBOR's tags are poorly designed. Although I respectfully disagree with his conclusions, I think he makes some great points.
Which is more popular?
"Serialization" often connotes "communication". Bandwidth is expensive, so formats like MessagePack and CBOR make obvious candidates for computer protocols.
Metcalfe's Law states that the value of a protocol is proportional to its popularity.
The data speak for itself:
3.1K☆ | C | msgpack/msgpack-c |
2.4K☆ | Go | vmihailenco/msgpack |
1.9K☆ | Python | msgpack/msgpack-python |
1.8K☆ | Go | tinylib/msgp |
1.4K☆ | Java | msgpack/msgpack-java |
1.4K☆ | JS | msgpack/msgpack-javascript |
1.2K☆ | Rust# | 3Hren/msgpack-rust |
1.0K☆ | JS | kawanet/msgpack-lite |
837☆ | C# | msgpack/msgpack-cli |
806☆ | Go | fxamacker/cbor |
784☆ | PHP | msgpack/msgpack-php |
764☆ | Ruby | msgpack/msgpack-ruby |
529☆ | JS | kriszyp/msgpackr |
519☆ | C | intel/tinycbor |
364☆ | JS | hildjj/node-cbor |
354☆ | C | PJK/libcbor |
320☆ | JS | paroga/cbor-js |
311☆ | JS | msgpack/msgpack-node |
303☆ | JS | kriszyp/cbor-x |
300☆ | Rust | pyfisch/cbor |
284☆ | Rust | enarx/ciborium |
243☆ | Python | agronholm/cbor2 |
214☆ | C# | peteroupc/CBOR |
210☆ | Erlang | msgpack/msgpack-erlang |
196☆ | Haskell | well-typed/cborg |
142☆ | Swift | valpackett/SwiftCBOR |
138☆ | Haskell | msgpack/msgpack-haskell |
119☆ | Java | c-rack/cbor-java |
Which is better?
For my particular use-case, CBOR totally wins. Scrapscript expressions feel great inside CBOR's extension tags.
I prefer CBOR, but I don't like that I prefer CBOR. It irks me to use technology with political baggage.
Anyway, MessagePack and CBOR are vast improvements over JSON. As long as MessagePack retains its popular advantage, both options seem reasonable.
Protocols are important. Communicate with caution.