taylor.town about spam rss

Begrudgingly choosing CBOR over MessagePack

In my pursuit of a sharable programming language, I need a suitable serialization format that's (1) performant to send/store and (2) easy to understand/implement.

Serialized scrapscript expressions are called "flat scraps".

In a previous essay, I crammed scrapscript into MessagePack.

At the time, I didn't realize that Max Bernstein had already written an entire serializer in ~100 LOC. Mind blown!

After further experiments, I somehow convinced myself that this serialization format also needed to (3) commit to existing standards, (4) work overtime as a pseudo-IR, (5), and run on relatively crappy hardware.

I'm still in the discovery phase here. Max's format remains very attractive after witnessing the weight of popular CBOR and msgpack implementations.

Meanwhile, Peter Saxton (EYG) sent me a friendly email recommending CBOR as a potential alternative to MessagePack.

Uh oh -- competing standards? Marginal design tradeoffs? Open-source woes? Buckle up!

Drama is fun, but not relevant to this essay. This post provides a solid summary of CBOR's history. This comment links to the most contentious public spats.

Which is cooler?

Obviously MessagePack is what cool kids would use.

Compare the subheadings on each landing page:

One of these formats is wearing a damn necktie.

Everything about CBOR is uncool. It was designed by a committee. It reeks of RFCs. Acronyms are lame. Saying "SEE-BORE" is like licking a nickel. One of the authors is "Carsten Bormann", which makes the name feel masturbatory.

CBOR was inspired by MessagePack. MessagePack was developed and promoted by Sadayuki Furuhashi ("frsyuki").

-- RFC 8949

Loyalty to an "original" brand has merit. Instead of chasing mere incremental improvements, you can support creators who synthesize value from nearly nothing. To favor derivative work can feel like choosing bureaucracy over personal expression.

Which is more efficient?

But many people don't care about "coolness" -- they want compression and speed and performance.

In these benchmarks, the author compared performance between two popular Go libraries. From these tests, it appears that the CBOR library encodes/decodes ~200% faster.

Following bar chart show the time taken to encode basic data types - nil, int64, uint64, bool, bytes, string, array and map. This benchmark was done in go-language, using msgpack package and gson package. Source code is available here, here and here.

Which is simpler?

But efficiency isn't everything. I usually choose conceptual simplicity over performance. Like most makers, I resent depending on forces I cannot understand.

To measure complexity, you can often use documentation length as a proxy. MessagePack is just a markdown file. The CBOR spec has its own gravitational field.

However, upon further scrutiny, I've found that the documentation sniff-test misled me. This HN comment shares my latest thoughts:

Yeah, I skipped all the drama, read the spec and implemented an encoder/decoder. CBOR is just how MessagePack-like format should have been done from the beginning: it's technically superior in a sense that it's neat and simple, replacing many specialized rules with one generalization.

At its top-level, MessagePack defines a bunch of types: integers, floats, arrays, extensions, etc. CBOR unifies these types with "tags"; this pattern seems much easier to explain and implement.

This person posits that CBOR's tags are poorly designed. Although I respectfully disagree with his conclusions, I think he makes some great points.

"Serialization" often connotes "communication". Bandwidth is expensive, so formats like MessagePack and CBOR make obvious candidates for computer protocols.

Metcalfe's Law states that the value of a protocol is proportional to its popularity.

The data speak for itself:

3.1K☆ C msgpack/msgpack-c
2.4K☆ Go vmihailenco/msgpack
1.9K☆ Python msgpack/msgpack-python
1.8K☆ Go tinylib/msgp
1.4K☆ Java msgpack/msgpack-java
1.4K☆ JS msgpack/msgpack-javascript
1.2K☆ Rust# 3Hren/msgpack-rust
1.0K☆ JS kawanet/msgpack-lite
837☆ C# msgpack/msgpack-cli
806☆ Go fxamacker/cbor
784☆ PHP msgpack/msgpack-php
764☆ Ruby msgpack/msgpack-ruby
529☆ JS kriszyp/msgpackr
519☆ C intel/tinycbor
364☆ JS hildjj/node-cbor
354☆ C PJK/libcbor
320☆ JS paroga/cbor-js
311☆ JS msgpack/msgpack-node
303☆ JS kriszyp/cbor-x
300☆ Rust pyfisch/cbor
284☆ Rust enarx/ciborium
243☆ Python agronholm/cbor2
214☆ C# peteroupc/CBOR
210☆ Erlang msgpack/msgpack-erlang
196☆ Haskell well-typed/cborg
142☆ Swift valpackett/SwiftCBOR
138☆ Haskell msgpack/msgpack-haskell
119☆ Java c-rack/cbor-java

Which is better?

For my particular use-case, CBOR totally wins. Scrapscript expressions feel great inside CBOR's extension tags.

I prefer CBOR, but I don't like that I prefer CBOR. It irks me to use technology with political baggage.

Anyway, MessagePack and CBOR are vast improvements over JSON. As long as MessagePack retains its popular advantage, both options seem reasonable.

Protocols are important. Communicate with caution.