about now spam rss

Compressing Codebase Collocates

A collocate is a series of words that frequently occur together.

Fill in the blanks:

Theoretical Information

The following equation defines “surprisal”:

Collocates suggest improvements to a message’s average information content.

Replacing collocates with simpler substitutes is called “compression”.


Programmers compress code via abstractions. They pull out patterns and turn them into reusable templates.

Bad abstractions create distracting compression artifacts in the codebase. Beware programming patterns that are unrelated to the movement of data. Much of modern softare is built with useless superstitions.

Even “good” abstractions aren’t free. Templates must be designed and defined and applied and remembered and maintained. If an abstraction isn’t compressing your code, it’s mere noise in your signal.

Carefully consider opportunity costs. Each abstraction creates artificial boundaries where other compression strategies might fare better.

Decompress, Recompress

The most reliable way to escape local compression maxima is to decompress everything and then recompress.

In my personal experience, most codebases can easily shrink a hundredfold and speed up a thousandfold. Sometimes much more.

To decompress a codebase, inline its paths of execution. For example, rewrite each endpoint of a webserver with only standard library functions and simple database drivers. One can repeat the decompression process all the way to bedrock machine code, but most programs accrue diminishing returns before that point.

To compress a codebase, recursively replace collocates with equivalent “zero-cost” abstractions. Don’t try to outsmart yourself – prioritize infrastructure for the most egregious repetition frictions of digital desire paths.

Brevity is a good indicator of compression, but sometimes counterproductive. When people rely on overly-specific mental models, the medicine becomes poison. Moving structures from git to wiki scatters information rather than compressing it. Onboarding processes are components of codebases. Secret truths are not mutual information.

Above all else, don’t force it. Obvious improvements become invisible when implemented well. Decompress, recompress. Breathe. Decompress, recompress. Breathe. Decompress, recompress.