100,000,000 CROWPOWER and no horses on the moon

tl;dr: Humans have no damn clue how to measure intelligence.

Raising Water

Water is wet (and heavy). Because it's wet/heavy, it tends to flow downhill (and underground).

To continue living, humans often wet themselves ("drinking"/"bathing") and their plants ("irrigation"). But many humans live uphill (and aboveground) -- to maintain wetness, they raise water to their homes/farms.

Once you carry your own water, you will learn the value of every drop.

But water is heavy (and wet), so humans built machines ("horse mills") and forced horses to raise water.

Horses (and humans) are made of meat. Meat is great, but it's prone to disease, exhaustion, distraction, etc. Ever cleverer, humans built non-meat machines ("steam engines") and forced water to raise water.

Horse Numbers

So that an engine which will raise as much water as two horses, working together at one time in such a work, can do, and for which there must be constantly kept ten or twelve horses for doing the same. Then I say, such an engine may be made large enough to do the work required in employing eight, ten, fifteen, or twenty horses to be constantly maintained and kept for doing such a work…

-- Thomas Savery, The Miner's Friend (1702)

Horses can do work, i.e. exert force over distance. Work over time is "power".

To explain his steam engine to other humans, James Watt defined "1 horsepower" as "33,000 foot-pounds per minute", which approximates a typical horse's work on a typical mill.

The "foot-pound" is the worst unit of energy. Be careful not to confuse it with the "pound-foot", which is a unit of torque.

Horse numbers are convenient at horse-scale, but cumbersome in calculations for telegraphy and rocketry, so scientists/engineers literally removed horses from the equation. Humans now measure power in "Watts" -- named after the human who named the measurement after horses. 1 horsepower equals ~746 watts.

One SpaceX Starship exceeds 100 million horsepower, but 100 million horses probably can't pull a sleigh into orbit. Horse-force is not thrust, and Earth's ~60 million total horses are not enough.

Indeed -- scientists have yet to discover even a single horse living on the moon. Terra Luna's scant fossil record suggests that horses may have never even established a stable population beyond Earth's atmosphere. Biologists blame the moon's unforgiving atmosphere; physicists blame the tyranny of the rocket equation. Either way, the moon seems safe from equine invasion.

Microwave ovens run at roughly one horsepower. This sounds like nonsense unless you're familiar with math, energy, work, dimensional analysis, electromagnetism, radiation, dielectric heating, magnetron design, and thermodynamics.

Well, it sounds like nonsense until you microwave your hundredth frozen burrito and it becomes mundane magic. We learned to measure energy, then capture it, store it, and harness it.

One Intelligence, Please

But humans still have no damn clue what "intelligence" is. We can't measure it, can't capture it, can't store it, and rarely use it.

Sometimes intelligence smells like "cognitive horsepower", i.e. some people/machines seem to have better overall engines for doing brilliant thinky-things. "g-factor" researchers show that many positive cognitive traits tend to correlate with each other. But the world also creates counterexamples like AlphaGo and Kim Peek -- non-generalizable brilliance.

IQ demonstrates intelligence in the same way that horse races demonstrate horsepower.

We can't define intelligence, yet we desperately want it -- and pay handsomely for it. Institutions approximate cognitive horsepower (if it exists) via crude proxies:

headcount & "man" hours/months
age & total years of experience
processing power (e.g. CPUs, GPUs, clock speed)
portfolios & selected works
standardized tests (e.g. SAT, IQ, ARC)
reputation/klout/endorsements

It's unclear how these measures compare and interact. If I were to get a heart transplant tomorrow, should I prefer 5 medical students over 1 expert? Should I prefer 2 Harvard grads over 3 UCR grads? A human child or 10,000 crows?

Such comparisons sound like nonsense; we lack equations to convert absurdity into understanding. We want to convert cognition into mundane magic. We need crowpower.

Crows are a good unit of measurement. They're cute (awww), smart (whatever that means), portable (~500g), and consistent/fungible (no 10x crows).

Crowpower

Scientific revolutions are punctuated by paradigm shifts. These shifts often occur when thought-experiment crash into new mathematical tooling: Schrödinger's cat, Newton's cannonball, Hilbert's hotel, Bell's spaceship, Maxwell's demon, Mermin's device, Zeno's race, Heisenberg's microscope, Galileo's ship, Savery's horse, Turing's machine, etc.

In each case, mature mathematics hit the limits of human intuition. Consider crowpower a catalyst.

Difficulty

We don't know what it means to cognitively "raise water". We lack the tools to quantify (or estimate) intellectual work. Consider the following tasks:

We intuitively understand these as "challenges", but it's hard to explain how or why they're challenging. Concepts like computational complexity, logical depth, learnability, Kolmogorov complexity, etc. could be different parts of the same elephant.

There are no horses on the moon -- could 100 million crows solve Fermat's Last Theorem?

FLT was postulated in 1637. Despite countless attempts, it went unsolved until Andrew Wiles produced a proof in 1994. This was absurdly difficult; many rank Wile's FLT proof among the greatest feats in mathematical history.

Units

100M crows might not be able to prove FLT, but could 100 clones of Adam Sandler do it?

I know very little about Adam Sandler -- he could totally be as smart as Andrew Wiles. I specifically chose a comedic actor who plays an average Joe.

I fully expect that comparing Sandler to Wiles is like comparing a 10-watt heater to an 11-watt blender. Wattage clearly explains rotational vs. thermal energy; nobody blames their heater for frothing milk poorly.

Here are some crude units-of-measurement to consider:

1 crow < 1 gump < 1 joe < 1 wile < 1 oz < 1 hal

I shouldn't need to tell you that rhetoric like this is dangerous. Don't take this too seriously. Be kind to each other.

Since nobody knows how human intelligence scales, "oz" (superhuman intelligence) purposefully ambiguates Oz and Ozymandias. Of course this also ambiguates the accepted abbreviation for "ounces", but this is the best I can do with my limited joepower.

It took 1 wile to prove FLT. It remains unclear how many joes it would take to perform the same feat. Here are some common responses to this thought-experiment:

"1 joe cannot be compared to 1 wile. G-factor is misguided; intelligence is not a one-dimensional phenomenon."
"1 joe is functionally equivalent to 1 wile, but needs more time to complete the same task. It might take 100 joes to prove FLT in a similar timeframe."
"1 joe is functionally equivalent to 1 wile, but doesn't have the memory/stack-depth to complete the same task. It might take 100 joes to hold FLT in their heads."
"1 joe fundamentally lacks some mental machinery in 1 wile. There is no reasonable amount of joes that could prove FLT."

We still have no damn clue what we're measuring.

OpenAI's GPT models might illuminate our fragile human hierarchies. Is GPT-4 closer to 99 gumps or 0.8 joes?

We weigh horses because we don't know how to test strength. In this world, nobody can distinguish a strong horse from a fat horse.

Scaling

100 duck-sized horses are not equivalent to 1 horse-sized duck. 100 1MHz processors are not equivalent to 1 100MHz processor.

Neil J. Gunther's Universal Scalability Law formulates this phenomenon:

C(N) = N / (1 + α(N-1) + βN(N-1))

C : capacity or throughput
N : number of processors, threads, or nodes
α : contention coefficient (serialization)
β : coherency coefficient (crosstalk)

Note that β (i.e. "communication overhead") dominates parallelization gains. As team size increases, the cost of talking can exceed the value of the work.

Even if 100M crows could be motivated to prove FLT, the bandwidth of crow speech is probably insufficient.

Coordination is hard. Humans build tools like traffic signs and punch clocks and SMS to more efficiently communicate across spacetime. Likewise, crow communication could be augmented with specialized tools/devices. Imagine millions of crows wearing the cutest little VR headsets -- each bird working on their own microscopic math mini-game in exchange for grapes or whatever crows eat.

We also don't know how to measure motivation. How many kilowatt-hours (a proxy for economic value) would it take to incentivize a crow to solve equations? How many kW-hours would it take to make those crows flip burgers?

But we've got too many variables on the table -- let's assume all crows are telepathic and cooperative. When β is zero, USL is equivalent to Amdahl's Law.

α represents contention. This variable depends entirely on the problem (e.g. proving FLT) and solution (e.g. proof strategy/algorithm). Information "assembly lines" cannot be parallelized -- some work/processing/computing cannot begin until intermediate results are completed.

In some sense, all difficult problems are difficult because they are sequential. In ten coin flips, it is easy to get any head, but hard to get all heads.

Automated theorem-proving is hard. Because FLT was remarkably difficult, the proof is probably resistant to highly-parallel strategies. 100M crows can only prove FLT if they have enough compute/memory to complete its most difficult subsequence.

Emergence

With enough training and error-correction, an average crow could emulate a transistor. A sizable murder of crows could emulate a Commodore 64, an Intel i9, an Nvidia RTX 5070, a human brain, etc.

If you believe that a crow can emulate a transistor, it would only take a few thousand crows to build a CPU. With enough patience and mechanical prowess, crows could summarize PDFs and write novels.

The Chinese Room Argument is discussed ad nauseam -- few folks would consider individual crow transistors/neurons as "intelligent" despite their emergent behavior. But it's unclear how much intelligence (if any) each crow can contribute to a collective.

There is only one way to make salt; salt molecules cannot be "more salty" or "less salty". But there are infinite ways to make pepper -- a messy blend of biomolecules created by messy genomes created by messy selection pressures.

If intelligence is like salt, then crows are very expensive (and cute) transistors. If intelligence is like pepper, a murder could someday be President of the United States.

Phase-Changes

Many people view intelligence as a sudden "waking up" phenomenon. Ice melts; water boils. In this lens, evolution produced smarter ape architectures until a "phase-change" happened and Homo sapiens took center stage.

Whenever I glimpse phase-changes, I reach for universality in my mathematical toolbox.

It's hard to take this idea seriously if you've ever experienced childhood. Humans slowly grow intelligent. Even milestones like object-permanence and walking and literacy become gradual under scrutiny.

But ideas also "click" into place. It's hard to "unsee" ambiguous illusions. It's difficult to simultaneously understand why sqrt(2) is irrational and not understand it -- intelligence may be gradual, but the experience is sudden/frenetic.

But along some orthogonal axis, we've taught robots object-permanence and walking and literacy, but it's "not real general intelligence". It's "just AlexNet" or "just PID" or "just stochastic parrots" -- until AI performs some magic phase-change, many folks won't admit it into the Cognition Club; it's merely "artificial" intelligence until it's "synthetic" intelligence.

But if the Cognition Club is real, why is it so hard to describe its minimal entry requirements? How many crows would it take to make it into the club? How did a dead parrot obliterate the Turing Test?

Generality

Humans that excel at any subject tend to excel at all subjects. Researchers call this phenomenon "g-factor" or g.

This model compliments s-factors and contrasts theories of multiple intelligences.

But if 10K crows could comfortably beat every Nintendo game, would you trust that same murder to file your taxes?

Video games don't span the full gamut of human knowledge/ability, but they're arguably the most objective available measure of general problem-solving ability.

Many video games are harder than college-level courses. Whirlitzer of Wisdom involves lunar cartography.

Video games form an objective (albeit anthropocentric (and ethnocentric)) hierarchy for g:

Typewriter monkeys could beat Super Mario Bros. given enough time, so this measure needs additional parameters. Because game-completion times can range from minutes to days, a reasonable time constraint might be "no more than 100x slower than current glitchless any% WR". For zero-shot attempts, it might be wise to allow ~10 lives/restarts within the total allocated time limit.

Let's try an example. Suppose you want to hire crows to beat racing games. Murder A beats 40% of Mario Kart installments. Murder B beats 100% of first-person shooters. Murder C beats 5% of all games. Which murder do you hire?

Intuitively, games are more similar when insights/learnings transfer, i.e. learning game A reduces the learning effort of game B. We'd expect the average "learning distance" between racing games to be smaller than the distance between all games. If learning distance is independent of players, we can arrange all these games in a high-dimensional "gamespace".

Extrapolating from this framework, g-factor does not measure the competence of players -- it measures the compactness (or maybe compressibility) of a gamespace region. The existence of a g-factor merely suggests that human school subjects are not so dissimilar: French, English, music, physics, mathematics, etc.

But what in the precise h*ck is gamespace? Surprise -- if our games are arranged by learning distance, then gamespace simply contains all learnable problems.

There are multiple ways to define "learnability": statistical learning theory, algorithmic learning theory, computational learning theory, etc.

Supersimulators

Learnable problems are a subset of computable problems.

The Church-Turing Thesis asserts that computable functions are precisely those that can be computed by a Turing Machine (TM) and anything that can simulate a TM. A system is [Turing-complete (universal)] if it can simulate any TM.

Many systems are unexpectedly Turing-complete, e.g. Dwarf Fortress, Minecraft, Conway's Game of Life, Magic: The Gathering.

Humans simulate computers, simulate conversations, simulate copulation, and simulate creatures.

To "think" is to simulate oneself. Memories simulate the past; dreams simulate the future. The hard problem of consciousness -- why subjective experience exists -- might be a mere side effect of simulating simulation itself.

If epiphenomenalists are correct, consciousness might be an unnecessary side effect of intelligence.

If true, all universal simulators ("supersimulators"?) are members of the Cognition Club. Simulation-depth might be a useful metric.

No two supersimulations are alike -- only a bat can be a bat, and only you can be you.

Learning is arguably an act of simulation: players sample examples, then predict (i.e. simulate) results. Difficult games demand more training; bad players require more training.

Andrew Wiles required 41 years of post-training to prove Fermat's Last Theorem. As it stands, 100M crows face fierce competition.

Things We Measure

We measure things we care about; we make units for things we measure. We made horsepower to sell steam engines. We made Watts to harness the [literal] power of electricity.

We needed energy beyond horses; we need cognition beyond crows, but we cannot measure intelligence. We think we know what intelligence looks like, but we have no clue when/how/why it happens.

Humans tend to confuse properties with processes. Illness isn't a divine curse; it's wild emergent behavior culminating from the struggle of countless organisms to survive a little longer. A frog is not a thing that hops; a frog is the phenomenon of frogging.

Measure difficulty. Measure motivation. Measure contention. Measure crosstalk. Measure collaboration. Measure gamespace. Measure compression. Measure learning. Measure simulation-depth. Measure everything. Measure anything.

Humans need intelligence. We get the units we deserve.