How to build god and get away with it

My daughter adores our great glass slabs. She believes that screens are sources of endless entertainment. Unfortunately, she's correct.

Sometimes mommy holds a slab to her ear and says, "Hello, this is Dr. Mommy." Sometimes Grandma is a video and a little mirror hovers in the corner. When Grandma is gone, it's Baby Shark and animals and Halloween and silly songs and -- oops, where did it go?

Sometimes Baby Shark disappears -- suddenly replaced by a grid of colorful icons. My daughter fails to grasp how dangerously close she is to summoning the police, deleting irreplaceable data, butt-dialing my ex-girlfriend, and so on.

She understands that slabs are magic, but my daughter knows not the enormity of this power. She will soon realize that she can watch people powerwash driveways and watch people play a powerwash simulator as fast as possible. She'll be able to "swipe left" on thousands of eligible partners and/or join the bread-stapling community. She will discover that humanity's artistic catalogue is free with library loans (or piracy), that all memories are permanently stored and instantly accessible (but rarely accessed), that video games can supersede hunger, that 90% of everything is crap, that any candy and any toy can be shipped to our doorstep in less than 48 hours, that all of it is somehow growing smarter, and that change is exciting (at first).

Clever Computers

Humans adore ever-cleverer computers. They claim clever computers are sources of boundless knowledge (and endless entertainment). Unfortunately, they're correct.

Over the past few decades, humans snatched the iPad from the universe; we grabbed the slab and ran. We can heat up water with decaying atoms (if it's not in our backyards). We can sever those atoms and annihilate bad guys (except when the bad guys have the atoms too). We can solve hunger (but it's inconvenient). We might cure aging. We might create affordable humanoid robots. We might build very clever computers. We might manufacture wisdom at scale. We might build god.

None of this is inevitable. For example, humans may discover that demigods beat gods -- that meat brains can be upgraded more readily than silicon brains -- unlikely, but possible.

If our gods smile upon us, Earth's inhabitants will flourish. In that best case scenario, benevolent entities maximize wellness for all known life, forever and ever, amen.

Two Paths to Ruin

There are two obvious paths where building god goes sour:

too slow: bad guys create a bad god on purpose
too fast: good guys create a bad god on accident

To be fair, most "bad guys" wield good intentions. But because nobody yet knows how human ideologies scale in silicon, any attempt at doing so would be reckless.

Likewise, "bad gods" inflict suffering. Such gods needn't be evil to wreak havoc at superhuman levels.

If experience is epiphenomenon, then bad gods needn't even be conscious.

The good guys win if they build good gods on purpose, but only if they build a good god before a bad god wakes up. Unfortunately, this further incentivizes the good guys to cut corners and build bad gods on accident.

You can model this in game-theoretic terms with an N-player game, where each player is given two buttons. The "safe" button awards 1 point; the "fast" button awards 10 points but has a random 1% chance of ending the game immediately with no winners. Each player can see each others' points, but no player knows how many points are needed to win the game. The first player to pass the secret point threshold wins the game. Upon reaching the win condition, flip a coin; if heads, all players win the game.

And so recklessness will accelerate when caution is most needed. When humans need god most, they will receive exactly the god they deserve.

Safer, Better, Faster, Cheaper

Any safety mechanism that hampers speed/quality/cost will be thwarted by defectors, which are precisely the people who shouldn't carry any advantage in existential games.

When building god, the only way to incentivize total cooperation is to make safety tools that also improve speed/quality/cost.

The Minds of Gods

There is only one obvious strategy that meets all these pre-conditions: making tools that inspect the minds of impotent gods. We must probe intentions, beliefs, habits, at all scales, at all accuracies. We must become better listeners.

Any org that controls crucial tools is prone to corruption. But open-source tooling helps good guys and bad guys alike, and we don't want to help bad guys. This remains a nasty quagmire.

For reinforcement learning (which seems like a viable route to god), better inspection tools mean better training tools (somewhat by definition) -- reducing risk while improving speed/quality/cost.

Technical hurdles stand in many domains and scales. Transparency/interpretability improvements can be coarsely categorized:

transparent models whose inner-workings and results can be directly understood by humans, e.g. linear models, decision trees
interpretability tools for post-hoc explanations, e.g. saliency maps, text explanations, consumer-facing UI
better datasets and methods for mapping model behavior to its training data, e.g. synthetic data
leaps in software/hardware development and infrastructure, e.g. privacy/security resources, fault-tolerant systems, specialized architectures
efficiently allocating capital to organizations improving transparency/interpretability, e.g. AI Grant, Cosmos Ventures
bolstering public interest/awareness in existential safety, e.g. LessWrong, CSIS

Researchers, tinkerers, entrepreneurs, dreamers, everybody -- we can work together. We can peer into gods' minds. But if we can't bring ourselves in alignment, what gives us the blind confidence that we can align our gods?

Collectively, staring into the mirror marked "objects closer than they appear", we can glean truth about all Sapiens in the dirty reflection of our training corpus.

The Best Parts

Sometimes, when I glimpse myself in my daughter's giant convex lenses, I wonder if she will inherit my height, or bad temper, or musical aptitude, or chronic depression, or this burning curiosity, or this passion for everything on The Pale Blue Dot.

Meanwhile, glass slabs grow more powerful yet. At some point my daughter will control her destiny with utmost clarity and self-awareness. Before she attains escape velocity, I want her to feel loved and safe and independent. I'll try to understand her feelings, struggles, dreams -- if I can somehow learn to listen faster than she can make mistakes. And long after my certain end, I hope she emulates only the very best parts of me.

Humanity will inevitably build god in man's own image. If we want to get away with it, we must manufacture transparency at scale, polish thoroughly, and smile.