about now spam rss

Monomode & Multimode in Augmented Reality

Related AR/VR essays: Apple Will Win The AR/VR Wars, AR Demands Peripherals, Bananas Will Become Smartphones, Claim a Domain in the Wet Web, Tools and Techniques for AR/VR Media, AR Interoperability Opportunities

Humans mainly use 2 main modes of attention:

There will be 2 corresponding modes of AR/VR:


Notifications, HUDs, and PIP windows will exist in both modes.

These will appear close to your face so that they take precedence over the augmented world around you.

Text, lists, calendars, and legacy apps will survive as “flat” media.


AR begins with transparent displays over your eyes. This is useful for displaying notifications/HUDs/PIPs in front of your face. Remember Google Glass?

Magic happens when you strap cameras to your face. Computer vision algorithms can identify and track objects in your field-of-view.

But people don’t like heavy things on their face. There will be a pressure to limit hardware.

Object identification is CPU/GPU intensive. AR OSes will perform object identification on behalf of installed extensions. Gaze detection can be used to prioritize running software, but the OS can’t let each app run amok with its own computer vision stack.

The OS will identify objects and summon extensions in context. In multimode, extensions cooperate by doing minimum amounts of contextual work.

For example, when you look at a carrot, the OS can ask your extensions about carrots. Your extensions can respond with annotations/shaders to render over the carrot in your view. If multiple extensions respond, the OS can choose to automatically display the results or collapse them in a menu. Chefs and botanists would likely install different extensions, and therefore see different results for carrots.

Possible multimode extensions:


Multimode offers ubiquitous/diffuse interaction, but people need a monomode for focused/complex experiences.

Some apps demand “full-screen”. Monomode can dedicate lower-level graphics APIs and extra computing resources to a singular experience.

An OS can give limited data to extensions, but some apps will require direct access to image/IR data. Apps with complex inputs will likely have complex outputs. To keep the OS simple, many input APIs will only be accessible in monomode.

Because monomode apps may obstruct your vision, the OS will virtually “cover” your face in public. This will prevent accidental staring between strangers.

Possible monomode apps:

The Multimode App

Each AR/VR headset will have one multimode app (with extensions) and many monomode apps. The multimode app will behave like a modern web-browser among many mobile apps.

If people use large numbers of AR/VR apps, apps will be collected on a “home screen”. Otherwise, you can switch apps via a physical digital-crown carousel or an “alt-tab” gesture.

Wet Stuff

The social dynamics of wet stuff (public virtual objects) will evolve unpredictably.

People and organizations will post signs, instructions, statuses (e.g. wait times), art, etc.

If your friend wants to display digital art on their wall, will you see it in multimode with extensions? Without extensions? What about monomode? Will you see it automatically or do you have to “activate” it? Will businesses and advertisers have the same power to usurp your vision? Will one organization monopolize virtual objects, or will we build a decentralized protocol?

Now is the time to choose our AR interop protocols.