taylor.town about now spam rss

VisionOS Dev: Taylor's Opinionated and Already-Outdated Guide (2024)

There is no substitute for project-based learning! Briefly skim this guide, and go try to build something with it. Come back whenever you find a lapse in understanding.

Apple recently announced VisionOS 2.0. Some of the updates are not covered in this guide.

All writings on taylor.town (including this guide) are available in plaintext, i.e. taylor.town/visionos.txt. To download for offline access, try curl https://taylor.town/visionos.txt > taylor-town-visionos.md.

This guide is still under construction! Please email me at [email protected] if you have any questions or suggestions.

Scaffolding Your Project

Here is what a typical VisionOS app looks like:

@Observable
@MainActor
public class AppState {...}

@main
@MainActor
struct MudPieApp: App {...}

struct ContentView: View {...}
#Preview(windowStyle: .automatic) {...}

struct ImmersiveView: View {...}
#Preview(immersionStyle: .mixed) {...}

#Preview

Use #Preview macros to create default simulator views in XCode.

#Preview( "Content", traits: .fixedLayout(width: 100, height: 100)) {
    ContentView()
}
#Preview("Volume", windowStyle: .volumetric) {
   ContentView()
}
#Preview("Mixed immersive space", immersionStyle: .mixed) {
   ContentView()
}
#Preview("Mixed immersive space", immersionStyle: .mixed) {
   ContentView()
} cameras: {
   PreviewCamera(from: .front)
   PreviewCamera(from: .top, zoom: 2)
   PreviewCamera(from: .leading, zoom: 0.5, name: "close up")
}

AppState

@Observable
@MainActor
public class AppState {
  var session = ARKitSession()
  var worldTracking = WorldTrackingProvider()
  var handTracking = HandTrackingProvider()
  var myEntity = Entity()
  // ...more members
  init() {
    Task {/* request tracking privs */}
    Task {/* load entities */}
    Task {/* etc */}
  }
}

Further reading: Task, TaskGroup

This is a singleton class. If you keep your views pure/memoryless, it becomes the sole place where all state is stored across your app. Updates to the instance propagate to all your views via Swift framework magic.

Most frontend development involves complex state management. I highly recommend adopting Elm's architecture paradigm into your mental toolkit.

But this mental model will only take you so far. As you'll find out later, entities force you to break the classic MVC paradigm by storing "view" details in your model. The shape/position/texture/etc. of your entities are mutable, and references to these entities must be persisted somewhere.

If you're a web-dev like me, working with entity hierarchies feels like direct DOM manipulation.

Don't be afraid! It's okay. Just exercise caution with your "sources of truth". For example, an entity's current color is a "true" value, not some copied value in a corresponding color picker.

Other bits of unsolicited advice:

MyApp

@main
@MainActor
struct MyApp: App {
  @StateObject private var appState = AppState()
  @State private var immersionStyle: ImmersionStyle = .automatic
  var body: some SwiftUI.Scene {
    WindowGroup { ContentView().environmentObject(appState) }
    ImmersiveSpace(id: "MY_SPACE") { ImmersiveView().environment(appState) }
      .immersionStyle(selection: $immersionStyle, in: .automatic)
  }
}

There's a lot to unpack here! For now, just note the general structure:

ContentView

struct ContentView: View {
  @Environment(AppState.self) private var appState
  var body: some View {
    // windows, volumes, etc.
  }
}
#Preview(windowStyle: .automatic) {
  ContentView().environment(AppState())
}

Put all your bounded, position-agnostic views here. The initial placement of windows/volumes can be suggested, but the size and position of the content is ultimately determined by VisionOS and its user.

ImmersiveView

struct ImmersiveView: View {
  @Environment(AppState.self) var appState
  var body: some View {
    RealityView { content, attachments in
      // add attachments and MyApp entities to `content`
    } update: { content, attachments in
      // runs when ImmersiveView properties change
    } attachments: {
      Attachment(id: "MY_ATTACH) {/* SwiftUI window/volume that will be converted into an entity */}
    }
    .gesture(/* listen for gesture targeted at entity */)
    // ...more gestures
  }
}
#Preview(immersionStyle: .mixed) {
  ImmersiveView().environment(AppState())
}

Whoa! Lots of stuff going on here. Much of this guide will be exploring the details here, so don't overwhelm yourself.

Heres' the gist:

Configuration

Here's a bunch of Info.plist junk you may need:

Display Containers

3D windows are called "volumes". They behave just like windows, but with a few quirks and restrictions.

2D windows can have limited 3D depth, e.g. hovering,. You can also display 3D content/models floating inside/outside of 2D windows.

You can use SwiftUI views (e.g. buttons, sliders) inside of 3D immersive spaces as "attachments".

Scenes

@main
@MainActor
struct MyApp: App {
  @StateObject private var appState = AppState()
  @State private var immersionStyle: ImmersionStyle = .automatic
  var body: some SwiftUI.Scene {
    WindowGroup { ContentView().environmentObject(appState) }
    ImmersiveSpace(id: "MY_SPACE") { ImmersiveView().environment(appState) }
      .immersionStyle(selection: $immersionStyle, in: .automatic)
  }
}

The default behavior of your scene is largely determined by Info.plist junk:

It is unclear if .immersionStyle(selection: ..., in: .full) will override UISceneInitialImmersionStyle. I'll report back here with some experimental findings.

Immersive spaces are somewhat like fullscreen experiences for VisionOS. When opening an ImmersiveSpace, VisionOS hides the content (e.g. Windows) of other apps.

struct OpenImmersiveSpaceButton: View {
  var solarSystem: SolarSystem
  @Environment(\.openImmersiveSpace) private var openImmersiveSpace
  var body: some View {
    Button("Open Immersive Space") {
      Task {
        let result = await openImmersiveSpace(id: "MY_SPACE")
        if case .error = result {
          print("An error occurred")
        }
      }
    }
  }
}
struct DismissImmersiveSpaceButton: View {
  @Environment(\.dismissImmersiveSpace) private var dismissImmersiveSpace
  var body: some View {
    Button("Close Solar System") {
      Task { await dismissImmersiveSpace() }
    }
  }
}

Use the environment's openImmersiveSpace(id) and dismissImmersiveSpace() to switch between the immersive spaces in your scene. Only one immersive space can be open at a time; you must dismiss any open spaces before opening up new ones. VisionOS will not automatically replace the space for you -- it will throw an error.

In a fully immersive experience, visionOS sets a 1.5-meter boundary from the user’s head. Moving outside this zone stops the immersive experience and activates external video to prevent collisions.

The user can manually dismiss an immersive space by double-tapping the digital crown on the AVP headset, so dismissImmersiveSpace isn't technically required. Double-tapping takes you "back" to all your previously-opened windows.

Your scene is the primary place to hang your anchors:

To add content to the view’s scene, you first create and add one or more AnchorEntity instances to the scene’s anchors collection. Anchors tell RealityKit how to pin virtual content to real world objects, like flat surfaces or images. You then add a hierarchy of other Entity instances to each anchor to indicate the geometry and behaviors that RealityKit should render at a given anchor point.

You'll probably want to hang virtual content (entities) on these anchors. Although physics/collision primitives handle many interactive cases, you'll probably need more specific tools to determine whether your entities meet spatial conditions.

For example, to pop bubbles with a lazer, you can raycast a line away from a hand anchor to find bubble entities anchored in the scene.

Lastly, your scene also provides message buses for coordinating pub/sub events:

Further reading: Scene, ScenePhase

Windows

@main
struct MyApp: App {
  var body: some Scene {
    WindowGroup {
      ContentView()
        .frame(
          minWidth: 100, maxWidth: 400,
          minHeight: 100, maxHeight: 400)
    }
    .windowStyle(.plain)
    .windowResizability(.contentSize)
    .defaultSize(CGSize(width: 600, height: 400))
  }
}

To programmatically open a window:

struct NewViewerButton: View {
  @Environment(\.openWindow) private var openWindow
  var body: some View {
    Button("Open Window") {
      openWindow(id: "MY_WINDOW")
    }
  }
}

You can also programmatically check whether multiple windows are allowed.

Volumes

WindowGroup(id: "MY_BALL") { Ball() }
.windowStyle(.volumetric)
.defaultSize(width: 1, height: 1, depth: 1, in: .meters)

Although you can specify a volume’s size in points, it’s typically better to use physical units, like the above code which specifies a size in meters. This is because the system renders a volume with fixed scaling rather than dynamic scaling, unlike a regular window, which means the volume appears more like a physical object than a user interface. For information about the different kinds of scaling, see Spatial layout.

A volume does not ever change size. Developers have no API for resizing windows in VisionOS, and the users cannot resize a volume at runtime.

Actually, it looks like users will be able to resize volumes in VisionOS 2.0.

3D Immersive Space

struct ImmersiveView: View {
  @Environment(AppState.self) var appState
  var body: some View {
    RealityView { content, attachments in
      // add attachments and MyApp entities to `content`
    } update: { content, attachments in
      // runs when ImmersiveView properties change
    } attachments: {
      Attachment(id: "MY_ATTACH) {/* SwiftUI window/volume that will be converted into an entity */}
    }
    .gesture(/* listen for gesture targeted at entity */)
    // ...more gestures
  }
}

You can change the immersion style at runtime, but styles are restricted via the in paramater in .immersionStyle(selection:in:):

You can change the immersion style after presenting the immersive space by changing the modifier’s selection input, although you can only use one of the values that you specify in the modifier’s second parameter.

It's very easy to confuse the direction of the z-axis, because larger values go "behind" the user. Also note that RealityKit's y-axis is opposite to SwiftUI's y-axis.

RealityView { content in
  let sphere = ModelEntity(mesh: .generateSphere(radius: 0.1))
  sphere.transform.translation = [0, 1, -2] // "absolutely" position (relative to origin)
  content.add(sphere)
}

RealityView.content is flexibly-sized based on your surroundings. You may need to convert between coordinate spaces.

The origin of an ImmersiveSpace defaults to the user's feet (unclear what happens when lying down), but the origin can change, e.g. the origin changes in a when multiple Spatial Personas enter a SharePlay activity.

You can use scene.immersiveContentBrightness(_) to suggest brightness levels for immersive spaces. You can use bright/dim/dark presets, or choose a linear brightness ratio between a standard dynamic range white video frame and the content background.

SwiftUI coordinate spaces differ from RealityKit coordinate spaces. This is how the SwiftUI ones work:

All geometric properties of a view, including size, position, and transform, are defined within the local coordinate space of the view’s parent.

Use content.convert(...) to switch between RealityCoordinateSpaces and SwiftUI CoordinateSpaces.

You'll be painfully reminded of this whenever you implement gestures:

.gesture(
  DragGesture()
  .targetedToAnyEntity()
  .onEnded { x in
    appState.holding!.transform.translation = x.convert(
      x.gestureValue.translation3D,
      from: .local,
      to: .scene)
  }
)

Quaternion? SIMD3? Matrix4x4? The interactions between different API representations can become painful, and I don't yet have any tidy suggestions. Please email me at [email protected] if you have any good mental models or advice for juggling these values.

Displaying Things

SwiftUI Views

SwiftUI offers components like Button, DatePicker, Image, VStack, etc.

Apple's official tutorials are a great starting point:

To embed 3D content into a window, use Model3D:

struct ContentView: View {
  var body: some View {
    VStack {

      // load from file
      Model3D(named: "MY_MODEL")
        .frame(width: 300, height: 600)

      // load from url with placeholder
      Model3D(url: URL(string: "https://example.com/elephant.usdz")!) {
          model
            .resizable()
            .aspectRatio(contentMode: .fit)
      } placeholder: {
          Color.green
      }
      .frame(width: 300, height: 600)

      // load from url with phases (good for progress bar, etc)
      Model3D(url: URL(string: "https://example.com/elephant.usdz")!) { phase in
        if let model = phase.model {
            model
        } else if phase.error != nil {
            Color.red
        } else {
            Color.blue
        }
      }

    }
  }
}

RealityView

struct ImmersiveView: View {
  @StateObject private var appState = AppState()
  var body: some View {
    RealityView { content, attachments in

      let mySphere = ModelEntity(mesh: .generateSphere(radius: 0.1))
      content.add(mySphere)

      if let myAttach = attachments.entity(for: "MY_ATTACH") {
        content.add(myAttach)
      }

    } attachments: {
      Attachment(id: "MY_ATTACH) {
        VStack {
          Button("RED") { appState.myColor = .red }
        }
      }
    }
  }
}

In this example, we added a SwiftUI attachment and basic sphere to our scene. I hope you are entertained.

Entities

struct ImmersiveView: View {
  var body: some View {
    RealityView { content in
      if let elephant = try? await ModelEntity(named: "elephant") {
        content.add(elephant) // entity appears at content origin by default
      }
      Task {/* do extra entity stuff in the background */}
    }
  }
}

And now it's finally time to address the elephant in the scene. Let's explore 𝔢𝔫𝔱𝔦𝔱𝔦𝔢𝔰.

To first give you an idea of what you're dealing with, here all the base entities at your disposal: AnchorEntity, BodyTrackedEntity, DirectionalLight, ModelEntity, PerspectiveCamera, PointLight, SpotLight, TriggerVolume, ViewAttachmentEntity

Hierarchies

Various AnchorEntities will be provided to you, e.g. hand anchors, anchors from tracked objects, anchors from people's faces, etc.

Use ModelEntity to display 3D models in your scene -- attach them to anchor entities (or descendants thereof), e.g. myAnchor.addChild(myModel). These entity hierarchies become useful when you want to search/manipulate the entities as a group.

To organize your hierarchy, you can use the following methods on any Entity class: id, scene, findEntity, isActive, anchor, isAnchored, parent, setParent, removeFromParent, children, addChild, removeChild, isEnabledInHierarchy

Enabling/Disabling

To stop simulating/rendering an entity, set entity.isEnabled = false. Reenable it with entity.isEnabled = true. You can also check whether any parent entity is disabled via isActive, which is only true when an entity is anchored and all ancestors are enabled.

Loading Entities

There are myriad ways to load these entities into your scene. Here are some examples from the SwiftSplash demo project:

await withTaskGroup(of: LoadResult.self) { taskGroup in
  for piece in pieces {
    taskGroup.addTask {
      do {
        guard let pieceEntity = try await self.loadFromRCPro(named: piece.key.rawValue,
                             fromSceneNamed: piece.sceneName) else {
          fatalError("Attempted to load piece entity \(piece.name) but failed.")
        }
        return LoadResult(entity: pieceEntity, key: piece.key.rawValue)
      } catch {
        fatalError("Attempted to load \(piece.name) but failed: \(error.localizedDescription)")
      }
    }
  }
}
for await result in taskGroup {
  if let pieceKey = pieces.filter({ piece in
    piece.key.rawValue == result.key
  }).first {
    self.add(template: result.entity, for: pieceKey.key)
    setupConnectible(entity: result.entity)
    result.entity.generateCollisionShapes(recursive: true)
    result.entity.setUpAnimationVisibility()
  }
  // ...
}

Here's another loading example from the Diaroma demo project:

To load a Reality Composer Pro scene, use load(named:in:), passing the name of the scene you want to load and the project’s bundle. Reality Composer Pro Swift packages define a constant that provides ready access to its bundle. The constant is the name of the Reality Composer Pro project with “Bundle” appended to the end. In this case, the project is called RealityKitContent, so the constant is called RealityKitContentBundle. Here’s how Diorama loads the map table in the RealityView initializer:

let entity = try await Entity.load(named: "DioramaAssembled", in: RealityKitContent.RealityKitContentBundle)

Here's another example, using .generateSphere to construct a ModelEntity from basic geometry:

struct SphereView: View {
  @State private var scale = false
  var body: some View {
    RealityView { content in
      let model = ModelEntity(
        mesh: .generateSphere(radius: 0.1),
        materials: [SimpleMaterial(color: .white, isMetallic: true)])
      model.components.set(InputTargetComponent())
      model.components.set(CollisionComponent(shapes: [.generateSphere(radius: 0.1)]))
      content.add(model)
    } update: { content in
      if let model = content.entities.first {
        model.transform.scale = scale ? [1.2, 1.2, 1.2] : [1.0, 1.0, 1.0]
      }
    }
    .gesture(TapGesture().targetedToAnyEntity().onEnded { _ in
      scale.toggle()
    })
  }
}

Ignore the component and gesture stuff… for now.

Transforms

To manipulate the position/roation/etc of an Entity, you'll want to use its Transform component. Here are its methods:

let sphere = ModelEntity(mesh: .generateSphere(radius: 0.1))
sphere.transform.translation = [0, 1, -2] // relative to anchor
myAnchor.add(sphere)

Gestures will also give you translations to convert. Here's how you might want to move an entity based on the final drag position of a gesture:

.gesture(
  DragGesture()
  .targetedToAnyEntity()
  .onEnded { x in
    appState.myEntity!.transform.translation += x.convert(
      x.gestureValue.translation3D,
      from: .local,
      to: .scene)
  }
)

Physics

let sphere = ModelEntity(
      mesh: MeshResource.generateSphere(radius: 0.1),
      materials: [SimpleMaterial(color: .red, isMetallic: false)],
      collisionShape: .generateSphere(radius: 0.1),
      mass: 1.0)

Straight from the Apple docs:

Models respond to physics simulations because they conform to the HasPhysics protocol. You give them mass and other physical properties with a PhysicsBodyComponent instance, and then apply forces or impulses. The simulator uses a PhysicsMotionComponent to manage the linear and angular velocity of the object. Alternatively, you can selectively circumvent the simulation to control position and velocity yourself. Do this for a given model by setting its physics body mode to PhysicsBodyMode.kinematic.

Models can also collide with one another, and with other entities that conform to the HasCollision protocol. The CollisionComponent provides parameters that let you manage which models collide with each other. It also lets you control the collision shape, which for performance reasons, is typically simpler than the visual geometry.

In other words, give your entities some mass and then use methods to apply forces, torque, impulses, etc. You can use downward force to simulate gravity, and an upward impulse to simulate a throw.

For performance reasons, the shapes that collide with each other are not the rendered shapes. This is exactly why things in video games appear to phase through each other when they get close! As your shapes get more complex, this effect can become more apparent. You can imagine that simple cubes and spheres may not capture the nuance of a high-speed collision between two horses. So you'll want to create the simplest mesh that produces the illusion of good physics.

The following ShapeResource functions are useful for defining collision shapes: .offsetBy, .bounds, .generateBox, .generateSphere, .generateCapsule, .generateConvex, .generateStaticMesh

You might find generateCollisionShapes useful when working with hierarchies.

For more complex stuff, you'll want MeshResource.

And once all this physical stuff is set up, VisionOS will automatically update the entities' Transform components over time.

You can create arbitrary surfaces to interact with, or find/use surfaces in your surroundings. More on that later.

Further reading: scene hierarchies for efficient physics simulations, handling different sized objects, controlling collisions

Anchors

let world = ARWorldTrackingConfiguration()
world.planeDetection = [.horizontal, .vertical]
session.run(world)

let planeAnchor = AnchorEntity(plane: .horizontal)
planeAnchor.addChild(ModelEntity(mesh: .generateBox(size: 0.1)))
scene.anchors.append(planeAnchor)

let imageAnchor = AnchorEntity(anchor: ARImageAnchor(referenceImage: "EXAMPLE"))
imageAnchor.addChild(ModelEntity(mesh: .generateText("Informative Text", extrusionDepth: 0.05, font: .systemFont(ofSize: 0.1))))
scene.anchors.append(imageAnchor)

RealityKit places anchors based on the anchoring component’s target property. For example, configure an anchor entity to find a detected horizontal surface like a table or floor. RealityKit will place that anchor once it detects an appropriate horizontal plane.

Some anchor entities may not show up in your scene if RealityKit fails to detect an appropriate place for them, e.g. an anchor entity with an image target won’t show up until RealityKit detects the specified image in the real world.

Add anchor entities directly to your scene’s anchors collection or anywhere else in the scene hierarchy by adding them to the children collection of another entity in your scene. AnchorEntity is a subclass of Entity, so you can make an anchor entity a child of any other entity.

RealityKit may move anchor entities as the scene updates, changing their location and rotation relative to their parent, even if your code never modifies its .transform property.

Sensing Stuff

ImmersiveSpace(id: "MY_SPACE") {
 ImmersiveView()
  .task {
    let planeData = PlaneDetectionProvider(alignments: [.horizontal])
    if PlaneDetectionProvider.isSupported {
      do {
        try await session.run([planeData])
        for await update in planeData.anchorUpdates {
          // ...
        }
      } catch {
        print("ARKit session error \(error)")
      }
    }
  }
}
.immersionStyle(selection: $immersionState, in: .mixed)

The VisionOS ARKit guide is very good:

ARKit in visionOS offers a new set of sensing capabilities. You adopt these capabilities individually in your app, using data providers to deliver updates asynchronously. The available capabilities include:

For each of these APIs, ARKit exposes anchor entities corresponding to real-world phenomena. You will likely either (A) attach your entities to these ARKit anchors or (B) record changes in these anchors to apply elsewhere.

Bonus: body/face tracking

ARKit sessions will not provide anchors (nor updates) unless you're in a full immersive space and other apps are hidden. You have to run \.openImmersiveSpace before ARKitSession.run() will work. View.task is a great way to run an async task before the ImmersiveView appears.

It's important to note that ARKit hand-tracking is completely separate from RealityView .gesture(...) methods.

Gestures

In SwiftUI, you can attach gestures to specific views:

VStack {
    Image(systemName: "star.fill")
    .gesture(
      TapGesture().onEnded {
        print("TAPPED IMAGE")
      }
    )
}
.gesture(
  TapGesture().onEnded {
    print("TAPPED VSTACK")
  }
)

In RealityKit, your gestures live at the RealityView level and will be targeted to particular entities within the scene:

RealityView {/* ... */}
.gesture(
  TapGesture()
  .targetedToAnyEntity()
  .onEnded {
    print("TAPPED SOMETHING")
  }
)
.gesture(
  TapGesture()
  .targetedToEntity(myFavoriteDolphin)
  .onEnded {
    print("TAPPED HENRIETTE")
  }
)
.gesture(
  TapGesture()
  .targetedToEntity(where: myDolphins)
  .onEnded {
    print("TAPPED DOLPHIN")
  }
)

Here are your three entity filters:

Use QueryPredicate to target entity subtypes. You can also reuse these queries to find entities within a scene.

You can specifically filter "pinches" with .handActivationBehavior. For anything besides pinching, you'll need to use a hand skeleton to figure out what's going on. More on that later.

InputTargetComponent

myEntity.components.set(InputTargetComponent())
myEntity.components.set(CollisionComponent(shapes: [.generateSphere(radius: 0.1)]))

Gestures only work on entities with collision components.

Seriously, this is pretty important: gestures only work on entities with collision components. InputTargetComponent ain't enough.

Luckily, InputTargetComponent is hierarchical, so any descendants with collision components can be targeted for gestures. You can "hide" a descendant entity from gestures by adding a disabled input component:

let inputComponent = InputTargetComponent()
inputComponent.isEnabled = false
myDescendantEntity.components.set(inputComponent)

Input Types

There are two types of input in VisionOS to use in your components, e.g. InputTargetComponent(allowedInputTypes: .indirect):

And this naturally brings us to gesture types. Everything works as you'd expect, except the "direct" input makes things weird. SpatialTapGesture is .direct, while all others have .direct and .indirect handling. But almost everything will be .indirect, because we're civilized, dammit.

I don't actually have a strong opinion on this, I'm just performing a comedic bit.

Gesture Types

Gesture Hand Poses

DragGesture()
.targetedToAnyEntity()
.onEnded { gesture in
  appState.myEntity?.setPosition(
    simd_float(gesture.gestureValue.inputDevicePose3D!.position.vector),
    relativeTo: nil)
}

For some gestures, you can observe hand orientation changes outside of ARKit hand-tracking sessions.

Gesture Ambiguity

Because multiple tap gestures on the same RealityView execute with a different number of taps, multiple gestures may be called at once. If a player double taps an entity, for example, both the single tap and the double tap gesture code get called, and the app has to determine which one to execute. Swift Splash makes this determination by using a Boolean state variable. If a player single taps, it sets that variable — called shouldSingleTap — to true. Then it waits for a period of time before executing the rest of its code. If shouldSingleTap gets set to false while it’s waiting, the code doesn’t execute. When SwiftSplash detects a double tap gesture, it sets shouldSingleTap to false, preventing the single-tap code from firing when it executes the double-tap code.

struct TapGestureView: View {
  @State private var shouldSingleTap = false
  var body: some View {
    RealityView {/* ... */}
    .gesture(
      TapGesture(count: 1)
      .targetedToAnyEntity()
      .onEnded { _ in
        Task {
          self.shouldSingleTap = true
          try? await Task.sleep(for: .seconds(100))
          print("SINGLE TAP")
        }
      }
    )
    .simultaneousGesture(
      TapGesture(count: 2)
      .targetedToAnyEntity()
      .onEnded { _ in
        self.shouldSingleTap = false
        print("DOUBLE TAP")
      }
    )
  }
}

Hand Shapes

for await update in self.handTracking.anchorUpdates {
  switch update.event {
  case .added, .updated:
    let hand = update.anchor
    let pose = Pose3D(hand.originFromAnchorTransform)
    guard let pose else { continue }
    let isPalmUp = pose.isApproximatelyEqual(
      to: Pose3D(
        position: pose.position,
        target: Point3D(x:0,y:0,z:0),
        up: Vector3D(x:0,y:1,z:0)
      )
    )
    let isRingIn = hand.handSkeleton?.joint(.ringFingerTip).anchorFromJointTransform.columns.3[SIMD3(0,1,2)].z ?? 0 < -0.1
    let isRingOut = hand.handSkeleton?.joint(.ringFingerTip).anchorFromJointTransform.columns.3[SIMD3(0,1,2)].z ?? 0 > 0.1
    let isPointerOut = hand.handSkeleton?.joint(.indexFingerTip).anchorFromJointTransform.columns.3[SIMD3(0,1,2)].z ?? 0 > 0.1
    if isPalmUp && isRingOut {
      print("LOOKING AT PALM")
    }
    if isPointerOut && isRingIn {
      print("SHOOTING")
    }
  }
}

I really hope that Apple produces a nicer API for hand shape recognition in the future. As of now, it's still pretty low-level, and hands are naturally complex.

To reduce complexity of hand recognition, I focus on four dimensions and ignore everything else:

  1. palm orientation
  2. index finger tip
  3. ring finger tip
  4. thumb tip

Because of the musculature of the human hand, the ring finger is a very good proxy for the middle finger and pinky, i.e. it's difficult to extend your ring finger by itself.

On Your Own

As you learn new APIs, two particular XCode shortcuts are extremely helpful for accessing documentation:

Starter Projects

VisionOS:

RealityKit:

ARKit:

Videos

WWDC23:

WWDC24: