Spatial Design and CoreML for the Apple Vision Pro

Apple Vision Pro

Figure 1: The Apple Vision Pro. Source: Apple Newsroom (opens in a new tab)

Thank you for joining our session, hosted by Debbie Yuen (USC) (opens in a new tab) and Mark Ramos (NYU), and made possible through the generosity of NVIDIA, Apple, and SIGGRAPH! In this session, we focus on spatial user interfaces (UI) and graphics for three-dimensional environments. We explore designing intuitive and predictive 3D user interfaces with machine learning. Before we get started with our lecture and hands-on lab, we would love to provide a brief introduction on the Apple Vision Pro, what you can expect from this course, and the materials you will need.

Getting Started with the Apple Vision Pro

We have a total of 25 Apple Vision Pro placed by every other computer. Please gather together in groups to collaborate on the programming exercises and take turns sharing the Apple Vision Pro. For the best experience with the Apple Vision Pro, we recommend remaining seated and removing jewelry that may come in contact with the Apple Vision Pro while being worn. If you are experiencing any discomfort, please alert the instructors immediately.

💡

Note: To read more about how to adjust the fit of the Apple Vision Pro, complete setup, and navigate applications, please visit Getting started with the Apple Vision Pro!

When the Apple Vision Pro is not in use, please place it carefully back in its case along with its associated cords and materials. Before leaving the lab session, we ask to double-check that the Apple Vision Pro is stored properly and its charging cables are attached.

Why the Apple Vision Pro?

Apple has a reputation for setting the standards for UI/UX and great user experiences. In combination with advanced technology for motion gestures, eye tracking, speech recognition, and object detection, we believe that exploring the space of multimodal 3D UI/UX with the Apple Vision Pro would not only be fun but also valuable to many.

As we explore the world of predictive UI together, we challenge you to innovate and imagine what good user experience looks like in a spatial environment. How might we integrate AI/ML tools to design interfaces that support users in navigating 3D environments, particularly in passthrough mode?

When we think of UI/UX and Product Design, might think of designing 2D interfaces with platforms such as Adobe XD, Figma, and Bezi. If you are interested in focusing on design without programming, we will provide design resources for the Apple Vision Pro. In this session due to time constraints, we will focus on developing interfaces with SwiftUI. Working with code will allow us to create advanced and custom interfaces while giving us the opportunity to look at front-end interaction design and back-end logic.

What about 3D UI with game engines such as Unreal or Unity? Unreal and Unity are both really great tools for developing cross-platform spatial experiences. In this session, we particularly want to focus on the tech stack with CreateML, the Vision Framework, Reality Composer, and CoreML, all of which integrate well with XCode and can be quickly built to the Apple Vision Pro. We will provide resources on developing for the Apple Vision Pro with Unity and Unreal as well.

Thank you again for joining our session. We hope you have fun working with the Apple Vision Pro and are able to learn something valuable from us.

Course Materials

Materials	Description	Links
Lecture Slides	Shareable link to the session’s presentation	Keynote Slides (opens in a new tab)
GitHub Project	Open-source visionOS project on SwiftUI, CoreML, and Vision Framework	GitHub Repository
Reality Composer (iOS)	iOS mobile app for object detection	Website (opens in a new tab)
Demo App: Squiggly	visionOS project built to the Apple Vision Pro as prototype	Apple Vision Pro
Research Paper	Background and motivation on this area	Research Paper

If you are working on your own Mac computers and iPhones, we will be using the following software today. Please keep in mind that your Mac computers must be on macOS Sequoia and a Mac with an Apple silicon chip (M1, M2, M3, M4).

Materials	Description	Links
Reality Composer Pro (Mac)	Shareable link to the session’s presentation	Website (opens in a new tab)
CreateML	Open-source visionOS project on SwiftUI, CoreML, and Vision Framework	Website (opens in a new tab)
Xcode 16.3	visionOS project built to the Apple Vision Pro as a prototype application	Website (opens in a new tab)

Agenda

5 minutes: Welcome and Introductions
Welcome, overview of the course, instructor introductions, and course motivation.

10 minutes: Introduction to Swift and visionOS
An overview of visionOS and spatial computing. We set up a new visionOS project in XCode and get the Apple Vision Pro simulator working. Then, we briefly cover Swift, SwiftUI, RealityKit, and Reality Composer for visionOS. Students experiment with creating views, UI, and windows.

20 minutes: CoreML with SwiftUI
Overview of Apple's CoreML and how to use their models to integrate machine learning in visionOS applications. Attendees will set up and add their CoreML files to XCode and explore hand and eye-tracking gestures for the Apple Vision Pro.

20 minutes: RealityKit and Reality Composer
Design an interactive spatial UI using RealityKit, SwiftUI, and Reality Composer. Attendees may bring in their own 3D models to use in their own applications and further explore the Vision framework [~visionframework].

5 minutes: Closing Remarks
Review, questions, and discussion.

Questions

What if I don't have access to an Apple Vision Pro?

Object detection, gestures, and eye tracking all depend on the Apple Vision Pro. Alternatively, you may also build to a visionOS simulator in XCode. It is required to work on a Mac when building for visionOS.

How might I build the Squiggly App on my own Mac?

Visit github.com to see the instructions on how to set up the Squiggly App on your perosnal Mac. You will need a GitHub account, Git and GitLFS. If you would like to access the Hugging Face modeles, you will also need a personal token.

Where can I access the machine learning models and datasets?

For this lab course, models are compiled as a submodule folder from github.com. However, you may find the full set at their respective repositories. For object detection, the .referenceobjects are found in the Reference Objects folder. Yoou may find all reference objects trained for this course are found at huggingface.co. The pretrained models for MNIST, FastViT, and the Vision Framework can be found at their respective websites and repositories. For the the squigglydataset, you may use and contribute to the dataset at huggingface.co.

Where can I find 3D assets for visionOS projects?

Reality Conposer Pro for Mac has great 3D models in .usdz format. Alternatively, you may also use your own 3D model and convert it to .usdz format. If you are using free models online for commerical use or AI training purposes, please remember to read the licensing!

XCode can't find my Apple Vision Pro device. How can I build to the Apple Vision Pro from the Macbook?

Make sure both the computer and the Vision Pro are on the same Wifi network and ensure that Developer Mode on the Apple Vision Pro is turned on. If your Apple Vision Pro has a developer strap, make sure the cord is connected properly between the computer and the Vision Pro. In the XCode header, select Manage Run Desinations and click on the Apple Vision Pro.

What if I have ideas on how to improve the Squiggly App?

Create a fork of the squiggly app and send us a pull request!!

Does the Squiggly App run on visionOS 1.0 or visionOS 26?

The Squiggly App runs only on visionOS 2.0 and does not work on visionOS 1.0. In the upcoming months, we are expecting to upgrade to visionOS 26.

Is it a requirement to use CreateML and Reality Composer Pro?

It is not a requirement to use CreateML to create new models and it also is not a requirement to use Reality Composer Pro as your primary 3D modeling software. The 3D models in Squiggly were exported from Blender, converted to .usdz format and then brought into Reality Composer Pro. The Squiggly App relies on RealityView in which it depends on the corresponding Reality Composer projects.

What object detection application did you use to capture the 3D objects?

The Reality Composer iOS app was used to create 3D scans of physical objects. The models were then processed on the Mac with Reality Composer Pro. The Reality Composer iOS app is compatible only with iPhone Pro devices.

How can I reach the instructors?

You may contact Debbie at her personal email at deborahyuen@berkeley.edu or her work email at yuend@usc.edu. And Mark at __.

Resources