Object Detection

Apple Vision Pro

Getting Started

Step 1: CombinedRealityView.swift

Navigate to squiggly/Views/Main/CombinedRealityView.swift. This is where the code for the immersive space is.

Apple Vision Pro

Step 2: Object Tab in Toolbar

On the Apple Vision Pros, you will find the toolbar on the left. In this window, we have 3 toolbars including the left toolbar, bottom toolbar, and the top ornament. In the left toolbar, select the Objects Tab to open up the section.

Step 3: Select Objects to Detect

Once you are in the tab, you will find a NavigationSplitView. Try cycing through the objects in which you will see we trained 3 crayon boxes and some of its individual crayons. For our course today, we will focus only on the Crayon Box. Please deselect all objects except for Crayonbox 3_raw_ObjectMaskOn.

Apple Vision Pro

Step 4: Detect Crayon Box

Then click on the green Play button in the bottom toolbar. The bottom toolbar will then change into a different UI as you enter into the immersive space. Now, find your crayon box on your table. When detected, you should see a yellow overlay with a UI overlay.

Apple Vision Pro

The box changes shape when it is opened. For best results, cover the face of the crayon box with the tabs. Apple Vision Pro

Overview

An image classifier is a machine learning model that recognizes images. When you give it an image, it responds with a category label for that image. You train an image classifier by showing it many examples of images you’ve already labeled. For example, you can train an image classifier to recognize animals by gathering photos of elephants, giraffes, lions, and so on. After the image classifier finishes training, you assess its accuracy and, if it performs well enough, save it as a Core ML model file. You then import the model file into your Xcode project to use the image classifier in your app. CreateML is a tool that we will be using to support us through this process.

Reference Objects

In this section, we share this process was implemented. First we started off by picking a 3D model. In our case, we got 24 count crayon boxes from Target. Luckily, they are on sale for 0.50 cents (USD)!!

Apple Vision Pro

There are a number of different ways we can gather data on our crayon box.

If you are on an iPhone Pro (iPhone 12, 14, 15, 16) you may use Apple's Reality Composer iOS app or download their Object Capture application.
If you do not have an iPhone, you may take at minimum 10 photos of your item. It is recommended to take between 50-100 photos for each object. Then, on Reality Composer Pro we can compile the images and create a 3D model.
Alternatively, we can use other applications such as PolyCam, Luma3D, and Abound.

Other than our crayon box, we experimented with training individual crayons, the Ardunio UNO, stuffed animals, Raspberr Pi Pico, and whiteboard Expo markers.

Apple Vision Pro

If you are working with a set of images, you may open up Reality Composer Pro on the Mac and select Create New Object Capture Model. From there, a 3D model in a .usdz format will be created for you. You may also use existing 3D models, just make sure they are in .usdz format.

Apple Vision Pro

Next, we can bring in our .usdz file into CreateML. It is important to make sure the objects are in the right dimensions in cm. If the dimensions are incorrect, you may modify them in a 3D modeling software such as Blender.

Apple Vision Pro

Finally, you can start training! It is expected to take some time. Once it is complete, you may export it as a Reference Object in .referenceobject format. Once you have your reference object, you may place the reference objects into your XCode project under the a folder named Reference Objects.

Apple Vision Pro

The reference objects selected for this project were a bit small. Creating a 3D model of small objects such as crayons and markers were difficult, especially with the object capture application. Taking individual photos of the crayons and then processing the photos in Reality Composer Pro also was difficult.

Apple Vision Pro

After creating the reference objects, we are then able to add them to our Reference Images folder. From there, we will see our reference objects appear in our UI. Apple Vision Pro

Local vs Cloud

All models used in the Squiggly app are using local models. For our learning purposes, we wanted to make sure the projects were able to run even without WiFi.

you can run machine learning models either locally or in the cloud. When running locally, the Core ML model is downloaded to your device, allowing the app to work completely offline with low and consistent latency. All images stay on the headset, which means your data never leaves the device, making this option ideal for real-time interactions, privacy, and demos. However, the model size is limited by the app package and the computing power of the device.

ImageClasifier.swift

let request = VNClassifyImageRequest()
        let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
 
        do {
            try handler.perform([request])
            if let results = request.results as? [VNClassificationObservation] {
                for classification in results {
                    file.observations[classification.identifier] = classification.confidence
                }
            }
        } catch {
            print("Vision error: \(error)")
        }
 
console.log(a);

When running in the cloud, Squiggly sends snapshots to a Hugging Face endpoint where the model is hosted. This allows you to use larger or frequently updated models without having to ship a new app build. It does require an internet connection, and performance depends on network speed. Cloud inference is useful for rapid iteration, A/B testing, or when using architectures too large for on-device execution. By default, Squiggly runs locally, but if confidence in a classification is low or the local model isn’t available, it can automatically fall back to the Hugging Face cloud option. You can also manually switch between local and cloud inference in the app’s settings

Here is an example of running a model from Pytorch from Hugging Face. If you are running .mlmodels you can’t load it with transformers. Instead, run it with Core ML which means Swift (iOS/macOS) or Python via coremltools. Read more here (opens in a new tab).

CloudModelExample.swift

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
 
model_name_or_path = "debbieyuen/squiggly"  # your model on Hugging Face
device = "cpu"  # or "cuda" if you have a GPU
 
processor = AutoImageProcessor.from_pretrained(model_name_or_path)
model = AutoModelForImageClassification.from_pretrained(model_name_or_path).to(device)
 
image = Image.open("test.png")
inputs = processor(images=image, return_tensors="pt").to(device)
 
outputs = model(**inputs)
predictions = outputs.logits.softmax(dim=-1)
predicted_class = predictions.argmax(dim=-1).item()
 
print(model.config.id2label[predicted_class])

Photo Classification User Interface