Exploring Open-Vocabulary Scene Understanding in XR... anyone tried it?

Wylie · February 18, 2025, 2:30am

Hey everyone! I wanted to share our recent work, OpenMaskXR, which focuses on open-vocabulary scene understanding in extended reality. We demonstrate how commodity XR headsets can identify object instances based on user queries in natural language. This goes beyond the classification of a fixed set of objects and allows for more dynamic interactions. You can check out our video here: https://youtu.be/rDraLkbDRW0. I’m curious if anyone has seen similar advancements in the industry. Happy to answer any questions!

Ari · February 18, 2025, 2:30am

Very interesting, I opened 2 issues on GitHub regarding the app and the WebXR client so that I can test better. Thanks for sharing.

Arden · February 18, 2025, 2:30am

This sounds like a game changer for XR development. How do you handle the natural language processing aspect?

Remington · February 18, 2025, 2:31am

I love the idea of making computing invisible in XR. It’s about time we rethink UI. Any examples already in the industry?

Donovan · February 18, 2025, 2:31am

What hardware do you recommend for testing OpenMaskXR? I’m curious if it works with lower-end devices.

Mal · February 18, 2025, 2:31am

I’m looking forward to trying this out. Any tips for getting started with your setup?