Are the current AI glasses just a step towards AR + AI glasses?

Voss · November 29, 2024, 10:21am

Opinion: The current “AI Glasses” are awkward and will inevitably transition to AR + AI Glasses!

Here is another blog by our guest author Axel Wong. His old blog post about Meta Orion AR Glasses is very popular, with over 143,000 people reading it. This time he gives a breakdown of the current AI Glasses hype with countless companies investing in this type of glasses with camera and audio input for multimodal AI models — but without a display, without multimodal output. Enjoy!

____

In the past year, the news that Meta’s Ray-Ban glasses sold over one million units has excited many companies. The so-called “AI glasses,” seen as a promising new product category, have been hyped up once again. On the streets of the U.S., it’s not uncommon to spot people wearing these glasses, which are equipped with dual cameras and audio functionality.

A large number of companies, including Baidu and Xiaomi, have rushed into this space, even attracting entrants from unexpected industries like power bank manufacturers. Rumor has it that Apple and Samsung are also eager to join the race. This sudden surge of enthusiasm reminds me of the smart speaker craze from years ago. Back then, over a hundred companies in Shenzhen were making smart speakers — but as we all know, most of them eventually stopped.

Ray-Ban AI Glasses

At its core, what we call “AI glasses” today are essentially glasses equipped with audio and camera capabilities. Bose was among the first to introduce audio-enabled glasses with its so-called BoseAR, which was essentially a pair of headphones in the form of sunglasses. Around the same time, Snap released its first-generation Spectacles, which allowed users to record short videos. I bought both out of curiosity at the time—but predictably, they’ve long since disappeared into a corner, gathering dust.

Clearly, the concept of adding sensors to eyewear isn’t new. So why do “AI glasses” suddenly seem fresh again? The answer is simple: large language models (LLMs) have entered the picture. The current buzz revolves around the idea of using LLMs on smartphones (usually via an app, like ByteDance’s Doubao) to “empower” these devices. You might think it’s just cameras and speakers, but no—this is AI-powered smart hardware!

OK, now that we’ve laid the groundwork, let’s get to the conclusion: In my opinion, today’s so-called “AI glasses” will inevitably transition (in the short term) to AR glasses. That means evolving from “audio + cameras” to “audio + cameras + near-eye displays.”

The Moment You’re Forced to Pull Out Your Phone, AI Glasses Lose the Game

This isn’t a criticism stemming from years of working in XR, nor is it a forced negative view of AI glasses. The issue lies in product logic: AI glasses without a display are fundamentally awkward and lack coherence. (For an analysis of why Ray-Ban glasses sell well, see the end of this article.)

For any product, the three most critical aspects are scenarios, scenarios, and scenarios.

AI Glasses with on-device AI processing

Let’s examine the scenarios for “AI glasses.” Take Baidu’s Xiaodu AI Glasses as an example. According to reports, they offer:

First-person video recording,
On-the-go Q&A,
Calorie recognition,
Object identification,
Visual translation,
Intelligent reminders.

When summarized, these features boil down to two core functionalities:

Recognition + audio prompts for information (on-the-go Q&A, calorie recognition, object identification, visual translation, intelligent reminders).
First-person video recording.

Let’s step back for a moment. How do we typically interact with AI today? Most of the time, it’s through a smartphone. The truth is, all the functions mentioned above can already be fully achieved with a smartphone screen and camera. AI glasses merely relocate the phone’s audio and camera capabilities to your head. Their biggest advantage is that you don’t need to take your phone out, which can be convenient in certain scenarios—such as when your hands are occupied (e.g., cycling or driving) and you need navigation or recording.

Now let’s consider the typical interaction flow between a user and AI on a phone. For example, when you want to know something, you ask the AI, and it responds with a long block of text, like this:

This is from Doubao; the Q&A itself is unrelated to this article, and only half the response is shown.

As you can see, the response is full of text. Most of the time, we don’t have the patience to listen to the AI read the entire thing aloud. That’s because the brain processes text or visual information far more efficiently than audio. Often, we just skim through the text, grasp the key points, and immediately move on to the next question.

Now, if we translate this scenario to AI glasses, problems arise. Imagine you’re walking down the street wearing AI glasses. You ask a question, and the AI responds with a long-winded explanation. You may not remember or even care to listen to the entire response. By the time the AI finishes speaking, your attention or location may have shifted. Frustrated, you’ll end up pulling out your phone to read the full text instead.

Moreover, there’s the issue of interaction itself: audio is inherently a “laggy” form of interaction. Anyone familiar with real-time interpretation, smart speakers, or in-car voice assistants will know this. You have to finish an entire sentence for the AI to process it and respond. The response might often be incorrect or irrelevant—like answering a completely different question.

(For more on this issue, see my earlier article: “The Media and Big Thinkers Are Hyping a New AI+AR ‘Unicorn,’ But I Think It’s Better Suited for Street Fortune-Telling.”)

This means there’s a high likelihood that:

You spend a long time talking to the AI, and it doesn’t understand you.
You find the response too slow, so you pull out your phone to type the command yourself.
You feel the AI is rambling, so you take out your phone to skim the full text.
Privacy concerns arise—you wouldn’t want to use voice commands to ask the AI to send a flirty message to your girlfriend in a public place.

In the end, the moment you’re forced to pull out your phone, the significance of AI glasses drops to almost zero.

After Audio, Let’s Talk About Cameras

A person holding a phone up in the air to take a picture

Admittedly, having a camera on your head provides a more elegant option for taking photos. Personally, I’m not a fan of taking pictures, for two main reasons: first, pulling out a phone to take a picture feels awkward and inelegant to me; second, it often seems disrespectful to the person speaking to you (for example, even if you’re using your phone to record what they’re saying, it can still come across as rude).

But I wonder how many people who use glasses for photography are genuinely taking photos in their daily lives. When photographing people, objects, or scenery, you typically need to rely on the framing guidelines provided by a phone’s viewfinder. Often, you might need to crouch or adjust the angle to capture the perfect shot—something that AI glasses in their current form are almost incapable of doing. And let’s not forget that the camera quality of AI glasses is inevitably far inferior to that of a smartphone.

Of course, many might argue that these glasses are mainly designed for first-person video recording or quick snapshots. To that, I can only say: if you have absolutely no expectations for the quality of your footage and just want to casually capture something, then yes, AI glasses could be somewhat useful. However, the discomfort of “not being able to see what you’re recording while you’re recording it” is likely to bother most people. And in the vast majority of cases, these functions can be completely replaced by a smartphone.

It all comes back to the same point: the moment you’re forced to pull out your phone, the significance of AI glasses drops to almost zero.

AI + AR Will Streamline the Entire Product Logic

Why do I say that AI glasses will inevitably transition to AR glasses in the near future?

To make it easier to understand, let’s stop calling them “AR glasses” for now. Instead, think of them as “Siri with near-eye displays for text and images” (I’ll call this “Piri”). This term captures the core concept better.

Let’s go back to Baidu’s AI glasses as an example. Looking at their own promotional materials—take a close look at these images—anyone unfamiliar with the product might think these are advertisements for AR glasses. (They even include thoughtfully designed AR-style UI elements. )

Frames from Baidu’s promo video for the Xiaodu AI Glasses

From these images alone, it’s clear that once near-eye display functionality allows AI-provided information to be presented directly—even if it’s just monochrome text—the entire product logic suddenly makes sense.

Let’s revisit the scenarios we discussed earlier:

Recognition + audio prompts for information: With near-eye displays, text information can now appear directly in the user’s view, making it instantly readable. What used to take minutes to listen to can now be grasped in seconds. Additionally, AI could automatically generate memos that float in your field of view, ready to be checked at any time (ideally disappearing after a short period).

Translation functionality also becomes more convenient for the wearer. While it’s not perfect (you can’t guarantee the other person is also wearing similar glasses), the vision of widespread AR adoption is precisely what the industry is striving for, right?

Photography: A simple viewfinder on the side could let users see what they’re capturing. This provides guidance and resolves the issue of blindly taking photos or videos.

This type of product doesn’t have to stick to the traditional shape of ordinary glasses. Monochrome waveguides could easily handle the basic functionality of Baidu’s AI glasses. Moreover, combining them with traditional optical systems (such as BB/BM/BP geometrical optics) could open up entirely new scenarios—like virtual companions (imagine a virtual Xiao Zhan accompanying you to watch a movie) or interactive training (a virtual tutor practicing a foreign language with you face-to-face). These are scenarios that display-limited waveguides struggle to achieve effectively.

AI Powers AR But Can’t Solve All Optical Challenges

While AI’s capabilities enhance the potential of AR glasses, they can’t unify the variety of optical solutions in AR glasses. For instance, AI cannot improve the display quality of certain optical designs, like waveguides. However, it can add more functionality to existing AR products:

For waveguide-based glasses, AI could resolve the lack of compelling use cases, turning them into more practical tools.
For BB-style large-screen AR glasses, AI might not only enrich their features but also address their current dilemma: difficulty justifying a high price tag (it’s almost like selling at a loss just to gain attention).

Additionally, this combination might spur the development of entirely new optical systems, potentially leading to innovative product categories.

Here’s an old concept model from 2018 (apologies for the rough design).

From this image, you can see how this type of product fundamentally differs from today’s large-screen AR glasses. The latter, positioned as “portable large screens,” are more akin to plug-and-play ‘glasses-shaped monitors.’ In contrast, AI + AR glasses would emphasize the practicality and usability of the app ecosystem. These two types of devices have completely different design and development philosophies.

This is also why current waveguide + microLED glasses haven’t gained widespread acceptance. Most of them are simply following the design philosophy of large-screen glasses, stacking hardware to achieve near-eye displayswithout thoroughly refining the app ecosystem. Some even fail to deliver decent hardware performance.

The Path Forward: AI Glasses Transitioning to AI + AR Glasses

Looking ahead, we can predict that companies making AI glasses today will face mixed market feedback:

Those that entered the space blindly, without understanding the product’s core value, will likely abandon it altogether.
Companies serious about developing a viable product will eventually incorporate display functionality, transitioning to AI + AR glasses.

Blindly following trends is meaningless and often leads to dead ends. But for those willing to innovate, AI + AR is the natural evolution of AI glasses.

Blindly Following Trends Is Pointless and Often Leads to Pitfalls

That brings us back to the question: Why have Ray-Ban’s smart glasses sold so well?

Ray-Ban AI Glasses

In my opinion, the success of Ray-Ban’s smart glasses lies in a pragmatic commercial strategy. Let’s break it down:

The strong brand appeal of Ray-Ban:Ray-Ban is a well-established mid-to-high-end eyewear brand with strong recognition in the consumer market, especially in the United States.
Extensive offline retail channels:AI glasses are hardware products and a new category, which makes them hard to sell online alone. Ray-Ban’s robust offline retail network allows users to try the glasses in-store, significantly increasing the likelihood of a purchase.
Reasonable pricing:The price of Meta’s smart glasses is comparable to that of regular Ray-Ban sunglasses. For consumers who were already planning to spend this much on sunglasses, adding a few trendy features makes it an easy upgrade.
Practical applications for certain users:Some users genuinely benefit from first-person video recording, such as livestreamers who wear the glasses for hands-free filming or visually impaired individuals who use apps like Be My Eyes.
Most importantly: Even if users stop using the AI or camera features after a month or two, the glasses remain a stylish and functional product. They are something you can confidently wear out in public—or even enjoy wearing daily—purely as eyewear.

In summary, the success of Meta’s Ray-Ban smart glasses has little, if anything, to do with AI or AR. It may not even have much to do with their functionality. Instead, it’s a combination of brand strength and a well-thought-out product positioning. It’s also worth noting that Meta only achieved this after two product iterations; the first-generation Ray-Ban Stories had lackluster sales.

Be My Eyes app, now available on Ray-Ban glasses

For example, the Be My Eyes app, now available on Ray-Ban glasses, allows visually impaired individuals to connect with a network of 8.1 million volunteers. These volunteers can use the glasses’ camera feed to view the wearer’s surroundings and provide instructions via audio.

Lessons for AI Glasses in the Chinese Market

It’s clear that some Chinese companies are trying to replicate this model by partnering with eyewear brands like Boshi or Bolon. However, this approach may not be enough because the Chinese consumer market is vastly different from the U.S.. How many people in China are willing to spend over a thousand yuan on a pair of sunglasses? Not many. Personally, I wouldn’t.

If companies want to make the AI features compelling enough for consumers to buy, the next logical step is to transition to AI + AR.

_____

Meta’s Next Steps: Toward AI + AR Glasses

In my article “Meta AR Glasses Optics Breakdown: Where Did $10,000 Go?”, I mentioned rumors that Meta plans to release glasses with waveguide technology in 2025. The optical design is said to use 2D reflective (array) waveguides paired with LCoS projectors.

While this optical design is likely a transitional step, the evolution from “audio + cameras” to “audio + cameras + near-eye displays” is a sound and logical progression for AI glasses.

A Final Note: The Risks of Blindly Following Trends

The consequences of blindly copying others are often dire. Take Apple’s Vision Pro as an example. When it was first released last year, I predicted it would fail (see my article “Vision Pro Is Not a Savior but Apple’s Cry for Help”).

The core question that every product must answer remains the same: What are people going to use this for?

Vision Pro’s biggest issue isn’t its hardware—it’s the severe lack of content. VR has always been heavily reliant on PC/console gaming ecosystems. Even with Vision Pro’s impressive hardware specifications, it’s essentially useless without content. For the companies that are still copying Vision Pro (I know of several), what’s the point if you don’t have a robust content ecosystem?

Darwin · November 29, 2024, 10:21am

You can solve the long AI answers by just asking it to give you shorter responses, like saying ‘Give me a quick answer’

Sonny · November 29, 2024, 10:21am

True, but if you shorten it too much, you might miss some details.

Adair · November 29, 2024, 10:21am

But can’t you always just ask it to ‘tell me more’ if you want more details later?

Colby · November 29, 2024, 10:21am

Yeah, that’s true. But in the example I saw in a blog, the response was like 20 seconds. That’s not super long, right? It’s about balance. If you’re busy or stressed, you might not want to keep asking for repeats. And without a screen, it’s tricky to go back to what you missed. You’d have to ask the AI to repeat from a certain part.

Hollis · November 29, 2024, 10:21am

I get your point. Makes sense.

Wes · November 29, 2024, 10:21am

I saw a presentation by Bernard Kress from Google. He said the hardware for smart glasses is ready; now it’s all about creating good user experiences before launching.

Joss · November 29, 2024, 10:21am

Good point! I responded to a ‘which glasses’ thread the other day and mentioned we’re still figuring out the UI and overall experience. Everyone seems to want something different. Personally, I prefer a minimal display unless I need specific info. For AI, I’d love something that works across devices—phone, PC, and smart speakers. Right now, every product has its own system. For example, I just saw Viture announce a neckband with its own AI. Feels like we’ll have too many systems to deal with soon!

Scout · November 29, 2024, 10:21am

Check this out—someone is working on an OS for AI agents: Former Android leaders are building an ‘operating system for AI agents’ - The Verge

Marlow · November 29, 2024, 10:21am

That looks promising. Let’s see how it goes.

Kiran · November 29, 2024, 10:21am

So are you saying that even simple AR glasses with basic displays would be more practical than these AI-only glasses?

Finlo · November 29, 2024, 10:21am

It’s always about the software or experience. Meta bought Beat Saber because it boosted sales for their hardware. If they create the right apps or games, they could sell out new AR glasses easily, even without AI.