Apple Enters the AI Realm

PLUS: OpenAI's Voice Engine Clones Any Voice

In partnership with

Hello readers,

Welcome to another edition of This Week in the Future! Apple researchers introduced ReALM, an on-device language model that competes with GPT-4. Plus, OpenAI unveiled Voice Engine to clone any voice with a single 15-second sample.

Let’s get into it!

Apple Enters the AI Realm

Image generated by DALL·E 3

In a new research paper, Apple engineers introduced ReALM (Reference Resolution As Language Modeling). Simply put, ReALM sees what’s on your screen and can perform actions autonomously. It works by converting everything it sees and hears into text. This makes it much more efficient and means it doesn't need as much compute power as it’s intended to run on-device and will likely be integrated with Siri.

Furthermore, Apple claims that this new system can match and even outperform GPT-4, especially in understanding complex requests or instructions based on what's happening on your screen. For example, you could be browsing a website and tell Siri to "call the business" you just saw without needing to be more specific. Siri will understand the context and know which number to call just based on your current screen.

Why This Matters

Apple has been waiting patiently in the shadows for the opportune moment to strike, and with a robust AI strategy expected to be announced at WWDC, ReALM is our best glimpse yet at Apple’s AI future. Expect a long overdue update to Siri and at least one “Apple special” we haven’t seen before in the consumer AI space.

OpenAI’s Voice Engine

Image generated by DALL·E 3

OpenAI has released a preview of Voice Engine, which can clone voices based on a single 15-second audio sample. OpenAI is treading carefully and has yet to decide how and if they will deploy the technology at scale. Positive applications highlighted include reading assistance and translation. Interestingly, OpenAI issued recommendations for how society should adapt to the consequences of widely-available voice cloning technology (while being the originator of said consequences). They include:

  • Phasing out voice based authentication for security

  • Making the public aware that everything they hear might be fake

Our Take

Translation is the most promising use case. Then again, subtitles never hurt anyone, right? After all, it could be argued that voice cloning has few worthwhile applications and plenty of dangerous ones, which is why OpenAI has been keeping this under wraps since late 2022. That being said, the demos are impressive.

🔥 Rapid Fire Inferno

📖 What We’re Reading

Generative AI for the Public Sector: The Journey to Scale (link)

“Generative artificial intelligence has the potential to make governments much more efficient and effective. The impact of GenAI on the public sector will be significant. For instance, in our first article in this series, we revealed that the potential productivity improvements from GenAI could be worth $1.75 trillion per year by 2033 globally across all levels of government.”

Source: Boston Consulting Group

💻️ AI Tools and Platforms

  • Retell AI → Conversational voice API for LLMs

  • CodeRabbit → AI-driven code reviews for teams

  • Ellipsis → AI dev tool for pull requests and comments

  • Keywords AI → DevOps platform for AI applications

  • Hailo → The world’s best edge AI processors

MaxAI.me - Outsmart Most People with 1-Click AI

MaxAI.me best AI features:

  • Chat with GPT-4, Claude 3, Gemini 1.5.

  • Perfect your writing anywhere.

  • Save 90% of your reading & watching time with AI summary.

  • Reply 10x faster on email & social media.