Don't Be Foo1ed

OpenAI o1

In partnership with

Hello readers,

Welcome to another edition of This Week in the Future! Slow and steady wins the AI race. OpenAI released o1, a new ‘reasoning’ model that takes its time to answer your prompt. Does it live up to the hype? Let’s find out!

Don’t Be Foo1ed

OpenAI has released a new model called o1 that is said to be capable of more complex reasoning. o1 ‘thinks’ before giving an answer, so response times are slower. o1 explains its reasoning step by step when it responds. The model is Strawberry which is Q*.

So, does it live up to the hype? No. In Sam Altman’s own words, the model is “still flawed, still limited, and it seems more impressive on first use than it does after you spend more time with it.” Watching the demos on OpenAI’s website (which are made in an artsy documentary style to give off the illusion of authenticity), most of the prompts given to o1 seem rather basic and there’s not much I haven’t seen GPT-4 do.

Case in point, o1 was given this prompt:

How many r's in are in strawberry

o1 correctly answers 3, but GPT-4o answers two. But if you ask the question like a normal person, GPT-4o also answers 3.

How often does the letter 'r' appear in the word 'strawberry'?

It was also said that GPT-4 and other models would struggle with this prompt:

Assume laws of physics on Earth. A small strawberry is put into a normal cup and the cup is placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the strawberry now? Explain your reasoning step by step

o1 correctly answers that the strawberry is on the table. However, so does GPT-4o.

Each demo followed the same formula. Show a basic example, do not show what GPT-4 would have done (just say that it would struggle), when o1 completes the task, proclaim that it was an incredible display of reasoning, and then end on a vague and specious projection about o1’s utility and implications.

To be fair, it’s cool that o1 is less likely to ignore parts of your prompt since it takes its time. The model also does seem to perform significantly better in certain domains based on benchmarks. What’s not remotely clear is how any of this solves the shortcomings of LLMs in real-world deployments. o1 does not represent enough of an improvement to the capability-cost ratio to address the ROI issue as far as I can tell.

OpenAI probably needed to release something since they’re having to raise money again, including $5 billion in debt from banks (a red flag). It’s also looking like LLMs will always hallucinate. At least, that’s the conclusion of a recent paper, which adds that this is something “we need to live with.” The tech industry used to be confident that hallucinations would be solved with scaling, but now we’re being told that we have to accept it. Will enterprises accept it? 🤔 One could say humans hallucinate too, except if a human starts spewing falsehoods, we don’t give them tons of power … oh wait.

🔥 Rapid Fire

Synthflow: Build AI voice assistants to manage inbound and outbound calls

Keep your business on 24/7 with genAI. Synthflow’s simple no-code builder lets you set up human-sounding AI voice assistants that can handle call center tasks: real-time appointment booking, lead qualification, handling FAQ, transferring between agents, and more. White label included. Pay as low as $0.08 per minute of conversation. CRM Integrations with Hubspot, Gohighlevel, Zoho, etc. Start for free or let us build your AI receptionist.

📖 What We’re Reading

“The excitement around generative AI (gen AI) and its massive potential value has energized organizations to rethink their approaches to business itself. Organizations are looking to seize a range of opportunities, from creating new medicines to enabling intelligent agents that run entire processes to increasing productivity for all workers. A raft of new risks and considerations, of course, go hand in hand with these developments. At the center of it all is data.”

Source: McKinsey

💻️ AI Tools and Platforms

  • Encord → AI data curation and labeling

  • Fiddler AI → Enterprise AI observability

  • Siena → AI customer experience agent

  • Cresta → Generative AI for contact centers

  • Second → AI for codebase maintenance