- AI For All
- Posts
- Don't Be Foo1ed
Don't Be Foo1ed
OpenAI o1
Hello readers,
Welcome to another edition of This Week in the Future! Slow and steady wins the AI race. OpenAI released o1, a new ‘reasoning’ model that takes its time to answer your prompt. Does it live up to the hype? Let’s find out!
Don’t Be Foo1ed
OpenAI has released a new model called o1 that is said to be capable of more complex reasoning. o1 ‘thinks’ before giving an answer, so response times are slower. o1 explains its reasoning step by step when it responds. The model is Strawberry which is Q*.
So, does it live up to the hype? No. In Sam Altman’s own words, the model is “still flawed, still limited, and it seems more impressive on first use than it does after you spend more time with it.” Watching the demos on OpenAI’s website (which are made in an artsy documentary style to give off the illusion of authenticity), most of the prompts given to o1 seem rather basic and there’s not much I haven’t seen GPT-4 do.
Case in point, o1 was given this prompt:
How many r's in are in strawberry
o1 correctly answers 3, but GPT-4o answers two. But if you ask the question like a normal person, GPT-4o also answers 3.
How often does the letter 'r' appear in the word 'strawberry'?
It was also said that GPT-4 and other models would struggle with this prompt:
Assume laws of physics on Earth. A small strawberry is put into a normal cup and the cup is placed upside down on a table. Someone then takes the cup and puts it inside the microwave. Where is the strawberry now? Explain your reasoning step by step
o1 correctly answers that the strawberry is on the table. However, so does GPT-4o.
Each demo followed the same formula. Show a basic example, do not show what GPT-4 would have done (just say that it would struggle), when o1 completes the task, proclaim that it was an incredible display of reasoning, and then end on a vague and specious projection about o1’s utility and implications.
To be fair, it’s cool that o1 is less likely to ignore parts of your prompt since it takes its time. The model also does seem to perform significantly better in certain domains based on benchmarks. What’s not remotely clear is how any of this solves the shortcomings of LLMs in real-world deployments. o1 does not represent enough of an improvement to the capability-cost ratio to address the ROI issue as far as I can tell.
OpenAI probably needed to release something since they’re having to raise money again, including $5 billion in debt from banks (a red flag). It’s also looking like LLMs will always hallucinate. At least, that’s the conclusion of a recent paper, which adds that this is something “we need to live with.” The tech industry used to be confident that hallucinations would be solved with scaling, but now we’re being told that we have to accept it. Will enterprises accept it? 🤔 One could say humans hallucinate too, except if a human starts spewing falsehoods, we don’t give them tons of power … oh wait.
🔥 Rapid Fire
Google introduces DataGemma for connecting LLMs to real-world data
Apple announces iPhone 16 Pro with Apple Intelligence features
Chai Discovery introduces Chai-1 model for biomolecular interactions
Arcee AI launches Arcee-SuperNova, an enterprise ChatGPT alternative
Anthropic adds Workspaces in Anthropic API Console for developers
Mistral AI releases its first multimodal AI model Pixtral 12B
Palantir and BP extend strategic partnership with new AI capabilities
Oracle adds new AI capabilities and 50+ AI agents to Fusion Cloud
Salesforce launches Industries AI: 100+ out-of-the-box AI capabilities
Deloitte launches AI Factory as a Service powered by NVIDIA and Oracle
NVIDIA and Oracle to accelerate AI and data processing for enterprises
IBM launches new services to help Oracle clients extend generative AI
Army implements generative AI platform to cArmy cloud environment
Anduril unveils Barracuda-M family of autonomous cruise missiles
Meta trains AI on all public Facebook and Instagram posts since 2007
Synthflow: Build AI voice assistants to manage inbound and outbound calls
Keep your business on 24/7 with genAI. Synthflow’s simple no-code builder lets you set up human-sounding AI voice assistants that can handle call center tasks: real-time appointment booking, lead qualification, handling FAQ, transferring between agents, and more. White label included. Pay as low as $0.08 per minute of conversation. CRM Integrations with Hubspot, Gohighlevel, Zoho, etc. Start for free or let us build your AI receptionist.
📖 What We’re Reading
“The excitement around generative AI (gen AI) and its massive potential value has energized organizations to rethink their approaches to business itself. Organizations are looking to seize a range of opportunities, from creating new medicines to enabling intelligent agents that run entire processes to increasing productivity for all workers. A raft of new risks and considerations, of course, go hand in hand with these developments. At the center of it all is data.”
💻️ AI Tools and Platforms
Encord → AI data curation and labeling
Fiddler AI → Enterprise AI observability
Siena → AI customer experience agent
Cresta → Generative AI for contact centers
Second → AI for codebase maintenance