- AI For All
- Posts
- AGI Has Been Achieved
AGI Has Been Achieved
Wait, No It Hasn't
Hello readers,
Welcome to the AI For All newsletter! I hope you had a better New Year’s than the thousands who were tricked by AI into attending a non-existent fireworks display. OpenAI announced o3, its new frontier ‘reasoning’ model, Sam Altman continued to overhype AI while the media fanned the flames, the plot thickened in the Suchir Balaji case, Meta continued to be a horrible company, and fresh billions were thrown onto the pyre to build new AI data centers for some reason. Am I bitter? Let’s find out!
AGI Has (Not) Been Achieved
Buckle in, folks, we’ve got a lot to cover. AI is the industry that never sleeps or turns a profit, and while I was away for two weeks, apparently AGI was achieved, and we’re well on our way to ASI (artificial superintelligence). No, not really, but it seems there was a big fuss over the final announcement of the 12 Days of OpenAI series: o3. Skipping over o2, OpenAI treated us to a demo of o3 that was conspicuously short on demonstrations. Instead, we were mostly told how o3 performed on a particular benchmark. What was this benchmark? Well, that’s what people are freaking out about.
The AI brosphere on X (which is shamefully comprised of actual AI researchers and ML engineers scorching their reputations) lost their collective minds over o3, claiming that AGI had been achieved. We even had Emad Mostaque, a former hedge fund manager and Christmas elf, declaring the need for a completely new economic system. What could have produced such an avalanche of hype-induced mouth foam?
o3 performed exceptionally well on the ARC-AGI benchmark devised by François Chollet. In a blog post, Chollet analyzed o3’s breakthrough high score of 75.7% and 87.5% for the high-compute configuration of o3. For context, o1 scored a max of 32%, GPT-4o scored 5%, and GPT-3 scored 0%. This would appear to be a step change in AI capability. So, why don’t I care? And why is o3 not AGI (or even close)?
First, Chollet himself writes, “It is important to note that ARC-AGI is not an acid test for AGI. Passing ARC-AGI does not equate to achieving AGI, and, as a matter of fact, I don't think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence. Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training).”
Second, there’s some controversy around the demo itself. Since there’s a lot of confusion out there, I am going to enlist the help of AI veteran Gary Marcus, who does a good job of listing all of the problems with OpenAI’s demo. If you don’t know who Gary Marcus is, he’s sort of like the Addison DeWitt of the AI field — no one likes him, but he’s right. However, there is yet another glaring problem with o3: the cost.
How does o3 work? Chollet theorizes that o3’s core mechanism is “natural language program search and execution in token space” similiar to AlphaZero. Massive compute is allocated for this at test time, which is what makes the cost of running and using o3 so expensive. On a related note, The Wall Street Journal reported that Orion (GPT-5) is experiencing delays, costs are soaring, and that OpenAI is seriously struggling to make the model “smart enough” as training data dries up, a reality that even Elon Musk had to concede to, though he is foolishly banking on synthetic data as an alternative.
Sam Altman recently said that OpenAI is losing money on its new $200-a-month ChatGPT Pro plan, which comes as a shock to no one. OpenAI hemorrhages money with all of its products, which makes it all the more absurd that OpenAI and Microsoft are defining AGI as a system that can generate $100 billion in profits. By that definition, I don’t think we have to worry about AGI any time soon (or ever).
In a laughable blog post titled Reflections, Sam Altman again bluffed about AGI. “We are now confident we know how to build AGI. We are beginning to turn our aim beyond that, to superintelligence.” The credulous tech and business media that Altman so effortlessly manipulates lapped up his every word. Not only will Bloomberg give Altman a softball interview to launder his credibility to the public, they’ll happily offer him answers to their few hard questions, like flattering theories about why he was fired.
I also found this sentence funny: “Given the possibilities of our work, OpenAI cannot be a normal company.” A normal company? You mean a profitable, sustainable company with a reliable product? When is the media going to stop being held hostage by these non-technical CEOs writing fan fiction? For the last time, there is no Santa Claus, there is no Easter Bunny, there is no AGI, and there is no Queen of England.
How one defines AGI is important. I will again enlist the help of Gary Marcus, who outlines all of the different types of goalpost moving we can expect to see in 2025. Warning: the blog I am linking to contains a screenshot of a post from a chronically online AI edgelord — reader discretion is advised.
Remember when tech companies touted how progressive they were? Well, now they’re lining up to curry favor with Donald Trump through a series of bribes donations because he’s pro-AI apparently, announcing a $20 billion plan to build new data centers in America. The investment comes courtesy of non-American Hussain Sajwani, an Emirati billionaire who stands to gain what exactly?
AWS is investing $11 billion to expand data centers in Georgia, and Microsoft is on track to invest $80 billion to build AI data centers around the world. Why are we building more data centers when we know that scaling doesn’t work? Well, you see, scaling pretraining doesn’t work. Now, we’re scaling test-time compute. 🤦♂️
The Suchir Balaji saga had a few new developments. Balaji’s suicide was covered in the previous newsletter — here’s the article about it to refresh your memory. Balaji’s parents believe it was murder and worked with an “independent journalist” (that they have since renounced) who visited Balaji’s apartment and claims there were signs of a struggle and that Balaji’s backup drive of his OpenAI testimony was missing.
But worry not, Inspector Elon and his pet sidekick Vivek Ramaswamy are on the case! With a record of job hopping that would make Frank Abagnale blush, it was once rumored that Elon was up for Speaker of the House. A man with the speech pattern of an awkward 15-year-old boy addressing a room of geriatrics (with the occasional cough emanating from the back row) would surely make for some uncomfortable comedy. Sadly, we won’t get to experience such secondhand embarrassment.
Lastly, Mark Zuckerberg, a man bereft of good ideas (and morals), plans to roll out AI profiles on Facebook and Instagram to juice up Meta’s engagement numbers (which is cheating). Meta actually already had AI profiles on its platforms that everyone hated and have since been removed for being “creepy and unnecessary,” kind of like Zuck himself. Seriously, you should not give this man the benefit of the doubt.
🔥 Rapid Fire
OpenAI fails to deliver opt-out tool for creators it promised by 2025
Microsoft rolls back Bing Image Creator after signs of degraded quality
Apple urged to fix inaccurate AI news alerts and summarization feature
ChatGPT and Meta AI become accomplices in terror attacks on U.S. soil
IRS deploys AI tools to combat fraud schemes amid bias concerns
Research: evaluating LLMs’ capability to launch phishing campaigns
Alibaba releases QVQ-72B-Preview open-weight multimodal model
Samsung unveils AI developments at CES and invests in Rainbow Robotics
NVIDIA unveils AI developments at CES while Jensen Huang overhypes
Try Artisan’s All-in-one Outbound Sales Platform & AI BDR
Ava automates your entire outbound demand generation so you can get leads delivered to your inbox on autopilot. She operates within the Artisan platform, which consolidates every tool you need for outbound:
300M+ High-Quality B2B Prospects, including E-Commerce and Local Business Leads
Automated Lead Enrichment With 10+ Data Sources
Full Email Deliverability Management
Multi-Channel Outreach Across Email & LinkedIn
Human-Level Personalization
📖 What We’re Reading
“About half of all organizations that are experimenting with GenAI are developing solutions exclusively in-house, without the help of partners – and that decision could limit potential benefits and slow progress toward their goals. According to a recent survey of 270 organizations across 15 sectors at various stages of their GenAI journeys, organizations that use a combination of in-house teams and external vendors to develop and deploy GenAI solutions are more satisfied and productive than those that try to go it alone. Meanwhile, those that rely entirely on in-house teams report less cost savings. Yet even among organizations that use external vendors, many say that they don't adequately understand how these partnerships impact their GenAI performance.”