๐Ÿค– Oops! Even the Smartest AI Just Failed a Kid's Pattern Test ๐Ÿงฉ

๐Ÿค– Oops! Even the Smartest AI Just Failed a Kid's Pattern Test ๐Ÿงฉ
Credit: midjourney

Good Morning from San Francisco,

๐ŸŽฏ A new test just exposed AI's embarrassing blind spot. Even the smartest models stumble when matching simple patterns that kids grasp easily. The Arc Prize Foundation's latest benchmark caught AI giants like OpenAI flat-footed. Humans score 60%. AI models? A pitiful 1-4%. ๐Ÿค”

๐Ÿ‘จโ€๐Ÿ’ผ OpenAI's deck chairs shift as Sam Altman steps back from daily duties. He's handing the reins to COO Brad Lightcap while he tinkers with new tech. Fresh faces fill key roles after recent high-profile exits.

The message? OpenAI wants to sprint ahead without tripping over its own feet. ๐Ÿƒโ€โ™‚๏ธ

Stay curious,

Marcus Schuler


AI's Smartest Models Fail New Intelligence Test

Credit: ARCCredit:Credit:C

The Arc Prize Foundation just dropped a new intelligence test that's making AI models look like elementary school students who forgot to study. And not in a good way.

Their latest benchmark, ARC-AGI-2, is humbling even the most advanced AI systems. We're talking about scores of 1-4% from models that usually ace these tests. Even humans โ€“ regular folks, not AI experts โ€“ are scoring 60% on average. Ouch.

The test works by showing AI models patterns of colored squares and asking them to figure out what comes next. Simple enough for humans, apparently impossible for machines.

Franรงois Chollet, the foundation's co-founder, claims this test fixes the problems with their previous version. The old test let AI models basically cheat by throwing massive computing power at the problems. Not anymore.

Credit: ARC

Take OpenAI's fancy o3 model. It crushed the old test with a 75.7% score, matching human performance. But on the new test? A measly 4% score, even after burning through $200 worth of computing power per problem. That's like bringing a supercomputer to a math test and still failing spectacularly.

The foundation isn't just pointing out problems โ€“ they're offering solutions. They've launched a contest challenging developers to reach 85% accuracy while spending less than 50 cents per task. It's like asking someone to build a Ferrari with bicycle parts and a shoestring budget.

The timing is perfect. The tech industry has been calling for better ways to measure AI progress, especially when it comes to traits like creativity and adaptability. Current benchmarks are starting to feel like standardized tests that students have memorized the answers to.

Why this matters:

  • The gap between AI hype and reality remains massive - even our "smartest" AI models can't match basic human pattern recognition
  • The industry's obsession with raw computing power may be leading us down the wrong path - true intelligence requires efficiency, not just brute force

Read on, my dear:


OpenAI Reshuffles Top Ranks as Altman Steps Back from Daily Grind

OpenAI COO Brad Lightcap / Credit: OpenAI

OpenAI is shaking up its executive suite. CEO Sam Altman wants to spend more time tinkering with AI and less time running the company.

COO Brad Lightcap emerges as OpenAI's new operational mastermind. He'll handle the mundane but crucial tasks of international expansion and keeping Microsoft happy. Meanwhile, Altman can focus on what he does best: dreaming up the next ChatGPT.

Mark Chen steps into the role of Chief Research Officer, bridging the gap between wild AI experiments and actual products people can use. Julia Villagra takes the helm as Chief People Officer, presumably to ensure the company's rapid growth doesn't turn its culture into Silicon Valley soup.

The reshuffling comes after several high-profile departures, including former Chief Scientist Ilya Sutskever and CTO Mira Murati. Some ex-employees have raised eyebrows about OpenAI's commitment to safe AI development. But Altman insists that real-world product testing makes their research stronger.

Why this matters:

  • OpenAI is growing up - trading its startup chaos for corporate structure
  • The company wants to move faster without breaking things (or humanity)

Read on, my dear:


AI Photo of the Day

Credit: midjourney
Prompt:
a racing car in Paris 1930, white background text : minuit 37

DeepSeek Disrupts China's AI Race with Free Tech and Lean Teams

A person holding a cell phone in their hand
Photo by Solen Feyissa / Unsplash

DeepSeek just reshaped China's AI landscape twice in two months. First, their R1 model crushed rivals in January. Now, they've released V3 as open source - a 641GB powerhouse anyone can use.

The impact hits hard. China's leading AI startups scramble to survive. 01.ai dropped its own tech to resell DeepSeek's. Baichuan fled to healthcare. Moonshot cut marketing for its Kimi chatbot to focus on core tech.

Zhipu feels the squeeze most. They burned $276 million last year on $41 million in sales. Now they rush toward an IPO while explaining why they need 800 people when DeepSeek runs laps around everyone with 160.

DeepSeek's edge? They focus on research, not quick profits. Their "mixture of experts" approach splits big problems into smaller chunks. It's clever but tricky - creating work for middlemen like 01.ai and Baidu who help companies use it.

The new V3 model shows their reach. Hours after release, developers got it running on Mac Studios at 20 tokens per second. Compress it to 352GB, and it fits on a $9,499 consumer machine. That's pricey, but it beats needing a data center.

OpenRouter plugged V3 into their API fast. Early tests impress: It ranks second in its class on coding benchmarks with a 55% score.

Why this matters:

  • DeepSeek proves focus beats force: Their lean team of 160 outperforms Zhipu's 800-person army
  • The AI power shift accelerates: When top models run on Macs, labs lose their monopoly on innovation

Read on, my dear:


Better prompting...


Today: Sustainable problem-solving โ™ป๏ธ๐Ÿ—‘๏ธ๐Ÿ™ˆ


Develop a practical plan to reduce plastic waste in the city.
Your framework:

  • 3 technical solutions
  • 3 ideas for citizen participation
  • Target date: 12 months
  • Budget: 500,000 euros
  • Pilot city: 100,000 residents
  • Focus: Quick implementation, measurable impact

Why this matters: A good prompt needs three key elements. These help anyone tasked with solving a problem jump right in.

  1. A well-structured prompt starts with a concrete goal.
  2. Next come measurable parameters like time, money, or available resources.
  3. The third part describes the desired outcome.

Clear boundaries force creative thinking within realistic possibilities. They prevent wishlist thinking and theoretical concepts. Instead, they generate actionable solutions.

The timeframe plays a crucial role. It forces you to identify the most powerful levers. Limit the time, and you'll get quick initial results. These can then be improved step by step.


AI & Tech News

Big Models Go Small: Qwen's 32B Beats the Giants

Chinese AI-maker Qwen released a new 32-billion-parameter model that runs on high-end laptops while matching data center performance. Their Qwen2.5-VL-32B beats top models on image tasks and math problems, yet needs just 64GB of RAM - enough space left for Chrome and coding tools to run alongside it.

Data Center Gold Rush in China Hits Reality Check

Alibaba's chairman Joe Tsai sees trouble brewing in AI infrastructure. Companies are building massive data centers without customers lined up, while Chinese startup DeepSeek shows how to build top AI models on a budget.

PsiQuantum Seeks $750M for Quantum Leap

PsiQuantum aims to raise $750 million to build quantum chips using standard semiconductor tech. BlackRock leads the round, valuing the startup at $6 billion - big money to turn fiber-optic manufacturing into quantum computers that could solve problems current machines can't touch.

OpenAI's Voice Assistant Learns to Listen

OpenAI tweaked its voice assistant to shut up and listen better. The update lets users take thinking pauses without getting cut off, while paying customers get an AI that's more direct and engaging - just as startup Sesame and Amazon's new Alexa threaten to steal the spotlight.

Type, Design, Buy: Arcade's AI Makes Custom Rugs

AI startup Arcade expands from jewelry to rugs. Users upload room photos, type what they want, and AI creates matching designs that manufacturers turn into real products - starting at $400 for wool rugs.

Text-to-Code Tools Net n8n $60M

Workflow startup n8n raised $60 million after adding AI tools that turn text into code. Revenue jumped 5x since the Berlin company let users write automation commands in plain English instead of programming language.

AI Startup Caught Padding Customer List

AI startup 11x faked customer logos and inflated revenue numbers, TechCrunch reports. The Andreessen Horowitz-backed company put ZoomInfo and Airtable logos on its website without permission, while counting canceled trial contracts as full-year revenue.

From Local Photos to Global AI: How LetzAI Scaled Up

LetzAI turned a simple idea - letting locals create AI images of Luxembourg - into a global platform where brands like PUMA and Sloggi now generate custom product shots. The startup skipped the usual GPU hardware headaches by renting NVIDIA H100s from Gcore, letting them focus on what matters: teaching AI to dress virtual influencers in soccer jerseys that don't look like they were painted on by a toddler.

AI Giants Flip on Regulation Under Trump

According to a report by the New York Times, tech leaders who demanded AI rules in 2023 now lobby Trump to block them. Meta, Google and OpenAI switched sides after warning Congress about AI risks - now asking the White House to stop state regulations and declare their use of copyrighted material legal.


๐Ÿš€ AI Profiles: The Companies Defining Tomorrow

๐Ÿค– Robot Army: This Startup Ships Your Online Orders with Zero Humans ๐Ÿ“ฆ

From Stanford dropout to FedEx ally: How Nimble's $1B bet on AI warehouse bots is changing retail forever ๐Ÿš€

Nimble Robotics builds AI-powered robots that pick and pack e-commerce orders in warehouses. Its robots handle everything from storage to shipping, running automated fulfillment 24/7.

The founders ๐ŸŽ“ Simon Kalouche quit his Stanford AI Lab PhD to start Nimble in 2017. He recruited top minds from Stanford and Carnegie Mellon to crack warehouse automation. Now 200+ employees work from San Francisco HQ, teaching robots to pick items faster than humans.

The product ๐Ÿฆพ Smart robots grab millions of products using computer vision and AI. They work nonstop with 99% accuracy across fashion, tech, and beauty items. Plugs into warehouse systems instantly - no coding needed. Cuts fulfillment costs 40% while speeding up shipping.

The competition ๐ŸฅŠ RightHand Robotics and Covariant chase the same dream. But Nimble runs its own robot warehouses instead of just selling bots. Big players like Knapp and Swisslog scramble to catch up. Amazon's in-house robotics team lurks in the background.

Financing ๐Ÿ’ฐ FedEx bet big with $1B Series C in 2024, pushing total funding to $221M. Deal lets FedEx use Nimble's tech. DNS Capital, GSR Ventures, and Accel backed earlier rounds. Current value: $1B.

The future โญ๏ธโญ๏ธโญ๏ธโญ๏ธ Robot warehouses coming to every major US city. FedEx partnership speeds growth. Millions of successful picks make the AI smarter daily. Challenge: Scale fast while keeping robots running smooth. Next stop: IPO or buyout by shipping giant.

๐Ÿค– ๐Ÿ“ฆ ๐Ÿ’ซ ๐Ÿš€ ๐Ÿ’ช

Great! Youโ€™ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to implicator.ai.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.