Google beats OpenAI on math 9-to-1, chatbot hacking gets sophisticated, and robots are feeding San Francisco

☀️ TRENDING AI NEWS

🤖 Google's math AI just beat OpenAI's benchmark score by a 9-to-1 margin
⚠️ Hackers are finding new ways to exploit AI chatbot 'personalities' and bypass safety guardrails
🤖 Robots are now cooking and serving meals for a San Francisco nonprofit in the Tenderloin
🏢 Scotland's 'green datacentre' policy was written before ChatGPT existed - and it shows

Something quietly happened in the AI math race this week - and the margin is hard to ignore. While everyone was still processing OpenAI's geometry conjecture breakthrough (which we covered earlier this week, by the way), Google went ahead and built a 9-to-1 scoring lead on the same benchmark. That's not a narrow edge. That's a statement. Today we've also got hackers evolving their chatbot attack playbook, robots making burritos in one of San Francisco's toughest neighborhoods, and a policy loophole that could quietly let AI datacentres pump out carbon emissions while calling themselves 'green'. Let's get into it.

🤓 AI Trivia

AI is getting increasingly good at mathematics - but what is the name of the famous unsolved problem collection that has long served as a benchmark for mathematical AI progress?

📐 The Millennium Prize Problems
📐 The Hilbert Problems
📐 The Landau Problems
📐 The IMO Shortlist

The answer is hiding near the bottom of today's newsletter... keep scrolling. 👇

🤖 Google Tops OpenAI on Math - by a Landslide

Last week we flagged that OpenAI cracked an 80-year-old geometry conjecture using its reasoning model - genuinely impressive stuff. But Google apparently wasn't impressed. The Rundown AI reports that Google's math AI is now outperforming OpenAI's on the same benchmark by a 9-to-1 margin - one of the starkest performance gaps we've seen between the two rivals on a single task.

The Gap Is Getting Hard to Explain Away

This isn't a subtle benchmark edge that lives in a footnote. A 9-to-1 scoring ratio suggests a fundamentally different level of mathematical reasoning - not just incremental tuning. For anyone tracking the AI math race, this shifts the leaderboard in a pretty dramatic way.

The timing matters too. Both labs have been racing to demonstrate superhuman performance on structured reasoning tasks - math being the clearest proving ground. Google pulling this far ahead suggests their approach to formal reasoning may be maturing faster than OpenAI's current trajectory.

Read the full story →

⚠️ Hackers Are Getting Better at Breaking Chatbots

The days of simple 'jailbreak' prompts are fading. According to The Verge, hackers are now exploiting something more subtle - the personalities built into AI chatbots. Early chatbot hacking required almost no technical skill: you'd just ask the model to 'pretend' it had no restrictions, and it would often comply. That loophole has largely been closed. But a more sophisticated generation of attacks has taken its place.

From Prompt Tricks to Psychological Manipulation

Modern attacks exploit the behavioral and tonal characteristics that AI companies deliberately build into their models. Because chatbots are designed to be agreeable, helpful, and contextually flexible, skilled attackers can push those tendencies in directions developers didn't intend. Think of it less like picking a lock and more like social engineering - but targeting a machine's designed disposition rather than a human's emotions.

The implications for AI security are significant. As chatbots get deployed in higher-stakes environments - customer service, healthcare, legal tools - the attack surface grows. And as The Verge notes, even Google is navigating this in real time. There's no solved playbook yet.

Read the full story →

🤖 Robots Are Cooking for San Francisco's Most Vulnerable Residents

Here's an AI story with actual stakes: a nonprofit operating in San Francisco's Tenderloin - one of the city's most underserved neighborhoods - has turned to robotic meal prep technology to compensate for a chronic shortage of human volunteers. Wired reports that the robots are now handling food preparation tasks that would otherwise go unfilled.

Where Robotics Actually Fills a Gap

This is a meaningful contrast to most robotics coverage, which tends to focus on factory automation or tech demos. Here, the deployment is filling a genuine human need in a resource-constrained environment. Volunteers are hard to recruit consistently, and the nonprofit couldn't maintain meal output without a reliable alternative.

It raises interesting questions about where robotic assistance makes the most immediate sense - not necessarily the highest-margin use cases, but the highest-need ones. For community organizations stretched thin, this kind of practical application could be quietly transformative.

Speaking of building quickly with AI - if you're putting together a project website or landing page for something like this, 60sec.site lets you spin one up in under a minute using AI. Worth bookmarking.

Read the full story →

🏢 Scotland's 'Green' Datacentres Have a Carbon-Sized Blind Spot

Scotland has been actively courting datacentre investment by offering 'green datacentre' status to qualifying facilities - but there's a catch buried in the policy. According to The Guardian, the definition of 'green' was written in 2022 - before ChatGPT launched and before the energy demands of modern AI infrastructure were anywhere near current levels.

A 2022 Definition in a 2026 Energy Landscape

Action to Protect Rural Scotland analyzed the policy and found that the outdated framework could allow a massive volume of carbon emissions to go uncounted. AI workloads have fundamentally changed the energy profile of modern datacentres - a facility that would have qualified as 'green' by 2022 standards might now be running significantly more power-hungry GPU clusters.

The broader pattern here isn't unique to Scotland - governments globally are finding that AI policy written even two or three years ago is already obsolete. The gap between regulatory frameworks and actual technological reality is widening fast.

Read the full story →

🎬 Cannes Is Having Its AI Reckoning Moment

The split in Hollywood over AI just played out under Mediterranean sunshine. The Guardian reports that this year's Cannes featured a genuine fault line between filmmakers - with director Darren Aronofsky speaking at an 'AI for Talent' summit about expanding the cinematic toolbox, while Guillermo del Toro declared he would 'rather die' than use the technology in his films.

Two Visions, One Red Carpet

The contrast couldn't be more vivid. Aronofsky's framing - AI as an additive tool for storytellers - sits directly opposite del Toro's position, which reflects a deeper concern about creative authenticity and the displacement of human craft. Neither position is niche: both represent large, credible factions within the film industry.

What makes Cannes a useful bellwether here is that it's one of the few places where the creative and commercial sides of filmmaking collide in public. The fact that this debate is happening on the Croisette suggests it's moved well past internal studio conversations and into genuine cultural conflict.

Read the full story →

🌎 Trivia Reveal

The answer is The Hilbert Problems! David Hilbert's famous list of 23 unsolved problems, presented in 1900, became one of the most influential benchmarks in mathematical history - and several remain unsolved today. The Millennium Prize Problems (7 problems worth $1M each) and the IMO Shortlist are also used to test AI math ability, but it's Hilbert's list that defined a century of mathematical ambition. For the record, the Landau Problems are a real set of four unsolved conjectures in number theory - so that one was a genuine trap. 😅

💬 Quick Question

The Cannes AI debate got me thinking - when you use AI for creative work (writing, design, video, anything), does it feel like a tool you're in control of, or does it start to feel like it's doing the creating? Hit reply and tell me how you think about it - I read every response, and this one genuinely has me curious.

That's it for today - thanks for reading. For more daily AI coverage, head to dailyinference.com and we'll see you tomorrow. 👋

Google beats OpenAI on math 9-to-1, chatbot hacking gets sophisticated, and robots are feeding San Francisco

☀️ TRENDING AI NEWS

🤓 AI Trivia

🤖 Google Tops OpenAI on Math - by a Landslide

The Gap Is Getting Hard to Explain Away

⚠️ Hackers Are Getting Better at Breaking Chatbots

From Prompt Tricks to Psychological Manipulation

🤖 Robots Are Cooking for San Francisco's Most Vulnerable Residents

Where Robotics Actually Fills a Gap

🏢 Scotland's 'Green' Datacentres Have a Carbon-Sized Blind Spot

A 2022 Definition in a 2026 Energy Landscape

🎬 Cannes Is Having Its AI Reckoning Moment

Two Visions, One Red Carpet

🌎 Trivia Reveal

💬 Quick Question

Keep Reading

Daily Inference - AI Daily News 🤖