Mistral drops a 77.6% SWE-Bench model, the Oscars ban AI actors, and the Musk v. Altman trial gets messy

☀️ TRENDING AI NEWS

🤖 Mistral AI launches Medium 3.5 with a 77.6% SWE-Bench Verified score and async cloud coding agents
🏢 The Academy officially bans AI-generated actors and scripts from Oscar eligibility
⚖️ Musk v. Altman trial enters week two - court exhibits reveal Jensen Huang gifted OpenAI a supercomputer
🛠️ Microsoft launches a dedicated Legal Agent inside Word for contract review and negotiation history

Three things happened in the last 48 hours that each tell a different story about where AI is heading - and together, they paint a pretty clear picture of the moment we're in right now.

A European AI lab just released a coding model that rivals the best in the world. Hollywood drew a hard line in the sand. And a courtroom in San Francisco is slowly pulling back the curtain on how the most powerful AI company on earth actually got started. Let's get into it.

🤓 AI Trivia

What does "SWE-Bench Verified" actually measure in AI model evaluations?

💻 A model's ability to write original software from scratch
🐛 A model's ability to resolve real-world GitHub software issues
📊 A model's score on academic computer science exams
⚙️ A model's speed at compiling and running code

The answer is hiding near the bottom of today's newsletter... keep scrolling. 👇

🤖 Mistral Launches Medium 3.5 - and It Can Code While You Sleep

Mistral AI just made a serious move in the coding agent space. The Paris-based lab released Mistral Medium 3.5, a 128B parameter flagship model that scores 77.6% on SWE-Bench Verified - one of the most demanding real-world coding benchmarks in the industry. That puts it in genuinely competitive territory with the top models from OpenAI and Anthropic.

Async Agents That Run in the Background

But the benchmark isn't even the most interesting part. Mistral also launched Remote Agents in Vibe - cloud-based coding sessions that run asynchronously. You kick off a task, close your laptop, and the agent keeps working. The new "Work mode" in Le Chat brings this capability directly into Mistral's consumer interface.

For developers building with AI agents, this is a meaningful shift. You're no longer constrained by keeping a browser tab open or managing local compute. It's the kind of async, hands-off workflow that enterprise teams have been asking for - and Mistral is offering it at a competitive price point.

If you're prototyping fast and need a site live before your agent is done running, 60sec.site lets you spin up an AI-built website in under a minute - worth keeping in your toolkit alongside your coding agent setup.

Read the full story →

🎬 The Oscars Just Drew a Hard Line on AI

The Academy of Motion Picture Arts and Sciences has officially ruled that AI-generated actors and scripts are no longer eligible for Oscars. It's one of the clearest and most high-profile institutional boundaries the entertainment industry has set so far - and it lands at a moment when the conversation around AI and creative work is getting louder by the week.

Hollywood Defines What "Human" Means for Awards Season

This isn't just symbolic. It sets a precedent that ripples through contracts, union negotiations, and how studios think about deploying AI in production. The rule specifically calls out AI-generated performances - so a fully synthetic actor, regardless of how convincing, cannot win Best Actor. The same applies to scripts generated by AI without substantial human authorship.

The timing is notable. The SAG-AFTRA strikes of 2023 were partly about this exact issue, and now the Academy is formalizing what Hollywood's creative community fought for. Whether this holds as AI-generated content becomes harder to detect is a separate - and very open - question.

Read the full story →

⚖️ The Musk v. Altman Trial Is Rewriting OpenAI's Origin Story

The courtroom drama between Elon Musk and OpenAI took some genuinely revealing turns last week. Musk spent the better part of three days on the witness stand arguing that Sam Altman and Greg Brockman deceived him when building OpenAI - and that the company's conversion to a for-profit model betrayed the nonprofit mission he signed up for.

What the Exhibits Actually Show

The court exhibits are where things get fascinating. Documents from before OpenAI even had a name are now public. Nvidia CEO Jensen Huang gifted OpenAI an in-demand supercomputer in the early days. Musk appears to have drafted significant founding documents himself. And perhaps most awkwardly, Musk admitted on the stand that xAI has been distilling OpenAI's models - which is either a legal liability or a strategic admission, depending on how the judge reads it.

The broader picture emerging from the trial is that OpenAI's founding was messier, more personal, and more contested than the clean origin story the company has presented publicly. By most legal analyst readings, Musk won't win this case - but the courtroom is producing a remarkable historical record regardless.

MIT Technology Review has the full week one breakdown →

The Verge also has a full exhibit breakdown →

🛠️ Microsoft Puts an AI Lawyer Inside Word

Microsoft is launching a Legal Agent directly inside Word - and it's specifically designed to make legal teams trust it. That last part is doing a lot of work, and Microsoft seems to know it.

Structured Workflows, Not a General-Purpose Chatbot

The agent handles contract reviews, tracks negotiation history across document versions, and manages complex multi-party documents. What's notable is the design philosophy: rather than a general AI model interpreting freeform commands, it follows structured workflows shaped by real legal practice. Microsoft is explicitly positioning this as more predictable and auditable than a general chatbot - which is exactly what legal teams need to actually deploy it.

For anyone following legal technology and AI agents in enterprise settings, this is worth watching closely. Legal is one of the highest-stakes environments for AI deployment - and if Microsoft can get buy-in from law firms and in-house counsel, it signals a lot about where enterprise AI adoption is headed.

Read the full story →

⚠️ The Claude Subscription Scam You Need to Know About

Here's one that's flying under the radar but affecting real people. The Guardian reports that users subscribing to the Claude chatbot are discovering unexpected $200 charges for gift cards appearing on their credit card bills - charges they never authorized. One family profiled in the story signed up for a standard $20/month plan and later found two separate $200 payments.

Subscription Billing That Doesn't Match What You Signed Up For

This appears to be a pattern affecting multiple users, not an isolated incident. The mechanism involves gift card charges being added to subscriptions in ways that aren't clearly communicated at signup. For a sector that's asking users to trust AI with their most sensitive tasks, chatbot safety and transparent billing practices matter enormously.

If you or someone you know is subscribed to any AI chatbot service, it's worth auditing your credit card statements for unexpected charges. This is also a reminder to check the billing details before subscribing to any AI service - not just Claude.

Read the full story →

🔬 AI Solves a 300-Year-Old Art History Mystery

Here's a palate cleanser from the Guardian. Researchers using AI analysis have discovered that a Hans Holbein portrait long considered to be of an unknown woman may actually be Anne Boleyn - Henry VIII's famously doomed second wife. Separately, the sketch traditionally labeled as Anne Boleyn may have been misidentified centuries ago.

The researchers believe the two portraits were incorrectly inscribed in the 1700s, leading to a centuries-long mix-up. AI analysis of facial features, stylistic elements, and historical context helped untangle what human scholars had assumed was settled. It's a nice reminder that historical AI applications extend well beyond the obvious use cases - and sometimes the most interesting discoveries come from the least expected directions.

Read the full story →

🌎 Trivia Reveal

The answer is: 🐛 A model's ability to resolve real-world GitHub software issues. SWE-Bench Verified tests whether AI models can actually fix bugs from real open-source GitHub repositories - not write code from scratch or pass theoretical exams. It's considered one of the most practical and difficult coding benchmarks because it mirrors what software engineers actually do day-to-day. Mistral Medium 3.5's 77.6% score means it successfully resolved more than three-quarters of those real-world issues.

💬 Quick Question

The Oscars' decision to ban AI-generated actors and scripts is one of the clearest institutional lines drawn so far - but is it the right call? Hit reply and tell me: do you think AI-assisted creative work should be eligible for awards, or should human authorship be a hard requirement? I read every response and genuinely curious where people land on this one.

That's all for today - see you tomorrow with more from the AI frontier. If you want to browse past coverage, the full Daily Inference archive is always one click away. And if someone forwarded this to you, you can subscribe and read everything at dailyinference.com.

Mistral drops a 77.6% SWE-Bench model, the Oscars ban AI actors, and the Musk v. Altman trial gets messy

☀️ TRENDING AI NEWS

🤓 AI Trivia

🤖 Mistral Launches Medium 3.5 - and It Can Code While You Sleep

Async Agents That Run in the Background

🎬 The Oscars Just Drew a Hard Line on AI

Hollywood Defines What "Human" Means for Awards Season

⚖️ The Musk v. Altman Trial Is Rewriting OpenAI's Origin Story

What the Exhibits Actually Show

🛠️ Microsoft Puts an AI Lawyer Inside Word

Structured Workflows, Not a General-Purpose Chatbot

⚠️ The Claude Subscription Scam You Need to Know About

Subscription Billing That Doesn't Match What You Signed Up For

🔬 AI Solves a 300-Year-Old Art History Mystery

🌎 Trivia Reveal

💬 Quick Question

Keep Reading

Daily Inference - AI Daily News 🤖