🤖 Daily Inference
Tuesday, February 24, 2026
Good morning! The AI hardware race is heating up in unexpected directions - a startup just replaced programmable GPUs with fixed-function chips and hit a jaw-dropping 17,000 tokens per second. Meanwhile, Samsung is reshaping how Galaxy users search the web, Google researchers found a way to cut inference costs in half, and a disturbing story out of Canada is forcing a hard conversation about AI chatbot safety. Let's get into it.
⚡ Taalas Is Ditching GPUs - And Hitting 17,000 Tokens Per Second
What if the path to faster, cheaper AI inference isn't better GPUs - but no GPUs at all? That's the bet Taalas is making. The startup is replacing programmable graphics processors with purpose-built, hardwired AI chips designed exclusively for inference workloads. The result: 17,000 tokens per second - a figure that would make most GPU clusters jealous.
The core insight behind Taalas's approach is that general-purpose GPUs carry a lot of overhead - they're designed to handle many different types of computation, which means a lot of that silicon goes to waste when you're only running inference. By hardwiring the chip specifically for AI token generation, Taalas eliminates that inefficiency and focuses every transistor on the task at hand. The tradeoff? You lose flexibility, but you gain raw speed and potentially much lower power consumption per token.
This matters because inference - the act of actually running an AI model to generate responses - is quickly becoming the dominant cost in AI deployment. Training happens once; inference happens billions of times a day. If Taalas can deliver on its promise of "ubiquitous inference," it could dramatically lower the cost of running AI at scale, opening the door to more powerful AI in edge devices, mobile hardware, and real-time applications. Keep an eye on the AI hardware space - it's moving fast.
🛠️ Samsung Is Adding Perplexity to Galaxy AI
Samsung is shaking up its Galaxy AI ecosystem with a notable new addition: Perplexity is coming to Galaxy devices. According to The Verge, Samsung is integrating the AI-powered search engine into its Galaxy AI suite, giving users a conversational, citation-backed alternative to traditional web search directly within their smartphone experience.
This is a meaningful move in the ongoing battle to redefine how people search the internet on mobile. Perplexity has positioned itself as an "answer engine" - rather than returning a list of links, it synthesizes information from across the web and presents a direct response with sourced citations. Embedding that capability natively into a major Android device lineup like Galaxy could expose millions of users to AI-first search for the first time.
For Samsung, this continues a broader pattern of layering third-party AI capabilities into Galaxy AI alongside its own on-device models. For Perplexity, it's a major distribution win that could accelerate adoption beyond its core tech-savvy user base. It also raises the stakes for Google, whose search dominance on Android has long been protected by default status - a position that AI-powered alternatives are increasingly challenging. This is one of the more consequential tech partnerships we've seen this month.
🚀 Google's "Deep-Thinking Ratio" Cuts Inference Costs by Half
Researchers at Google have proposed a clever new technique for improving large language model performance without the usual cost penalties: the "Deep-Thinking Ratio." According to MarkTechPost, the approach involves strategically controlling how much "deep thinking" - extended chain-of-thought reasoning - an LLM applies to a given query, rather than defaulting to maximum reasoning for every single prompt.
The intuition is straightforward: not every question requires the same depth of reasoning. Asking an AI what the capital of France is doesn't need the same compute budget as solving a multi-step math problem. Google's research proposes a ratio that dynamically allocates reasoning depth based on the complexity of the task, improving accuracy on hard problems while dramatically reducing compute on easier ones. The net result, according to the research, is total inference costs cut by roughly half while maintaining or improving accuracy.
This kind of efficiency research is becoming increasingly critical as AI deployments scale. The economics of running frontier models at billions of queries per day are eye-watering, and any technique that halves that cost without sacrificing quality is enormously valuable. If this approach holds up under broader testing, it could have wide-ranging implications for how language models are deployed commercially - and for how competitive the inference market becomes.
⚠️ ChatGPT Was Talking to the Suspect in a School Shooting
One of the most disturbing AI stories in recent memory surfaced this week. The suspect in the Tumbler Ridge school shooting in Canada had described violent scenarios to ChatGPT before the attack, according to reporting from both The Verge and TechCrunch. The revelations have prompted a deeply uncomfortable question: what responsibility does an AI company have when its chatbot receives messages that may foreshadow real-world violence?
TechCrunch reports that OpenAI internally debated whether to contact police after learning about the conversations. The case highlights the profound tension AI companies face: their products are designed to be helpful and non-judgmental conversational partners, but that same openness can make them inadvertent confidants for people planning harm. How should an AI system - or the company behind it - respond when conversations cross into territory that suggests imminent danger?
This story is likely to accelerate policy debates around chatbot safety and mandatory reporting obligations for AI platforms. It also raises questions about what guardrails are realistic - and where the line sits between privacy, free expression, and public safety. There are no easy answers here, but this case will almost certainly be cited in AI regulation discussions for years to come. We've covered related AI safety concerns in previous issues.
🏢 The Met Police Is Using Palantir AI to Flag Officer Misconduct
London's Metropolitan Police is using AI tools supplied by Palantir to flag potential officer misconduct, according to The Guardian. The system is designed to identify patterns of behavior among officers that might indicate wrongdoing before it escalates - a kind of predictive monitoring system turned inward on the police force itself rather than on the public.
Palantir's involvement in UK public sector AI is not new, but this particular application is notable. The Met has faced significant scrutiny over institutional misconduct in recent years, and using data analytics to proactively surface warning signs is a significant operational shift. The AI tools reportedly analyze a range of data points related to officer activity to generate alerts for supervisors.
The development is likely to draw attention from civil liberties groups and police reform advocates alike - raising questions about accuracy, fairness, and the consequences for officers flagged incorrectly. It also fits into a broader global trend of public sector AI deployments that are increasingly consequential for individuals' lives and careers. When AI influences who gets investigated for misconduct, the stakes for getting it right are extremely high. This is also worth reading alongside our earlier coverage of Coventry Council's Palantir contract review for more context on Palantir's UK footprint.
🌾 US Farmers Are Turning Down Millions for Data Center Land
Here's a story that puts the AI infrastructure boom in very human terms. According to The Guardian, US farmers are rejecting multimillion-dollar bids for their land from data center developers - with some saying flatly, "I'm not for sale." As tech companies race to build the physical infrastructure needed to power AI, they're running into something algorithms can't easily optimize away: the values and identities of rural landowners.
Data centers require enormous tracts of flat land, stable power supplies, and access to water for cooling - characteristics that make agricultural land in the US Midwest and rural South particularly attractive. The bids being turned down in some cases represent life-changing sums of money. Yet many farmers are declining, citing concerns about the environmental impact on their communities, the loss of agricultural heritage, and distrust of big tech's motives.
This tension is only going to intensify. The AI infrastructure buildout requires massive physical expansion, and the communities being asked to host that expansion don't always share the enthusiasm of Silicon Valley investors. This story is a reminder that environmental concerns and community consent are real constraints on how fast the AI industry can grow - no matter how many billions are on the table. Speaking of which, if you're building an AI-powered business and need a fast, professional web presence, check out 60sec.site - an AI website builder that can get you online in under a minute.
💬 What Do You Think?
The ChatGPT and school shooting story raises a question that I don't think has a clean answer: Should AI companies be legally required to report conversations that suggest imminent violence - even if it means breaking user privacy? Where would you draw the line? Hit reply and let me know - I read every response and I'm genuinely curious how you see this one.
That's it for today! From hardwired chips pushing the boundaries of what's physically possible to AI chatbots at the center of a real-world tragedy, the range of today's stories is a reminder of just how many dimensions this technology touches. Thanks for reading - if you found this useful, forward it to a friend or colleague who'd appreciate it. See you tomorrow.
- The Daily Inference team | dailyinference.com