What we're reading, in the order we're reading it.
Anthony and Harry's working stream — links, charts, tweets, and short takes. The unprocessed inputs behind the briefings.
(via DEV) The bitter lesson of misuse detection (via DEV)
TL;DR: We wanted to benchmark supervision systems available on the market—they performed poorly. Out of curiosity, we naively asked a frontier LLM to…
(via DEV) Evaluating and monitoring for AI scheming (via DEV)
As AI models become more sophisticated, a key concern is the potential for “deceptive alignment” or “scheming”. This is the risk of an AI system beco…
(via DEV) Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models (via DEV)
A Blog post by Project-Numina on Hugging Face
(via DEV) Nvidia becomes world’s first $4tn company (via DEV)
Shares in the chip-maker have surged in value as investment in AI continues to gather pace.
(via DEV) GitHub – snap-stanford/Biomni: Biomni: a general-purpose biomedical AI agent (via DEV)
Biomni: a general-purpose biomedical AI agent. Contribute to snap-stanford/Biomni development by creating an account on GitHub.
(via DEV) A robot might perform your next surgery (via DEV)
The robot performed with the expertise of a skilled human surgeon, researchers at Johns Hopkins University said
(via DEV) Exclusive: OpenAI to release web browser in challenge to Google Chrome (via DEV)
OpenAI is close to releasing an AI-powered web browser that will challenge Alphabet’s market-dominating Google Chrome, three people familiar with the matter told Reuters.
(via DEV) Creating custom kernels for the AMD MI300 (via DEV)
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
(via DEV) Chipmaker Nvidia becomes most valuable company in the world at $4 trillion (via DEV)
The poster child of the AI boom, Nvidia has grown into largest company on Wall Street, surpassing Microsoft, Apple, Amazon and Google.
(via DEV) What’s worse, spies or schemers? (via DEV)
Here are two problems you’ll face if you’re an AI company building and using powerful AI: …
(via DEV) Upskill your LLMs With Gradio MCP Servers (via DEV)
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
(via DEV) Generative AI, not ad tech, is the new antitrust battleground for Google (via DEV)
The latest EU complaint from independent publishers marks the third potential major antitrust battle currently facing Google.
(via DEV) Grok, Elon Musk’s AI Chatbot, Shares Antisemitic Posts on X (via DEV)
The artificial intelligence chatbot, which has a dedicated account on X, praised Hitler after fielding a query about a user’s comments on the Texas flood.
(via DEV) How to Train Your LLM Web Agent: A Statistical Diagnosis (via DEV)
A Blog post by Emiliano Penaloza on Hugging Face
(via DEV) LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance (via DEV)
Abstract In this paper, LLMs are tasked with completing an impossible quiz, while they are in a sandbox, monitored, told about these measures and ins…
(via DEV) Subversion via Focal Points: Investigating Collusion in LLM Monitoring (via DEV)
I released a new paper on collusion and Schelling coordination between language models: “Subversion via Focal Points: Investigating Collusion in LLM…
(via DEV) Study could lead to LLMs that are better at complex reasoning (via DEV)
To improve adaptability of large language models to challenging tasks that require reasoning, MIT researchers found strategically applying a method known as test-time training with task-specific examples can boost the...
(via DEV) AI Safety at the Frontier: Paper Highlights, June ’25 (via DEV)
tl;dr Paper of the month: • Emergent misalignment arises across many models when training on incorrect data and is largely driven by a single “toxic…
(via DEV) The Era of Exploration (via DEV)
This post explores the idea that the next breakthroughs in AI may hinge more on how we collect experience through exploration, and less on how many parameters and data points...
(via DEV) Apple’s newest AI study unlocks street view for blind users (via DEV)
SceneScout, combines Apple Maps with a multimodal LLM to provide interactive, AI-generated descriptions of street view images.
(via DEV) How a big shift in training LLMs led to a capability explosion (via DEV)
Reinforcement learning, explained with a minimum of math and jargon.
(via DEV) AI could be about to completely change the way we do mathematics (via DEV)
Computers can help ensure that mathematical proofs are correct, but translating traditional maths into a machine-readable format is an arduous task. Now, the latest generation of artificial intelligence models is...
(via DEV) Forget the hype — real AI agents solve bounded problems, not open-world fantasies (via DEV)
Event-driven multi-agent systems are a practical architecture for working with imperfect tools in a structured way.
(via DEV) ChatGPT is pushing people towards mania, psychosis and death (via DEV)
Record numbers of people are turning to AI chatbots for therapy, reports Anthony Cuthbertson. But recent incidents have uncovered some deeply worrying blindspots of a technology out of control
(via DEV) The AI Birthday Letter That Blew Me Away (via DEV)
Google is ushering in an era of custom chatbots.
(via DEV) How AI May Treat Chronic Pain Without Medication (via DEV)
USC and UCLA researchers have created a wireless device that uses AI to decode pain levels from brain activity and customize spinal cord stimulation to treats chronic pain.
(via DEV) Early Signs of Steganographic Capabilities in Frontier LLMs (via DEV)
One key strategy for preventing bad outcomes from misuse or misalignment is model monitoring. However, one way that monitoring can fail is if LLMs us…
(via DEV) Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30% (via DEV)
Sakana AI’s new inference-time scaling technique uses Monte-Carlo Tree Search to orchestrate multiple LLMs to collaborate on complex tasks.
(via DEV) Microsoft: We Need A Lot Less Employees. And a Lot More AI Infrastructure. (via DEV)
The tech giant’s massive workforce reductions coincide with record-breaking AI investments, signaling a fundamental shift in how Microsoft views its future workforce Microsoft’s strateg…
(via DEV) Thought Anchors: Which LLM Reasoning Steps Matter? (via DEV)
This post is adapted from our recent arXiv paper. Paul Bogdan and Uzay Macar are co-first authors on this work. …
(via DEV) AI Task Length Horizons in Offensive Cybersecurity (via DEV)
This is a rough research note where the primary objective was my own learning. I am sharing it because I’d love feedback and I thought the results we…