News Feed - CO/AI

Oct 12, 2025

Joanna · link

(via DEV) The bitter lesson of misuse detection (via DEV)

TL;DR: We wanted to benchmark supervision systems available on the market—they performed poorly. Out of curiosity, we naively asked a frontier LLM to…

Oct 12, 2025

Joanna · link

(via DEV) Evaluating and monitoring for AI scheming (via DEV)

As AI models become more sophisticated, a key concern is the potential for “deceptive alignment” or “scheming”. This is the risk of an AI system beco…

Oct 12, 2025

Joanna · link

(via DEV) Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models (via DEV)

A Blog post by Project-Numina on Hugging Face

Oct 12, 2025

Joanna · link

(via DEV) Computer Scientists Figure Out How to Prove Lies (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) Nvidia becomes world’s first $4tn company (via DEV)

Shares in the chip-maker have surged in value as investment in AI continues to gather pace.

Oct 12, 2025

Joanna · link

(via DEV) GitHub – snap-stanford/Biomni: Biomni: a general-purpose biomedical AI agent (via DEV)

Biomni: a general-purpose biomedical AI agent. Contribute to snap-stanford/Biomni development by creating an account on GitHub.

Oct 12, 2025

Joanna · link

(via DEV) A robot might perform your next surgery (via DEV)

The robot performed with the expertise of a skilled human surgeon, researchers at Johns Hopkins University said

Oct 12, 2025

Joanna · link

(via DEV) Exclusive: OpenAI to release web browser in challenge to Google Chrome (via DEV)

OpenAI is close to releasing an AI-powered web browser that will challenge Alphabet’s market-dominating Google Chrome, three people familiar with the matter told Reuters.

Oct 12, 2025

Joanna · link

(via DEV) Creating custom kernels for the AMD MI300 (via DEV)

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Oct 12, 2025

Joanna · link

(via DEV) Chipmaker Nvidia becomes most valuable company in the world at $4 trillion (via DEV)

The poster child of the AI boom, Nvidia has grown into largest company on Wall Street, surpassing Microsoft, Apple, Amazon and Google.

Oct 12, 2025

Joanna · link

(via DEV) What’s worse, spies or schemers? (via DEV)

Here are two problems you’ll face if you’re an AI company building and using powerful AI: …

Oct 12, 2025

Joanna · link

(via DEV) Upskill your LLMs With Gradio MCP Servers (via DEV)

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Oct 12, 2025

Joanna · link

(via DEV) Generative AI, not ad tech, is the new antitrust battleground for Google (via DEV)

The latest EU complaint from independent publishers marks the third potential major antitrust battle currently facing Google.

Oct 12, 2025

Joanna · link

(via DEV) Grok, Elon Musk’s AI Chatbot, Shares Antisemitic Posts on X (via DEV)

The artificial intelligence chatbot, which has a dedicated account on X, praised Hitler after fielding a query about a user’s comments on the Texas flood.

Oct 12, 2025

Joanna · link

(via DEV) Supabase MCP can leak your entire SQL database (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) The Tradeoffs of SSMs and Transformers (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) How to Train Your LLM Web Agent: A Statistical Diagnosis (via DEV)

A Blog post by Emiliano Penaloza on Hugging Face

Oct 12, 2025

Joanna · link

(via DEV) LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance (via DEV)

Abstract In this paper, LLMs are tasked with completing an impossible quiz, while they are in a sandbox, monitored, told about these measures and ins…

Oct 12, 2025

Joanna · link

(via DEV) Subversion via Focal Points: Investigating Collusion in LLM Monitoring (via DEV)

I released a new paper on collusion and Schelling coordination between language models: “Subversion via Focal Points: Investigating Collusion in LLM…

Oct 12, 2025

Joanna · link

(via DEV) Study could lead to LLMs that are better at complex reasoning (via DEV)

To improve adaptability of large language models to challenging tasks that require reasoning, MIT researchers found strategically applying a method known as test-time training with task-specific examples can boost the...

Oct 12, 2025

Joanna · link

(via DEV) AI Safety at the Frontier: Paper Highlights, June ’25 (via DEV)

tl;dr Paper of the month: • Emergent misalignment arises across many models when training on incorrect data and is largely driven by a single “toxic…

Oct 12, 2025

Joanna · link

(via DEV) The Era of Exploration (via DEV)

This post explores the idea that the next breakthroughs in AI may hinge more on how we collect experience through exploration, and less on how many parameters and data points...

Oct 12, 2025

Joanna · link

(via DEV) Apple’s newest AI study unlocks street view for blind users (via DEV)

SceneScout, combines Apple Maps with a multimodal LLM to provide interactive, AI-generated descriptions of street view images.

Oct 12, 2025

Joanna · link

(via DEV) Mercury: Ultra-Fast Language Models Based on Diffusion (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) How a big shift in training LLMs led to a capability explosion (via DEV)

Reinforcement learning, explained with a minimum of math and jargon.

Oct 12, 2025

Joanna · link

(via DEV) Anthropic downloaded over 7M pirated books to train Claude, a judge said (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) AI could be about to completely change the way we do mathematics (via DEV)

Computers can help ensure that mathematical proofs are correct, but translating traditional maths into a machine-readable format is an arduous task. Now, the latest generation of artificial intelligence models is...

Oct 12, 2025

Joanna · link

(via DEV) Evaluating the factuality of verifiable claims in long-form text generation (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) Forget the hype — real AI agents solve bounded problems, not open-world fantasies (via DEV)

Event-driven multi-agent systems are a practical architecture for working with imperfect tools in a structured way.

Oct 12, 2025

Joanna · link

(via DEV) Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) ChatGPT is pushing people towards mania, psychosis and death (via DEV)

Record numbers of people are turning to AI chatbots for therapy, reports Anthony Cuthbertson. But recent incidents have uncovered some deeply worrying blindspots of a technology out of control

Oct 12, 2025

Joanna · link

(via DEV) Large Language Models Are Improving Exponentially (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) The AI Birthday Letter That Blew Me Away (via DEV)

Google is ushering in an era of custom chatbots.

Oct 12, 2025

Joanna · link

(via DEV) How AI May Treat Chronic Pain Without Medication (via DEV)

USC and UCLA researchers have created a wireless device that uses AI to decode pain levels from brain activity and customize spinal cord stimulation to treats chronic pain.

Oct 12, 2025

Joanna · link

(via DEV) VLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention (via DEV)

Comments

Oct 12, 2025

Joanna · link

(via DEV) Early Signs of Steganographic Capabilities in Frontier LLMs (via DEV)

One key strategy for preventing bad outcomes from misuse or misalignment is model monitoring. However, one way that monitoring can fail is if LLMs us…

Oct 12, 2025

Joanna · link

(via DEV) Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30% (via DEV)

Sakana AI’s new inference-time scaling technique uses Monte-Carlo Tree Search to orchestrate multiple LLMs to collaborate on complex tasks.

Oct 12, 2025

Joanna · link

(via DEV) Microsoft: We Need A Lot Less Employees. And a Lot More AI Infrastructure. (via DEV)

The tech giant’s massive workforce reductions coincide with record-breaking AI investments, signaling a fundamental shift in how Microsoft views its future workforce Microsoft’s strateg…

Oct 12, 2025

Joanna · link

(via DEV) Thought Anchors: Which LLM Reasoning Steps Matter? (via DEV)

This post is adapted from our recent arXiv paper. Paul Bogdan and Uzay Macar are co-first authors on this work. …

Oct 12, 2025

Joanna · link

(via DEV) AI Task Length Horizons in Offensive Cybersecurity (via DEV)

This is a rough research note where the primary objective was my own learning. I am sharing it because I’d love feedback and I thought the results we…

What we're reading, in the order we're reading it.

All Signal.No Noise.

All Signal.
No Noise.