×
Google’s new AI agent outperforms OpenAI and Perplexity on research benchmarks
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Google researchers have developed Test-Time Diffusion Deep Researcher (TTD-DR), a new AI framework that outperforms leading research agents from OpenAI, Perplexity, and others on key benchmarks. The system mimics human writing processes by using diffusion mechanisms and evolutionary algorithms to iteratively refine research reports, potentially powering a new generation of enterprise research assistants for complex business tasks like competitive analysis and market entry reports.

The big picture: Unlike current AI research agents that follow rigid linear processes, TTD-DR treats report creation as a diffusion process where an initial “noisy” draft is progressively refined into a polished final report.

  • The framework addresses fundamental limitations in existing deep research agents, which often lose global context and miss critical connections between information pieces.
  • As the researchers note, this “indicates a fundamental limitation in current DR agent work and highlights the need for a more cohesive, purpose-built framework for DR agents that imitates or surpasses human research capabilities.”

How it works: TTD-DR operates through two core mechanisms that work together to produce comprehensive research reports.

  • “Denoising with Retrieval” starts with a preliminary draft and iteratively improves it by formulating new search queries, retrieving external information, and integrating findings to correct inaccuracies and add detail.
  • “Self-Evolution” ensures each component—the planner, question generator, and answer synthesizer—independently optimizes its performance, making “report denoising more effective,” according to co-author Rujun Han, a research scientist at Google.
  • The system was built using Google’s Agent Development Kit with Gemini 2.5 Pro as the core language model, though other models can be substituted.

In plain English: Think of how a human researcher works—they start with a rough outline, write an initial draft, then repeatedly search for new information and revise their work until it’s polished. TTD-DR mimics this process using the same technique that AI image generators use to create pictures from random noise, but instead of images, it’s refining research reports. Each revision cycle makes the report more accurate and comprehensive.

Performance results: TTD-DR consistently outperformed commercial and open-source competitors across multiple benchmarks testing both long-form report generation and multi-hop reasoning tasks.

  • In side-by-side comparisons with OpenAI Deep Research on long-form reports, TTD-DR achieved win rates of 69.1% and 74.5% on two different datasets.
  • The system surpassed OpenAI’s offering on three separate multi-hop reasoning benchmarks with performance gains of 4.8%, 7.7%, and 1.7%.
  • Testing included comparisons against Perplexity Deep Research, Grok DeepSearch, and open-source GPT-Researcher.

Enterprise applications: The framework targets high-value business use cases that standard retrieval augmented generation (RAG) systems struggle with.

  • According to the paper’s authors, real-world business applications were the primary target for the system development.
  • The resulting research companion is “capable of generating helpful and comprehensive reports for complex research questions across diverse industry domains, including finance, biomedical, recreation, and technology.”
  • Performance gains directly measure the system’s ability to produce well-structured business documents, as Han notes the model was evaluated on helpfulness, which includes fluency and coherence.

What’s next: While current research focuses on text-based reports using web search, the framework is designed for broad adaptability across enterprise tasks.

  • Han confirmed plans to extend the work to incorporate more tools for complex enterprise applications.
  • A similar “test-time diffusion” process could generate complex software code, create detailed financial models, or design multi-stage marketing campaigns.
  • “All of these tools can be naturally incorporated in our framework,” Han said, suggesting this draft-centric approach could become foundational architecture for various complex, multi-step AI agents.
Google’s new diffusion AI agent mimics human writing to improve enterprise research

Recent News

Google DeepMind expands Perch AI to track endangered wildlife sounds

Biologists built custom classifiers in under an hour to find endangered species 50x faster.

James Cameron warns AI weapons could trigger “Terminator”-style apocalypse

The director sees three existential threats converging at humanity's most dangerous crossroads.

OpenAI offers $1.5M bonuses as Meta hoovers up AI talent

The unprecedented retention package makes every employee a millionaire over two years.