×
Employee of the Month: Salesforce’s CoAct-1 hybrid AI agent achieves 60% task success rate
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Salesforce researchers have developed CoAct-1, a new computer-use AI agent that combines traditional point-and-click navigation with code execution to automate complex tasks. The hybrid system achieved a 60.76% success rate on the OSWorld benchmark while requiring significantly fewer steps than purely GUI-based agents, potentially solving the brittleness issues that plague current automation tools.

How it works: CoAct-1 operates as a three-agent team that strategically chooses between coding and clicking based on the task at hand.

  • The Orchestrator acts as project manager, analyzing user goals and delegating subtasks to either the Programmer or GUI Operator based on which approach would be most effective.
  • The Programmer writes and executes Python or Bash scripts for backend operations like file management and data processing.
  • The GUI Operator handles visual interface tasks that require mouse clicks and navigation through traditional point-and-click methods.
  • After each subtask completion, agents report back to the Orchestrator with summaries and screenshots for the next decision.

Why this matters: Current GUI-based agents often fail on complex, multi-step workflows due to accumulated errors from precise clicking sequences.

  • “A single mis-click or misunderstood UI element can derail the entire task,” the researchers noted in their paper.
  • Tasks requiring more actions are statistically more likely to fail, making step reduction crucial for reliability.
  • CoAct-1 solves tasks in an average of 10.15 steps compared to 15.22 steps for leading GUI-only agents like GTA-1.

The enterprise opportunity: Ran Xu, co-author and Director of Applied AI Research at Salesforce, sees immediate applications in customer support environments.

  • “A service support agent uses many different tools — general tools such as Salesforce, industry-specific tools such as EPIC for healthcare, and a lot of customized tools — to investigate a customer request and formulate a response,” Xu explained.
  • The technology could automate sales prospecting, bookkeeping, customer segmentation, and campaign asset generation where full API access isn’t available.
  • Many enterprise tools lack APIs, making this hybrid approach particularly valuable for real-world automation.

Security and oversight challenges: The system’s ability to execute code raises important safety considerations for enterprise deployment.

  • “Access control and sandboxing is the key,” Xu emphasized, noting that humans must “understand the implication and give the AI access for safety.”
  • For mission-critical operations, “some may always need human approval,” suggesting a human-in-the-loop approach for high-stakes tasks.
  • The path to enterprise robustness involves training agents with feedback in realistic, simulated environments before live deployment.

The competitive advantage: CoAct-1’s efficiency gains were most pronounced in OS-level tasks and multi-application workflows where programmatic control offers clear benefits.

  • For example, finding image files across complex folder structures, resizing them, and creating archives can be accomplished with a single robust script rather than brittle GUI sequences.
  • While other agents like OpenAI’s CUA 4o required fewer steps on average, their overall success rates were much lower than CoAct-1’s performance.
Salesforce’s new CoAct-1 agents don’t just point and click — they write code to accomplish tasks faster and with greater success rates

Recent News

UCSF psychiatrist reports 12 cases of AI psychosis from chatbot interactions

Chatbots function like "hallucinatory mirrors" that exploit vulnerabilities in human cognition.

ChatGPT adds workspace integrations as OpenAI manages GPT-5 capacity

Existing customers get priority access as infrastructure strains under massive demand.