Salesforce researchers have developed CoAct-1, a new computer-use AI agent that combines traditional point-and-click navigation with code execution to automate complex tasks. The hybrid system achieved a 60.76% success rate on the OSWorld benchmark while requiring significantly fewer steps than purely GUI-based agents, potentially solving the brittleness issues that plague current automation tools.
How it works: CoAct-1 operates as a three-agent team that strategically chooses between coding and clicking based on the task at hand.
- The Orchestrator acts as project manager, analyzing user goals and delegating subtasks to either the Programmer or GUI Operator based on which approach would be most effective.
- The Programmer writes and executes Python or Bash scripts for backend operations like file management and data processing.
- The GUI Operator handles visual interface tasks that require mouse clicks and navigation through traditional point-and-click methods.
- After each subtask completion, agents report back to the Orchestrator with summaries and screenshots for the next decision.
Why this matters: Current GUI-based agents often fail on complex, multi-step workflows due to accumulated errors from precise clicking sequences.
- “A single mis-click or misunderstood UI element can derail the entire task,” the researchers noted in their paper.
- Tasks requiring more actions are statistically more likely to fail, making step reduction crucial for reliability.
- CoAct-1 solves tasks in an average of 10.15 steps compared to 15.22 steps for leading GUI-only agents like GTA-1.
The enterprise opportunity: Ran Xu, co-author and Director of Applied AI Research at Salesforce, sees immediate applications in customer support environments.
- “A service support agent uses many different tools — general tools such as Salesforce, industry-specific tools such as EPIC for healthcare, and a lot of customized tools — to investigate a customer request and formulate a response,” Xu explained.
- The technology could automate sales prospecting, bookkeeping, customer segmentation, and campaign asset generation where full API access isn’t available.
- Many enterprise tools lack APIs, making this hybrid approach particularly valuable for real-world automation.
Security and oversight challenges: The system’s ability to execute code raises important safety considerations for enterprise deployment.
- “Access control and sandboxing is the key,” Xu emphasized, noting that humans must “understand the implication and give the AI access for safety.”
- For mission-critical operations, “some may always need human approval,” suggesting a human-in-the-loop approach for high-stakes tasks.
- The path to enterprise robustness involves training agents with feedback in realistic, simulated environments before live deployment.
The competitive advantage: CoAct-1’s efficiency gains were most pronounced in OS-level tasks and multi-application workflows where programmatic control offers clear benefits.
- For example, finding image files across complex folder structures, resizing them, and creating archives can be accomplished with a single robust script rather than brittle GUI sequences.
- While other agents like OpenAI’s CUA 4o required fewer steps on average, their overall success rates were much lower than CoAct-1’s performance.
Salesforce’s new CoAct-1 agents don’t just point and click — they write code to accomplish tasks faster and with greater success rates