News report:
Reddit users were the first to notice that Claude suddenly became sharper and more capable, and now we know why: Anthropic has made significant upgrades to its AI models, including an enhanced Claude 3.5 Sonnet and much-needed upgrades to its lightweight Haiku model.
The latest news is that these AIs can now physically control computers like humans, manipulating cursors, scrolling pages, and even clicking buttons. In a video demonstration, Anthropic researcher Sam Ringer showcased how Claude could fill out forms on external websites by scrolling through spreadsheets, searching for company information after analyzing the company’s CRM, and understanding and filling out the fields in the form.
Anthropic stated in an article, “With the availability of Claude on API, developers can direct Claude to use computers just as people would look at screens, move cursors, click buttons, and input text. Claude 3.5 Sonnet is the cutting-edge AI model that offers computer usage for the first time.”
In an official announcement earlier today, Anthropic said, “We’re releasing computer usage early to solicit feedback from developers and expect this capability to rapidly improve over time.”
Anthropic (or possibly one of their AIs pressing the buttons?) seems to have released the model before they made the announcement. For the past few hours, the Claude and Anthropic subreddits have been filled with people trying to figure out what happened because their AI is performing so well: Users report that it is faster, more accurate, and surprisingly, it no longer apologizes.
“Claude is back and much improved. It just responds to you as if it truly understands your intent rather than giving lifeless responses,” said a user named NextGenA in a Reddit post. “I was stuck on a coding issue for hours with o1 Mini and o1 Preview, gradually getting worse responses. Submitted the same prompt to Claude, and it had no problems instantly,” commented Roth_Skyfire in another post.
They are right. According to Anthropic’s report, Claude 3.5 Sonnet’s coding ability soared from 33.4% to 49% in SWE bench Verified tests, surpassing competitors like OpenAI o1 Preview. This is not just a minor improvement. Every benchmark reported by Anthropic indicates that the new Claude 3.5 Sonnet is much better than the original model.
But here’s where things get really interesting. The upgraded Sonnet is not only smarter; it can now control your computer. Anthropic calls this new feature “computer usage,” currently in beta. It works by allowing Claude to access your desktop and perform a task. The AI then starts using your computer remotely just like a human would—moving cursors, clicking buttons, typing commands, filling out forms and text fields, just like a human.
However, this feature is only available through an API, so end-users won’t be able to enjoy it in the short term.
Anthropic has trained Claude to interpret what is happening on your screen visually. Developers can instruct it to perform tasks like filling out forms, browsing websites, or even using software applications. It’s like having your AI sitting at your computer, doing work for you, except it doesn’t get tired and (hopefully) doesn’t make as many mistakes as we humans do.
The feature is in testing phase as it still encounters some basic issues—scrolling and zooming give it trouble. That’s why Anthropic is monitoring things closely, storing screenshots for at least 30 days, and conducting security checks for any suspicious behavior.
The company’s caution is justified. A few months ago, Microsoft introduced a feature called “Recall” that allowed Copilot+ to take screenshots of users’ computers, making its AI more useful and relevant. The noise was so loud that Microsoft had to postpone the plan after the Copilot+Recall feature was labeled as “spyware,” and authorities started investigating it.
But Anthropic is made up of good people, and they promise to be different. The research team stated, “We find that the updated Claude 3.5 Sonnet, including its new computer usage skill, remains at AI safety level 2—meaning it does not require higher security and safety measures than what we currently implement.”
Companies like Replit are already integrating Claude’s computer usage capabilities to help automate application assessments, while The Browser Company is testing its ability to streamline web-based workflows. These early adopters are exploring ways to have Claude handle tasks that typically require dozens or even hundreds of manual steps.
Additionally, the budget-friendly model, Claude 3.5 Haiku, is now as powerful as the previous flagship model, Claude 3 Opus. However, this model’s operating cost is only a fraction of its predecessor, and the latency is much lower, making it more easily accessible without sacrificing too much performance.
Claude 3.5 Haiku is set to launch in November.