Google has introduced the Gemini 2.5 Computer Use model, a specialized version of Gemini 2.5 Pro designed to give AI agents direct control over apps and browsers. Instead of relying only on structured APIs, the new model interacts with graphical interfaces like a human clicking, typing, scrolling, and even filling forms.

The model works through a continuous loop in the Gemini API’s computer_use tool, analyzing screenshots and recent actions before producing the next UI command. It can request user confirmation for sensitive tasks such as purchases, adding a layer of safety.
Google claims the model outperforms alternatives in both speed and accuracy, particularly in browser benchmarks, while also showing early promise in mobile app control. To prevent misuse, it comes with strict safeguards like per-step safety checks and developer-defined system instructions.
Gemini 2.5 Computer Use is now in public preview via Google AI Studio and Vertex AI.

0 Comments
Leave a Reply