With Computer Use, the Mountain View company equips its model with the ability to use Chrome like a real human user. Clicking, filling out forms, scrolling, dragging and dropping… it does it all.
The era of passive assistants is coming to an end. Google unveils Gemini 2.5 Computer Use, an AI capable of directly interacting with the web. Gemini can now navigate, fill out forms, test interfaces, or book hotels… all while manipulating Chrome as you would with your mouse and keyboard.
The AI Moves From Text To Action With Gemini 2.5 Computer Use
Google continues to push the boundaries of its Gemini model. After previous versions capable of reading, writing, and analyzing, we now have Gemini 2.5 Computer Use. This major update gives “hands” to the AI. Google’s strategy focuses on mastering the Chrome browser, which is a smart choice as the web is already the most universal working environment.
Thus, Gemini 2.5 Computer Use interacts directly with websites, not through APIs, but via their user interfaces. This is Google’s direct response to OpenAI (ChatGPT Agent) and Anthropic (Claude 3.5 Sonnet), which are already testing such agents capable of acting online.
Our new Gemini 2.5 Computer Use model can navigate browsers just like you do. 🌐
It builds on Gemini’s visual understanding and reasoning capabilities to power agents that can click, scroll and type for you online – setting a new standard on multiple benchmarks, with faster… pic.twitter.com/Fqmov9Kkhb— Google DeepMind (@GoogleDeepMind) October 7, 2025
Unlike ChatGPT Agent or Claude, Gemini 2.5 only controls the browser, not the operating system. Google justifies this choice based on considerations of security and reliability. A restricted yet stable environment is preferred over complete access to your computer.
Additionally, Google claims that Gemini 2.5 outperforms competitors across several web and mobile benchmarks, such as Online-Mind2Web and WebVoyager, while exhibiting reduced latency. It’s difficult to verify these figures, but they confirm extensive optimizations for autonomous browsing.
How Does It Work In Practice?
Currently, the AI can perform 13 standard actions. It can open a web page, type text, click buttons, scroll through pages, drag and drop, submit forms, etc. This range covers most web needs, such as online shopping, filling out forms, interface testing, product comparisons… all without ever directly accessing your system.
Gemini 2.5 Computer Use blends natural language understanding with computer vision. When a user makes a request, for example, “book a hotel in Toulouse for this weekend“, the AI analyzes the request, opens Chrome, and captures and observes the screen.
Then, it determines the next action (clicking, filling out, confirming…) and verifies the result with another screen capture. This cycle repeats until the task is completed.
Google is already testing Gemini 2.5 Computer Use in various projects, including AI Mode and Project Mariner, where the AI performs autonomous actions in the browser. There are also demos where it plays 2048 and browses Hacker News to spot trending topics. The AI also fills out complex forms.
Official videos showcase an accelerated speed (3x), yet the sequences remain impressive. It requires no human intervention once the task is initiated.





