Gemini 2.5 Computer Use — Preview Access, Benchmarks, Loop & Safety
Interactive overview with facts, quick quiz, and handy links
Availability: The model is in public preview via the Gemini API in Google AI Studio and Vertex AI.
What it does: Through the computer_use
tool, tasks are completed by analyzing a screenshot and emitting UI actions (click, type, scroll, drag, work with forms and dropdowns, operate behind logins). Loop execution continues until completion or termination.
Benchmarks & latency: Reported results include 70%+ accuracy with ~225s latency on the Browserbase harness for Online‑Mind2Web, with leading results also reported on WebVoyager and AndroidWorld. Source announcement provides the benchmark table and scatterplot.
Where to learn more: See the Google DeepMind announcement for the model flow, safety notes, demos, and evaluation references.
70%+
Accuracy on Online‑Mind2Web (Browserbase harness)
≈ 225s
Latency reported for the same harness
Leads
Online‑Mind2Web • WebVoyager • AndroidWorld
Preview
Accessible in AI Studio & Vertex AI
1) Inputs
User task, screenshot of the UI, recent actions, optional allow/deny list for functions.
2) Propose Action
Model returns a function call (click, type, scroll). Some actions may request user confirmation.
3) Execute
Client code runs the action in the browser or app.
4) Update Context
New screenshot and current URL are sent back to continue or finish the loop.
“From https://tinyurl.com/pet-care-signup, get all details for any pet with a California residency and add them as a guest in my spa CRM at https://pet-luxe-spa.web.app/. Then, set up a follow up visit appointment with the specialist Anima Lavar for October 10th anytime after 8am. The reason for the visit is the same as their requested treatment.”
“My art club brainstormed tasks ahead of our fair. The board is chaotic and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.”
Quick Quiz
Q1. Which Gemini API tool emits the UI actions?
Q2. Reported Online‑Mind2Web accuracy band?
Q3. What is required for certain sensitive actions?
This section covered preview access, the action loop, referenced benchmarks and latency, demo prompts, safety controls, and links for further reading.