gemini-computer-use) enables AI-powered browser automation through visual understanding and multi-turn interactions.
Overview
Gemini Computer Use leverages Google’s multimodal AI capabilities to:- Process and understand web page screenshots
- Plan and execute multi-step browser interactions
- Handle complex visual layouts and dynamic content
- Integrate with Google’s AI ecosystem
Supported Models
| Model | Model ID | Best For |
|---|---|---|
| Gemini 2.5 Computer Use | gemini-2.5-computer-use-preview-10-2025 | Screenshot-based automation (default) |
Code Example
Configuration Options
| Parameter | Type | Description |
|---|---|---|
agent | string | Must be gemini-computer-use |
model | string | Gemini model to use (default: gemini-2.5-computer-use-preview-10-2025) |
url | string | Starting URL for the task |
max_steps | integer | Maximum actions the agent can take |
output_schema | object | JSON Schema for structured output |
secret_values | object | Secure credentials (see Secret Values) |
Secure Credentials with Secret Values
Gemini Computer Use fully supports secret values for secure credential handling. Secrets are never exposed to the AI model.Best Practices
- gemini-2.5-computer-use-preview is the default - optimized for screenshot-based automation
- Leverage structured output with
output_schemafor reliable data extraction - Provide clear, specific prompts describing the exact task to complete

