Skip to main content
Google’s Gemini Computer Use agent (gemini-computer-use) enables AI-powered browser automation through visual understanding and multi-turn interactions.

Overview

Gemini Computer Use leverages Google’s multimodal AI capabilities to:
  • Process and understand web page screenshots
  • Plan and execute multi-step browser interactions
  • Handle complex visual layouts and dynamic content
  • Integrate with Google’s AI ecosystem

Supported Models

ModelModel IDBest For
Gemini 2.5 Computer Usegemini-2.5-computer-use-preview-10-2025Screenshot-based automation (default)

Code Example

import Anchorbrowser from 'anchorbrowser';

const anchorClient = new Anchorbrowser({
  apiKey: process.env.ANCHORBROWSER_API_KEY
});

const response = await anchorClient.agent.task(
  'Search for the latest AI news and summarize the top 3 articles',
  {
    taskOptions: {
      url: 'https://news.google.com',
      agent: 'gemini-computer-use',
      // model: 'gemini-2.5-computer-use-preview-10-2025',  // Default model
      maxSteps: 25,
      outputSchema: {
        type: 'object',
        properties: {
          articles: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                title: { type: 'string' },
                summary: { type: 'string' },
                source: { type: 'string' }
              }
            }
          }
        }
      }
    }
  }
);

console.log(response);

Configuration Options

ParameterTypeDescription
agentstringMust be gemini-computer-use
modelstringGemini model to use (default: gemini-2.5-computer-use-preview-10-2025)
urlstringStarting URL for the task
max_stepsintegerMaximum actions the agent can take
output_schemaobjectJSON Schema for structured output

Best Practices

  • gemini-2.5-computer-use-preview is the default - optimized for screenshot-based automation
  • Leverage structured output with output_schema for reliable data extraction
  • Provide clear, specific prompts describing the exact task to complete