Skip to main content
Google’s Gemini Computer Use agent (gemini-computer-use) enables AI-powered browser automation through visual understanding and multi-turn interactions.

Overview

Gemini Computer Use leverages Google’s multimodal AI capabilities to:
  • Process and understand web page screenshots
  • Plan and execute multi-step browser interactions
  • Handle complex visual layouts and dynamic content
  • Integrate with Google’s AI ecosystem

Supported Models

ModelModel IDBest For
Gemini 2.5 Computer Usegemini-2.5-computer-use-preview-10-2025Screenshot-based automation (default)

Code Example

import Anchorbrowser from 'anchorbrowser';

const anchorClient = new Anchorbrowser({
  apiKey: process.env.ANCHORBROWSER_API_KEY
});

const response = await anchorClient.agent.task(
  'Search for the latest AI news and summarize the top 3 articles',
  {
    taskOptions: {
      url: 'https://news.google.com',
      agent: 'gemini-computer-use',
      // model: 'gemini-2.5-computer-use-preview-10-2025',  // Default model
      maxSteps: 25,
      outputSchema: {
        type: 'object',
        properties: {
          articles: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                title: { type: 'string' },
                summary: { type: 'string' },
                source: { type: 'string' }
              }
            }
          }
        }
      }
    }
  }
);

console.log(response);

Configuration Options

ParameterTypeDescription
agentstringMust be gemini-computer-use
modelstringGemini model to use (default: gemini-2.5-computer-use-preview-10-2025)
urlstringStarting URL for the task
max_stepsintegerMaximum actions the agent can take
output_schemaobjectJSON Schema for structured output
secret_valuesobjectSecure credentials (see Secret Values)

Secure Credentials with Secret Values

Gemini Computer Use fully supports secret values for secure credential handling. Secrets are never exposed to the AI model.
const response = await anchorClient.agent.task(
  'Login to the dashboard and download my latest report',
  {
    taskOptions: {
      url: 'https://app.example.com/login',
      agent: 'gemini-computer-use',
      secretValues: {
        EMAIL: process.env.APP_EMAIL,
        PASSWORD: process.env.APP_PASSWORD
      }
    }
  }
);
Learn more about domain-scoped secrets and TOTP support.

Best Practices

  • gemini-2.5-computer-use-preview is the default - optimized for screenshot-based automation
  • Leverage structured output with output_schema for reliable data extraction
  • Provide clear, specific prompts describing the exact task to complete