OS-level control provides direct access to operating system primitives like mouse movements, keyboard input, and screen interactions within your browser sessions. This approach offers more precise control than traditional web automation methods and is particularly powerful when combined with AI agents and vision-based models.
// Take screenshot of current browser stateconst response = await axios.get(`https://api.anchorbrowser.io/v1/sessions/${sessionId}/screenshot`, { headers: { 'anchor-api-key': 'your-api-key' }, responseType: 'arraybuffer'});const imageBuffer = response.data;// Process screenshot with vision AI model
AnchorBrowser includes an integrated OpenAI Computer Use agent that leverages OS-level control for enhanced AI interactions. This agent can perform complex tasks by combining vision models with precise OS-level operations.
Copy
class AnchorBrowser(BasePlaywrightComputer): """ Computer implementation for Anchor browser (https://anchorbrowser.io) Uses OS-level control endpoints for browser automation within the container. IMPORTANT: The `goto` and navigation tools are already implemented and recommended when using the Anchor computer to help the agent navigate more effectively. """ def __init__(self, width: int = 1024, height: int = 900, session_id: str = None): """Initialize the Anchor browser session""" super().__init__() self.dimensions = (width, height) self.session_id = session_id self.base_url = "http://localhost:8484/api/os-control" def screenshot(self) -> str: """ Capture a screenshot using OS-level control API. Returns: str: A base64 encoded string of the screenshot for AI model consumption. """ try: response = requests.get(f"{self.base_url}/screenshot") if not response.ok: print(f"OS-level screenshot failed, falling back to standard screenshot") return super().screenshot() # OS-level API returns binary PNG data, encoded for AI models return base64.b64encode(response.content).decode('utf-8') except Exception as error: print(f"OS-level screenshot failed, falling back: {error}") return super().screenshot() def click(self, x: int, y: int, button: str = "left") -> None: """ Click at the specified coordinates using OS-level control. Args: x: The x-coordinate to click. y: The y-coordinate to click. button: The mouse button to use ('left', 'right'). """ try: response = requests.post( f"{self.base_url}/mouse/click", json={"x": x, "y": y, "button": button} ) if not response.ok: print(f"OS-level click failed, falling back to standard click") super().click(x, y, button) except Exception as error: print(f"OS-level click failed, falling back: {error}") super().click(x, y, button) def type(self, text: str) -> None: """ Type text using OS-level control with realistic delays. """ try: response = requests.post( f"{self.base_url}/keyboard/type", json={"text": text, "delay": 30} ) if not response.ok: print(f"OS-level type failed, falling back to standard type") super().type(text) except Exception as error: print(f"OS-level type failed, falling back: {error}") super().type(text) def keypress(self, keys: List[str]) -> None: """ Press keyboard shortcut using OS-level control. Args: keys: List of keys to press simultaneously (e.g., ['Ctrl', 'c']). """ try: response = requests.post( f"{self.base_url}/keyboard/shortcut", json={"keys": keys, "holdTime": 100} ) if not response.ok: print(f"OS-level keyboard shortcut failed, falling back") # Fallback to standard implementation for key in keys: self._page.keyboard.down(key) for key in reversed(keys): self._page.keyboard.up(key) except Exception as error: print(f"OS-level keyboard shortcut failed: {error}")
The integrated computer use agent works seamlessly with OpenAI’s vision models:
Copy
# Initialize the agent with your sessionagent = AnchorBrowser(width=1440, height=900, session_id="your-session-id")# The agent can now:# 1. Take screenshots for AI analysisscreenshot = agent.screenshot()# 2. Perform precise clicks based on AI visionagent.click(x=400, y=300)# 3. Type text naturallyagent.type("Hello from AI agent!")# 4. Execute keyboard shortcutsagent.keypress(['Ctrl', 'l']) # Focus address baragent.keypress(['Ctrl', 'f']) # Open search
The computer use integration provides automatic fallbacks to standard browser automation if OS-level operations aren’t available, ensuring reliability across different environments.
Headful Sessions Only: OS-level control requires a visible desktop environment
Performance Impact: Screenshots and precise positioning may be slower than DOM-based automation
OS-level control opens up powerful possibilities for AI-driven browser automation, enabling more natural and effective interactions that mirror human behavior while providing the precision needed for reliable automation workflows.
Assistant
Responses are generated using AI and may contain mistakes.