OS-Level Control
OS-level control provides direct access to operating system primitives like mouse movements, keyboard input, and screen interactions within your browser sessions. This approach offers more precise control than traditional web automation methods and is particularly powerful when combined with AI agents and vision-based models.Why OS-Level Control?
Superior AI Agent Performance
Vision-based AI models perform significantly better when they can interact with the browser using the same primitives humans use:- Keyboard shortcuts work naturally:
Ctrl+F
for searching,Ctrl+L
for browser navbar interaction,Ctrl+T
for new tabs - OS-level UI elements: Dropdowns, context menus, and system dialogs that aren’t part of the webpage DOM.
- Visual coordinate targeting: AI agents can directly click on elements they see in screenshots
Core Capabilities - Beyond Traditional Web Automation
Control mouse interactions with pixel-level precision:Basic Click
Advanced Mouse Control
Drag and Drop
Perform complex drag and drop operations in a single command:Keyboard Input
Send text and keyboard shortcuts with human-like timing:Text Input
Keyboard Shortcuts
Scrolling
Control page scrolling with precision:node.js
python
Screenshots
Capture visual state for AI analysis:Clipboard Operations
Manage clipboard content programmatically:Reading Clipboard
Setting Clipboard
Navigation
Direct URL navigation at the OS level on the currently selected tab:AI Agent Integration Patterns
OpenAI Computer Use Integration
Anchor includes an integrated OpenAI Computer Use agent that leverages OS-level control for enhanced AI interactions. This agent can perform complex tasks by combining vision models with precise OS-level operations.python
Usage with OpenAI Models
The integrated computer use agent works seamlessly with OpenAI’s vision models:Limitations and Considerations
Session Requirements
- Headful Sessions Only: OS-level control requires a visible desktop environment
- Performance Impact: Screenshots and precise positioning may be slower than DOM-based automation
OS-level control opens up powerful possibilities for AI-driven browser automation, enabling more natural and effective interactions that mirror human behavior while providing the precision needed for reliable automation workflows.