Skip to content

Computer Use

Computer use gives Claude a headless desktop environment inside the sandbox. Claude can take screenshots, click, type, scroll, and browse the web — all through MCP tools. You can watch live via VNC.

  • A sandbox with computer_use: true (Docker or Libvirt — not macOS)
  • The review section configured for Mini Apps (needed for VNC viewer)
contexts:
browser-tasks:
directory: /home/you/Documents/browser-project
description: "Browser automation"
allowed_tools:
- LSP
- AskUserQuestion
sandbox:
backend: docker
computer_use: true
review:
tunnel: cloudflared # needed for VNC Mini App

The computer-use Docker image (openshrimp-computer-use) extends the base image with a Wayland compositor, Chromium, and a terminal.

contexts:
browser-tasks:
directory: /home/you/Documents/browser-project
description: "Browser automation"
allowed_tools:
- LSP
- AskUserQuestion
sandbox:
backend: libvirt
computer_use: true

The sandbox runs a headless 1280x720 Wayland desktop with:

  • labwc — lightweight Wayland compositor
  • Chromium — web browser
  • foot — terminal emulator
  • wayvnc — VNC server for live viewing

When computer use is enabled, these MCP tools are registered automatically:

ToolDescription
computer_screenshotTake a PNG screenshot (1280x720). Sent to Telegram and returned for Claude to analyze.
computer_clickClick at (x, y) coordinates. Supports left, right, and middle buttons.
computer_typeType text character by character.
computer_keyPress a key or key combo (e.g. ctrl+a, alt+F4, super+d).
computer_scrollScroll at (x, y) in a direction (up/down/left/right).
computer_toplevelFocus a window by name (case-insensitive substring match).

Claude follows a screenshot-act loop:

  1. Take a screenshot to see the current state
  2. Decide what to do (click a button, type text, etc.)
  3. Perform the action
  4. Take another screenshot to verify the result

Screenshots are automatically sent to your Telegram chat so you can see what Claude sees.

Use the /vnc command to open the VNC viewer Mini App in Telegram. This gives you a live view of the desktop as Claude interacts with it.

/vnc

The VNC viewer uses noVNC and connects through a WebSocket proxy to the sandbox’s VNC server.

  • Screenshots via grim (Wayland screenshot tool)
  • Input via wlrctl (Wayland input simulation)
  • Window focus via wlrctl
  • VNC exposed on a dynamic port
  • Screenshots via the libvirt domain screenshot API
  • Input via QMP (QEMU Machine Protocol) — mouse events, key presses
  • Window focus not directly supported (use Alt+Tab or similar key combos)
  • VNC port auto-assigned from QEMU’s VNC server
  • Claude works best when you describe what you want it to do on the screen rather than giving pixel coordinates
  • For web tasks, you can ask Claude to open Chromium and navigate to a URL
  • Screenshots are 1280x720 — this is the desktop resolution Claude interacts with
  • If Claude gets stuck, you can connect via VNC and interact manually