Repository inventory

patrickporto/desktop-agent

Skills indexed from this repository, with install-style signals scoped to the repo.

1 skills0 GitHub stars0 weekly installsPythonGitHub Owner profile

Overview

This skill exposes desktop automation capabilities for controlling the mouse, keyboard, taking screenshots, performing OCR, and managing applications. It provides a CLI interface that returns structured JSON, letting AI agents perform reliable, auditable desktop actions across Windows, macOS, and Linux. The skill emphasizes safe workflows: observe first, then act.

How this skill works

Commands are invoked via a single CLI pattern (uvx desktop-agent <category> <command> ...) and return JSON responses with success, data, and error fields. Categories include mouse, keyboard, screen, message, and app; actions range from moving/clicking the mouse and typing keys to locating images, reading text with OCR, taking screenshots, and opening or focusing applications. The skill uses PyAutoGUI-style operations and supports options like durations, confidence thresholds, regions, and window targeting.

When to use it

Automating repetitive desktop tasks: form filling, copy/paste, or batch UI interactions.
Testing GUI workflows or validating UI presence via image or text recognition.
Taking screenshots for logging, analysis, or evidence of automated steps.
Opening, focusing, or listing application windows before sending input.
Showing dialogs to ask for user confirmation before destructive actions.

Best practices

Observe the environment first: run screen size, app list, or locate commands before acting.
Use window-targeted or regional screenshots to speed up image searches on large displays.
Add delays between steps and prefer animated mouse moves with short durations for reliability.
Validate image files exist and set an appropriate confidence (0.7–0.9) when using locate.
Handle failures gracefully: check command JSON output and confirm recoverable errors before retrying.

Example use cases

Open Notepad, focus the window, and type a template message using keyboard write.
Locate a Save button image in an application window and click its center using locate-center then mouse click.
Capture an active-window screenshot for diagnostics and store the filename in automation logs.
Fill a multi-field form by clicking each field and typing text, using tab navigation only when field order is stable.
Copy selected text from one app and paste it into another using keyboard hotkeys and mouse clicks to focus targets.

FAQ

Query screen size first and use window-specific or relative offsets rather than hard-coded absolute coordinates.

What if an image locate command fails?

Verify the image path, reduce the search region, adjust confidence, and confirm the target window is active before retrying.

1 skills

desktop-agent

Automation

This skill enables desktop automation by controlling mouse, keyboard, and screen for efficient UI tasks across apps.

CliProductivityPythonScripting+2