Introduction
UseDesktop is infrastructure for computer-use agents. You can create verifiable RL environments for CUA, run evals, train models, compare model runs, and export evidence-backed workflow packages.
What you can do
Build resettable RL environments and task packages from real computer-use workflows.
Run CUA models against tasks with grader contracts, traces, scores, and pass@k results.
Use verified workflow packages as SFT/RL data and compare model improvements.
Package environments, tasks, graders, model runs, audits, and provenance for review.
Start here
Quickstart
Build the first reviewable package: one workflow, one task, one grader, and one model run.
Concepts in 5 minutes
Understand environment, task, grader, run, audit, and export without reading every page.
Artifact schema
Use the portable manifest that connects capture, evals, training, and export.
Build path
Capture workflow
Record the source trace, screenshots, artifacts, decisions, and final outcome.
Create task
Convert a workflow into a prompt, seed, constraints, expected state, and difficulty target.
Write grader
Score final state, process evidence, violations, and known reward-hacking paths.
Run model
Collect traces, scores, rewards, pass@k results, and failure evidence across models.
Export package
Package manifests, artifacts, audits, and provenance for evals or customer review.
Publish eval
Turn a package into a public evidence page with inspectable grader and run records.
Package shape
The public eval pages are the human-readable layer. The export is the machine-readable contract that should be runnable in local, RunPod, AWS, or customer infrastructure.
{
"environment": {
"id": "korean-commerce-admin",
"reset": "seeded_state_v1",
"action_space": ["click", "type", "key", "scroll"]
},
"task": {
"prompt": "Change the listed price to 29,900 KRW and publish.",
"constraints": ["do_not_change_inventory", "save_required"]
},
"grader": {
"type": "state_and_process_v1",
"success": ["price_field_equals_29900", "publish_state_true"]
},
"evidence": {
"runs": ["pass@1", "pass@3", "pass@5"],
"audits": ["verifier_fp", "verifier_fn", "known_loopholes"]
}
} Quality story
A workflow package should not only look realistic. It should carry evidence: task solvability, ambiguity checks, verifier false-positive and false-negative audits, model pass@k distributions, failure traces, and contamination notes.
The app state and runtime boundary the model is placed inside.
The goal, constraints, start condition, and expected outcome.
The scoring function that turns a model attempt into quantitative evidence.
A model attempt with trace, score, reward, verdict, and failure notes.