concept

RL environments.

An environment is the world the computer-use agent acts in. For UseDesktop, that usually means a resettable workflow app, a deterministic seed, an observation contract, an action space, and hooks that let a grader inspect the final state.

What counts as an environment

The environment can be a mock app, a workflow twin, or a container-backed runtime. What matters is that it can be reset, instrumented, versioned, and scored. A screen recording alone is not an environment because it cannot accept new model actions.

Resettable state

Every rollout starts from a known seed so pass@k and verifier audits are comparable.

Observable surface

The agent may see screenshots, accessibility trees, DOM state, files, or app state.

Action boundary

The environment declares allowed actions such as click, type, key, scroll, file write, or API call.

Evidence hooks

The runtime exposes final state, event logs, screenshots, and artifacts to the grader.

Environment record

{
  "id": "korean-commerce-admin",
  "kind": "mock_app",
  "source": "real operator workflow pattern",
  "reset": {
    "strategy": "seeded_state_v1",
    "seed": "kca-seed-042"
  },
  "observation": ["screenshot", "visible_ui_state"],
  "action_space": ["click", "type", "key", "scroll"],
  "grader_hooks": ["state_snapshot", "event_log", "screenshot_sequence"]
}

Versioning

Environment versions should change when reset state, UI layout, action semantics, grader hooks, or source workflow assumptions change. UI-only changes still matter if they alter model behavior or verifier brittleness.

The environment page should make the quality story inspectable: source workflow, reset behavior, action space, grader contract, model results, and known failure modes.