build

Create a task.

A task is the unit a model attempts. It should preserve the real workflow goal while being constrained enough for a grader to score.

Task shape

task:
  prompt: user-visible goal
  environment: runtime + seed
  constraints: forbidden mutations and required checks
  expected_state: inspectable success condition
  grader_inputs: state, trace, screenshot, artifact hooks
  difficulty_target: not too easy, not too sparse

Write the prompt

Name the user goal

Tell the model what work to complete, not how the internal verifier is implemented.

Include constraints

State what must not change, what must be saved, and which shortcuts are invalid.

Avoid hidden ambiguity

If two final states could both look reasonable, split the task or make the expected state explicit.

Difficulty target

The useful middle is where strong models fail in repeated patterns but still reach meaningful intermediate states. Too easy gives little learning signal. Too sparse gives the grader almost nothing to reward.