Publish an eval.
Publishing is the path from local package authoring to a public page that a researcher can inspect: prompt, env, grader, run evidence, and quality controls.
Checklist
Define reset state, observation space, action space, source workflow, and version.
Write prompts, constraints, expected outcomes, and seeds for each task.
Add final-state checks, process checks, violation checks, and audit notes.
Collect traces, pass@k summaries, scores, rewards, and failure modes.
Export the manifest, attach artifacts, and render public pages for evals and docs.
Minimum evidence
Public packages should include at least one task demo, one grader contract, one model run, pass@k summary, verifier FP/FN notes, and known failure modes. Without that, the page reads like a dataset claim rather than evidence.
eval ready when:
- reset succeeds from a known seed
- human can solve the task
- verifier rejects at least one known bad attempt
- verifier accepts at least one known good attempt
- model run traces are linked to scores
- contamination notes are written Share preview
Share the most specific URL. Use an environment page for package-level context, a task page for prompt/env/grader review, a run page for failure evidence, and a model page for comparison.