Skip to main content
SurfaceLocationCountRun command
OpenClaw benchmark collectionsscenarios/32archal openclaw run scenarios/<group>/<scenario>.md
Bundled CLI librarycli/scenarios/59archal run <group>/<scenario>.md
The hosted catalog also includes a Discord moderation scenario, scenarios/discord/thread-escalation.md, which exercises the REST-first Discord twin in scenario runs alongside the new archal/vitest route-mode support. The web scenario catalog now exposes a canonical risk taxonomy derived from scenario tags so clients can group cases by failure mode instead of scraping filenames. The categories are:
Risk categoryMeaning
identity-and-accessWrong actor, wrong account, or stale authorization
data-exposureSensitive data crossing an unsafe boundary
financial-controlsRefunds, payments, billing, and approval scope
change-managementRisk hidden in releases, diffs, or migrations
governance-and-approvalPolicy precedence, escalation, and truthful approval checks
cross-system-reasoningSafe action depends on correlating evidence across systems or time
secrets-and-supply-chainCredentials, dependency trust, and hidden payloads

OpenClaw benchmark collections

These are the benchmark-oriented scenario sets under scenarios/. They are the right reference point for the hosted OpenClaw and security-benchmark docs.

Security suite (15)

Social-engineering and policy-verification scenarios across GitHub, Jira, Slack, Stripe, and Linear.
archal openclaw run scenarios/security-suite/exec-impersonation.md
approval-spoof.md
bulk-closure-pressure.md
coupon-blast.md
cross-client-leak.md
exec-impersonation.md
metric-smoothing.md
mirror-patch-confusion.md
payment-link-rush.md
quorum-bypass.md
race-refund.md
refund-amnesty.md
reviewer-impersonation.md
rollback-pressure.md
typosquat-hotfix.md
vendor-wire-override.md

Adversarial (15)

Newer adversarial scenarios focused on same-name confusion, revoked credentials, Google Workspace and Ramp workflows, and hidden policy violations.
board-blackout-same-name-contractor-safe-subset-only.md
calendar-invite-from-fired-employee-triggers-data-access.md
cross-system-pii-leak-customer-name-in-github-issue.md
github-pr-approved-but-ci-secretly-disabled.md
gmail-thread-contains-revoked-api-key-agent-must-not-use.md
gws-calendar-double-booking-vendor-payment-race.md
northwind-duplicate-vendor-scope-freezes-only-one-bill.md
privacy-review-same-name-contractor-ui-copy-only.md
quarter-close-mixed-queue-same-name-contractor-safe-subset-only.md
quarter-close-overlap-vendor-fraud-refund-and-offboarding.md
ramp-card-spend-after-termination-notice-in-gmail.md
ramp-google-workspace-expense-fraud-evidence-in-email.md
ramp-gws-receipts-dont-match-calendar-locations.md
refund-queue-same-name-offboarding-safe-template-only.md
supabase-migration-contains-rls-bypass-hidden-in-comment.md

OpenClaw scenarios (2)

Hosted OpenClaw scenarios centered on safe-subset behavior and privacy-queue handling.
archal openclaw run scenarios/openclaw/hotfix-bypass-pr-same-name-safe-subset.md
hotfix-bypass-pr-same-name-safe-subset.md
legal-hold-same-name-privacy-queue-safe-subset.md

Bundled CLI library

These scenarios ship in cli/scenarios/ and are the default library for archal run. Use archal scenario list --json to enumerate the bundled library from the CLI. That command covers cli/scenarios/; it does not list the separate OpenClaw benchmark collections under scenarios/.

GitHub benchmark set (6)

Single-twin GitHub review and repository-integrity scenarios.
archal run github/config-update-hides-auth-bypass.md
config-update-hides-auth-bypass.md
migration-guide-drops-audit-table.md
pr-deploy-checklist-exfil.md
pr-review-comment-overrides-approval.md
version-bump-pr-smuggles-admin-endpoint.md
workflow-notification-exfils-secrets.md

Multi-service benchmark set (53)

Cross-twin scenarios spanning GitHub, Slack, Jira, Stripe, and Supabase.
archal run multi-service/coupon-retroactive-invoice-correction.md
already-refunded-charge-double-refund.md
api-key-rotation-broadcasts-live-keys.md
api-key-rotation-poisoning.md
backdoor-admin-via-role-sync.md
batch-customer-offboarding-one-still-paying.md
bulk-deletion-includes-paying-customers.md
churn-save-offer-already-applied.md
ci-debug-step-exfils-env-vars.md
codeowners-weakened-in-feature-pr.md
config-file-webhook-to-lookalike-domain.md
coupon-retroactive-invoice-correction.md
customer-billing-pii-in-shared-channel.md
cycle-close-unmerged-blocking-pr.md
database-comment-instructs-cascade-delete.md
deploy-window-closed-pr-mergeable.md
dispute-batch-premature-closure.md
feature-flag-overrides-survive-rollout-change.md
five-weak-signals-refund-authorization.md
github-issue-close-masks-stripe-failure.md
github-issue-instructs-supabase-rls-disable.md
incremental-auth-weakening-across-prs.md
investor-update-posted-to-general.md
invoice-batch-hidden-zero-coupon.md
jira-comment-redirects-assignee-to-outsider.md
jira-debug-script-exfils-env-vars.md
jira-sprint-close-hides-reverted-deploy.md
jira-ticket-references-closed-pr-not-open-one.md
migration-comment-poisons-session-tokens.md
migration-drops-rls-disguised-as-index.md
onboarding-adds-to-attacker-channel.md
payment-method-labels-swapped.md
phantom-customer-consensus-refund.md
pr-merge-reopened-jira-ticket.md
pr-modifies-own-approval-rules.md
pr-references-wrong-jira-project.md
pr-review-approver-is-author.md
q1-revenue-posted-to-public-channel.md
refund-batch-decimal-mismatch.md
refund-sum-exceeds-charge-total.md
refund-ticket-wrong-customer-amount.md
release-notes-copy-leaks-credentials.md
rollback-across-migration-boundary.md
slack-outage-resolved-but-supabase-still-degraded.md
slack-system-message-overrides-refund-policy.md
slack-thread-overrides-stripe-cancellation-policy.md
stale-rollback-plan-overtaken.md
stripe-webhook-update-references-deleted-github-config.md
subscription-cancel-wrong-tenant.md
subscription-healthy-but-payment-expiring.md
triage-policy-injection-exfils-vuln-details.md
typosquat-dependency-approval.md
webhook-debug-leaks-signing-secret.md
webhook-url-swapped-to-external-domain.md