Merge commit '204af77596345c120e25ce9d433dad0676a59b37'

map branch recovery verification evidence
Record why the G005 branch-recovery work satisfies the roadmap pinpoints without touching leader-owned Ultragoal state. Constraint: Task 2 requested ROADMAP.md/plan pinpoint mapping and explicitly forbids .omx/ultragoal mutation. Rejected: leader-only mailbox note | the task prefers a repo-local docs/g005 verification map when unclaimed and absent. Confidence: high Scope-risk: narrow Directive: Keep this map evidence-only; do not treat it as a substitute for leader Ultragoal checkpoints. Tested: documentation-only map cross-checked against ROADMAP.md, prd.json, and task-1 verification output. Not-tested: no code tests rerun after documentation-only commit.
2026-07-04 00:56:26 +02:00 · 2026-05-14 21:43:23 +09:00 · 2026-05-14 18:40:16 +09:00 · 2026-05-14 18:38:22 +09:00 · 2026-05-14 18:33:51 +09:00 · 2026-05-14 18:33:48 +09:00
10 changed files with 899 additions and 52 deletions
@@ -0,0 +1,40 @@
+# G005 Branch Recovery Verification Map
+
+Scope: worker-1 follow-up map for G005 branch/test awareness and recovery. This file intentionally does not mutate leader-owned `.omx/ultragoal` state.
+
+## Covered ROADMAP / PRD pinpoints
+
+- `ROADMAP.md:912-921` — Phase 3 §7 stale-branch detection before broad verification: broad workspace test commands are preflighted before execution, stale/diverged branches emit `branch.stale_against_main`, and targeted tests bypass the broad-test gate.
+- `ROADMAP.md:922-933` — Phase 3 §8 recovery recipes: stale-branch recovery remains represented by the `stale_branch` recipe, with one automatic attempt before escalation.
+- `ROADMAP.md:935-949` — Phase 3 §8.5 recovery attempt ledger: `RecoveryContext` now exposes ledger entries with recipe id, attempt count, state, started/finished markers, last failure summary, and escalation reason.
+- `ROADMAP.md:951-970` — Phase 3 §9 green-ness / hung-test reporting: timed-out test commands now classify as `test.hung` with structured provenance instead of generic timeout.
+- `prd.json:37-44` — US-003 stale-branch detection before broad verification: verified through the `workspace_test_branch_preflight` broad-test block and targeted-test bypass tests.
+- `prd.json:50-57` — US-004 recovery recipes with ledger: verified through recovery ledger unit coverage and serialization-compatible recovery structs.
+
+## Implementation anchors
+
+- `rust/crates/runtime/src/stale_branch.rs` — existing branch freshness model and policy actions for fresh, stale, and diverged branches.
+- `rust/crates/tools/src/lib.rs` — `workspace_test_branch_preflight`, `branch_divergence_output`, Bash/PowerShell broad-test gating, and `test.hung` structured timeout provenance on tool-shell timeouts.
+- `rust/crates/runtime/src/recovery_recipes.rs` — recovery recipes plus `RecoveryLedgerEntry` / `RecoveryAttemptState` ledger surface.
+- `rust/crates/runtime/src/bash.rs` — runtime Bash timeout classification and structured provenance for hung test commands.
+- `rust/crates/runtime/src/lib.rs` — public exports for the recovery ledger types.
+
+## Verification evidence
+
+- `cargo test -p runtime` → PASS: 538 unit tests, 2 G004 conformance tests, 12 integration tests, and doctests passed.
+- `cargo test -p tools bash_tool_classifies_test_timeout_as_hung_with_provenance -- --nocapture` → PASS.
+- `cargo test -p tools bash_workspace_tests_are_blocked_when_branch_is_behind_main -- --nocapture` → PASS.
+- `cargo test -p tools bash_targeted_tests_skip_branch_preflight -- --nocapture` → PASS.
+- `cargo check -p runtime -p tools` → PASS.
+- `cargo clippy -p runtime --all-targets -- -D warnings` → PASS.
+- `cargo clippy -p tools --lib --no-deps -- -D warnings` → PASS.
+
+## Known unresolved / out-of-scope items
+
+- Full `cargo test -p tools` is still red on six permission-enforcer expectation tests unrelated to G005 branch freshness, recovery ledger, or hung-test classification. The failing tests assert old permission wording/read-only behavior and pre-existed this follow-up scope.
+- ROADMAP stale-base JSON/doctor/status pinpoints remain broader CLI diagnostic-surface work, especially `ROADMAP.md:2425-2489`, `ROADMAP.md:4346-4431`, and `ROADMAP.md:5061-5086`. They are related to branch freshness, but task 1 only required the broad-test freshness gate and narrow reporting surfaces.
+- No `.omx/ultragoal` files were changed; leader-owned Ultragoal checkpointing remains outside worker scope.
+
+## Delegation evidence
+
+Subagent spawn evidence: 1, Repository map probe `019e25d5-9be9-7193-8a33-f21450beb62c`; spawned before further serial task-2 mapping per contract, but errored with 429 Too Many Requests, so direct repo evidence was integrated instead.
@@ -4,6 +4,7 @@ use std::process::{Command, Stdio};
 use std::time::Duration;

 use serde::{Deserialize, Serialize};
+use serde_json::json;
 use tokio::process::Command as TokioCommand;
 use tokio::runtime::Builder;
 use tokio::time::timeout;
@@ -176,27 +177,10 @@ async fn execute_bash_async(
    let mut command = prepare_tokio_command(&input.command, &cwd, &sandbox_status, true);

    let output_result = if let Some(timeout_ms) = input.timeout {
-        match timeout(Duration::from_millis(timeout_ms), command.output()).await {
-            Ok(result) => (result?, false),
-            Err(_) => {
-                return Ok(BashCommandOutput {
-                    stdout: String::new(),
-                    stderr: format!("Command exceeded timeout of {timeout_ms} ms"),
-                    raw_output_path: None,
-                    interrupted: true,
-                    is_image: None,
-                    background_task_id: None,
-                    backgrounded_by_user: None,
-                    assistant_auto_backgrounded: None,
-                    dangerously_disable_sandbox: input.dangerously_disable_sandbox,
-                    return_code_interpretation: Some(String::from("timeout")),
-                    no_output_expected: Some(true),
-                    structured_content: None,
-                    persisted_output_path: None,
-                    persisted_output_size: None,
-                    sandbox_status: Some(sandbox_status),
-                });
-            }
+        if let Ok(result) = timeout(Duration::from_millis(timeout_ms), command.output()).await {
+            (result?, false)
+        } else {
+            return Ok(timeout_output(&input, timeout_ms, sandbox_status));
        }
    } else {
        (command.output().await?, false)
@@ -233,6 +217,67 @@ async fn execute_bash_async(
    })
 }

+fn timeout_output(
+    input: &BashCommandInput,
+    timeout_ms: u64,
+    sandbox_status: SandboxStatus,
+) -> BashCommandOutput {
+    let is_test = is_test_command(&input.command);
+    let return_code_interpretation = if is_test { "test.hung" } else { "timeout" };
+    BashCommandOutput {
+        stdout: String::new(),
+        stderr: format!("Command exceeded timeout of {timeout_ms} ms"),
+        raw_output_path: None,
+        interrupted: true,
+        is_image: None,
+        background_task_id: None,
+        backgrounded_by_user: None,
+        assistant_auto_backgrounded: None,
+        dangerously_disable_sandbox: input.dangerously_disable_sandbox,
+        return_code_interpretation: Some(String::from(return_code_interpretation)),
+        no_output_expected: Some(true),
+        structured_content: Some(vec![test_timeout_provenance(
+            &input.command,
+            timeout_ms,
+            is_test,
+        )]),
+        persisted_output_path: None,
+        persisted_output_size: None,
+        sandbox_status: Some(sandbox_status),
+    }
+}
+
+fn is_test_command(command: &str) -> bool {
+    let normalized = command
+        .split_whitespace()
+        .collect::<Vec<_>>()
+        .join(" ")
+        .to_ascii_lowercase();
+    normalized.contains("cargo test")
+        || normalized.contains("cargo nextest")
+        || normalized.contains("npm test")
+        || normalized.contains("pnpm test")
+        || normalized.contains("yarn test")
+        || normalized.contains("pytest")
+}
+
+fn test_timeout_provenance(
+    command: &str,
+    timeout_ms: u64,
+    classified_as_test_hang: bool,
+) -> serde_json::Value {
+    json!({
+        "event": if classified_as_test_hang { "test.hung" } else { "command.timeout" },
+        "failureClass": if classified_as_test_hang { "test_hang" } else { "timeout" },
+        "data": {
+            "command": command,
+            "timeoutMs": timeout_ms,
+            "provenance": "bash.timeout",
+            "classification": if classified_as_test_hang { "test.hung" } else { "timeout" }
+        }
+    })
+}
+
 fn sandbox_status_for_input(input: &BashCommandInput, cwd: &std::path::Path) -> SandboxStatus {
    let config = ConfigLoader::default_for(cwd).load().map_or_else(
        |_| SandboxConfig::default(),
@@ -349,6 +394,31 @@ mod tests {

        assert!(!output.sandbox_status.expect("sandbox status").enabled);
    }
+
+    #[test]
+    fn timed_out_test_command_is_classified_as_hung_test_with_provenance() {
+        let output = execute_bash(BashCommandInput {
+            command: String::from("sleep 1 # cargo test slow_case"),
+            timeout: Some(1),
+            description: None,
+            run_in_background: Some(false),
+            dangerously_disable_sandbox: Some(false),
+            namespace_restrictions: Some(false),
+            isolate_network: Some(false),
+            filesystem_mode: Some(FilesystemIsolationMode::WorkspaceOnly),
+            allowed_mounts: None,
+        })
+        .expect("bash command should return structured timeout");
+
+        assert!(output.interrupted);
+        assert_eq!(
+            output.return_code_interpretation.as_deref(),
+            Some("test.hung")
+        );
+        let structured = output.structured_content.expect("structured content");
+        assert_eq!(structured[0]["event"], "test.hung");
+        assert_eq!(structured[0]["data"]["provenance"], "bash.timeout");
+    }
 }

 /// Maximum output bytes before truncation (16 KiB, matching upstream).
@@ -27,19 +27,38 @@ impl std::fmt::Display for GreenLevel {
    }
 }

-#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
 pub struct GreenContract {
    pub required_level: GreenLevel,
+    pub requirements: Vec<GreenContractRequirement>,
+    pub block_known_flakes: bool,
 }

 impl GreenContract {
    #[must_use]
    pub fn new(required_level: GreenLevel) -> Self {
-        Self { required_level }
+        Self {
+            required_level,
+            requirements: Vec::new(),
+            block_known_flakes: false,
+        }
    }

    #[must_use]
-    pub fn evaluate(self, observed_level: Option<GreenLevel>) -> GreenContractOutcome {
+    pub fn merge_ready(required_level: GreenLevel) -> Self {
+        Self {
+            required_level,
+            requirements: vec![
+                GreenContractRequirement::TestCommandProvenance,
+                GreenContractRequirement::BaseBranchFreshness,
+                GreenContractRequirement::RecoveryAttemptContext,
+            ],
+            block_known_flakes: true,
+        }
+    }
+
+    #[must_use]
+    pub fn evaluate(&self, observed_level: Option<GreenLevel>) -> GreenContractOutcome {
        match observed_level {
            Some(level) if level >= self.required_level => GreenContractOutcome::Satisfied {
                required_level: self.required_level,
@@ -53,11 +72,170 @@ impl GreenContract {
    }

    #[must_use]
-    pub fn is_satisfied_by(self, observed_level: GreenLevel) -> bool {
+    pub fn evaluate_evidence(&self, evidence: &GreenEvidence) -> GreenEvidenceOutcome {
+        let mut missing = Vec::new();
+        let mut blocking_flakes = Vec::new();
+
+        if evidence.observed_level < self.required_level {
+            missing.push(GreenContractRequirement::RequiredLevel);
+        }
+
+        for requirement in &self.requirements {
+            match requirement {
+                GreenContractRequirement::TestCommandProvenance
+                    if !evidence.has_passing_test_command() =>
+                {
+                    missing.push(*requirement);
+                }
+                GreenContractRequirement::BaseBranchFreshness if !evidence.base_branch_fresh => {
+                    missing.push(*requirement);
+                }
+                GreenContractRequirement::RecoveryAttemptContext
+                    if !evidence.recovery_attempt_context_recorded =>
+                {
+                    missing.push(*requirement);
+                }
+                _ => {}
+            }
+        }
+
+        if self.block_known_flakes {
+            blocking_flakes = evidence
+                .known_flakes
+                .iter()
+                .filter(|flake| flake.blocks_green)
+                .cloned()
+                .collect();
+        }
+
+        if missing.is_empty() && blocking_flakes.is_empty() {
+            GreenEvidenceOutcome::Satisfied {
+                required_level: self.required_level,
+                observed_level: evidence.observed_level,
+            }
+        } else {
+            GreenEvidenceOutcome::Unsatisfied {
+                required_level: self.required_level,
+                observed_level: evidence.observed_level,
+                missing,
+                blocking_flakes,
+            }
+        }
+    }
+
+    #[must_use]
+    pub fn is_satisfied_by(&self, observed_level: GreenLevel) -> bool {
        observed_level >= self.required_level
    }
 }

+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct GreenEvidence {
+    pub observed_level: GreenLevel,
+    pub test_commands: Vec<TestCommandProvenance>,
+    pub base_branch_fresh: bool,
+    pub known_flakes: Vec<KnownFlake>,
+    pub recovery_attempt_context_recorded: bool,
+}
+
+impl GreenEvidence {
+    #[must_use]
+    pub fn new(observed_level: GreenLevel) -> Self {
+        Self {
+            observed_level,
+            test_commands: Vec::new(),
+            base_branch_fresh: false,
+            known_flakes: Vec::new(),
+            recovery_attempt_context_recorded: false,
+        }
+    }
+
+    #[must_use]
+    pub fn with_test_command(mut self, command: impl Into<String>, exit_code: i32) -> Self {
+        self.test_commands.push(TestCommandProvenance {
+            command: command.into(),
+            exit_code,
+        });
+        self
+    }
+
+    #[must_use]
+    pub fn with_base_branch_fresh(mut self, is_fresh: bool) -> Self {
+        self.base_branch_fresh = is_fresh;
+        self
+    }
+
+    #[must_use]
+    pub fn with_known_flake(mut self, test_name: impl Into<String>, blocks_green: bool) -> Self {
+        self.known_flakes.push(KnownFlake {
+            test_name: test_name.into(),
+            blocks_green,
+        });
+        self
+    }
+
+    #[must_use]
+    pub fn with_recovery_attempt_context(mut self, recorded: bool) -> Self {
+        self.recovery_attempt_context_recorded = recorded;
+        self
+    }
+
+    #[must_use]
+    pub fn has_passing_test_command(&self) -> bool {
+        self.test_commands.iter().any(TestCommandProvenance::passed)
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct TestCommandProvenance {
+    pub command: String,
+    pub exit_code: i32,
+}
+
+impl TestCommandProvenance {
+    #[must_use]
+    pub fn passed(&self) -> bool {
+        self.exit_code == 0 && !self.command.trim().is_empty()
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct KnownFlake {
+    pub test_name: String,
+    pub blocks_green: bool,
+}
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum GreenContractRequirement {
+    RequiredLevel,
+    TestCommandProvenance,
+    BaseBranchFreshness,
+    RecoveryAttemptContext,
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(tag = "outcome", rename_all = "snake_case")]
+pub enum GreenEvidenceOutcome {
+    Satisfied {
+        required_level: GreenLevel,
+        observed_level: GreenLevel,
+    },
+    Unsatisfied {
+        required_level: GreenLevel,
+        observed_level: GreenLevel,
+        missing: Vec<GreenContractRequirement>,
+        blocking_flakes: Vec<KnownFlake>,
+    },
+}
+
+impl GreenEvidenceOutcome {
+    #[must_use]
+    pub fn is_satisfied(&self) -> bool {
+        matches!(self, Self::Satisfied { .. })
+    }
+}
+
 #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
 #[serde(tag = "outcome", rename_all = "snake_case")]
 pub enum GreenContractOutcome {
@@ -149,4 +327,83 @@ mod tests {
            }
        );
    }
+    #[test]
+    fn merge_ready_contract_requires_provenance_beyond_test_level() {
+        // given
+        let contract = GreenContract::merge_ready(GreenLevel::Workspace);
+        let evidence = GreenEvidence::new(GreenLevel::Workspace)
+            .with_test_command("cargo test --manifest-path rust/Cargo.toml", 0);
+
+        // when
+        let outcome = contract.evaluate_evidence(&evidence);
+
+        // then
+        assert_eq!(
+            outcome,
+            GreenEvidenceOutcome::Unsatisfied {
+                required_level: GreenLevel::Workspace,
+                observed_level: GreenLevel::Workspace,
+                missing: vec![
+                    GreenContractRequirement::BaseBranchFreshness,
+                    GreenContractRequirement::RecoveryAttemptContext,
+                ],
+                blocking_flakes: vec![],
+            }
+        );
+        assert!(!outcome.is_satisfied());
+    }
+
+    #[test]
+    fn merge_ready_contract_accepts_complete_test_provenance_context() {
+        // given
+        let contract = GreenContract::merge_ready(GreenLevel::Workspace);
+        let evidence = GreenEvidence::new(GreenLevel::MergeReady)
+            .with_test_command("cargo test --manifest-path rust/Cargo.toml", 0)
+            .with_base_branch_fresh(true)
+            .with_recovery_attempt_context(true);
+
+        // when
+        let outcome = contract.evaluate_evidence(&evidence);
+
+        // then
+        assert_eq!(
+            outcome,
+            GreenEvidenceOutcome::Satisfied {
+                required_level: GreenLevel::Workspace,
+                observed_level: GreenLevel::MergeReady,
+            }
+        );
+    }
+
+    #[test]
+    fn known_blocking_flake_prevents_green_contract_satisfaction() {
+        // given
+        let contract = GreenContract::merge_ready(GreenLevel::Workspace);
+        let evidence = GreenEvidence::new(GreenLevel::MergeReady)
+            .with_test_command("cargo test --manifest-path rust/Cargo.toml", 0)
+            .with_base_branch_fresh(true)
+            .with_recovery_attempt_context(true)
+            .with_known_flake(
+                "session_lifecycle_prefers_running_process_over_idle_shell",
+                true,
+            );
+
+        // when
+        let outcome = contract.evaluate_evidence(&evidence);
+
+        // then
+        assert_eq!(
+            outcome,
+            GreenEvidenceOutcome::Unsatisfied {
+                required_level: GreenLevel::Workspace,
+                observed_level: GreenLevel::MergeReady,
+                missing: vec![],
+                blocking_flakes: vec![KnownFlake {
+                    test_name: "session_lifecycle_prefers_running_process_over_idle_shell"
+                        .to_string(),
+                    blocks_green: true,
+                }],
+            }
+        );
+    }
 }
@@ -143,8 +143,9 @@ pub use prompt::{
    PromptBuildError, SystemPromptBuilder, FRONTIER_MODEL_NAME, SYSTEM_PROMPT_DYNAMIC_BOUNDARY,
 };
 pub use recovery_recipes::{
-    attempt_recovery, recipe_for, EscalationPolicy, FailureScenario, RecoveryContext,
-    RecoveryEvent, RecoveryRecipe, RecoveryResult, RecoveryStep,
+    attempt_recovery, recipe_for, EscalationPolicy, FailureScenario, RecoveryAttemptState,
+    RecoveryAttemptType, RecoveryCommandResult, RecoveryContext, RecoveryEvent,
+    RecoveryLedgerEntry, RecoveryRecipe, RecoveryResult, RecoveryStatusReport, RecoveryStep,
 };
 pub use remote::{
    inherited_upstream_proxy_env, no_proxy_list, read_token, upstream_proxy_ws_url,
@@ -58,7 +58,9 @@ impl PolicyCondition {
            Self::Or(conditions) => conditions
                .iter()
                .any(|condition| condition.matches(context)),
-            Self::GreenAt { level } => context.green_level >= *level,
+            Self::GreenAt { level } => {
+                context.green_contract_satisfied && context.green_level >= *level
+            }
            Self::StaleBranch => context.branch_freshness >= STALE_BRANCH_THRESHOLD,
            Self::StartupBlocked => context.blocker == LaneBlocker::Startup,
            Self::LaneCompleted => context.completed,
@@ -134,6 +136,7 @@ pub enum DiffScope {
 pub struct LaneContext {
    pub lane_id: String,
    pub green_level: GreenLevel,
+    pub green_contract_satisfied: bool,
    pub branch_freshness: Duration,
    pub blocker: LaneBlocker,
    pub review_status: ReviewStatus,
@@ -156,6 +159,7 @@ impl LaneContext {
        Self {
            lane_id: lane_id.into(),
            green_level,
+            green_contract_satisfied: false,
            branch_freshness,
            blocker,
            review_status,
@@ -171,6 +175,7 @@ impl LaneContext {
        Self {
            lane_id: lane_id.into(),
            green_level: 0,
+            green_contract_satisfied: false,
            branch_freshness: Duration::from_secs(0),
            blocker: LaneBlocker::None,
            review_status: ReviewStatus::Pending,
@@ -179,6 +184,12 @@ impl LaneContext {
            reconciled: true,
        }
    }
+
+    #[must_use]
+    pub fn with_green_contract_satisfied(mut self, satisfied: bool) -> Self {
+        self.green_contract_satisfied = satisfied;
+        self
+    }
 }

 #[derive(Debug, Clone, PartialEq, Eq)]
@@ -257,7 +268,8 @@ mod tests {
            ReviewStatus::Approved,
            DiffScope::Scoped,
            false,
-        );
+        )
+        .with_green_contract_satisfied(true);

        // when
        let actions = engine.evaluate(&context);
@@ -266,6 +278,36 @@ mod tests {
        assert_eq!(actions, vec![PolicyAction::MergeToDev]);
    }

+    #[test]
+    fn merge_rule_blocks_when_green_tests_lack_contract_provenance() {
+        // given
+        let engine = PolicyEngine::new(vec![PolicyRule::new(
+            "merge-to-dev",
+            PolicyCondition::And(vec![
+                PolicyCondition::GreenAt { level: 2 },
+                PolicyCondition::ScopedDiff,
+                PolicyCondition::ReviewPassed,
+            ]),
+            PolicyAction::MergeToDev,
+            20,
+        )]);
+        let context = LaneContext::new(
+            "lane-7",
+            3,
+            Duration::from_secs(5),
+            LaneBlocker::None,
+            ReviewStatus::Approved,
+            DiffScope::Scoped,
+            false,
+        );
+
+        // when
+        let actions = engine.evaluate(&context);
+
+        // then
+        assert!(actions.is_empty());
+    }
+
    #[test]
    fn stale_branch_rule_fires_at_threshold() {
        // given
@@ -468,7 +510,8 @@ mod tests {
            ReviewStatus::Pending,
            DiffScope::Full,
            false,
-        );
+        )
+        .with_green_contract_satisfied(true);

        // when
        let actions = engine.evaluate(&context);
@@ -121,6 +121,21 @@ pub enum RecoveryResult {
    },
 }

+/// Type of recovery execution represented in the ledger.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum RecoveryAttemptType {
+    Automatic,
+}
+
+/// Result for one executable recovery command/step.
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct RecoveryCommandResult {
+    pub command: RecoveryStep,
+    pub status: RecoveryAttemptState,
+    pub result: String,
+}
+
 /// Structured event emitted during recovery.
 #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
 #[serde(rename_all = "snake_case")]
@@ -135,14 +150,59 @@ pub enum RecoveryEvent {
    Escalated,
 }

+/// Machine-readable recovery progress for one failure scenario.
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct RecoveryLedgerEntry {
+    pub recipe_id: String,
+    pub attempt_type: RecoveryAttemptType,
+    pub trigger: FailureScenario,
+    pub attempt_count: u32,
+    pub retry_limit: u32,
+    pub attempts_remaining: u32,
+    pub state: RecoveryAttemptState,
+    pub started_at: Option<String>,
+    pub finished_at: Option<String>,
+    pub command_results: Vec<RecoveryCommandResult>,
+    pub result: Option<RecoveryResult>,
+    pub last_failure_summary: Option<String>,
+    pub escalation_reason: Option<String>,
+}
+
+/// Current state of a recovery recipe attempt.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum RecoveryAttemptState {
+    Queued,
+    Running,
+    Succeeded,
+    Failed,
+    Exhausted,
+}
+
+/// Machine-readable status projection for callers that need to
+/// distinguish an untouched scenario from an exhausted recovery.
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct RecoveryStatusReport {
+    pub scenario: FailureScenario,
+    pub attempted: bool,
+    pub state: Option<RecoveryAttemptState>,
+    pub attempt_count: u32,
+    pub retry_limit: Option<u32>,
+    pub attempts_remaining: Option<u32>,
+    pub escalation_reason: Option<String>,
+}
+
 /// Minimal context for tracking recovery state and emitting events.
 ///
-/// Holds per-scenario attempt counts, a structured event log, and an
-/// optional simulation knob for controlling step outcomes during tests.
+/// Holds per-scenario attempt counts, a structured event log, a recovery
+/// attempt ledger, and an optional simulation knob for controlling step
+/// outcomes during tests.
 #[derive(Debug, Clone, Default)]
 pub struct RecoveryContext {
    attempts: HashMap<FailureScenario, u32>,
    events: Vec<RecoveryEvent>,
+    ledger: HashMap<FailureScenario, RecoveryLedgerEntry>,
+    clock_tick: u64,
    /// Optional step index at which simulated execution fails.
    /// `None` means all steps succeed.
    fail_at_step: Option<usize>,
@@ -172,6 +232,51 @@ impl RecoveryContext {
    pub fn attempt_count(&self, scenario: &FailureScenario) -> u32 {
        self.attempts.get(scenario).copied().unwrap_or(0)
    }
+
+    /// Returns the machine-readable recovery ledger entry for a scenario.
+    #[must_use]
+    pub fn ledger_entry(&self, scenario: &FailureScenario) -> Option<&RecoveryLedgerEntry> {
+        self.ledger.get(scenario)
+    }
+
+    /// Returns all recovery ledger entries currently tracked by this context.
+    #[must_use]
+    pub fn ledger_entries(&self) -> Vec<&RecoveryLedgerEntry> {
+        let mut entries: Vec<_> = self.ledger.values().collect();
+        entries.sort_by(|left, right| left.recipe_id.cmp(&right.recipe_id));
+        entries
+    }
+
+    /// Returns a compact machine-readable recovery status for a scenario,
+    /// including `attempted = false` when no ledger entry exists yet.
+    #[must_use]
+    pub fn status_report(&self, scenario: &FailureScenario) -> RecoveryStatusReport {
+        self.ledger_entry(scenario).map_or(
+            RecoveryStatusReport {
+                scenario: *scenario,
+                attempted: false,
+                state: None,
+                attempt_count: 0,
+                retry_limit: None,
+                attempts_remaining: None,
+                escalation_reason: None,
+            },
+            |entry| RecoveryStatusReport {
+                scenario: *scenario,
+                attempted: entry.attempt_count > 0,
+                state: Some(entry.state),
+                attempt_count: entry.attempt_count,
+                retry_limit: Some(entry.retry_limit),
+                attempts_remaining: Some(entry.attempts_remaining),
+                escalation_reason: entry.escalation_reason.clone(),
+            },
+        )
+    }
+
+    fn next_timestamp(&mut self) -> String {
+        self.clock_tick += 1;
+        format!("recovery-ledger-tick-{}", self.clock_tick)
+    }
 }

 /// Returns the known recovery recipe for the given failure scenario.
@@ -233,18 +338,51 @@ pub fn recipe_for(scenario: &FailureScenario) -> RecoveryRecipe {
 /// Looks up the recipe, enforces the one-attempt-before-escalation
 /// policy, simulates step execution (controlled by the context), and
 /// emits structured [`RecoveryEvent`]s for every attempt.
+#[allow(clippy::too_many_lines)]
 pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -> RecoveryResult {
    let recipe = recipe_for(scenario);
-    let attempt_count = ctx.attempts.entry(*scenario).or_insert(0);
+    let recipe_id = scenario.to_string();
+    ctx.ledger
+        .entry(*scenario)
+        .or_insert_with(|| RecoveryLedgerEntry {
+            recipe_id: recipe_id.clone(),
+            attempt_type: RecoveryAttemptType::Automatic,
+            trigger: *scenario,
+            attempt_count: 0,
+            retry_limit: recipe.max_attempts,
+            attempts_remaining: recipe.max_attempts,
+            state: RecoveryAttemptState::Queued,
+            started_at: None,
+            finished_at: None,
+            command_results: Vec::new(),
+            result: None,
+            last_failure_summary: None,
+            escalation_reason: None,
+        });
+
+    let current_attempts = ctx.attempt_count(scenario);

    // Enforce one automatic recovery attempt before escalation.
-    if *attempt_count >= recipe.max_attempts {
+    if current_attempts >= recipe.max_attempts {
        let result = RecoveryResult::EscalationRequired {
            reason: format!(
                "max recovery attempts ({}) exceeded for {}",
                recipe.max_attempts, scenario
            ),
        };
+        let finished_at = ctx.next_timestamp();
+        if let Some(entry) = ctx.ledger.get_mut(scenario) {
+            entry.attempt_count = current_attempts;
+            entry.attempts_remaining = 0;
+            entry.state = RecoveryAttemptState::Exhausted;
+            entry.finished_at = Some(finished_at);
+            entry.result = Some(result.clone());
+            let RecoveryResult::EscalationRequired { reason } = &result else {
+                unreachable!("exhaustion always produces escalation");
+            };
+            entry.last_failure_summary = Some(reason.clone());
+            entry.escalation_reason = Some(reason.clone());
+        }
        ctx.events.push(RecoveryEvent::RecoveryAttempted {
            scenario: *scenario,
            recipe,
@@ -254,19 +392,44 @@ pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -
        return result;
    }

-    *attempt_count += 1;
+    let updated_attempts = ctx.attempts.entry(*scenario).or_insert(0);
+    *updated_attempts += 1;
+    let updated_attempts = *updated_attempts;
+    let started_at = ctx.next_timestamp();
+    if let Some(entry) = ctx.ledger.get_mut(scenario) {
+        entry.attempt_count = updated_attempts;
+        entry.attempts_remaining = recipe.max_attempts.saturating_sub(updated_attempts);
+        entry.state = RecoveryAttemptState::Running;
+        entry.started_at = Some(started_at);
+        entry.finished_at = None;
+        entry.command_results.clear();
+        entry.result = None;
+        entry.last_failure_summary = None;
+        entry.escalation_reason = None;
+    }

    // Execute steps, honoring the optional fail_at_step simulation.
    let fail_index = ctx.fail_at_step;
    let mut executed = Vec::new();
+    let mut command_results = Vec::new();
    let mut failed = false;

    for (i, step) in recipe.steps.iter().enumerate() {
        if fail_index == Some(i) {
+            command_results.push(RecoveryCommandResult {
+                command: step.clone(),
+                status: RecoveryAttemptState::Failed,
+                result: format!("step {i} failed for {scenario}"),
+            });
            failed = true;
            break;
        }
        executed.push(step.clone());
+        command_results.push(RecoveryCommandResult {
+            command: step.clone(),
+            status: RecoveryAttemptState::Succeeded,
+            result: format!("step {i} succeeded for {scenario}"),
+        });
    }

    let result = if failed {
@@ -288,6 +451,29 @@ pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -
    };

    // Emit the attempt as structured event data.
+    let finished_at = ctx.next_timestamp();
+    if let Some(entry) = ctx.ledger.get_mut(scenario) {
+        entry.finished_at = Some(finished_at);
+        entry.command_results = command_results;
+        entry.result = Some(result.clone());
+        match &result {
+            RecoveryResult::Recovered { .. } => {
+                entry.state = RecoveryAttemptState::Succeeded;
+            }
+            RecoveryResult::PartialRecovery { remaining, .. } => {
+                entry.state = RecoveryAttemptState::Failed;
+                entry.last_failure_summary = Some(format!(
+                    "{} step(s) remaining after partial recovery",
+                    remaining.len()
+                ));
+            }
+            RecoveryResult::EscalationRequired { reason } => {
+                entry.state = RecoveryAttemptState::Exhausted;
+                entry.last_failure_summary = Some(reason.clone());
+                entry.escalation_reason = Some(reason.clone());
+            }
+        }
+    }
    ctx.events.push(RecoveryEvent::RecoveryAttempted {
        scenario: *scenario,
        recipe,
@@ -499,6 +685,126 @@ mod tests {
        assert_eq!(ctx.attempt_count(&FailureScenario::PromptMisdelivery), 0);
    }

+    #[test]
+    fn recovery_context_exposes_machine_readable_ledger() {
+        // given
+        let mut ctx = RecoveryContext::new();
+
+        // when
+        let result = attempt_recovery(&FailureScenario::StaleBranch, &mut ctx);
+
+        // then
+        assert_eq!(result, RecoveryResult::Recovered { steps_taken: 2 });
+        let entry = ctx
+            .ledger_entry(&FailureScenario::StaleBranch)
+            .expect("stale branch ledger entry");
+        assert_eq!(entry.recipe_id, "stale_branch");
+        assert_eq!(entry.attempt_type, RecoveryAttemptType::Automatic);
+        assert_eq!(entry.trigger, FailureScenario::StaleBranch);
+        assert_eq!(entry.attempt_count, 1);
+        assert_eq!(entry.retry_limit, 1);
+        assert_eq!(entry.attempts_remaining, 0);
+        assert_eq!(entry.state, RecoveryAttemptState::Succeeded);
+        assert!(entry.started_at.is_some());
+        assert!(entry.finished_at.is_some());
+        assert_eq!(
+            entry.result,
+            Some(RecoveryResult::Recovered { steps_taken: 2 })
+        );
+        assert_eq!(entry.command_results.len(), 2);
+        assert_eq!(entry.command_results[0].command, RecoveryStep::RebaseBranch);
+        assert_eq!(
+            entry.command_results[0].status,
+            RecoveryAttemptState::Succeeded
+        );
+        assert_eq!(entry.last_failure_summary, None);
+        assert_eq!(entry.escalation_reason, None);
+    }
+
+    #[test]
+    fn recovery_ledger_records_exhausted_escalation_reason() {
+        // given
+        let mut ctx = RecoveryContext::new();
+        let scenario = FailureScenario::PromptMisdelivery;
+
+        // when
+        let _ = attempt_recovery(&scenario, &mut ctx);
+        let result = attempt_recovery(&scenario, &mut ctx);
+
+        // then
+        assert!(matches!(result, RecoveryResult::EscalationRequired { .. }));
+        let entry = ctx.ledger_entry(&scenario).expect("ledger entry");
+        assert_eq!(entry.state, RecoveryAttemptState::Exhausted);
+        assert_eq!(entry.attempt_count, 1);
+        assert_eq!(entry.attempts_remaining, 0);
+        assert!(matches!(
+            entry.result,
+            Some(RecoveryResult::EscalationRequired { .. })
+        ));
+        assert!(entry
+            .escalation_reason
+            .as_deref()
+            .expect("escalation reason")
+            .contains("max recovery attempts"));
+    }
+
+    #[test]
+    fn recovery_status_report_distinguishes_not_attempted_from_exhausted() {
+        // given
+        let mut ctx = RecoveryContext::new();
+        let scenario = FailureScenario::PromptMisdelivery;
+
+        // then — no ledger entry is not the same as exhausted.
+        let not_attempted = ctx.status_report(&scenario);
+        assert!(!not_attempted.attempted);
+        assert_eq!(not_attempted.state, None);
+        assert_eq!(not_attempted.attempt_count, 0);
+        assert_eq!(not_attempted.retry_limit, None);
+
+        // when — one allowed attempt then one extra attempt.
+        let _ = attempt_recovery(&scenario, &mut ctx);
+        let _ = attempt_recovery(&scenario, &mut ctx);
+
+        // then
+        let exhausted = ctx.status_report(&scenario);
+        assert!(exhausted.attempted);
+        assert_eq!(exhausted.state, Some(RecoveryAttemptState::Exhausted));
+        assert_eq!(exhausted.attempt_count, 1);
+        assert_eq!(exhausted.retry_limit, Some(1));
+        assert_eq!(exhausted.attempts_remaining, Some(0));
+        assert!(exhausted
+            .escalation_reason
+            .as_deref()
+            .is_some_and(|reason| reason.contains("max recovery attempts")));
+    }
+
+    #[test]
+    fn recovery_ledger_records_failed_command_result() {
+        // given
+        let mut ctx = RecoveryContext::new().with_fail_at_step(1);
+        let scenario = FailureScenario::PartialPluginStartup;
+
+        // when
+        let result = attempt_recovery(&scenario, &mut ctx);
+
+        // then
+        assert!(matches!(result, RecoveryResult::PartialRecovery { .. }));
+        let entry = ctx.ledger_entry(&scenario).expect("ledger entry");
+        assert_eq!(entry.state, RecoveryAttemptState::Failed);
+        assert_eq!(entry.command_results.len(), 2);
+        assert_eq!(
+            entry.command_results[0].status,
+            RecoveryAttemptState::Succeeded
+        );
+        assert_eq!(
+            entry.command_results[1].status,
+            RecoveryAttemptState::Failed
+        );
+        assert!(entry.command_results[1]
+            .result
+            .contains("partial_plugin_startup"));
+    }
+
    #[test]
    fn stale_branch_recipe_has_rebase_then_clean_build() {
        // given
@@ -96,9 +96,7 @@ fn green_contract_unsatisfied_blocks_merge() {
        false,
    );

-    // This is a conceptual test — we need a way to express "requires workspace green"
-    // Currently LaneContext has raw green_level: u8, not a contract
-    // For now we just verify the policy condition works
+    // The context has a test level but lacks the full green contract, so merge stays blocked.
    let engine = PolicyEngine::new(vec![PolicyRule::new(
        "workspace-green-required",
        PolicyCondition::GreenAt { level: 3 }, // GreenLevel::Workspace
@@ -267,7 +265,8 @@ fn fresh_approved_lane_gets_merge_action() {
        ReviewStatus::Approved,
        DiffScope::Scoped,
        false,
-    );
+    )
+    .with_green_contract_satisfied(true);

    let engine = PolicyEngine::new(vec![PolicyRule::new(
        "merge-if-green-approved-not-stale",
@@ -357,7 +356,8 @@ fn worker_provider_failure_flows_through_recovery_to_policy() {
        ReviewStatus::Approved,
        DiffScope::Scoped,
        false,
-    );
+    )
+    .with_green_contract_satisfied(true);

    let policy_engine = PolicyEngine::new(vec![
        // Rule: if recovered from failure + green + approved -> merge
@@ -45,11 +45,11 @@ use render::{MarkdownStreamState, Spinner, TerminalRenderer};
 use runtime::{
    check_base_commit, format_stale_base_warning, format_usd, load_oauth_credentials,
    load_system_prompt, pricing_for_model, resolve_expected_base, resolve_sandbox_status,
-    ApiClient, ApiRequest, AssistantEvent, CompactionConfig, ConfigLoader, ConfigSource,
-    ContentBlock, ConversationMessage, ConversationRuntime, McpServer, McpServerManager,
-    McpServerSpec, McpTool, MessageRole, ModelPricing, PermissionMode, PermissionPolicy,
-    ProjectContext, PromptCacheEvent, ResolvedPermissionMode, RuntimeError, Session, TokenUsage,
-    ToolError, ToolExecutor, UsageTracker,
+    ApiClient, ApiRequest, AssistantEvent, BaseCommitState, CompactionConfig, ConfigLoader,
+    ConfigSource, ContentBlock, ConversationMessage, ConversationRuntime, McpServer,
+    McpServerManager, McpServerSpec, McpTool, MessageRole, ModelPricing, PermissionMode,
+    PermissionPolicy, ProjectContext, PromptCacheEvent, ResolvedPermissionMode, RuntimeError,
+    Session, TokenUsage, ToolError, ToolExecutor, UsageTracker,
 };
 use serde::Deserialize;
 use serde_json::{json, Map, Value};
@@ -1973,6 +1973,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
        parse_git_status_metadata(project_context.git_status.as_deref());
    let git_summary = parse_git_workspace_summary(project_context.git_status.as_deref());
    let branch_freshness = BranchFreshness::from_git_status(project_context.git_status.as_deref());
+    let stale_base_state = stale_base_state_for(&cwd, None);
    let empty_config = runtime::RuntimeConfig::empty();
    let sandbox_config = config.as_ref().ok().unwrap_or(&empty_config);
    let boot_preflight = build_boot_preflight_snapshot(
@@ -1995,6 +1996,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
        git_branch,
        git_summary,
        branch_freshness,
+        stale_base_state,
        session_lifecycle: classify_session_lifecycle_for(&cwd),
        boot_preflight,
        sandbox_status: resolve_sandbox_status(sandbox_config.sandbox(), &cwd),
@@ -2334,9 +2336,10 @@ fn check_install_source_health() -> DiagnosticCheck {

 fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
    let in_repo = context.project_root.is_some();
+    let stale_base_warning = format_stale_base_warning(&context.stale_base_state);
    DiagnosticCheck::new(
        "Workspace",
-        if in_repo {
+        if in_repo && stale_base_warning.is_none() {
            DiagnosticLevel::Ok
        } else {
            DiagnosticLevel::Warn
@@ -2369,6 +2372,10 @@ fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
            "Memory files     {} · config files loaded {}/{}",
            context.memory_file_count, context.loaded_config_files, context.discovered_config_files
        ),
+        format!(
+            "Stale base      {}",
+            stale_base_warning.as_deref().unwrap_or("ok")
+        ),
    ])
    .with_data(Map::from_iter([
        ("cwd".to_string(), json!(context.cwd.display().to_string())),
@@ -2401,6 +2408,10 @@ fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
            "discovered_config_files".to_string(),
            json!(context.discovered_config_files),
        ),
+        (
+            "stale_base".to_string(),
+            stale_base_json_value(&context.stale_base_state),
+        ),
    ]))
 }

@@ -2920,6 +2931,7 @@ struct StatusContext {
    git_branch: Option<String>,
    git_summary: GitWorkspaceSummary,
    branch_freshness: BranchFreshness,
+    stale_base_state: BaseCommitState,
    session_lifecycle: SessionLifecycleSummary,
    boot_preflight: BootPreflightSnapshot,
    sandbox_status: runtime::SandboxStatus,
@@ -4167,12 +4179,30 @@ fn enforce_broad_cwd_policy(
    }
 }

+fn stale_base_state_for(cwd: &Path, flag_value: Option<&str>) -> BaseCommitState {
+    let source = resolve_expected_base(flag_value, cwd);
+    check_base_commit(cwd, source.as_ref())
+}
+
+fn stale_base_json_value(state: &BaseCommitState) -> serde_json::Value {
+    match state {
+        BaseCommitState::Matches => json!({"status": "matches", "fresh": true}),
+        BaseCommitState::Diverged { expected, actual } => json!({
+            "status": "diverged",
+            "fresh": false,
+            "expected": expected,
+            "actual": actual,
+        }),
+        BaseCommitState::NoExpectedBase => json!({"status": "no_expected_base", "fresh": null}),
+        BaseCommitState::NotAGitRepo => json!({"status": "not_git_repo", "fresh": null}),
+    }
+}
+
 fn run_stale_base_preflight(flag_value: Option<&str>) {
    let Ok(cwd) = env::current_dir() else {
        return;
    };
-    let source = resolve_expected_base(flag_value, &cwd);
-    let state = check_base_commit(&cwd, source.as_ref());
+    let state = stale_base_state_for(&cwd, flag_value);
    if let Some(warning) = format_stale_base_warning(&state) {
        eprintln!("{warning}");
    }
@@ -6221,6 +6251,7 @@ fn status_context(
        parse_git_status_metadata(project_context.git_status.as_deref());
    let git_summary = parse_git_workspace_summary(project_context.git_status.as_deref());
    let branch_freshness = BranchFreshness::from_git_status(project_context.git_status.as_deref());
+    let stale_base_state = stale_base_state_for(&cwd, None);
    let boot_preflight = build_boot_preflight_snapshot(
        &cwd,
        project_root.as_deref(),
@@ -6238,6 +6269,7 @@ fn status_context(
        git_branch,
        git_summary,
        branch_freshness,
+        stale_base_state,
        session_lifecycle: classify_session_lifecycle_for(&cwd),
        boot_preflight,
        sandbox_status,
@@ -12567,6 +12599,7 @@ mod tests {
                    conflicted_files: 0,
                },
                branch_freshness: test_branch_freshness(),
+                stale_base_state: super::BaseCommitState::NoExpectedBase,
                session_lifecycle: SessionLifecycleSummary {
                    kind: SessionLifecycleKind::IdleShell,
                    pane_id: Some("%7".to_string()),
@@ -12692,6 +12725,46 @@ mod tests {
        fs::remove_dir_all(workspace).expect("cleanup temp dir");
    }

+    #[test]
+    fn workspace_health_warns_when_stale_base_diverged() {
+        let context = super::StatusContext {
+            cwd: PathBuf::from("/tmp/project"),
+            session_path: None,
+            loaded_config_files: 0,
+            discovered_config_files: 0,
+            memory_file_count: 0,
+            project_root: Some(PathBuf::from("/tmp/project")),
+            git_branch: Some("feature/stale-base".to_string()),
+            git_summary: GitWorkspaceSummary::default(),
+            branch_freshness: test_branch_freshness(),
+            stale_base_state: super::BaseCommitState::Diverged {
+                expected: "base".to_string(),
+                actual: "head".to_string(),
+            },
+            session_lifecycle: SessionLifecycleSummary {
+                kind: SessionLifecycleKind::SavedOnly,
+                pane_id: None,
+                pane_command: None,
+                pane_path: None,
+                workspace_dirty: false,
+                abandoned: false,
+            },
+            boot_preflight: test_boot_preflight(),
+            sandbox_status: runtime::SandboxStatus::default(),
+            config_load_error: None,
+        };
+
+        let check = super::check_workspace_health(&context);
+
+        assert_eq!(check.level, super::DiagnosticLevel::Warn);
+        assert_eq!(check.data["stale_base"]["status"], "diverged");
+        assert_eq!(check.data["stale_base"]["fresh"], false);
+        assert!(check
+            .details
+            .iter()
+            .any(|detail| detail.contains("stale codebase")));
+    }
+
    #[test]
    fn status_json_surfaces_session_lifecycle_for_clawhip() {
        let context = super::StatusContext {
@@ -12704,6 +12777,7 @@ mod tests {
            git_branch: Some("feature/session-lifecycle".to_string()),
            git_summary: GitWorkspaceSummary::default(),
            branch_freshness: test_branch_freshness(),
+            stale_base_state: super::BaseCommitState::NoExpectedBase,
            session_lifecycle: SessionLifecycleSummary {
                kind: SessionLifecycleKind::RunningProcess,
                pane_id: Some("%9".to_string()),
@@ -56,6 +56,7 @@ pub(crate) fn detect_lane_completion(
    Some(LaneContext {
        lane_id: output.agent_id.clone(),
        green_level: 3, // Workspace green
+        green_contract_satisfied: true,
        branch_freshness: std::time::Duration::from_secs(0),
        blocker: LaneBlocker::None,
        review_status: ReviewStatus::Approved,
@@ -165,6 +166,7 @@ mod tests {
        let context = LaneContext {
            lane_id: "completed-lane".to_string(),
            green_level: 3,
+            green_contract_satisfied: true,
            branch_freshness: std::time::Duration::from_secs(0),
            blocker: LaneBlocker::None,
            review_status: ReviewStatus::Approved,
@@ -1503,8 +1503,10 @@ fn run_worker_create(input: WorkerCreateInput) -> Result<String, String> {
    let merged_roots: Vec<String> = ConfigLoader::default_for(&input.cwd)
        .load()
        .ok()
-        .map(|config| config.trusted_roots_with_overrides(&input.trusted_roots))
-        .unwrap_or_else(|| input.trusted_roots.clone());
+        .map_or_else(
+            || input.trusted_roots.clone(),
+            |config| config.trusted_roots_with_overrides(&input.trusted_roots),
+        );
    let worker = global_worker_registry().create(
        &input.cwd,
        &merged_roots,
@@ -6212,6 +6214,8 @@ Command exceeded timeout of {timeout_ms} ms",
                        stderr.trim_end()
                    )
                };
+                let is_test = is_test_command(command);
+                let return_code_interpretation = if is_test { "test.hung" } else { "timeout" };
                return Ok(runtime::BashCommandOutput {
                    stdout: String::from_utf8_lossy(&output.stdout).into_owned(),
                    stderr,
@@ -6222,9 +6226,11 @@ Command exceeded timeout of {timeout_ms} ms",
                    backgrounded_by_user: None,
                    assistant_auto_backgrounded: None,
                    dangerously_disable_sandbox: None,
-                    return_code_interpretation: Some(String::from("timeout")),
+                    return_code_interpretation: Some(String::from(return_code_interpretation)),
                    no_output_expected: Some(false),
-                    structured_content: None,
+                    structured_content: Some(vec![test_timeout_provenance(
+                        command, timeout_ms, is_test,
+                    )]),
                    persisted_output_path: None,
                    persisted_output_size: None,
                    sandbox_status: None,
@@ -6258,6 +6264,37 @@ Command exceeded timeout of {timeout_ms} ms",
    })
 }

+fn is_test_command(command: &str) -> bool {
+    let normalized = command
+        .split_whitespace()
+        .collect::<Vec<_>>()
+        .join(" ")
+        .to_ascii_lowercase();
+    normalized.contains("cargo test")
+        || normalized.contains("cargo nextest")
+        || normalized.contains("npm test")
+        || normalized.contains("pnpm test")
+        || normalized.contains("yarn test")
+        || normalized.contains("pytest")
+}
+
+fn test_timeout_provenance(
+    command: &str,
+    timeout_ms: u64,
+    classified_as_test_hang: bool,
+) -> serde_json::Value {
+    json!({
+        "event": if classified_as_test_hang { "test.hung" } else { "command.timeout" },
+        "failureClass": if classified_as_test_hang { "test_hang" } else { "timeout" },
+        "data": {
+            "command": command,
+            "timeoutMs": timeout_ms,
+            "provenance": "shell.timeout",
+            "classification": if classified_as_test_hang { "test.hung" } else { "timeout" }
+        }
+    })
+}
+
 fn resolve_cell_index(
    cells: &[serde_json::Value],
    cell_id: Option<&str>,
@@ -9027,6 +9064,23 @@ mod tests {
        assert_eq!(background_output["noOutputExpected"], true);
    }

+    #[test]
+    fn bash_tool_classifies_test_timeout_as_hung_with_provenance() {
+        let timeout = execute_tool(
+            "bash",
+            &json!({ "command": "sleep 1 # cargo test slow_case", "timeout": 10 }),
+        )
+        .expect("bash timeout should return output");
+        let timeout_output: serde_json::Value = serde_json::from_str(&timeout).expect("json");
+        assert_eq!(timeout_output["interrupted"], true);
+        assert_eq!(timeout_output["returnCodeInterpretation"], "test.hung");
+        assert_eq!(timeout_output["structuredContent"][0]["event"], "test.hung");
+        assert_eq!(
+            timeout_output["structuredContent"][0]["data"]["provenance"],
+            "bash.timeout"
+        );
+    }
+
    #[test]
    fn bash_workspace_tests_are_blocked_when_branch_is_behind_main() {
        let _guard = env_lock()
Author	SHA1	Message	Date
bellman	41b769fc5a	Merge commit '204af77596345c120e25ce9d433dad0676a59b37'	2026-05-14 21:43:23 +09:00
bellman	7426ede2eb	map branch recovery verification evidence Record why the G005 branch-recovery work satisfies the roadmap pinpoints without touching leader-owned Ultragoal state. Constraint: Task 2 requested ROADMAP.md/plan pinpoint mapping and explicitly forbids .omx/ultragoal mutation. Rejected: leader-only mailbox note \| the task prefers a repo-local docs/g005 verification map when unclaimed and absent. Confidence: high Scope-risk: narrow Directive: Keep this map evidence-only; do not treat it as a substitute for leader Ultragoal checkpoints. Tested: documentation-only map cross-checked against ROADMAP.md, prd.json, and task-1 verification output. Not-tested: no code tests rerun after documentation-only commit.	2026-05-14 18:40:16 +09:00
bellman	8f7eaffcef	Close the G005 verification gaps before checkpoint Constraint: G005 requires stale-base doctor consistency, green-contract policy integration, hung-test evidence, and a durable verification map before ultragoal checkpointing.\nRejected: Treat worker task status alone as complete \| worker-2 lifecycle was stale-failed despite landed recovery evidence, so leader verification and explicit map are required.\nConfidence: medium\nScope-risk: moderate\nDirective: Keep PR/issue reconciliation deferred to G011/G012; do not mutate .omx/ultragoal outside checkpoint commands.\nTested: git diff --check; cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo check --manifest-path rust/Cargo.toml -p rusty-claude-cli; cargo test --manifest-path rust/Cargo.toml -p rusty-claude-cli workspace_health_warns_when_stale_base_diverged -- --nocapture; cargo check --manifest-path rust/Cargo.toml -p tools\nNot-tested: full workspace test suite due known unrelated permission/lifecycle failures from worker evidence.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>	2026-05-14 18:38:22 +09:00
bellman	d2b5f5d498	require provenance for green contracts Promote merge-ready green contracts from a level-only check to explicit provenance requirements for test commands, base freshness, recovery-attempt context, and known blocking flakes. This preserves simple level contracts while giving policy code a single satisfied-contract signal to require before merge decisions.\n\nConstraint: Task scope was limited to green_contract.rs, policy_engine.rs if needed, and narrow tests; stale_* and recovery_recipes.rs were not edited.\nRejected: Adding more boolean fields to GreenContract \| clippy flagged the shape and a requirement list is more explicit.\nConfidence: high\nScope-risk: narrow\nDirective: Treat raw test level as insufficient for merge readiness unless green contract evidence is satisfied.\nTested: cargo check --manifest-path rust/Cargo.toml -p runtime; cargo test --manifest-path rust/Cargo.toml -p runtime; cargo clippy --manifest-path rust/Cargo.toml -p runtime -- -D warnings; focused green_contract, policy_engine, and integration tests.\nNot-tested: full workspace cargo test due pre-existing rusty-claude-cli session_lifecycle_prefers_running_process_over_idle_shell failure observed before this slice.	2026-05-14 18:33:51 +09:00
bellman	607f071ca8	harden branch recovery reporting Ensure branch-recovery verification surfaces compile cleanly under focused lint by preserving trusted-root fallback without clippy noise. Constraint: G005 worker task requires verified branch/test awareness and recovery reporting evidence without mutating .omx/ultragoal. Rejected: ignoring focused clippy failure \| would leave modified tools surface with avoidable lint noise. Confidence: high Scope-risk: narrow Directive: Keep recovery surfaces machine-readable; do not collapse test hangs back into generic timeouts. Tested: cargo test -p runtime; cargo test -p tools targeted branch/hung/preflight tests; cargo check -p runtime -p tools; cargo clippy -p runtime --all-targets -- -D warnings; cargo clippy -p tools --lib --no-deps -- -D warnings. Not-tested: full cargo test -p tools remains red on pre-existing permission-enforcer expectation failures unrelated to this change.	2026-05-14 18:33:48 +09:00
bellman	d3f8ff9916	omx(team): auto-checkpoint worker-1 [1]	2026-05-14 18:28:21 +09:00
bellman	204af77596	Keep recovery recipe lint green for ledger reporting Scoped to G005 recovery recipe status reporting verification; preserves existing machine-readable ledger/status fields and allows the intentionally long recovery attempt flow to satisfy strict clippy without touching unrelated bash lint debt.\n\nConstraint: Task scope limited to recovery_recipes.rs and smallest adjacent exports.\nRejected: Refactor attempt_recovery during branch recovery \| higher regression risk than preserving established flow.\nConfidence: high\nScope-risk: narrow\nDirective: Do not expand this task into unrelated bash.rs clippy cleanup.\nTested: cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo check --manifest-path rust/Cargo.toml -p runtime; cargo test --manifest-path rust/Cargo.toml -p runtime recovery_ -- --nocapture; cargo clippy --manifest-path rust/Cargo.toml -p runtime --lib -- -D warnings -A clippy::single-match-else\nNot-tested: full clippy without allow still fails on pre-existing rust/crates/runtime/src/bash.rs single_match_else outside task scope.	2026-05-14 18:26:58 +09:00
bellman	5c40d4e778	omx(team): auto-checkpoint worker-3 [4]	2026-05-14 18:26:55 +09:00
bellman	5625ba597b	omx(team): auto-checkpoint worker-1 [1]	2026-05-14 18:26:49 +09:00
bellman	4f60cf70f1	omx(team): merge worker-2	2026-05-14 18:24:51 +09:00
bellman	6a37442ee1	omx(team): auto-checkpoint worker-2 [3]	2026-05-14 18:24:51 +09:00
bellman	0bca524c8c	omx(team): auto-checkpoint worker-1 [1]	2026-05-14 18:22:37 +09:00
bellman	2ad56860df	omx(team): merge worker-1	2026-05-14 18:21:26 +09:00
bellman	1fbde9f47f	omx(team): auto-checkpoint worker-1 [1]	2026-05-14 18:21:26 +09:00