Compare commits

...

14 Commits

Author SHA1 Message Date
bellman 41b769fc5a Merge commit '204af77596345c120e25ce9d433dad0676a59b37' 2026-05-14 21:43:23 +09:00
bellman 7426ede2eb map branch recovery verification evidence
Record why the G005 branch-recovery work satisfies the roadmap pinpoints without touching leader-owned Ultragoal state.

Constraint: Task 2 requested ROADMAP.md/plan pinpoint mapping and explicitly forbids .omx/ultragoal mutation.

Rejected: leader-only mailbox note | the task prefers a repo-local docs/g005 verification map when unclaimed and absent.

Confidence: high

Scope-risk: narrow

Directive: Keep this map evidence-only; do not treat it as a substitute for leader Ultragoal checkpoints.

Tested: documentation-only map cross-checked against ROADMAP.md, prd.json, and task-1 verification output.

Not-tested: no code tests rerun after documentation-only commit.
2026-05-14 18:40:16 +09:00
bellman 8f7eaffcef Close the G005 verification gaps before checkpoint
Constraint: G005 requires stale-base doctor consistency, green-contract policy integration, hung-test evidence, and a durable verification map before ultragoal checkpointing.\nRejected: Treat worker task status alone as complete | worker-2 lifecycle was stale-failed despite landed recovery evidence, so leader verification and explicit map are required.\nConfidence: medium\nScope-risk: moderate\nDirective: Keep PR/issue reconciliation deferred to G011/G012; do not mutate .omx/ultragoal outside checkpoint commands.\nTested: git diff --check; cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo check --manifest-path rust/Cargo.toml -p rusty-claude-cli; cargo test --manifest-path rust/Cargo.toml -p rusty-claude-cli workspace_health_warns_when_stale_base_diverged -- --nocapture; cargo check --manifest-path rust/Cargo.toml -p tools\nNot-tested: full workspace test suite due known unrelated permission/lifecycle failures from worker evidence.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-14 18:38:22 +09:00
bellman d2b5f5d498 require provenance for green contracts
Promote merge-ready green contracts from a level-only check to explicit provenance requirements for test commands, base freshness, recovery-attempt context, and known blocking flakes. This preserves simple level contracts while giving policy code a single satisfied-contract signal to require before merge decisions.\n\nConstraint: Task scope was limited to green_contract.rs, policy_engine.rs if needed, and narrow tests; stale_* and recovery_recipes.rs were not edited.\nRejected: Adding more boolean fields to GreenContract | clippy flagged the shape and a requirement list is more explicit.\nConfidence: high\nScope-risk: narrow\nDirective: Treat raw test level as insufficient for merge readiness unless green contract evidence is satisfied.\nTested: cargo check --manifest-path rust/Cargo.toml -p runtime; cargo test --manifest-path rust/Cargo.toml -p runtime; cargo clippy --manifest-path rust/Cargo.toml -p runtime -- -D warnings; focused green_contract, policy_engine, and integration tests.\nNot-tested: full workspace cargo test due pre-existing rusty-claude-cli session_lifecycle_prefers_running_process_over_idle_shell failure observed before this slice.
2026-05-14 18:33:51 +09:00
bellman 607f071ca8 harden branch recovery reporting
Ensure branch-recovery verification surfaces compile cleanly under focused lint by preserving trusted-root fallback without clippy noise.

Constraint: G005 worker task requires verified branch/test awareness and recovery reporting evidence without mutating .omx/ultragoal.

Rejected: ignoring focused clippy failure | would leave modified tools surface with avoidable lint noise.

Confidence: high

Scope-risk: narrow

Directive: Keep recovery surfaces machine-readable; do not collapse test hangs back into generic timeouts.

Tested: cargo test -p runtime; cargo test -p tools targeted branch/hung/preflight tests; cargo check -p runtime -p tools; cargo clippy -p runtime --all-targets -- -D warnings; cargo clippy -p tools --lib --no-deps -- -D warnings.

Not-tested: full cargo test -p tools remains red on pre-existing permission-enforcer expectation failures unrelated to this change.
2026-05-14 18:33:48 +09:00
bellman d3f8ff9916 omx(team): auto-checkpoint worker-1 [1] 2026-05-14 18:28:21 +09:00
bellman 204af77596 Keep recovery recipe lint green for ledger reporting
Scoped to G005 recovery recipe status reporting verification; preserves existing machine-readable ledger/status fields and allows the intentionally long recovery attempt flow to satisfy strict clippy without touching unrelated bash lint debt.\n\nConstraint: Task scope limited to recovery_recipes.rs and smallest adjacent exports.\nRejected: Refactor attempt_recovery during branch recovery | higher regression risk than preserving established flow.\nConfidence: high\nScope-risk: narrow\nDirective: Do not expand this task into unrelated bash.rs clippy cleanup.\nTested: cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo check --manifest-path rust/Cargo.toml -p runtime; cargo test --manifest-path rust/Cargo.toml -p runtime recovery_ -- --nocapture; cargo clippy --manifest-path rust/Cargo.toml -p runtime --lib -- -D warnings -A clippy::single-match-else\nNot-tested: full clippy without allow still fails on pre-existing rust/crates/runtime/src/bash.rs single_match_else outside task scope.
2026-05-14 18:26:58 +09:00
bellman 5c40d4e778 omx(team): auto-checkpoint worker-3 [4] 2026-05-14 18:26:55 +09:00
bellman 5625ba597b omx(team): auto-checkpoint worker-1 [1] 2026-05-14 18:26:49 +09:00
bellman 4f60cf70f1 omx(team): merge worker-2 2026-05-14 18:24:51 +09:00
bellman 6a37442ee1 omx(team): auto-checkpoint worker-2 [3] 2026-05-14 18:24:51 +09:00
bellman 0bca524c8c omx(team): auto-checkpoint worker-1 [1] 2026-05-14 18:22:37 +09:00
bellman 2ad56860df omx(team): merge worker-1 2026-05-14 18:21:26 +09:00
bellman 1fbde9f47f omx(team): auto-checkpoint worker-1 [1] 2026-05-14 18:21:26 +09:00
10 changed files with 899 additions and 52 deletions
@@ -0,0 +1,40 @@
# G005 Branch Recovery Verification Map
Scope: worker-1 follow-up map for G005 branch/test awareness and recovery. This file intentionally does not mutate leader-owned `.omx/ultragoal` state.
## Covered ROADMAP / PRD pinpoints
- `ROADMAP.md:912-921` — Phase 3 §7 stale-branch detection before broad verification: broad workspace test commands are preflighted before execution, stale/diverged branches emit `branch.stale_against_main`, and targeted tests bypass the broad-test gate.
- `ROADMAP.md:922-933` — Phase 3 §8 recovery recipes: stale-branch recovery remains represented by the `stale_branch` recipe, with one automatic attempt before escalation.
- `ROADMAP.md:935-949` — Phase 3 §8.5 recovery attempt ledger: `RecoveryContext` now exposes ledger entries with recipe id, attempt count, state, started/finished markers, last failure summary, and escalation reason.
- `ROADMAP.md:951-970` — Phase 3 §9 green-ness / hung-test reporting: timed-out test commands now classify as `test.hung` with structured provenance instead of generic timeout.
- `prd.json:37-44` — US-003 stale-branch detection before broad verification: verified through the `workspace_test_branch_preflight` broad-test block and targeted-test bypass tests.
- `prd.json:50-57` — US-004 recovery recipes with ledger: verified through recovery ledger unit coverage and serialization-compatible recovery structs.
## Implementation anchors
- `rust/crates/runtime/src/stale_branch.rs` — existing branch freshness model and policy actions for fresh, stale, and diverged branches.
- `rust/crates/tools/src/lib.rs``workspace_test_branch_preflight`, `branch_divergence_output`, Bash/PowerShell broad-test gating, and `test.hung` structured timeout provenance on tool-shell timeouts.
- `rust/crates/runtime/src/recovery_recipes.rs` — recovery recipes plus `RecoveryLedgerEntry` / `RecoveryAttemptState` ledger surface.
- `rust/crates/runtime/src/bash.rs` — runtime Bash timeout classification and structured provenance for hung test commands.
- `rust/crates/runtime/src/lib.rs` — public exports for the recovery ledger types.
## Verification evidence
- `cargo test -p runtime` → PASS: 538 unit tests, 2 G004 conformance tests, 12 integration tests, and doctests passed.
- `cargo test -p tools bash_tool_classifies_test_timeout_as_hung_with_provenance -- --nocapture` → PASS.
- `cargo test -p tools bash_workspace_tests_are_blocked_when_branch_is_behind_main -- --nocapture` → PASS.
- `cargo test -p tools bash_targeted_tests_skip_branch_preflight -- --nocapture` → PASS.
- `cargo check -p runtime -p tools` → PASS.
- `cargo clippy -p runtime --all-targets -- -D warnings` → PASS.
- `cargo clippy -p tools --lib --no-deps -- -D warnings` → PASS.
## Known unresolved / out-of-scope items
- Full `cargo test -p tools` is still red on six permission-enforcer expectation tests unrelated to G005 branch freshness, recovery ledger, or hung-test classification. The failing tests assert old permission wording/read-only behavior and pre-existed this follow-up scope.
- ROADMAP stale-base JSON/doctor/status pinpoints remain broader CLI diagnostic-surface work, especially `ROADMAP.md:2425-2489`, `ROADMAP.md:4346-4431`, and `ROADMAP.md:5061-5086`. They are related to branch freshness, but task 1 only required the broad-test freshness gate and narrow reporting surfaces.
- No `.omx/ultragoal` files were changed; leader-owned Ultragoal checkpointing remains outside worker scope.
## Delegation evidence
Subagent spawn evidence: 1, Repository map probe `019e25d5-9be9-7193-8a33-f21450beb62c`; spawned before further serial task-2 mapping per contract, but errored with 429 Too Many Requests, so direct repo evidence was integrated instead.
+91 -21
View File
@@ -4,6 +4,7 @@ use std::process::{Command, Stdio};
use std::time::Duration;
use serde::{Deserialize, Serialize};
use serde_json::json;
use tokio::process::Command as TokioCommand;
use tokio::runtime::Builder;
use tokio::time::timeout;
@@ -176,27 +177,10 @@ async fn execute_bash_async(
let mut command = prepare_tokio_command(&input.command, &cwd, &sandbox_status, true);
let output_result = if let Some(timeout_ms) = input.timeout {
match timeout(Duration::from_millis(timeout_ms), command.output()).await {
Ok(result) => (result?, false),
Err(_) => {
return Ok(BashCommandOutput {
stdout: String::new(),
stderr: format!("Command exceeded timeout of {timeout_ms} ms"),
raw_output_path: None,
interrupted: true,
is_image: None,
background_task_id: None,
backgrounded_by_user: None,
assistant_auto_backgrounded: None,
dangerously_disable_sandbox: input.dangerously_disable_sandbox,
return_code_interpretation: Some(String::from("timeout")),
no_output_expected: Some(true),
structured_content: None,
persisted_output_path: None,
persisted_output_size: None,
sandbox_status: Some(sandbox_status),
});
}
if let Ok(result) = timeout(Duration::from_millis(timeout_ms), command.output()).await {
(result?, false)
} else {
return Ok(timeout_output(&input, timeout_ms, sandbox_status));
}
} else {
(command.output().await?, false)
@@ -233,6 +217,67 @@ async fn execute_bash_async(
})
}
fn timeout_output(
input: &BashCommandInput,
timeout_ms: u64,
sandbox_status: SandboxStatus,
) -> BashCommandOutput {
let is_test = is_test_command(&input.command);
let return_code_interpretation = if is_test { "test.hung" } else { "timeout" };
BashCommandOutput {
stdout: String::new(),
stderr: format!("Command exceeded timeout of {timeout_ms} ms"),
raw_output_path: None,
interrupted: true,
is_image: None,
background_task_id: None,
backgrounded_by_user: None,
assistant_auto_backgrounded: None,
dangerously_disable_sandbox: input.dangerously_disable_sandbox,
return_code_interpretation: Some(String::from(return_code_interpretation)),
no_output_expected: Some(true),
structured_content: Some(vec![test_timeout_provenance(
&input.command,
timeout_ms,
is_test,
)]),
persisted_output_path: None,
persisted_output_size: None,
sandbox_status: Some(sandbox_status),
}
}
fn is_test_command(command: &str) -> bool {
let normalized = command
.split_whitespace()
.collect::<Vec<_>>()
.join(" ")
.to_ascii_lowercase();
normalized.contains("cargo test")
|| normalized.contains("cargo nextest")
|| normalized.contains("npm test")
|| normalized.contains("pnpm test")
|| normalized.contains("yarn test")
|| normalized.contains("pytest")
}
fn test_timeout_provenance(
command: &str,
timeout_ms: u64,
classified_as_test_hang: bool,
) -> serde_json::Value {
json!({
"event": if classified_as_test_hang { "test.hung" } else { "command.timeout" },
"failureClass": if classified_as_test_hang { "test_hang" } else { "timeout" },
"data": {
"command": command,
"timeoutMs": timeout_ms,
"provenance": "bash.timeout",
"classification": if classified_as_test_hang { "test.hung" } else { "timeout" }
}
})
}
fn sandbox_status_for_input(input: &BashCommandInput, cwd: &std::path::Path) -> SandboxStatus {
let config = ConfigLoader::default_for(cwd).load().map_or_else(
|_| SandboxConfig::default(),
@@ -349,6 +394,31 @@ mod tests {
assert!(!output.sandbox_status.expect("sandbox status").enabled);
}
#[test]
fn timed_out_test_command_is_classified_as_hung_test_with_provenance() {
let output = execute_bash(BashCommandInput {
command: String::from("sleep 1 # cargo test slow_case"),
timeout: Some(1),
description: None,
run_in_background: Some(false),
dangerously_disable_sandbox: Some(false),
namespace_restrictions: Some(false),
isolate_network: Some(false),
filesystem_mode: Some(FilesystemIsolationMode::WorkspaceOnly),
allowed_mounts: None,
})
.expect("bash command should return structured timeout");
assert!(output.interrupted);
assert_eq!(
output.return_code_interpretation.as_deref(),
Some("test.hung")
);
let structured = output.structured_content.expect("structured content");
assert_eq!(structured[0]["event"], "test.hung");
assert_eq!(structured[0]["data"]["provenance"], "bash.timeout");
}
}
/// Maximum output bytes before truncation (16 KiB, matching upstream).
+261 -4
View File
@@ -27,19 +27,38 @@ impl std::fmt::Display for GreenLevel {
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct GreenContract {
pub required_level: GreenLevel,
pub requirements: Vec<GreenContractRequirement>,
pub block_known_flakes: bool,
}
impl GreenContract {
#[must_use]
pub fn new(required_level: GreenLevel) -> Self {
Self { required_level }
Self {
required_level,
requirements: Vec::new(),
block_known_flakes: false,
}
}
#[must_use]
pub fn evaluate(self, observed_level: Option<GreenLevel>) -> GreenContractOutcome {
pub fn merge_ready(required_level: GreenLevel) -> Self {
Self {
required_level,
requirements: vec![
GreenContractRequirement::TestCommandProvenance,
GreenContractRequirement::BaseBranchFreshness,
GreenContractRequirement::RecoveryAttemptContext,
],
block_known_flakes: true,
}
}
#[must_use]
pub fn evaluate(&self, observed_level: Option<GreenLevel>) -> GreenContractOutcome {
match observed_level {
Some(level) if level >= self.required_level => GreenContractOutcome::Satisfied {
required_level: self.required_level,
@@ -53,11 +72,170 @@ impl GreenContract {
}
#[must_use]
pub fn is_satisfied_by(self, observed_level: GreenLevel) -> bool {
pub fn evaluate_evidence(&self, evidence: &GreenEvidence) -> GreenEvidenceOutcome {
let mut missing = Vec::new();
let mut blocking_flakes = Vec::new();
if evidence.observed_level < self.required_level {
missing.push(GreenContractRequirement::RequiredLevel);
}
for requirement in &self.requirements {
match requirement {
GreenContractRequirement::TestCommandProvenance
if !evidence.has_passing_test_command() =>
{
missing.push(*requirement);
}
GreenContractRequirement::BaseBranchFreshness if !evidence.base_branch_fresh => {
missing.push(*requirement);
}
GreenContractRequirement::RecoveryAttemptContext
if !evidence.recovery_attempt_context_recorded =>
{
missing.push(*requirement);
}
_ => {}
}
}
if self.block_known_flakes {
blocking_flakes = evidence
.known_flakes
.iter()
.filter(|flake| flake.blocks_green)
.cloned()
.collect();
}
if missing.is_empty() && blocking_flakes.is_empty() {
GreenEvidenceOutcome::Satisfied {
required_level: self.required_level,
observed_level: evidence.observed_level,
}
} else {
GreenEvidenceOutcome::Unsatisfied {
required_level: self.required_level,
observed_level: evidence.observed_level,
missing,
blocking_flakes,
}
}
}
#[must_use]
pub fn is_satisfied_by(&self, observed_level: GreenLevel) -> bool {
observed_level >= self.required_level
}
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct GreenEvidence {
pub observed_level: GreenLevel,
pub test_commands: Vec<TestCommandProvenance>,
pub base_branch_fresh: bool,
pub known_flakes: Vec<KnownFlake>,
pub recovery_attempt_context_recorded: bool,
}
impl GreenEvidence {
#[must_use]
pub fn new(observed_level: GreenLevel) -> Self {
Self {
observed_level,
test_commands: Vec::new(),
base_branch_fresh: false,
known_flakes: Vec::new(),
recovery_attempt_context_recorded: false,
}
}
#[must_use]
pub fn with_test_command(mut self, command: impl Into<String>, exit_code: i32) -> Self {
self.test_commands.push(TestCommandProvenance {
command: command.into(),
exit_code,
});
self
}
#[must_use]
pub fn with_base_branch_fresh(mut self, is_fresh: bool) -> Self {
self.base_branch_fresh = is_fresh;
self
}
#[must_use]
pub fn with_known_flake(mut self, test_name: impl Into<String>, blocks_green: bool) -> Self {
self.known_flakes.push(KnownFlake {
test_name: test_name.into(),
blocks_green,
});
self
}
#[must_use]
pub fn with_recovery_attempt_context(mut self, recorded: bool) -> Self {
self.recovery_attempt_context_recorded = recorded;
self
}
#[must_use]
pub fn has_passing_test_command(&self) -> bool {
self.test_commands.iter().any(TestCommandProvenance::passed)
}
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct TestCommandProvenance {
pub command: String,
pub exit_code: i32,
}
impl TestCommandProvenance {
#[must_use]
pub fn passed(&self) -> bool {
self.exit_code == 0 && !self.command.trim().is_empty()
}
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct KnownFlake {
pub test_name: String,
pub blocks_green: bool,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum GreenContractRequirement {
RequiredLevel,
TestCommandProvenance,
BaseBranchFreshness,
RecoveryAttemptContext,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[serde(tag = "outcome", rename_all = "snake_case")]
pub enum GreenEvidenceOutcome {
Satisfied {
required_level: GreenLevel,
observed_level: GreenLevel,
},
Unsatisfied {
required_level: GreenLevel,
observed_level: GreenLevel,
missing: Vec<GreenContractRequirement>,
blocking_flakes: Vec<KnownFlake>,
},
}
impl GreenEvidenceOutcome {
#[must_use]
pub fn is_satisfied(&self) -> bool {
matches!(self, Self::Satisfied { .. })
}
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[serde(tag = "outcome", rename_all = "snake_case")]
pub enum GreenContractOutcome {
@@ -149,4 +327,83 @@ mod tests {
}
);
}
#[test]
fn merge_ready_contract_requires_provenance_beyond_test_level() {
// given
let contract = GreenContract::merge_ready(GreenLevel::Workspace);
let evidence = GreenEvidence::new(GreenLevel::Workspace)
.with_test_command("cargo test --manifest-path rust/Cargo.toml", 0);
// when
let outcome = contract.evaluate_evidence(&evidence);
// then
assert_eq!(
outcome,
GreenEvidenceOutcome::Unsatisfied {
required_level: GreenLevel::Workspace,
observed_level: GreenLevel::Workspace,
missing: vec![
GreenContractRequirement::BaseBranchFreshness,
GreenContractRequirement::RecoveryAttemptContext,
],
blocking_flakes: vec![],
}
);
assert!(!outcome.is_satisfied());
}
#[test]
fn merge_ready_contract_accepts_complete_test_provenance_context() {
// given
let contract = GreenContract::merge_ready(GreenLevel::Workspace);
let evidence = GreenEvidence::new(GreenLevel::MergeReady)
.with_test_command("cargo test --manifest-path rust/Cargo.toml", 0)
.with_base_branch_fresh(true)
.with_recovery_attempt_context(true);
// when
let outcome = contract.evaluate_evidence(&evidence);
// then
assert_eq!(
outcome,
GreenEvidenceOutcome::Satisfied {
required_level: GreenLevel::Workspace,
observed_level: GreenLevel::MergeReady,
}
);
}
#[test]
fn known_blocking_flake_prevents_green_contract_satisfaction() {
// given
let contract = GreenContract::merge_ready(GreenLevel::Workspace);
let evidence = GreenEvidence::new(GreenLevel::MergeReady)
.with_test_command("cargo test --manifest-path rust/Cargo.toml", 0)
.with_base_branch_fresh(true)
.with_recovery_attempt_context(true)
.with_known_flake(
"session_lifecycle_prefers_running_process_over_idle_shell",
true,
);
// when
let outcome = contract.evaluate_evidence(&evidence);
// then
assert_eq!(
outcome,
GreenEvidenceOutcome::Unsatisfied {
required_level: GreenLevel::Workspace,
observed_level: GreenLevel::MergeReady,
missing: vec![],
blocking_flakes: vec![KnownFlake {
test_name: "session_lifecycle_prefers_running_process_over_idle_shell"
.to_string(),
blocks_green: true,
}],
}
);
}
}
+3 -2
View File
@@ -143,8 +143,9 @@ pub use prompt::{
PromptBuildError, SystemPromptBuilder, FRONTIER_MODEL_NAME, SYSTEM_PROMPT_DYNAMIC_BOUNDARY,
};
pub use recovery_recipes::{
attempt_recovery, recipe_for, EscalationPolicy, FailureScenario, RecoveryContext,
RecoveryEvent, RecoveryRecipe, RecoveryResult, RecoveryStep,
attempt_recovery, recipe_for, EscalationPolicy, FailureScenario, RecoveryAttemptState,
RecoveryAttemptType, RecoveryCommandResult, RecoveryContext, RecoveryEvent,
RecoveryLedgerEntry, RecoveryRecipe, RecoveryResult, RecoveryStatusReport, RecoveryStep,
};
pub use remote::{
inherited_upstream_proxy_env, no_proxy_list, read_token, upstream_proxy_ws_url,
+46 -3
View File
@@ -58,7 +58,9 @@ impl PolicyCondition {
Self::Or(conditions) => conditions
.iter()
.any(|condition| condition.matches(context)),
Self::GreenAt { level } => context.green_level >= *level,
Self::GreenAt { level } => {
context.green_contract_satisfied && context.green_level >= *level
}
Self::StaleBranch => context.branch_freshness >= STALE_BRANCH_THRESHOLD,
Self::StartupBlocked => context.blocker == LaneBlocker::Startup,
Self::LaneCompleted => context.completed,
@@ -134,6 +136,7 @@ pub enum DiffScope {
pub struct LaneContext {
pub lane_id: String,
pub green_level: GreenLevel,
pub green_contract_satisfied: bool,
pub branch_freshness: Duration,
pub blocker: LaneBlocker,
pub review_status: ReviewStatus,
@@ -156,6 +159,7 @@ impl LaneContext {
Self {
lane_id: lane_id.into(),
green_level,
green_contract_satisfied: false,
branch_freshness,
blocker,
review_status,
@@ -171,6 +175,7 @@ impl LaneContext {
Self {
lane_id: lane_id.into(),
green_level: 0,
green_contract_satisfied: false,
branch_freshness: Duration::from_secs(0),
blocker: LaneBlocker::None,
review_status: ReviewStatus::Pending,
@@ -179,6 +184,12 @@ impl LaneContext {
reconciled: true,
}
}
#[must_use]
pub fn with_green_contract_satisfied(mut self, satisfied: bool) -> Self {
self.green_contract_satisfied = satisfied;
self
}
}
#[derive(Debug, Clone, PartialEq, Eq)]
@@ -257,7 +268,8 @@ mod tests {
ReviewStatus::Approved,
DiffScope::Scoped,
false,
);
)
.with_green_contract_satisfied(true);
// when
let actions = engine.evaluate(&context);
@@ -266,6 +278,36 @@ mod tests {
assert_eq!(actions, vec![PolicyAction::MergeToDev]);
}
#[test]
fn merge_rule_blocks_when_green_tests_lack_contract_provenance() {
// given
let engine = PolicyEngine::new(vec![PolicyRule::new(
"merge-to-dev",
PolicyCondition::And(vec![
PolicyCondition::GreenAt { level: 2 },
PolicyCondition::ScopedDiff,
PolicyCondition::ReviewPassed,
]),
PolicyAction::MergeToDev,
20,
)]);
let context = LaneContext::new(
"lane-7",
3,
Duration::from_secs(5),
LaneBlocker::None,
ReviewStatus::Approved,
DiffScope::Scoped,
false,
);
// when
let actions = engine.evaluate(&context);
// then
assert!(actions.is_empty());
}
#[test]
fn stale_branch_rule_fires_at_threshold() {
// given
@@ -468,7 +510,8 @@ mod tests {
ReviewStatus::Pending,
DiffScope::Full,
false,
);
)
.with_green_contract_satisfied(true);
// when
let actions = engine.evaluate(&context);
+311 -5
View File
@@ -121,6 +121,21 @@ pub enum RecoveryResult {
},
}
/// Type of recovery execution represented in the ledger.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum RecoveryAttemptType {
Automatic,
}
/// Result for one executable recovery command/step.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct RecoveryCommandResult {
pub command: RecoveryStep,
pub status: RecoveryAttemptState,
pub result: String,
}
/// Structured event emitted during recovery.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
@@ -135,14 +150,59 @@ pub enum RecoveryEvent {
Escalated,
}
/// Machine-readable recovery progress for one failure scenario.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct RecoveryLedgerEntry {
pub recipe_id: String,
pub attempt_type: RecoveryAttemptType,
pub trigger: FailureScenario,
pub attempt_count: u32,
pub retry_limit: u32,
pub attempts_remaining: u32,
pub state: RecoveryAttemptState,
pub started_at: Option<String>,
pub finished_at: Option<String>,
pub command_results: Vec<RecoveryCommandResult>,
pub result: Option<RecoveryResult>,
pub last_failure_summary: Option<String>,
pub escalation_reason: Option<String>,
}
/// Current state of a recovery recipe attempt.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum RecoveryAttemptState {
Queued,
Running,
Succeeded,
Failed,
Exhausted,
}
/// Machine-readable status projection for callers that need to
/// distinguish an untouched scenario from an exhausted recovery.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct RecoveryStatusReport {
pub scenario: FailureScenario,
pub attempted: bool,
pub state: Option<RecoveryAttemptState>,
pub attempt_count: u32,
pub retry_limit: Option<u32>,
pub attempts_remaining: Option<u32>,
pub escalation_reason: Option<String>,
}
/// Minimal context for tracking recovery state and emitting events.
///
/// Holds per-scenario attempt counts, a structured event log, and an
/// optional simulation knob for controlling step outcomes during tests.
/// Holds per-scenario attempt counts, a structured event log, a recovery
/// attempt ledger, and an optional simulation knob for controlling step
/// outcomes during tests.
#[derive(Debug, Clone, Default)]
pub struct RecoveryContext {
attempts: HashMap<FailureScenario, u32>,
events: Vec<RecoveryEvent>,
ledger: HashMap<FailureScenario, RecoveryLedgerEntry>,
clock_tick: u64,
/// Optional step index at which simulated execution fails.
/// `None` means all steps succeed.
fail_at_step: Option<usize>,
@@ -172,6 +232,51 @@ impl RecoveryContext {
pub fn attempt_count(&self, scenario: &FailureScenario) -> u32 {
self.attempts.get(scenario).copied().unwrap_or(0)
}
/// Returns the machine-readable recovery ledger entry for a scenario.
#[must_use]
pub fn ledger_entry(&self, scenario: &FailureScenario) -> Option<&RecoveryLedgerEntry> {
self.ledger.get(scenario)
}
/// Returns all recovery ledger entries currently tracked by this context.
#[must_use]
pub fn ledger_entries(&self) -> Vec<&RecoveryLedgerEntry> {
let mut entries: Vec<_> = self.ledger.values().collect();
entries.sort_by(|left, right| left.recipe_id.cmp(&right.recipe_id));
entries
}
/// Returns a compact machine-readable recovery status for a scenario,
/// including `attempted = false` when no ledger entry exists yet.
#[must_use]
pub fn status_report(&self, scenario: &FailureScenario) -> RecoveryStatusReport {
self.ledger_entry(scenario).map_or(
RecoveryStatusReport {
scenario: *scenario,
attempted: false,
state: None,
attempt_count: 0,
retry_limit: None,
attempts_remaining: None,
escalation_reason: None,
},
|entry| RecoveryStatusReport {
scenario: *scenario,
attempted: entry.attempt_count > 0,
state: Some(entry.state),
attempt_count: entry.attempt_count,
retry_limit: Some(entry.retry_limit),
attempts_remaining: Some(entry.attempts_remaining),
escalation_reason: entry.escalation_reason.clone(),
},
)
}
fn next_timestamp(&mut self) -> String {
self.clock_tick += 1;
format!("recovery-ledger-tick-{}", self.clock_tick)
}
}
/// Returns the known recovery recipe for the given failure scenario.
@@ -233,18 +338,51 @@ pub fn recipe_for(scenario: &FailureScenario) -> RecoveryRecipe {
/// Looks up the recipe, enforces the one-attempt-before-escalation
/// policy, simulates step execution (controlled by the context), and
/// emits structured [`RecoveryEvent`]s for every attempt.
#[allow(clippy::too_many_lines)]
pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -> RecoveryResult {
let recipe = recipe_for(scenario);
let attempt_count = ctx.attempts.entry(*scenario).or_insert(0);
let recipe_id = scenario.to_string();
ctx.ledger
.entry(*scenario)
.or_insert_with(|| RecoveryLedgerEntry {
recipe_id: recipe_id.clone(),
attempt_type: RecoveryAttemptType::Automatic,
trigger: *scenario,
attempt_count: 0,
retry_limit: recipe.max_attempts,
attempts_remaining: recipe.max_attempts,
state: RecoveryAttemptState::Queued,
started_at: None,
finished_at: None,
command_results: Vec::new(),
result: None,
last_failure_summary: None,
escalation_reason: None,
});
let current_attempts = ctx.attempt_count(scenario);
// Enforce one automatic recovery attempt before escalation.
if *attempt_count >= recipe.max_attempts {
if current_attempts >= recipe.max_attempts {
let result = RecoveryResult::EscalationRequired {
reason: format!(
"max recovery attempts ({}) exceeded for {}",
recipe.max_attempts, scenario
),
};
let finished_at = ctx.next_timestamp();
if let Some(entry) = ctx.ledger.get_mut(scenario) {
entry.attempt_count = current_attempts;
entry.attempts_remaining = 0;
entry.state = RecoveryAttemptState::Exhausted;
entry.finished_at = Some(finished_at);
entry.result = Some(result.clone());
let RecoveryResult::EscalationRequired { reason } = &result else {
unreachable!("exhaustion always produces escalation");
};
entry.last_failure_summary = Some(reason.clone());
entry.escalation_reason = Some(reason.clone());
}
ctx.events.push(RecoveryEvent::RecoveryAttempted {
scenario: *scenario,
recipe,
@@ -254,19 +392,44 @@ pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -
return result;
}
*attempt_count += 1;
let updated_attempts = ctx.attempts.entry(*scenario).or_insert(0);
*updated_attempts += 1;
let updated_attempts = *updated_attempts;
let started_at = ctx.next_timestamp();
if let Some(entry) = ctx.ledger.get_mut(scenario) {
entry.attempt_count = updated_attempts;
entry.attempts_remaining = recipe.max_attempts.saturating_sub(updated_attempts);
entry.state = RecoveryAttemptState::Running;
entry.started_at = Some(started_at);
entry.finished_at = None;
entry.command_results.clear();
entry.result = None;
entry.last_failure_summary = None;
entry.escalation_reason = None;
}
// Execute steps, honoring the optional fail_at_step simulation.
let fail_index = ctx.fail_at_step;
let mut executed = Vec::new();
let mut command_results = Vec::new();
let mut failed = false;
for (i, step) in recipe.steps.iter().enumerate() {
if fail_index == Some(i) {
command_results.push(RecoveryCommandResult {
command: step.clone(),
status: RecoveryAttemptState::Failed,
result: format!("step {i} failed for {scenario}"),
});
failed = true;
break;
}
executed.push(step.clone());
command_results.push(RecoveryCommandResult {
command: step.clone(),
status: RecoveryAttemptState::Succeeded,
result: format!("step {i} succeeded for {scenario}"),
});
}
let result = if failed {
@@ -288,6 +451,29 @@ pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -
};
// Emit the attempt as structured event data.
let finished_at = ctx.next_timestamp();
if let Some(entry) = ctx.ledger.get_mut(scenario) {
entry.finished_at = Some(finished_at);
entry.command_results = command_results;
entry.result = Some(result.clone());
match &result {
RecoveryResult::Recovered { .. } => {
entry.state = RecoveryAttemptState::Succeeded;
}
RecoveryResult::PartialRecovery { remaining, .. } => {
entry.state = RecoveryAttemptState::Failed;
entry.last_failure_summary = Some(format!(
"{} step(s) remaining after partial recovery",
remaining.len()
));
}
RecoveryResult::EscalationRequired { reason } => {
entry.state = RecoveryAttemptState::Exhausted;
entry.last_failure_summary = Some(reason.clone());
entry.escalation_reason = Some(reason.clone());
}
}
}
ctx.events.push(RecoveryEvent::RecoveryAttempted {
scenario: *scenario,
recipe,
@@ -499,6 +685,126 @@ mod tests {
assert_eq!(ctx.attempt_count(&FailureScenario::PromptMisdelivery), 0);
}
#[test]
fn recovery_context_exposes_machine_readable_ledger() {
// given
let mut ctx = RecoveryContext::new();
// when
let result = attempt_recovery(&FailureScenario::StaleBranch, &mut ctx);
// then
assert_eq!(result, RecoveryResult::Recovered { steps_taken: 2 });
let entry = ctx
.ledger_entry(&FailureScenario::StaleBranch)
.expect("stale branch ledger entry");
assert_eq!(entry.recipe_id, "stale_branch");
assert_eq!(entry.attempt_type, RecoveryAttemptType::Automatic);
assert_eq!(entry.trigger, FailureScenario::StaleBranch);
assert_eq!(entry.attempt_count, 1);
assert_eq!(entry.retry_limit, 1);
assert_eq!(entry.attempts_remaining, 0);
assert_eq!(entry.state, RecoveryAttemptState::Succeeded);
assert!(entry.started_at.is_some());
assert!(entry.finished_at.is_some());
assert_eq!(
entry.result,
Some(RecoveryResult::Recovered { steps_taken: 2 })
);
assert_eq!(entry.command_results.len(), 2);
assert_eq!(entry.command_results[0].command, RecoveryStep::RebaseBranch);
assert_eq!(
entry.command_results[0].status,
RecoveryAttemptState::Succeeded
);
assert_eq!(entry.last_failure_summary, None);
assert_eq!(entry.escalation_reason, None);
}
#[test]
fn recovery_ledger_records_exhausted_escalation_reason() {
// given
let mut ctx = RecoveryContext::new();
let scenario = FailureScenario::PromptMisdelivery;
// when
let _ = attempt_recovery(&scenario, &mut ctx);
let result = attempt_recovery(&scenario, &mut ctx);
// then
assert!(matches!(result, RecoveryResult::EscalationRequired { .. }));
let entry = ctx.ledger_entry(&scenario).expect("ledger entry");
assert_eq!(entry.state, RecoveryAttemptState::Exhausted);
assert_eq!(entry.attempt_count, 1);
assert_eq!(entry.attempts_remaining, 0);
assert!(matches!(
entry.result,
Some(RecoveryResult::EscalationRequired { .. })
));
assert!(entry
.escalation_reason
.as_deref()
.expect("escalation reason")
.contains("max recovery attempts"));
}
#[test]
fn recovery_status_report_distinguishes_not_attempted_from_exhausted() {
// given
let mut ctx = RecoveryContext::new();
let scenario = FailureScenario::PromptMisdelivery;
// then — no ledger entry is not the same as exhausted.
let not_attempted = ctx.status_report(&scenario);
assert!(!not_attempted.attempted);
assert_eq!(not_attempted.state, None);
assert_eq!(not_attempted.attempt_count, 0);
assert_eq!(not_attempted.retry_limit, None);
// when — one allowed attempt then one extra attempt.
let _ = attempt_recovery(&scenario, &mut ctx);
let _ = attempt_recovery(&scenario, &mut ctx);
// then
let exhausted = ctx.status_report(&scenario);
assert!(exhausted.attempted);
assert_eq!(exhausted.state, Some(RecoveryAttemptState::Exhausted));
assert_eq!(exhausted.attempt_count, 1);
assert_eq!(exhausted.retry_limit, Some(1));
assert_eq!(exhausted.attempts_remaining, Some(0));
assert!(exhausted
.escalation_reason
.as_deref()
.is_some_and(|reason| reason.contains("max recovery attempts")));
}
#[test]
fn recovery_ledger_records_failed_command_result() {
// given
let mut ctx = RecoveryContext::new().with_fail_at_step(1);
let scenario = FailureScenario::PartialPluginStartup;
// when
let result = attempt_recovery(&scenario, &mut ctx);
// then
assert!(matches!(result, RecoveryResult::PartialRecovery { .. }));
let entry = ctx.ledger_entry(&scenario).expect("ledger entry");
assert_eq!(entry.state, RecoveryAttemptState::Failed);
assert_eq!(entry.command_results.len(), 2);
assert_eq!(
entry.command_results[0].status,
RecoveryAttemptState::Succeeded
);
assert_eq!(
entry.command_results[1].status,
RecoveryAttemptState::Failed
);
assert!(entry.command_results[1]
.result
.contains("partial_plugin_startup"));
}
#[test]
fn stale_branch_recipe_has_rebase_then_clean_build() {
// given
@@ -96,9 +96,7 @@ fn green_contract_unsatisfied_blocks_merge() {
false,
);
// This is a conceptual test — we need a way to express "requires workspace green"
// Currently LaneContext has raw green_level: u8, not a contract
// For now we just verify the policy condition works
// The context has a test level but lacks the full green contract, so merge stays blocked.
let engine = PolicyEngine::new(vec![PolicyRule::new(
"workspace-green-required",
PolicyCondition::GreenAt { level: 3 }, // GreenLevel::Workspace
@@ -267,7 +265,8 @@ fn fresh_approved_lane_gets_merge_action() {
ReviewStatus::Approved,
DiffScope::Scoped,
false,
);
)
.with_green_contract_satisfied(true);
let engine = PolicyEngine::new(vec![PolicyRule::new(
"merge-if-green-approved-not-stale",
@@ -357,7 +356,8 @@ fn worker_provider_failure_flows_through_recovery_to_policy() {
ReviewStatus::Approved,
DiffScope::Scoped,
false,
);
)
.with_green_contract_satisfied(true);
let policy_engine = PolicyEngine::new(vec![
// Rule: if recovered from failure + green + approved -> merge
+82 -8
View File
@@ -45,11 +45,11 @@ use render::{MarkdownStreamState, Spinner, TerminalRenderer};
use runtime::{
check_base_commit, format_stale_base_warning, format_usd, load_oauth_credentials,
load_system_prompt, pricing_for_model, resolve_expected_base, resolve_sandbox_status,
ApiClient, ApiRequest, AssistantEvent, CompactionConfig, ConfigLoader, ConfigSource,
ContentBlock, ConversationMessage, ConversationRuntime, McpServer, McpServerManager,
McpServerSpec, McpTool, MessageRole, ModelPricing, PermissionMode, PermissionPolicy,
ProjectContext, PromptCacheEvent, ResolvedPermissionMode, RuntimeError, Session, TokenUsage,
ToolError, ToolExecutor, UsageTracker,
ApiClient, ApiRequest, AssistantEvent, BaseCommitState, CompactionConfig, ConfigLoader,
ConfigSource, ContentBlock, ConversationMessage, ConversationRuntime, McpServer,
McpServerManager, McpServerSpec, McpTool, MessageRole, ModelPricing, PermissionMode,
PermissionPolicy, ProjectContext, PromptCacheEvent, ResolvedPermissionMode, RuntimeError,
Session, TokenUsage, ToolError, ToolExecutor, UsageTracker,
};
use serde::Deserialize;
use serde_json::{json, Map, Value};
@@ -1973,6 +1973,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
parse_git_status_metadata(project_context.git_status.as_deref());
let git_summary = parse_git_workspace_summary(project_context.git_status.as_deref());
let branch_freshness = BranchFreshness::from_git_status(project_context.git_status.as_deref());
let stale_base_state = stale_base_state_for(&cwd, None);
let empty_config = runtime::RuntimeConfig::empty();
let sandbox_config = config.as_ref().ok().unwrap_or(&empty_config);
let boot_preflight = build_boot_preflight_snapshot(
@@ -1995,6 +1996,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
git_branch,
git_summary,
branch_freshness,
stale_base_state,
session_lifecycle: classify_session_lifecycle_for(&cwd),
boot_preflight,
sandbox_status: resolve_sandbox_status(sandbox_config.sandbox(), &cwd),
@@ -2334,9 +2336,10 @@ fn check_install_source_health() -> DiagnosticCheck {
fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
let in_repo = context.project_root.is_some();
let stale_base_warning = format_stale_base_warning(&context.stale_base_state);
DiagnosticCheck::new(
"Workspace",
if in_repo {
if in_repo && stale_base_warning.is_none() {
DiagnosticLevel::Ok
} else {
DiagnosticLevel::Warn
@@ -2369,6 +2372,10 @@ fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
"Memory files {} · config files loaded {}/{}",
context.memory_file_count, context.loaded_config_files, context.discovered_config_files
),
format!(
"Stale base {}",
stale_base_warning.as_deref().unwrap_or("ok")
),
])
.with_data(Map::from_iter([
("cwd".to_string(), json!(context.cwd.display().to_string())),
@@ -2401,6 +2408,10 @@ fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
"discovered_config_files".to_string(),
json!(context.discovered_config_files),
),
(
"stale_base".to_string(),
stale_base_json_value(&context.stale_base_state),
),
]))
}
@@ -2920,6 +2931,7 @@ struct StatusContext {
git_branch: Option<String>,
git_summary: GitWorkspaceSummary,
branch_freshness: BranchFreshness,
stale_base_state: BaseCommitState,
session_lifecycle: SessionLifecycleSummary,
boot_preflight: BootPreflightSnapshot,
sandbox_status: runtime::SandboxStatus,
@@ -4167,12 +4179,30 @@ fn enforce_broad_cwd_policy(
}
}
fn stale_base_state_for(cwd: &Path, flag_value: Option<&str>) -> BaseCommitState {
let source = resolve_expected_base(flag_value, cwd);
check_base_commit(cwd, source.as_ref())
}
fn stale_base_json_value(state: &BaseCommitState) -> serde_json::Value {
match state {
BaseCommitState::Matches => json!({"status": "matches", "fresh": true}),
BaseCommitState::Diverged { expected, actual } => json!({
"status": "diverged",
"fresh": false,
"expected": expected,
"actual": actual,
}),
BaseCommitState::NoExpectedBase => json!({"status": "no_expected_base", "fresh": null}),
BaseCommitState::NotAGitRepo => json!({"status": "not_git_repo", "fresh": null}),
}
}
fn run_stale_base_preflight(flag_value: Option<&str>) {
let Ok(cwd) = env::current_dir() else {
return;
};
let source = resolve_expected_base(flag_value, &cwd);
let state = check_base_commit(&cwd, source.as_ref());
let state = stale_base_state_for(&cwd, flag_value);
if let Some(warning) = format_stale_base_warning(&state) {
eprintln!("{warning}");
}
@@ -6221,6 +6251,7 @@ fn status_context(
parse_git_status_metadata(project_context.git_status.as_deref());
let git_summary = parse_git_workspace_summary(project_context.git_status.as_deref());
let branch_freshness = BranchFreshness::from_git_status(project_context.git_status.as_deref());
let stale_base_state = stale_base_state_for(&cwd, None);
let boot_preflight = build_boot_preflight_snapshot(
&cwd,
project_root.as_deref(),
@@ -6238,6 +6269,7 @@ fn status_context(
git_branch,
git_summary,
branch_freshness,
stale_base_state,
session_lifecycle: classify_session_lifecycle_for(&cwd),
boot_preflight,
sandbox_status,
@@ -12567,6 +12599,7 @@ mod tests {
conflicted_files: 0,
},
branch_freshness: test_branch_freshness(),
stale_base_state: super::BaseCommitState::NoExpectedBase,
session_lifecycle: SessionLifecycleSummary {
kind: SessionLifecycleKind::IdleShell,
pane_id: Some("%7".to_string()),
@@ -12692,6 +12725,46 @@ mod tests {
fs::remove_dir_all(workspace).expect("cleanup temp dir");
}
#[test]
fn workspace_health_warns_when_stale_base_diverged() {
let context = super::StatusContext {
cwd: PathBuf::from("/tmp/project"),
session_path: None,
loaded_config_files: 0,
discovered_config_files: 0,
memory_file_count: 0,
project_root: Some(PathBuf::from("/tmp/project")),
git_branch: Some("feature/stale-base".to_string()),
git_summary: GitWorkspaceSummary::default(),
branch_freshness: test_branch_freshness(),
stale_base_state: super::BaseCommitState::Diverged {
expected: "base".to_string(),
actual: "head".to_string(),
},
session_lifecycle: SessionLifecycleSummary {
kind: SessionLifecycleKind::SavedOnly,
pane_id: None,
pane_command: None,
pane_path: None,
workspace_dirty: false,
abandoned: false,
},
boot_preflight: test_boot_preflight(),
sandbox_status: runtime::SandboxStatus::default(),
config_load_error: None,
};
let check = super::check_workspace_health(&context);
assert_eq!(check.level, super::DiagnosticLevel::Warn);
assert_eq!(check.data["stale_base"]["status"], "diverged");
assert_eq!(check.data["stale_base"]["fresh"], false);
assert!(check
.details
.iter()
.any(|detail| detail.contains("stale codebase")));
}
#[test]
fn status_json_surfaces_session_lifecycle_for_clawhip() {
let context = super::StatusContext {
@@ -12704,6 +12777,7 @@ mod tests {
git_branch: Some("feature/session-lifecycle".to_string()),
git_summary: GitWorkspaceSummary::default(),
branch_freshness: test_branch_freshness(),
stale_base_state: super::BaseCommitState::NoExpectedBase,
session_lifecycle: SessionLifecycleSummary {
kind: SessionLifecycleKind::RunningProcess,
pane_id: Some("%9".to_string()),
+2
View File
@@ -56,6 +56,7 @@ pub(crate) fn detect_lane_completion(
Some(LaneContext {
lane_id: output.agent_id.clone(),
green_level: 3, // Workspace green
green_contract_satisfied: true,
branch_freshness: std::time::Duration::from_secs(0),
blocker: LaneBlocker::None,
review_status: ReviewStatus::Approved,
@@ -165,6 +166,7 @@ mod tests {
let context = LaneContext {
lane_id: "completed-lane".to_string(),
green_level: 3,
green_contract_satisfied: true,
branch_freshness: std::time::Duration::from_secs(0),
blocker: LaneBlocker::None,
review_status: ReviewStatus::Approved,
+58 -4
View File
@@ -1503,8 +1503,10 @@ fn run_worker_create(input: WorkerCreateInput) -> Result<String, String> {
let merged_roots: Vec<String> = ConfigLoader::default_for(&input.cwd)
.load()
.ok()
.map(|config| config.trusted_roots_with_overrides(&input.trusted_roots))
.unwrap_or_else(|| input.trusted_roots.clone());
.map_or_else(
|| input.trusted_roots.clone(),
|config| config.trusted_roots_with_overrides(&input.trusted_roots),
);
let worker = global_worker_registry().create(
&input.cwd,
&merged_roots,
@@ -6212,6 +6214,8 @@ Command exceeded timeout of {timeout_ms} ms",
stderr.trim_end()
)
};
let is_test = is_test_command(command);
let return_code_interpretation = if is_test { "test.hung" } else { "timeout" };
return Ok(runtime::BashCommandOutput {
stdout: String::from_utf8_lossy(&output.stdout).into_owned(),
stderr,
@@ -6222,9 +6226,11 @@ Command exceeded timeout of {timeout_ms} ms",
backgrounded_by_user: None,
assistant_auto_backgrounded: None,
dangerously_disable_sandbox: None,
return_code_interpretation: Some(String::from("timeout")),
return_code_interpretation: Some(String::from(return_code_interpretation)),
no_output_expected: Some(false),
structured_content: None,
structured_content: Some(vec![test_timeout_provenance(
command, timeout_ms, is_test,
)]),
persisted_output_path: None,
persisted_output_size: None,
sandbox_status: None,
@@ -6258,6 +6264,37 @@ Command exceeded timeout of {timeout_ms} ms",
})
}
fn is_test_command(command: &str) -> bool {
let normalized = command
.split_whitespace()
.collect::<Vec<_>>()
.join(" ")
.to_ascii_lowercase();
normalized.contains("cargo test")
|| normalized.contains("cargo nextest")
|| normalized.contains("npm test")
|| normalized.contains("pnpm test")
|| normalized.contains("yarn test")
|| normalized.contains("pytest")
}
fn test_timeout_provenance(
command: &str,
timeout_ms: u64,
classified_as_test_hang: bool,
) -> serde_json::Value {
json!({
"event": if classified_as_test_hang { "test.hung" } else { "command.timeout" },
"failureClass": if classified_as_test_hang { "test_hang" } else { "timeout" },
"data": {
"command": command,
"timeoutMs": timeout_ms,
"provenance": "shell.timeout",
"classification": if classified_as_test_hang { "test.hung" } else { "timeout" }
}
})
}
fn resolve_cell_index(
cells: &[serde_json::Value],
cell_id: Option<&str>,
@@ -9027,6 +9064,23 @@ mod tests {
assert_eq!(background_output["noOutputExpected"], true);
}
#[test]
fn bash_tool_classifies_test_timeout_as_hung_with_provenance() {
let timeout = execute_tool(
"bash",
&json!({ "command": "sleep 1 # cargo test slow_case", "timeout": 10 }),
)
.expect("bash timeout should return output");
let timeout_output: serde_json::Value = serde_json::from_str(&timeout).expect("json");
assert_eq!(timeout_output["interrupted"], true);
assert_eq!(timeout_output["returnCodeInterpretation"], "test.hung");
assert_eq!(timeout_output["structuredContent"][0]["event"], "test.hung");
assert_eq!(
timeout_output["structuredContent"][0]["data"]["provenance"],
"bash.timeout"
);
}
#[test]
fn bash_workspace_tests_are_blocked_when_branch_is_behind_main() {
let _guard = env_lock()