mirror of
https://github.com/instructkr/claw-code.git
synced 2026-07-04 00:56:26 +02:00
Compare commits
14 Commits
879962b826
...
41b769fc5a
| Author | SHA1 | Date | |
|---|---|---|---|
| 41b769fc5a | |||
| 7426ede2eb | |||
| 8f7eaffcef | |||
| d2b5f5d498 | |||
| 607f071ca8 | |||
| d3f8ff9916 | |||
| 204af77596 | |||
| 5c40d4e778 | |||
| 5625ba597b | |||
| 4f60cf70f1 | |||
| 6a37442ee1 | |||
| 0bca524c8c | |||
| 2ad56860df | |||
| 1fbde9f47f |
@@ -0,0 +1,40 @@
|
||||
# G005 Branch Recovery Verification Map
|
||||
|
||||
Scope: worker-1 follow-up map for G005 branch/test awareness and recovery. This file intentionally does not mutate leader-owned `.omx/ultragoal` state.
|
||||
|
||||
## Covered ROADMAP / PRD pinpoints
|
||||
|
||||
- `ROADMAP.md:912-921` — Phase 3 §7 stale-branch detection before broad verification: broad workspace test commands are preflighted before execution, stale/diverged branches emit `branch.stale_against_main`, and targeted tests bypass the broad-test gate.
|
||||
- `ROADMAP.md:922-933` — Phase 3 §8 recovery recipes: stale-branch recovery remains represented by the `stale_branch` recipe, with one automatic attempt before escalation.
|
||||
- `ROADMAP.md:935-949` — Phase 3 §8.5 recovery attempt ledger: `RecoveryContext` now exposes ledger entries with recipe id, attempt count, state, started/finished markers, last failure summary, and escalation reason.
|
||||
- `ROADMAP.md:951-970` — Phase 3 §9 green-ness / hung-test reporting: timed-out test commands now classify as `test.hung` with structured provenance instead of generic timeout.
|
||||
- `prd.json:37-44` — US-003 stale-branch detection before broad verification: verified through the `workspace_test_branch_preflight` broad-test block and targeted-test bypass tests.
|
||||
- `prd.json:50-57` — US-004 recovery recipes with ledger: verified through recovery ledger unit coverage and serialization-compatible recovery structs.
|
||||
|
||||
## Implementation anchors
|
||||
|
||||
- `rust/crates/runtime/src/stale_branch.rs` — existing branch freshness model and policy actions for fresh, stale, and diverged branches.
|
||||
- `rust/crates/tools/src/lib.rs` — `workspace_test_branch_preflight`, `branch_divergence_output`, Bash/PowerShell broad-test gating, and `test.hung` structured timeout provenance on tool-shell timeouts.
|
||||
- `rust/crates/runtime/src/recovery_recipes.rs` — recovery recipes plus `RecoveryLedgerEntry` / `RecoveryAttemptState` ledger surface.
|
||||
- `rust/crates/runtime/src/bash.rs` — runtime Bash timeout classification and structured provenance for hung test commands.
|
||||
- `rust/crates/runtime/src/lib.rs` — public exports for the recovery ledger types.
|
||||
|
||||
## Verification evidence
|
||||
|
||||
- `cargo test -p runtime` → PASS: 538 unit tests, 2 G004 conformance tests, 12 integration tests, and doctests passed.
|
||||
- `cargo test -p tools bash_tool_classifies_test_timeout_as_hung_with_provenance -- --nocapture` → PASS.
|
||||
- `cargo test -p tools bash_workspace_tests_are_blocked_when_branch_is_behind_main -- --nocapture` → PASS.
|
||||
- `cargo test -p tools bash_targeted_tests_skip_branch_preflight -- --nocapture` → PASS.
|
||||
- `cargo check -p runtime -p tools` → PASS.
|
||||
- `cargo clippy -p runtime --all-targets -- -D warnings` → PASS.
|
||||
- `cargo clippy -p tools --lib --no-deps -- -D warnings` → PASS.
|
||||
|
||||
## Known unresolved / out-of-scope items
|
||||
|
||||
- Full `cargo test -p tools` is still red on six permission-enforcer expectation tests unrelated to G005 branch freshness, recovery ledger, or hung-test classification. The failing tests assert old permission wording/read-only behavior and pre-existed this follow-up scope.
|
||||
- ROADMAP stale-base JSON/doctor/status pinpoints remain broader CLI diagnostic-surface work, especially `ROADMAP.md:2425-2489`, `ROADMAP.md:4346-4431`, and `ROADMAP.md:5061-5086`. They are related to branch freshness, but task 1 only required the broad-test freshness gate and narrow reporting surfaces.
|
||||
- No `.omx/ultragoal` files were changed; leader-owned Ultragoal checkpointing remains outside worker scope.
|
||||
|
||||
## Delegation evidence
|
||||
|
||||
Subagent spawn evidence: 1, Repository map probe `019e25d5-9be9-7193-8a33-f21450beb62c`; spawned before further serial task-2 mapping per contract, but errored with 429 Too Many Requests, so direct repo evidence was integrated instead.
|
||||
@@ -4,6 +4,7 @@ use std::process::{Command, Stdio};
|
||||
use std::time::Duration;
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
use serde_json::json;
|
||||
use tokio::process::Command as TokioCommand;
|
||||
use tokio::runtime::Builder;
|
||||
use tokio::time::timeout;
|
||||
@@ -176,27 +177,10 @@ async fn execute_bash_async(
|
||||
let mut command = prepare_tokio_command(&input.command, &cwd, &sandbox_status, true);
|
||||
|
||||
let output_result = if let Some(timeout_ms) = input.timeout {
|
||||
match timeout(Duration::from_millis(timeout_ms), command.output()).await {
|
||||
Ok(result) => (result?, false),
|
||||
Err(_) => {
|
||||
return Ok(BashCommandOutput {
|
||||
stdout: String::new(),
|
||||
stderr: format!("Command exceeded timeout of {timeout_ms} ms"),
|
||||
raw_output_path: None,
|
||||
interrupted: true,
|
||||
is_image: None,
|
||||
background_task_id: None,
|
||||
backgrounded_by_user: None,
|
||||
assistant_auto_backgrounded: None,
|
||||
dangerously_disable_sandbox: input.dangerously_disable_sandbox,
|
||||
return_code_interpretation: Some(String::from("timeout")),
|
||||
no_output_expected: Some(true),
|
||||
structured_content: None,
|
||||
persisted_output_path: None,
|
||||
persisted_output_size: None,
|
||||
sandbox_status: Some(sandbox_status),
|
||||
});
|
||||
}
|
||||
if let Ok(result) = timeout(Duration::from_millis(timeout_ms), command.output()).await {
|
||||
(result?, false)
|
||||
} else {
|
||||
return Ok(timeout_output(&input, timeout_ms, sandbox_status));
|
||||
}
|
||||
} else {
|
||||
(command.output().await?, false)
|
||||
@@ -233,6 +217,67 @@ async fn execute_bash_async(
|
||||
})
|
||||
}
|
||||
|
||||
fn timeout_output(
|
||||
input: &BashCommandInput,
|
||||
timeout_ms: u64,
|
||||
sandbox_status: SandboxStatus,
|
||||
) -> BashCommandOutput {
|
||||
let is_test = is_test_command(&input.command);
|
||||
let return_code_interpretation = if is_test { "test.hung" } else { "timeout" };
|
||||
BashCommandOutput {
|
||||
stdout: String::new(),
|
||||
stderr: format!("Command exceeded timeout of {timeout_ms} ms"),
|
||||
raw_output_path: None,
|
||||
interrupted: true,
|
||||
is_image: None,
|
||||
background_task_id: None,
|
||||
backgrounded_by_user: None,
|
||||
assistant_auto_backgrounded: None,
|
||||
dangerously_disable_sandbox: input.dangerously_disable_sandbox,
|
||||
return_code_interpretation: Some(String::from(return_code_interpretation)),
|
||||
no_output_expected: Some(true),
|
||||
structured_content: Some(vec![test_timeout_provenance(
|
||||
&input.command,
|
||||
timeout_ms,
|
||||
is_test,
|
||||
)]),
|
||||
persisted_output_path: None,
|
||||
persisted_output_size: None,
|
||||
sandbox_status: Some(sandbox_status),
|
||||
}
|
||||
}
|
||||
|
||||
fn is_test_command(command: &str) -> bool {
|
||||
let normalized = command
|
||||
.split_whitespace()
|
||||
.collect::<Vec<_>>()
|
||||
.join(" ")
|
||||
.to_ascii_lowercase();
|
||||
normalized.contains("cargo test")
|
||||
|| normalized.contains("cargo nextest")
|
||||
|| normalized.contains("npm test")
|
||||
|| normalized.contains("pnpm test")
|
||||
|| normalized.contains("yarn test")
|
||||
|| normalized.contains("pytest")
|
||||
}
|
||||
|
||||
fn test_timeout_provenance(
|
||||
command: &str,
|
||||
timeout_ms: u64,
|
||||
classified_as_test_hang: bool,
|
||||
) -> serde_json::Value {
|
||||
json!({
|
||||
"event": if classified_as_test_hang { "test.hung" } else { "command.timeout" },
|
||||
"failureClass": if classified_as_test_hang { "test_hang" } else { "timeout" },
|
||||
"data": {
|
||||
"command": command,
|
||||
"timeoutMs": timeout_ms,
|
||||
"provenance": "bash.timeout",
|
||||
"classification": if classified_as_test_hang { "test.hung" } else { "timeout" }
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
fn sandbox_status_for_input(input: &BashCommandInput, cwd: &std::path::Path) -> SandboxStatus {
|
||||
let config = ConfigLoader::default_for(cwd).load().map_or_else(
|
||||
|_| SandboxConfig::default(),
|
||||
@@ -349,6 +394,31 @@ mod tests {
|
||||
|
||||
assert!(!output.sandbox_status.expect("sandbox status").enabled);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn timed_out_test_command_is_classified_as_hung_test_with_provenance() {
|
||||
let output = execute_bash(BashCommandInput {
|
||||
command: String::from("sleep 1 # cargo test slow_case"),
|
||||
timeout: Some(1),
|
||||
description: None,
|
||||
run_in_background: Some(false),
|
||||
dangerously_disable_sandbox: Some(false),
|
||||
namespace_restrictions: Some(false),
|
||||
isolate_network: Some(false),
|
||||
filesystem_mode: Some(FilesystemIsolationMode::WorkspaceOnly),
|
||||
allowed_mounts: None,
|
||||
})
|
||||
.expect("bash command should return structured timeout");
|
||||
|
||||
assert!(output.interrupted);
|
||||
assert_eq!(
|
||||
output.return_code_interpretation.as_deref(),
|
||||
Some("test.hung")
|
||||
);
|
||||
let structured = output.structured_content.expect("structured content");
|
||||
assert_eq!(structured[0]["event"], "test.hung");
|
||||
assert_eq!(structured[0]["data"]["provenance"], "bash.timeout");
|
||||
}
|
||||
}
|
||||
|
||||
/// Maximum output bytes before truncation (16 KiB, matching upstream).
|
||||
|
||||
@@ -27,19 +27,38 @@ impl std::fmt::Display for GreenLevel {
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct GreenContract {
|
||||
pub required_level: GreenLevel,
|
||||
pub requirements: Vec<GreenContractRequirement>,
|
||||
pub block_known_flakes: bool,
|
||||
}
|
||||
|
||||
impl GreenContract {
|
||||
#[must_use]
|
||||
pub fn new(required_level: GreenLevel) -> Self {
|
||||
Self { required_level }
|
||||
Self {
|
||||
required_level,
|
||||
requirements: Vec::new(),
|
||||
block_known_flakes: false,
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn evaluate(self, observed_level: Option<GreenLevel>) -> GreenContractOutcome {
|
||||
pub fn merge_ready(required_level: GreenLevel) -> Self {
|
||||
Self {
|
||||
required_level,
|
||||
requirements: vec![
|
||||
GreenContractRequirement::TestCommandProvenance,
|
||||
GreenContractRequirement::BaseBranchFreshness,
|
||||
GreenContractRequirement::RecoveryAttemptContext,
|
||||
],
|
||||
block_known_flakes: true,
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn evaluate(&self, observed_level: Option<GreenLevel>) -> GreenContractOutcome {
|
||||
match observed_level {
|
||||
Some(level) if level >= self.required_level => GreenContractOutcome::Satisfied {
|
||||
required_level: self.required_level,
|
||||
@@ -53,11 +72,170 @@ impl GreenContract {
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn is_satisfied_by(self, observed_level: GreenLevel) -> bool {
|
||||
pub fn evaluate_evidence(&self, evidence: &GreenEvidence) -> GreenEvidenceOutcome {
|
||||
let mut missing = Vec::new();
|
||||
let mut blocking_flakes = Vec::new();
|
||||
|
||||
if evidence.observed_level < self.required_level {
|
||||
missing.push(GreenContractRequirement::RequiredLevel);
|
||||
}
|
||||
|
||||
for requirement in &self.requirements {
|
||||
match requirement {
|
||||
GreenContractRequirement::TestCommandProvenance
|
||||
if !evidence.has_passing_test_command() =>
|
||||
{
|
||||
missing.push(*requirement);
|
||||
}
|
||||
GreenContractRequirement::BaseBranchFreshness if !evidence.base_branch_fresh => {
|
||||
missing.push(*requirement);
|
||||
}
|
||||
GreenContractRequirement::RecoveryAttemptContext
|
||||
if !evidence.recovery_attempt_context_recorded =>
|
||||
{
|
||||
missing.push(*requirement);
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
if self.block_known_flakes {
|
||||
blocking_flakes = evidence
|
||||
.known_flakes
|
||||
.iter()
|
||||
.filter(|flake| flake.blocks_green)
|
||||
.cloned()
|
||||
.collect();
|
||||
}
|
||||
|
||||
if missing.is_empty() && blocking_flakes.is_empty() {
|
||||
GreenEvidenceOutcome::Satisfied {
|
||||
required_level: self.required_level,
|
||||
observed_level: evidence.observed_level,
|
||||
}
|
||||
} else {
|
||||
GreenEvidenceOutcome::Unsatisfied {
|
||||
required_level: self.required_level,
|
||||
observed_level: evidence.observed_level,
|
||||
missing,
|
||||
blocking_flakes,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn is_satisfied_by(&self, observed_level: GreenLevel) -> bool {
|
||||
observed_level >= self.required_level
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct GreenEvidence {
|
||||
pub observed_level: GreenLevel,
|
||||
pub test_commands: Vec<TestCommandProvenance>,
|
||||
pub base_branch_fresh: bool,
|
||||
pub known_flakes: Vec<KnownFlake>,
|
||||
pub recovery_attempt_context_recorded: bool,
|
||||
}
|
||||
|
||||
impl GreenEvidence {
|
||||
#[must_use]
|
||||
pub fn new(observed_level: GreenLevel) -> Self {
|
||||
Self {
|
||||
observed_level,
|
||||
test_commands: Vec::new(),
|
||||
base_branch_fresh: false,
|
||||
known_flakes: Vec::new(),
|
||||
recovery_attempt_context_recorded: false,
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn with_test_command(mut self, command: impl Into<String>, exit_code: i32) -> Self {
|
||||
self.test_commands.push(TestCommandProvenance {
|
||||
command: command.into(),
|
||||
exit_code,
|
||||
});
|
||||
self
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn with_base_branch_fresh(mut self, is_fresh: bool) -> Self {
|
||||
self.base_branch_fresh = is_fresh;
|
||||
self
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn with_known_flake(mut self, test_name: impl Into<String>, blocks_green: bool) -> Self {
|
||||
self.known_flakes.push(KnownFlake {
|
||||
test_name: test_name.into(),
|
||||
blocks_green,
|
||||
});
|
||||
self
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn with_recovery_attempt_context(mut self, recorded: bool) -> Self {
|
||||
self.recovery_attempt_context_recorded = recorded;
|
||||
self
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn has_passing_test_command(&self) -> bool {
|
||||
self.test_commands.iter().any(TestCommandProvenance::passed)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct TestCommandProvenance {
|
||||
pub command: String,
|
||||
pub exit_code: i32,
|
||||
}
|
||||
|
||||
impl TestCommandProvenance {
|
||||
#[must_use]
|
||||
pub fn passed(&self) -> bool {
|
||||
self.exit_code == 0 && !self.command.trim().is_empty()
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct KnownFlake {
|
||||
pub test_name: String,
|
||||
pub blocks_green: bool,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum GreenContractRequirement {
|
||||
RequiredLevel,
|
||||
TestCommandProvenance,
|
||||
BaseBranchFreshness,
|
||||
RecoveryAttemptContext,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(tag = "outcome", rename_all = "snake_case")]
|
||||
pub enum GreenEvidenceOutcome {
|
||||
Satisfied {
|
||||
required_level: GreenLevel,
|
||||
observed_level: GreenLevel,
|
||||
},
|
||||
Unsatisfied {
|
||||
required_level: GreenLevel,
|
||||
observed_level: GreenLevel,
|
||||
missing: Vec<GreenContractRequirement>,
|
||||
blocking_flakes: Vec<KnownFlake>,
|
||||
},
|
||||
}
|
||||
|
||||
impl GreenEvidenceOutcome {
|
||||
#[must_use]
|
||||
pub fn is_satisfied(&self) -> bool {
|
||||
matches!(self, Self::Satisfied { .. })
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(tag = "outcome", rename_all = "snake_case")]
|
||||
pub enum GreenContractOutcome {
|
||||
@@ -149,4 +327,83 @@ mod tests {
|
||||
}
|
||||
);
|
||||
}
|
||||
#[test]
|
||||
fn merge_ready_contract_requires_provenance_beyond_test_level() {
|
||||
// given
|
||||
let contract = GreenContract::merge_ready(GreenLevel::Workspace);
|
||||
let evidence = GreenEvidence::new(GreenLevel::Workspace)
|
||||
.with_test_command("cargo test --manifest-path rust/Cargo.toml", 0);
|
||||
|
||||
// when
|
||||
let outcome = contract.evaluate_evidence(&evidence);
|
||||
|
||||
// then
|
||||
assert_eq!(
|
||||
outcome,
|
||||
GreenEvidenceOutcome::Unsatisfied {
|
||||
required_level: GreenLevel::Workspace,
|
||||
observed_level: GreenLevel::Workspace,
|
||||
missing: vec![
|
||||
GreenContractRequirement::BaseBranchFreshness,
|
||||
GreenContractRequirement::RecoveryAttemptContext,
|
||||
],
|
||||
blocking_flakes: vec![],
|
||||
}
|
||||
);
|
||||
assert!(!outcome.is_satisfied());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn merge_ready_contract_accepts_complete_test_provenance_context() {
|
||||
// given
|
||||
let contract = GreenContract::merge_ready(GreenLevel::Workspace);
|
||||
let evidence = GreenEvidence::new(GreenLevel::MergeReady)
|
||||
.with_test_command("cargo test --manifest-path rust/Cargo.toml", 0)
|
||||
.with_base_branch_fresh(true)
|
||||
.with_recovery_attempt_context(true);
|
||||
|
||||
// when
|
||||
let outcome = contract.evaluate_evidence(&evidence);
|
||||
|
||||
// then
|
||||
assert_eq!(
|
||||
outcome,
|
||||
GreenEvidenceOutcome::Satisfied {
|
||||
required_level: GreenLevel::Workspace,
|
||||
observed_level: GreenLevel::MergeReady,
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn known_blocking_flake_prevents_green_contract_satisfaction() {
|
||||
// given
|
||||
let contract = GreenContract::merge_ready(GreenLevel::Workspace);
|
||||
let evidence = GreenEvidence::new(GreenLevel::MergeReady)
|
||||
.with_test_command("cargo test --manifest-path rust/Cargo.toml", 0)
|
||||
.with_base_branch_fresh(true)
|
||||
.with_recovery_attempt_context(true)
|
||||
.with_known_flake(
|
||||
"session_lifecycle_prefers_running_process_over_idle_shell",
|
||||
true,
|
||||
);
|
||||
|
||||
// when
|
||||
let outcome = contract.evaluate_evidence(&evidence);
|
||||
|
||||
// then
|
||||
assert_eq!(
|
||||
outcome,
|
||||
GreenEvidenceOutcome::Unsatisfied {
|
||||
required_level: GreenLevel::Workspace,
|
||||
observed_level: GreenLevel::MergeReady,
|
||||
missing: vec![],
|
||||
blocking_flakes: vec![KnownFlake {
|
||||
test_name: "session_lifecycle_prefers_running_process_over_idle_shell"
|
||||
.to_string(),
|
||||
blocks_green: true,
|
||||
}],
|
||||
}
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -143,8 +143,9 @@ pub use prompt::{
|
||||
PromptBuildError, SystemPromptBuilder, FRONTIER_MODEL_NAME, SYSTEM_PROMPT_DYNAMIC_BOUNDARY,
|
||||
};
|
||||
pub use recovery_recipes::{
|
||||
attempt_recovery, recipe_for, EscalationPolicy, FailureScenario, RecoveryContext,
|
||||
RecoveryEvent, RecoveryRecipe, RecoveryResult, RecoveryStep,
|
||||
attempt_recovery, recipe_for, EscalationPolicy, FailureScenario, RecoveryAttemptState,
|
||||
RecoveryAttemptType, RecoveryCommandResult, RecoveryContext, RecoveryEvent,
|
||||
RecoveryLedgerEntry, RecoveryRecipe, RecoveryResult, RecoveryStatusReport, RecoveryStep,
|
||||
};
|
||||
pub use remote::{
|
||||
inherited_upstream_proxy_env, no_proxy_list, read_token, upstream_proxy_ws_url,
|
||||
|
||||
@@ -58,7 +58,9 @@ impl PolicyCondition {
|
||||
Self::Or(conditions) => conditions
|
||||
.iter()
|
||||
.any(|condition| condition.matches(context)),
|
||||
Self::GreenAt { level } => context.green_level >= *level,
|
||||
Self::GreenAt { level } => {
|
||||
context.green_contract_satisfied && context.green_level >= *level
|
||||
}
|
||||
Self::StaleBranch => context.branch_freshness >= STALE_BRANCH_THRESHOLD,
|
||||
Self::StartupBlocked => context.blocker == LaneBlocker::Startup,
|
||||
Self::LaneCompleted => context.completed,
|
||||
@@ -134,6 +136,7 @@ pub enum DiffScope {
|
||||
pub struct LaneContext {
|
||||
pub lane_id: String,
|
||||
pub green_level: GreenLevel,
|
||||
pub green_contract_satisfied: bool,
|
||||
pub branch_freshness: Duration,
|
||||
pub blocker: LaneBlocker,
|
||||
pub review_status: ReviewStatus,
|
||||
@@ -156,6 +159,7 @@ impl LaneContext {
|
||||
Self {
|
||||
lane_id: lane_id.into(),
|
||||
green_level,
|
||||
green_contract_satisfied: false,
|
||||
branch_freshness,
|
||||
blocker,
|
||||
review_status,
|
||||
@@ -171,6 +175,7 @@ impl LaneContext {
|
||||
Self {
|
||||
lane_id: lane_id.into(),
|
||||
green_level: 0,
|
||||
green_contract_satisfied: false,
|
||||
branch_freshness: Duration::from_secs(0),
|
||||
blocker: LaneBlocker::None,
|
||||
review_status: ReviewStatus::Pending,
|
||||
@@ -179,6 +184,12 @@ impl LaneContext {
|
||||
reconciled: true,
|
||||
}
|
||||
}
|
||||
|
||||
#[must_use]
|
||||
pub fn with_green_contract_satisfied(mut self, satisfied: bool) -> Self {
|
||||
self.green_contract_satisfied = satisfied;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
@@ -257,7 +268,8 @@ mod tests {
|
||||
ReviewStatus::Approved,
|
||||
DiffScope::Scoped,
|
||||
false,
|
||||
);
|
||||
)
|
||||
.with_green_contract_satisfied(true);
|
||||
|
||||
// when
|
||||
let actions = engine.evaluate(&context);
|
||||
@@ -266,6 +278,36 @@ mod tests {
|
||||
assert_eq!(actions, vec![PolicyAction::MergeToDev]);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn merge_rule_blocks_when_green_tests_lack_contract_provenance() {
|
||||
// given
|
||||
let engine = PolicyEngine::new(vec![PolicyRule::new(
|
||||
"merge-to-dev",
|
||||
PolicyCondition::And(vec![
|
||||
PolicyCondition::GreenAt { level: 2 },
|
||||
PolicyCondition::ScopedDiff,
|
||||
PolicyCondition::ReviewPassed,
|
||||
]),
|
||||
PolicyAction::MergeToDev,
|
||||
20,
|
||||
)]);
|
||||
let context = LaneContext::new(
|
||||
"lane-7",
|
||||
3,
|
||||
Duration::from_secs(5),
|
||||
LaneBlocker::None,
|
||||
ReviewStatus::Approved,
|
||||
DiffScope::Scoped,
|
||||
false,
|
||||
);
|
||||
|
||||
// when
|
||||
let actions = engine.evaluate(&context);
|
||||
|
||||
// then
|
||||
assert!(actions.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn stale_branch_rule_fires_at_threshold() {
|
||||
// given
|
||||
@@ -468,7 +510,8 @@ mod tests {
|
||||
ReviewStatus::Pending,
|
||||
DiffScope::Full,
|
||||
false,
|
||||
);
|
||||
)
|
||||
.with_green_contract_satisfied(true);
|
||||
|
||||
// when
|
||||
let actions = engine.evaluate(&context);
|
||||
|
||||
@@ -121,6 +121,21 @@ pub enum RecoveryResult {
|
||||
},
|
||||
}
|
||||
|
||||
/// Type of recovery execution represented in the ledger.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum RecoveryAttemptType {
|
||||
Automatic,
|
||||
}
|
||||
|
||||
/// Result for one executable recovery command/step.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct RecoveryCommandResult {
|
||||
pub command: RecoveryStep,
|
||||
pub status: RecoveryAttemptState,
|
||||
pub result: String,
|
||||
}
|
||||
|
||||
/// Structured event emitted during recovery.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
@@ -135,14 +150,59 @@ pub enum RecoveryEvent {
|
||||
Escalated,
|
||||
}
|
||||
|
||||
/// Machine-readable recovery progress for one failure scenario.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct RecoveryLedgerEntry {
|
||||
pub recipe_id: String,
|
||||
pub attempt_type: RecoveryAttemptType,
|
||||
pub trigger: FailureScenario,
|
||||
pub attempt_count: u32,
|
||||
pub retry_limit: u32,
|
||||
pub attempts_remaining: u32,
|
||||
pub state: RecoveryAttemptState,
|
||||
pub started_at: Option<String>,
|
||||
pub finished_at: Option<String>,
|
||||
pub command_results: Vec<RecoveryCommandResult>,
|
||||
pub result: Option<RecoveryResult>,
|
||||
pub last_failure_summary: Option<String>,
|
||||
pub escalation_reason: Option<String>,
|
||||
}
|
||||
|
||||
/// Current state of a recovery recipe attempt.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum RecoveryAttemptState {
|
||||
Queued,
|
||||
Running,
|
||||
Succeeded,
|
||||
Failed,
|
||||
Exhausted,
|
||||
}
|
||||
|
||||
/// Machine-readable status projection for callers that need to
|
||||
/// distinguish an untouched scenario from an exhausted recovery.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
|
||||
pub struct RecoveryStatusReport {
|
||||
pub scenario: FailureScenario,
|
||||
pub attempted: bool,
|
||||
pub state: Option<RecoveryAttemptState>,
|
||||
pub attempt_count: u32,
|
||||
pub retry_limit: Option<u32>,
|
||||
pub attempts_remaining: Option<u32>,
|
||||
pub escalation_reason: Option<String>,
|
||||
}
|
||||
|
||||
/// Minimal context for tracking recovery state and emitting events.
|
||||
///
|
||||
/// Holds per-scenario attempt counts, a structured event log, and an
|
||||
/// optional simulation knob for controlling step outcomes during tests.
|
||||
/// Holds per-scenario attempt counts, a structured event log, a recovery
|
||||
/// attempt ledger, and an optional simulation knob for controlling step
|
||||
/// outcomes during tests.
|
||||
#[derive(Debug, Clone, Default)]
|
||||
pub struct RecoveryContext {
|
||||
attempts: HashMap<FailureScenario, u32>,
|
||||
events: Vec<RecoveryEvent>,
|
||||
ledger: HashMap<FailureScenario, RecoveryLedgerEntry>,
|
||||
clock_tick: u64,
|
||||
/// Optional step index at which simulated execution fails.
|
||||
/// `None` means all steps succeed.
|
||||
fail_at_step: Option<usize>,
|
||||
@@ -172,6 +232,51 @@ impl RecoveryContext {
|
||||
pub fn attempt_count(&self, scenario: &FailureScenario) -> u32 {
|
||||
self.attempts.get(scenario).copied().unwrap_or(0)
|
||||
}
|
||||
|
||||
/// Returns the machine-readable recovery ledger entry for a scenario.
|
||||
#[must_use]
|
||||
pub fn ledger_entry(&self, scenario: &FailureScenario) -> Option<&RecoveryLedgerEntry> {
|
||||
self.ledger.get(scenario)
|
||||
}
|
||||
|
||||
/// Returns all recovery ledger entries currently tracked by this context.
|
||||
#[must_use]
|
||||
pub fn ledger_entries(&self) -> Vec<&RecoveryLedgerEntry> {
|
||||
let mut entries: Vec<_> = self.ledger.values().collect();
|
||||
entries.sort_by(|left, right| left.recipe_id.cmp(&right.recipe_id));
|
||||
entries
|
||||
}
|
||||
|
||||
/// Returns a compact machine-readable recovery status for a scenario,
|
||||
/// including `attempted = false` when no ledger entry exists yet.
|
||||
#[must_use]
|
||||
pub fn status_report(&self, scenario: &FailureScenario) -> RecoveryStatusReport {
|
||||
self.ledger_entry(scenario).map_or(
|
||||
RecoveryStatusReport {
|
||||
scenario: *scenario,
|
||||
attempted: false,
|
||||
state: None,
|
||||
attempt_count: 0,
|
||||
retry_limit: None,
|
||||
attempts_remaining: None,
|
||||
escalation_reason: None,
|
||||
},
|
||||
|entry| RecoveryStatusReport {
|
||||
scenario: *scenario,
|
||||
attempted: entry.attempt_count > 0,
|
||||
state: Some(entry.state),
|
||||
attempt_count: entry.attempt_count,
|
||||
retry_limit: Some(entry.retry_limit),
|
||||
attempts_remaining: Some(entry.attempts_remaining),
|
||||
escalation_reason: entry.escalation_reason.clone(),
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
fn next_timestamp(&mut self) -> String {
|
||||
self.clock_tick += 1;
|
||||
format!("recovery-ledger-tick-{}", self.clock_tick)
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns the known recovery recipe for the given failure scenario.
|
||||
@@ -233,18 +338,51 @@ pub fn recipe_for(scenario: &FailureScenario) -> RecoveryRecipe {
|
||||
/// Looks up the recipe, enforces the one-attempt-before-escalation
|
||||
/// policy, simulates step execution (controlled by the context), and
|
||||
/// emits structured [`RecoveryEvent`]s for every attempt.
|
||||
#[allow(clippy::too_many_lines)]
|
||||
pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -> RecoveryResult {
|
||||
let recipe = recipe_for(scenario);
|
||||
let attempt_count = ctx.attempts.entry(*scenario).or_insert(0);
|
||||
let recipe_id = scenario.to_string();
|
||||
ctx.ledger
|
||||
.entry(*scenario)
|
||||
.or_insert_with(|| RecoveryLedgerEntry {
|
||||
recipe_id: recipe_id.clone(),
|
||||
attempt_type: RecoveryAttemptType::Automatic,
|
||||
trigger: *scenario,
|
||||
attempt_count: 0,
|
||||
retry_limit: recipe.max_attempts,
|
||||
attempts_remaining: recipe.max_attempts,
|
||||
state: RecoveryAttemptState::Queued,
|
||||
started_at: None,
|
||||
finished_at: None,
|
||||
command_results: Vec::new(),
|
||||
result: None,
|
||||
last_failure_summary: None,
|
||||
escalation_reason: None,
|
||||
});
|
||||
|
||||
let current_attempts = ctx.attempt_count(scenario);
|
||||
|
||||
// Enforce one automatic recovery attempt before escalation.
|
||||
if *attempt_count >= recipe.max_attempts {
|
||||
if current_attempts >= recipe.max_attempts {
|
||||
let result = RecoveryResult::EscalationRequired {
|
||||
reason: format!(
|
||||
"max recovery attempts ({}) exceeded for {}",
|
||||
recipe.max_attempts, scenario
|
||||
),
|
||||
};
|
||||
let finished_at = ctx.next_timestamp();
|
||||
if let Some(entry) = ctx.ledger.get_mut(scenario) {
|
||||
entry.attempt_count = current_attempts;
|
||||
entry.attempts_remaining = 0;
|
||||
entry.state = RecoveryAttemptState::Exhausted;
|
||||
entry.finished_at = Some(finished_at);
|
||||
entry.result = Some(result.clone());
|
||||
let RecoveryResult::EscalationRequired { reason } = &result else {
|
||||
unreachable!("exhaustion always produces escalation");
|
||||
};
|
||||
entry.last_failure_summary = Some(reason.clone());
|
||||
entry.escalation_reason = Some(reason.clone());
|
||||
}
|
||||
ctx.events.push(RecoveryEvent::RecoveryAttempted {
|
||||
scenario: *scenario,
|
||||
recipe,
|
||||
@@ -254,19 +392,44 @@ pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -
|
||||
return result;
|
||||
}
|
||||
|
||||
*attempt_count += 1;
|
||||
let updated_attempts = ctx.attempts.entry(*scenario).or_insert(0);
|
||||
*updated_attempts += 1;
|
||||
let updated_attempts = *updated_attempts;
|
||||
let started_at = ctx.next_timestamp();
|
||||
if let Some(entry) = ctx.ledger.get_mut(scenario) {
|
||||
entry.attempt_count = updated_attempts;
|
||||
entry.attempts_remaining = recipe.max_attempts.saturating_sub(updated_attempts);
|
||||
entry.state = RecoveryAttemptState::Running;
|
||||
entry.started_at = Some(started_at);
|
||||
entry.finished_at = None;
|
||||
entry.command_results.clear();
|
||||
entry.result = None;
|
||||
entry.last_failure_summary = None;
|
||||
entry.escalation_reason = None;
|
||||
}
|
||||
|
||||
// Execute steps, honoring the optional fail_at_step simulation.
|
||||
let fail_index = ctx.fail_at_step;
|
||||
let mut executed = Vec::new();
|
||||
let mut command_results = Vec::new();
|
||||
let mut failed = false;
|
||||
|
||||
for (i, step) in recipe.steps.iter().enumerate() {
|
||||
if fail_index == Some(i) {
|
||||
command_results.push(RecoveryCommandResult {
|
||||
command: step.clone(),
|
||||
status: RecoveryAttemptState::Failed,
|
||||
result: format!("step {i} failed for {scenario}"),
|
||||
});
|
||||
failed = true;
|
||||
break;
|
||||
}
|
||||
executed.push(step.clone());
|
||||
command_results.push(RecoveryCommandResult {
|
||||
command: step.clone(),
|
||||
status: RecoveryAttemptState::Succeeded,
|
||||
result: format!("step {i} succeeded for {scenario}"),
|
||||
});
|
||||
}
|
||||
|
||||
let result = if failed {
|
||||
@@ -288,6 +451,29 @@ pub fn attempt_recovery(scenario: &FailureScenario, ctx: &mut RecoveryContext) -
|
||||
};
|
||||
|
||||
// Emit the attempt as structured event data.
|
||||
let finished_at = ctx.next_timestamp();
|
||||
if let Some(entry) = ctx.ledger.get_mut(scenario) {
|
||||
entry.finished_at = Some(finished_at);
|
||||
entry.command_results = command_results;
|
||||
entry.result = Some(result.clone());
|
||||
match &result {
|
||||
RecoveryResult::Recovered { .. } => {
|
||||
entry.state = RecoveryAttemptState::Succeeded;
|
||||
}
|
||||
RecoveryResult::PartialRecovery { remaining, .. } => {
|
||||
entry.state = RecoveryAttemptState::Failed;
|
||||
entry.last_failure_summary = Some(format!(
|
||||
"{} step(s) remaining after partial recovery",
|
||||
remaining.len()
|
||||
));
|
||||
}
|
||||
RecoveryResult::EscalationRequired { reason } => {
|
||||
entry.state = RecoveryAttemptState::Exhausted;
|
||||
entry.last_failure_summary = Some(reason.clone());
|
||||
entry.escalation_reason = Some(reason.clone());
|
||||
}
|
||||
}
|
||||
}
|
||||
ctx.events.push(RecoveryEvent::RecoveryAttempted {
|
||||
scenario: *scenario,
|
||||
recipe,
|
||||
@@ -499,6 +685,126 @@ mod tests {
|
||||
assert_eq!(ctx.attempt_count(&FailureScenario::PromptMisdelivery), 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn recovery_context_exposes_machine_readable_ledger() {
|
||||
// given
|
||||
let mut ctx = RecoveryContext::new();
|
||||
|
||||
// when
|
||||
let result = attempt_recovery(&FailureScenario::StaleBranch, &mut ctx);
|
||||
|
||||
// then
|
||||
assert_eq!(result, RecoveryResult::Recovered { steps_taken: 2 });
|
||||
let entry = ctx
|
||||
.ledger_entry(&FailureScenario::StaleBranch)
|
||||
.expect("stale branch ledger entry");
|
||||
assert_eq!(entry.recipe_id, "stale_branch");
|
||||
assert_eq!(entry.attempt_type, RecoveryAttemptType::Automatic);
|
||||
assert_eq!(entry.trigger, FailureScenario::StaleBranch);
|
||||
assert_eq!(entry.attempt_count, 1);
|
||||
assert_eq!(entry.retry_limit, 1);
|
||||
assert_eq!(entry.attempts_remaining, 0);
|
||||
assert_eq!(entry.state, RecoveryAttemptState::Succeeded);
|
||||
assert!(entry.started_at.is_some());
|
||||
assert!(entry.finished_at.is_some());
|
||||
assert_eq!(
|
||||
entry.result,
|
||||
Some(RecoveryResult::Recovered { steps_taken: 2 })
|
||||
);
|
||||
assert_eq!(entry.command_results.len(), 2);
|
||||
assert_eq!(entry.command_results[0].command, RecoveryStep::RebaseBranch);
|
||||
assert_eq!(
|
||||
entry.command_results[0].status,
|
||||
RecoveryAttemptState::Succeeded
|
||||
);
|
||||
assert_eq!(entry.last_failure_summary, None);
|
||||
assert_eq!(entry.escalation_reason, None);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn recovery_ledger_records_exhausted_escalation_reason() {
|
||||
// given
|
||||
let mut ctx = RecoveryContext::new();
|
||||
let scenario = FailureScenario::PromptMisdelivery;
|
||||
|
||||
// when
|
||||
let _ = attempt_recovery(&scenario, &mut ctx);
|
||||
let result = attempt_recovery(&scenario, &mut ctx);
|
||||
|
||||
// then
|
||||
assert!(matches!(result, RecoveryResult::EscalationRequired { .. }));
|
||||
let entry = ctx.ledger_entry(&scenario).expect("ledger entry");
|
||||
assert_eq!(entry.state, RecoveryAttemptState::Exhausted);
|
||||
assert_eq!(entry.attempt_count, 1);
|
||||
assert_eq!(entry.attempts_remaining, 0);
|
||||
assert!(matches!(
|
||||
entry.result,
|
||||
Some(RecoveryResult::EscalationRequired { .. })
|
||||
));
|
||||
assert!(entry
|
||||
.escalation_reason
|
||||
.as_deref()
|
||||
.expect("escalation reason")
|
||||
.contains("max recovery attempts"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn recovery_status_report_distinguishes_not_attempted_from_exhausted() {
|
||||
// given
|
||||
let mut ctx = RecoveryContext::new();
|
||||
let scenario = FailureScenario::PromptMisdelivery;
|
||||
|
||||
// then — no ledger entry is not the same as exhausted.
|
||||
let not_attempted = ctx.status_report(&scenario);
|
||||
assert!(!not_attempted.attempted);
|
||||
assert_eq!(not_attempted.state, None);
|
||||
assert_eq!(not_attempted.attempt_count, 0);
|
||||
assert_eq!(not_attempted.retry_limit, None);
|
||||
|
||||
// when — one allowed attempt then one extra attempt.
|
||||
let _ = attempt_recovery(&scenario, &mut ctx);
|
||||
let _ = attempt_recovery(&scenario, &mut ctx);
|
||||
|
||||
// then
|
||||
let exhausted = ctx.status_report(&scenario);
|
||||
assert!(exhausted.attempted);
|
||||
assert_eq!(exhausted.state, Some(RecoveryAttemptState::Exhausted));
|
||||
assert_eq!(exhausted.attempt_count, 1);
|
||||
assert_eq!(exhausted.retry_limit, Some(1));
|
||||
assert_eq!(exhausted.attempts_remaining, Some(0));
|
||||
assert!(exhausted
|
||||
.escalation_reason
|
||||
.as_deref()
|
||||
.is_some_and(|reason| reason.contains("max recovery attempts")));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn recovery_ledger_records_failed_command_result() {
|
||||
// given
|
||||
let mut ctx = RecoveryContext::new().with_fail_at_step(1);
|
||||
let scenario = FailureScenario::PartialPluginStartup;
|
||||
|
||||
// when
|
||||
let result = attempt_recovery(&scenario, &mut ctx);
|
||||
|
||||
// then
|
||||
assert!(matches!(result, RecoveryResult::PartialRecovery { .. }));
|
||||
let entry = ctx.ledger_entry(&scenario).expect("ledger entry");
|
||||
assert_eq!(entry.state, RecoveryAttemptState::Failed);
|
||||
assert_eq!(entry.command_results.len(), 2);
|
||||
assert_eq!(
|
||||
entry.command_results[0].status,
|
||||
RecoveryAttemptState::Succeeded
|
||||
);
|
||||
assert_eq!(
|
||||
entry.command_results[1].status,
|
||||
RecoveryAttemptState::Failed
|
||||
);
|
||||
assert!(entry.command_results[1]
|
||||
.result
|
||||
.contains("partial_plugin_startup"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn stale_branch_recipe_has_rebase_then_clean_build() {
|
||||
// given
|
||||
|
||||
@@ -96,9 +96,7 @@ fn green_contract_unsatisfied_blocks_merge() {
|
||||
false,
|
||||
);
|
||||
|
||||
// This is a conceptual test — we need a way to express "requires workspace green"
|
||||
// Currently LaneContext has raw green_level: u8, not a contract
|
||||
// For now we just verify the policy condition works
|
||||
// The context has a test level but lacks the full green contract, so merge stays blocked.
|
||||
let engine = PolicyEngine::new(vec![PolicyRule::new(
|
||||
"workspace-green-required",
|
||||
PolicyCondition::GreenAt { level: 3 }, // GreenLevel::Workspace
|
||||
@@ -267,7 +265,8 @@ fn fresh_approved_lane_gets_merge_action() {
|
||||
ReviewStatus::Approved,
|
||||
DiffScope::Scoped,
|
||||
false,
|
||||
);
|
||||
)
|
||||
.with_green_contract_satisfied(true);
|
||||
|
||||
let engine = PolicyEngine::new(vec![PolicyRule::new(
|
||||
"merge-if-green-approved-not-stale",
|
||||
@@ -357,7 +356,8 @@ fn worker_provider_failure_flows_through_recovery_to_policy() {
|
||||
ReviewStatus::Approved,
|
||||
DiffScope::Scoped,
|
||||
false,
|
||||
);
|
||||
)
|
||||
.with_green_contract_satisfied(true);
|
||||
|
||||
let policy_engine = PolicyEngine::new(vec![
|
||||
// Rule: if recovered from failure + green + approved -> merge
|
||||
|
||||
@@ -45,11 +45,11 @@ use render::{MarkdownStreamState, Spinner, TerminalRenderer};
|
||||
use runtime::{
|
||||
check_base_commit, format_stale_base_warning, format_usd, load_oauth_credentials,
|
||||
load_system_prompt, pricing_for_model, resolve_expected_base, resolve_sandbox_status,
|
||||
ApiClient, ApiRequest, AssistantEvent, CompactionConfig, ConfigLoader, ConfigSource,
|
||||
ContentBlock, ConversationMessage, ConversationRuntime, McpServer, McpServerManager,
|
||||
McpServerSpec, McpTool, MessageRole, ModelPricing, PermissionMode, PermissionPolicy,
|
||||
ProjectContext, PromptCacheEvent, ResolvedPermissionMode, RuntimeError, Session, TokenUsage,
|
||||
ToolError, ToolExecutor, UsageTracker,
|
||||
ApiClient, ApiRequest, AssistantEvent, BaseCommitState, CompactionConfig, ConfigLoader,
|
||||
ConfigSource, ContentBlock, ConversationMessage, ConversationRuntime, McpServer,
|
||||
McpServerManager, McpServerSpec, McpTool, MessageRole, ModelPricing, PermissionMode,
|
||||
PermissionPolicy, ProjectContext, PromptCacheEvent, ResolvedPermissionMode, RuntimeError,
|
||||
Session, TokenUsage, ToolError, ToolExecutor, UsageTracker,
|
||||
};
|
||||
use serde::Deserialize;
|
||||
use serde_json::{json, Map, Value};
|
||||
@@ -1973,6 +1973,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
|
||||
parse_git_status_metadata(project_context.git_status.as_deref());
|
||||
let git_summary = parse_git_workspace_summary(project_context.git_status.as_deref());
|
||||
let branch_freshness = BranchFreshness::from_git_status(project_context.git_status.as_deref());
|
||||
let stale_base_state = stale_base_state_for(&cwd, None);
|
||||
let empty_config = runtime::RuntimeConfig::empty();
|
||||
let sandbox_config = config.as_ref().ok().unwrap_or(&empty_config);
|
||||
let boot_preflight = build_boot_preflight_snapshot(
|
||||
@@ -1995,6 +1996,7 @@ fn render_doctor_report() -> Result<DoctorReport, Box<dyn std::error::Error>> {
|
||||
git_branch,
|
||||
git_summary,
|
||||
branch_freshness,
|
||||
stale_base_state,
|
||||
session_lifecycle: classify_session_lifecycle_for(&cwd),
|
||||
boot_preflight,
|
||||
sandbox_status: resolve_sandbox_status(sandbox_config.sandbox(), &cwd),
|
||||
@@ -2334,9 +2336,10 @@ fn check_install_source_health() -> DiagnosticCheck {
|
||||
|
||||
fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
|
||||
let in_repo = context.project_root.is_some();
|
||||
let stale_base_warning = format_stale_base_warning(&context.stale_base_state);
|
||||
DiagnosticCheck::new(
|
||||
"Workspace",
|
||||
if in_repo {
|
||||
if in_repo && stale_base_warning.is_none() {
|
||||
DiagnosticLevel::Ok
|
||||
} else {
|
||||
DiagnosticLevel::Warn
|
||||
@@ -2369,6 +2372,10 @@ fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
|
||||
"Memory files {} · config files loaded {}/{}",
|
||||
context.memory_file_count, context.loaded_config_files, context.discovered_config_files
|
||||
),
|
||||
format!(
|
||||
"Stale base {}",
|
||||
stale_base_warning.as_deref().unwrap_or("ok")
|
||||
),
|
||||
])
|
||||
.with_data(Map::from_iter([
|
||||
("cwd".to_string(), json!(context.cwd.display().to_string())),
|
||||
@@ -2401,6 +2408,10 @@ fn check_workspace_health(context: &StatusContext) -> DiagnosticCheck {
|
||||
"discovered_config_files".to_string(),
|
||||
json!(context.discovered_config_files),
|
||||
),
|
||||
(
|
||||
"stale_base".to_string(),
|
||||
stale_base_json_value(&context.stale_base_state),
|
||||
),
|
||||
]))
|
||||
}
|
||||
|
||||
@@ -2920,6 +2931,7 @@ struct StatusContext {
|
||||
git_branch: Option<String>,
|
||||
git_summary: GitWorkspaceSummary,
|
||||
branch_freshness: BranchFreshness,
|
||||
stale_base_state: BaseCommitState,
|
||||
session_lifecycle: SessionLifecycleSummary,
|
||||
boot_preflight: BootPreflightSnapshot,
|
||||
sandbox_status: runtime::SandboxStatus,
|
||||
@@ -4167,12 +4179,30 @@ fn enforce_broad_cwd_policy(
|
||||
}
|
||||
}
|
||||
|
||||
fn stale_base_state_for(cwd: &Path, flag_value: Option<&str>) -> BaseCommitState {
|
||||
let source = resolve_expected_base(flag_value, cwd);
|
||||
check_base_commit(cwd, source.as_ref())
|
||||
}
|
||||
|
||||
fn stale_base_json_value(state: &BaseCommitState) -> serde_json::Value {
|
||||
match state {
|
||||
BaseCommitState::Matches => json!({"status": "matches", "fresh": true}),
|
||||
BaseCommitState::Diverged { expected, actual } => json!({
|
||||
"status": "diverged",
|
||||
"fresh": false,
|
||||
"expected": expected,
|
||||
"actual": actual,
|
||||
}),
|
||||
BaseCommitState::NoExpectedBase => json!({"status": "no_expected_base", "fresh": null}),
|
||||
BaseCommitState::NotAGitRepo => json!({"status": "not_git_repo", "fresh": null}),
|
||||
}
|
||||
}
|
||||
|
||||
fn run_stale_base_preflight(flag_value: Option<&str>) {
|
||||
let Ok(cwd) = env::current_dir() else {
|
||||
return;
|
||||
};
|
||||
let source = resolve_expected_base(flag_value, &cwd);
|
||||
let state = check_base_commit(&cwd, source.as_ref());
|
||||
let state = stale_base_state_for(&cwd, flag_value);
|
||||
if let Some(warning) = format_stale_base_warning(&state) {
|
||||
eprintln!("{warning}");
|
||||
}
|
||||
@@ -6221,6 +6251,7 @@ fn status_context(
|
||||
parse_git_status_metadata(project_context.git_status.as_deref());
|
||||
let git_summary = parse_git_workspace_summary(project_context.git_status.as_deref());
|
||||
let branch_freshness = BranchFreshness::from_git_status(project_context.git_status.as_deref());
|
||||
let stale_base_state = stale_base_state_for(&cwd, None);
|
||||
let boot_preflight = build_boot_preflight_snapshot(
|
||||
&cwd,
|
||||
project_root.as_deref(),
|
||||
@@ -6238,6 +6269,7 @@ fn status_context(
|
||||
git_branch,
|
||||
git_summary,
|
||||
branch_freshness,
|
||||
stale_base_state,
|
||||
session_lifecycle: classify_session_lifecycle_for(&cwd),
|
||||
boot_preflight,
|
||||
sandbox_status,
|
||||
@@ -12567,6 +12599,7 @@ mod tests {
|
||||
conflicted_files: 0,
|
||||
},
|
||||
branch_freshness: test_branch_freshness(),
|
||||
stale_base_state: super::BaseCommitState::NoExpectedBase,
|
||||
session_lifecycle: SessionLifecycleSummary {
|
||||
kind: SessionLifecycleKind::IdleShell,
|
||||
pane_id: Some("%7".to_string()),
|
||||
@@ -12692,6 +12725,46 @@ mod tests {
|
||||
fs::remove_dir_all(workspace).expect("cleanup temp dir");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn workspace_health_warns_when_stale_base_diverged() {
|
||||
let context = super::StatusContext {
|
||||
cwd: PathBuf::from("/tmp/project"),
|
||||
session_path: None,
|
||||
loaded_config_files: 0,
|
||||
discovered_config_files: 0,
|
||||
memory_file_count: 0,
|
||||
project_root: Some(PathBuf::from("/tmp/project")),
|
||||
git_branch: Some("feature/stale-base".to_string()),
|
||||
git_summary: GitWorkspaceSummary::default(),
|
||||
branch_freshness: test_branch_freshness(),
|
||||
stale_base_state: super::BaseCommitState::Diverged {
|
||||
expected: "base".to_string(),
|
||||
actual: "head".to_string(),
|
||||
},
|
||||
session_lifecycle: SessionLifecycleSummary {
|
||||
kind: SessionLifecycleKind::SavedOnly,
|
||||
pane_id: None,
|
||||
pane_command: None,
|
||||
pane_path: None,
|
||||
workspace_dirty: false,
|
||||
abandoned: false,
|
||||
},
|
||||
boot_preflight: test_boot_preflight(),
|
||||
sandbox_status: runtime::SandboxStatus::default(),
|
||||
config_load_error: None,
|
||||
};
|
||||
|
||||
let check = super::check_workspace_health(&context);
|
||||
|
||||
assert_eq!(check.level, super::DiagnosticLevel::Warn);
|
||||
assert_eq!(check.data["stale_base"]["status"], "diverged");
|
||||
assert_eq!(check.data["stale_base"]["fresh"], false);
|
||||
assert!(check
|
||||
.details
|
||||
.iter()
|
||||
.any(|detail| detail.contains("stale codebase")));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn status_json_surfaces_session_lifecycle_for_clawhip() {
|
||||
let context = super::StatusContext {
|
||||
@@ -12704,6 +12777,7 @@ mod tests {
|
||||
git_branch: Some("feature/session-lifecycle".to_string()),
|
||||
git_summary: GitWorkspaceSummary::default(),
|
||||
branch_freshness: test_branch_freshness(),
|
||||
stale_base_state: super::BaseCommitState::NoExpectedBase,
|
||||
session_lifecycle: SessionLifecycleSummary {
|
||||
kind: SessionLifecycleKind::RunningProcess,
|
||||
pane_id: Some("%9".to_string()),
|
||||
|
||||
@@ -56,6 +56,7 @@ pub(crate) fn detect_lane_completion(
|
||||
Some(LaneContext {
|
||||
lane_id: output.agent_id.clone(),
|
||||
green_level: 3, // Workspace green
|
||||
green_contract_satisfied: true,
|
||||
branch_freshness: std::time::Duration::from_secs(0),
|
||||
blocker: LaneBlocker::None,
|
||||
review_status: ReviewStatus::Approved,
|
||||
@@ -165,6 +166,7 @@ mod tests {
|
||||
let context = LaneContext {
|
||||
lane_id: "completed-lane".to_string(),
|
||||
green_level: 3,
|
||||
green_contract_satisfied: true,
|
||||
branch_freshness: std::time::Duration::from_secs(0),
|
||||
blocker: LaneBlocker::None,
|
||||
review_status: ReviewStatus::Approved,
|
||||
|
||||
@@ -1503,8 +1503,10 @@ fn run_worker_create(input: WorkerCreateInput) -> Result<String, String> {
|
||||
let merged_roots: Vec<String> = ConfigLoader::default_for(&input.cwd)
|
||||
.load()
|
||||
.ok()
|
||||
.map(|config| config.trusted_roots_with_overrides(&input.trusted_roots))
|
||||
.unwrap_or_else(|| input.trusted_roots.clone());
|
||||
.map_or_else(
|
||||
|| input.trusted_roots.clone(),
|
||||
|config| config.trusted_roots_with_overrides(&input.trusted_roots),
|
||||
);
|
||||
let worker = global_worker_registry().create(
|
||||
&input.cwd,
|
||||
&merged_roots,
|
||||
@@ -6212,6 +6214,8 @@ Command exceeded timeout of {timeout_ms} ms",
|
||||
stderr.trim_end()
|
||||
)
|
||||
};
|
||||
let is_test = is_test_command(command);
|
||||
let return_code_interpretation = if is_test { "test.hung" } else { "timeout" };
|
||||
return Ok(runtime::BashCommandOutput {
|
||||
stdout: String::from_utf8_lossy(&output.stdout).into_owned(),
|
||||
stderr,
|
||||
@@ -6222,9 +6226,11 @@ Command exceeded timeout of {timeout_ms} ms",
|
||||
backgrounded_by_user: None,
|
||||
assistant_auto_backgrounded: None,
|
||||
dangerously_disable_sandbox: None,
|
||||
return_code_interpretation: Some(String::from("timeout")),
|
||||
return_code_interpretation: Some(String::from(return_code_interpretation)),
|
||||
no_output_expected: Some(false),
|
||||
structured_content: None,
|
||||
structured_content: Some(vec![test_timeout_provenance(
|
||||
command, timeout_ms, is_test,
|
||||
)]),
|
||||
persisted_output_path: None,
|
||||
persisted_output_size: None,
|
||||
sandbox_status: None,
|
||||
@@ -6258,6 +6264,37 @@ Command exceeded timeout of {timeout_ms} ms",
|
||||
})
|
||||
}
|
||||
|
||||
fn is_test_command(command: &str) -> bool {
|
||||
let normalized = command
|
||||
.split_whitespace()
|
||||
.collect::<Vec<_>>()
|
||||
.join(" ")
|
||||
.to_ascii_lowercase();
|
||||
normalized.contains("cargo test")
|
||||
|| normalized.contains("cargo nextest")
|
||||
|| normalized.contains("npm test")
|
||||
|| normalized.contains("pnpm test")
|
||||
|| normalized.contains("yarn test")
|
||||
|| normalized.contains("pytest")
|
||||
}
|
||||
|
||||
fn test_timeout_provenance(
|
||||
command: &str,
|
||||
timeout_ms: u64,
|
||||
classified_as_test_hang: bool,
|
||||
) -> serde_json::Value {
|
||||
json!({
|
||||
"event": if classified_as_test_hang { "test.hung" } else { "command.timeout" },
|
||||
"failureClass": if classified_as_test_hang { "test_hang" } else { "timeout" },
|
||||
"data": {
|
||||
"command": command,
|
||||
"timeoutMs": timeout_ms,
|
||||
"provenance": "shell.timeout",
|
||||
"classification": if classified_as_test_hang { "test.hung" } else { "timeout" }
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
fn resolve_cell_index(
|
||||
cells: &[serde_json::Value],
|
||||
cell_id: Option<&str>,
|
||||
@@ -9027,6 +9064,23 @@ mod tests {
|
||||
assert_eq!(background_output["noOutputExpected"], true);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bash_tool_classifies_test_timeout_as_hung_with_provenance() {
|
||||
let timeout = execute_tool(
|
||||
"bash",
|
||||
&json!({ "command": "sleep 1 # cargo test slow_case", "timeout": 10 }),
|
||||
)
|
||||
.expect("bash timeout should return output");
|
||||
let timeout_output: serde_json::Value = serde_json::from_str(&timeout).expect("json");
|
||||
assert_eq!(timeout_output["interrupted"], true);
|
||||
assert_eq!(timeout_output["returnCodeInterpretation"], "test.hung");
|
||||
assert_eq!(timeout_output["structuredContent"][0]["event"], "test.hung");
|
||||
assert_eq!(
|
||||
timeout_output["structuredContent"][0]["data"]["provenance"],
|
||||
"bash.timeout"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn bash_workspace_tests_are_blocked_when_branch_is_behind_main() {
|
||||
let _guard = env_lock()
|
||||
|
||||
Reference in New Issue
Block a user