Compare commits

...

10 Commits

Author SHA1 Message Date
Yeachan-Heo 8632f90f3d docs(roadmap): add delayed binary read gap 2026-05-22 05:30:46 +00:00
Yeachan-Heo a49d8e8361 docs(roadmap): add env wrapper read-only gap 2026-05-22 05:01:15 +00:00
Yeachan-Heo 2bb5d49ce3 docs(roadmap): add git fetch read-only gap 2026-05-22 04:31:16 +00:00
Yeachan-Heo 1882c9582a docs(roadmap): add path traversal validation gap 2026-05-22 04:01:56 +00:00
Yeachan-Heo 665dd1fe2a docs(roadmap): add g004 approval delegation continuity gap 2026-05-22 00:30:49 +00:00
Yeachan-Heo 54bbee2680 docs(roadmap): add g004 suffix path validation gap 2026-05-22 00:01:11 +00:00
Yeachan-Heo 8ea54d3a50 docs(roadmap): add approval token expiry boundary gap 2026-05-21 23:30:43 +00:00
Yeachan-Heo 3d877d78f3 docs(roadmap): add branch lock module normalization gap 2026-05-21 23:00:47 +00:00
Yeachan-Heo f45b651e18 docs(roadmap): add auto compaction telemetry gap 2026-05-21 22:30:42 +00:00
Yeachan-Heo a8c67b08e2 docs(roadmap): add auto compaction short huge session gap 2026-05-21 22:00:55 +00:00
+22
View File
@@ -6671,3 +6671,25 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
557. **Wrong-task prompt-misdelivery detection only recognizes `` prompt echoes, so `>` / `` agent prompts can hide mismatched-task receipts until timeout** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@bc55711` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-311-auto-merge-race-receipt`. Code inspection: worker readiness accepts multiple prompt glyphs (`>`, ``, ``) in `detect_ready_for_prompt` at `runtime/src/worker_boot.rs:1079-1113`, but `detect_prompt_echo` at `worker_boot.rs:1210-1217` only strips a leading ``. `detect_prompt_misdelivery` relies on `detect_prompt_echo` for the `mismatched_prompt_visible` path that catches wrong-task receipts when the screen shows a different task prompt. The existing regression `wrong_task_receipt_mismatch_is_detected_before_execution_continues` uses ` Explain this KakaoTalk screenshot...`, so it exercises only the single supported glyph. If the coding agent UI echoes `> Explain...` or ` Explain...`, `observed_prompt_preview` is `None`; when the expected prompt text is not also visible, the wrong-task mismatch is not detected and the worker stays `Running` until coarse startup timeout classification. **Required fix shape:** (a) make prompt-echo parsing share the same glyph set as `detect_ready_for_prompt` (`>`, ``, ``, and boxed `│ >` variants if present); (b) add wrong-task receipt tests for `>`, ``, and `` echoes; (c) include the raw echo line/glyph in `WorkerEventPayload::PromptDelivery` or event detail so operators can diagnose UI variant drift; (d) ensure shell prompt detection remains separate so real shell prompts are still classified as `Shell`, not wrong-task agent echoes; (e) add a timeout evidence regression proving observed prompt preview is populated for all supported glyphs. **Why this matters:** prompt-misdelivery protection is only as good as the UI echo parser. Supporting multiple ready glyphs but only one echo glyph creates event/log opacity: operators see a generic timeout instead of a precise wrong-task replay condition for common terminal themes or agent UIs. Source: gaebal-gajae dogfood response to Clawhip message `1507125863408341102` on 2026-05-21.
558. **Tool-permission gate detection is hard-coded to one English MCP prompt shape, so alternate MCP approval wording can fall through as startup-no-evidence instead of `ToolPermissionRequired`** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9fd61af` and binary built from source SHA `25d663d`. Active tmux session at probe time: `omx-issue-2443-ralplan-consensus-resume`. Code inspection: `detect_tool_permission_prompt` in `runtime/src/worker_boot.rs:958-999` only enters when the full screen contains either `allow the` + `server` + `tool` + `run`, or `allow tool` + `run`. The only production-shaped tests at `worker_boot.rs:1387-1474` use exactly `Allow the omx_memory MCP server to run tool "..."?`. Common equivalent approval copy such as `Allow MCP server omx_memory to call tool "project_memory_read"?`, `Allow server omx_memory to execute tool ...`, `Approve tool project_memory_read from omx_memory?`, or localized/shorter plugin prompts do not contain the exact `allow the ... server ... run tool` / `allow tool ... run` token pattern. When those appear during boot, `observe` will not set `WorkerStatus::ToolPermissionRequired`, no structured `ToolPermissionPrompt` payload is emitted, and later timeout evidence can degrade to generic `startup_no_evidence` or worker-crashed classification even though the pane clearly showed an approval gate. **Required fix shape:** (a) replace phrase-order checks with a tolerant classifier over permission verbs (`allow`/`approve`/`permit`), execution verbs (`run`/`call`/`execute`), and MCP/tool tokens independent of order; (b) add fixture tests for at least three real-world prompt variants, including `call tool` and prompts where the tool name appears before the server; (c) preserve extracted `server_name`, `tool_name`, allow-scope, and raw `prompt_preview` even when fields are partial; (d) emit an `Unknown`-scope tool-permission event rather than falling through when the approval intent is clear but parsing is incomplete; (e) include classifier confidence/reason in startup timeout evidence so UI wording drift is visible. **Why this matters:** MCP permission prompts are exactly the kind of boot blocker operators need to resolve quickly. A brittle single-template detector converts an actionable “click allow” condition into opaque startup failure noise whenever plugin/UI copy drifts. Source: gaebal-gajae dogfood response to Clawhip message `1507133416884404254` on 2026-05-21.
559. **Auto-compaction can refuse to compact very large short sessions because `compact_session` still enforces the default message-count gate even after real provider usage crosses the input-token threshold** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 22:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9ef521b` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: `gajae-issue-313-omx-launch-resilience-receipt`, `omx-pr-2447-ralplan-consensus-final-review`. Code inspection: `ConversationRuntime::maybe_auto_compact` at `runtime/src/conversation.rs:559-572` triggers from actual cumulative provider `input_tokens`, then calls `compact_session` with only `max_estimated_tokens: 0` overridden. But `compact_session` first calls `should_compact`, and `should_compact` at `runtime/src/compact.rs:41-50` still requires `compactable.len() > config.preserve_recent_messages` before considering token budget. Because `CompactionConfig::default().preserve_recent_messages` is 4, a session with 1-4 extremely large messages and real provider usage above `auto_compaction_input_tokens_threshold` returns `removed_message_count == 0`; `maybe_auto_compact` then silently returns `None`. The existing auto-compaction regression at `conversation.rs:1520-1572` seeds enough turns so the message-count predicate passes, but it does not cover a short huge transcript that crosses the real usage threshold. **Required fix shape:** (a) distinguish manual estimated-token compaction from auto-compaction-after-real-usage; (b) when actual provider usage crosses the threshold, allow compaction even if message count is <= default preserved tail, while preserving at least the latest user/assistant boundary safely; (c) add a regression with one or two huge messages plus `AssistantEvent::Usage { input_tokens: 120_000 }` proving auto-compaction emits `AutoCompactionEvent`; (d) make the skip reason observable when auto-compaction threshold is crossed but no messages are removed (`too_few_messages`, `tool_boundary`, `empty_prefix`, etc.); (e) ensure tool-use/tool-result boundary protection still wins over unsafe compaction. **Why this matters:** provider-reported usage is the trusted signal that the context window is hot. If auto-compaction ignores that signal for short but huge sessions, users hit context exhaustion with no auto-compaction event and no explanation. Source: gaebal-gajae dogfood response to Clawhip message `1507140966774079521` on 2026-05-21.
560. **Auto-compaction observability reports only removed message count, not token pressure or before/after token estimates, so operators cannot tell whether compaction actually relieved context risk** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 22:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a8c67b0` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: `gajae-issue-314-omx-launch-resilience-review`, `gajae-pr-314-final-review`, `omx-pr-2447-ralplan-consensus-review3`. Code inspection: `AutoCompactionEvent` at `runtime/src/conversation.rs:123-127` contains only `removed_message_count`; `maybe_auto_compact` at `conversation.rs:559-581` has access to cumulative provider usage and the pre/post session but drops all token-pressure context. The CLI notice at `rusty-claude-cli/src/main.rs:3560-3562` prints only `[auto-compacted: removed N messages]`, and JSON output at `main.rs:5111-5114` exposes only `removed_messages` plus that same notice. The parity harness assertion at `mock_parity_harness.rs:691-712` only checks that the `auto_compaction` key exists and usage input tokens are large; it does not require any before/after estimate, threshold, trigger usage, or reduction metadata. **Required fix shape:** (a) extend `AutoCompactionEvent` with `trigger_input_tokens`, `threshold_input_tokens`, `estimated_tokens_before`, `estimated_tokens_after`, `removed_message_count`, and a `reason/trigger` enum; (b) compute before/after estimates around `compact_session` and include them in CLI text and JSON; (c) add tests proving JSON contains these fields for auto-compaction and that the CLI notice includes enough context to debug risk relief; (d) include a skipped/degraded event shape when threshold crossed but no messages removed, tying into #559; (e) update parity harness to assert semantic values, not merely key presence. **Why this matters:** auto-compaction is a context-safety action. Without token-pressure and reduction telemetry, a user sees “removed 2 messages” but cannot tell if the session went from 120k to 5k tokens, 120k to 119k, or failed to relieve the risk at all. Source: gaebal-gajae dogfood response to Clawhip message `1507148512159203338` on 2026-05-21.
561. **Branch-lock collision detection treats module strings literally, so equivalent paths with `./`, duplicate slashes, or trailing slashes evade same-branch overlap detection** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 23:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f45b651` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-316-inflight-review-superseded-merge`. Code inspection: `detect_branch_lock_collisions` in `runtime/src/branch_lock.rs:23-49` delegates to `overlapping_modules`, which compares raw `modules: Vec<String>` entries via `modules_overlap` at `branch_lock.rs:65-69`: exact equality or prefix with `format!("{right}/")` / `format!("{left}/")`. No normalization is applied. As a result, two lanes on the same branch with semantically identical module scopes like `runtime/mcp` vs `./runtime/mcp`, `runtime/mcp` vs `runtime/mcp/`, `runtime//mcp` vs `runtime/mcp`, or `crates/runtime/../runtime/mcp` are not detected as collisions, even though they target the same files. The existing tests cover exact same module, nested raw-prefix module, and different branches only; they do not exercise path normalization or malformed-but-common module inputs. This is distinct from Jobdori's #562 empty-modules whole-branch gap: even non-empty module locks can bypass collision checks when strings differ syntactically. **Required fix shape:** (a) normalize module scopes before comparison by trimming whitespace, removing leading `./`, collapsing repeated slashes, dropping trailing slashes, and resolving `.`/`..` components without escaping repo root; (b) store/report both normalized collision scope and raw input modules for auditability; (c) reject or explicitly mark invalid module paths that normalize outside the repo; (d) add tests for `./runtime/mcp`, `runtime/mcp/`, `runtime//mcp`, nested normalized paths, and invalid `../` escape attempts; (e) keep deterministic sorted/deduped collision output after normalization. **Why this matters:** branch locks are a coordination safety rail. If two agents can claim the same branch/module by spelling the path differently, lock receipts give false confidence and concurrent lanes can still overwrite or review-stomp each other. Source: gaebal-gajae dogfood response to Clawhip message `1507156065941192705` on 2026-05-21.
562. **Approval tokens are still accepted at their exact expiry second because validation uses `now > expires_at` instead of `now >= expires_at`** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 23:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@3d877d7` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-317-inflight-review-restart`. Code inspection: `ApprovalTokenGrant::expires_at` stores `expires_at_epoch_seconds`, and `ApprovalTokenLedger::validate_grant` at `runtime/src/approval_tokens.rs:286-290` rejects only when `now_epoch_seconds > expires_at`. Therefore a token with `expires_at(20)` remains valid for verification/consumption when `now_epoch_seconds == 20`; it expires only at 21. The existing test `approval_token_rejects_scope_expansion_expiry_and_revocation` checks `ledger.verify(..., now=21)` for `expires_at(20)` but does not cover the boundary at exactly 20. This differs from the OAuth expiry helper in `api/src/providers/anthropic.rs`, which uses `expires_at <= now_unix_timestamp()` and treats the timestamp as no longer valid once reached. **Required fix shape:** (a) change approval-token expiry validation to `now_epoch_seconds >= expires_at` or explicitly rename/document the field as `valid_through_epoch_seconds` if inclusive semantics are intended; (b) add boundary tests for `now=expires_at-1`, `now=expires_at`, and `now=expires_at+1` for both `verify` and `consume`; (c) include `expires_at_epoch_seconds` and `now_epoch_seconds` in expiry audit/error metadata so operators can see boundary decisions; (d) align approval-token expiry semantics with OAuth/token-cache expiry conventions across the codebase; (e) add conformance coverage if approval tokens are exported in G004 bundles. **Why this matters:** approval tokens authorize policy exceptions. An off-by-one expiry window is small but security-sensitive and, without boundary tests, future short-lived approval grants can be consumed exactly at the moment operators expect them to be dead. Source: gaebal-gajae dogfood response to Clawhip message `1507163615596380191` on 2026-05-21.
563. **G004 conformance `get_path` falls back to suffix pointers, so nested required fields can be satisfied by same-named root fields and invalid bundles pass** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 00:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8ea54d3` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-316-fix-still-open-fixture`. Code inspection: every conformance field helper ultimately calls `get_path(root, pointer)` in `runtime/src/g004_conformance.rs:386-398`. If `root.pointer(path)` misses, `get_path` iterates suffixes of the requested path and tries those against the same root object. For example, validating a report's `/identity/contentHash` at `g004_conformance.rs:132-137` first checks that nested field; if it is missing, `get_path(report, "/identity/contentHash")` then tries `"/contentHash"`. A malformed report with top-level `contentHash` but no `identity.contentHash` therefore passes the nested identity requirement. The same suffix fallback affects `/projection/provenance`, `/redaction/provenance`, `/metadata/provenance`, `/metadata/seq`, `/metadata/eventFingerprint`, and similar nested contract fields. This makes the machine-checkable G004 validator structurally permissive in exactly the fields that are supposed to prove provenance/identity, and tests/fixtures do not appear to include negative cases for misplaced nested fields. **Required fix shape:** (a) remove suffix fallback from `get_path` and use strict JSON Pointer lookup for conformance validation; (b) if fallback is needed for legacy aliases, make it explicit per field with a deprecation warning/error code, never implicit for all nested fields; (c) add negative tests where top-level `contentHash`, `provenance`, `seq`, or `eventFingerprint` exist but the required nested path is absent, proving validation fails; (d) update error messages to report the exact missing nested path; (e) run existing G004 fixtures through strict validation and fix any accidental reliance on suffix aliases. **Why this matters:** conformance validators are trust boundaries. A suffix fallback lets artifacts satisfy provenance/identity requirements from the wrong location, so downstream consumers can accept bundles whose shape does not match the advertised contract. Source: gaebal-gajae dogfood response to Clawhip message `1507171165033201735` on 2026-05-22.
564. **G004 approval-token conformance checks only that delegation hops are non-empty, not that the chain is contiguous from owner to executor, so broken approval provenance passes** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 00:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@54bbee2` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: none. Code inspection: `validate_approval_tokens` in `runtime/src/g004_conformance.rs:226-253` requires `tokenId`, `owner`, `scope`, `issuedAt`, `oneTimeUse`, `replayPreventionNonce`, and calls `validate_delegation_chain`. But `validate_delegation_chain` at `g004_conformance.rs:256-272` only checks each hop has non-empty `from`, `to`, `action`, and `at`; it never verifies that the first hop starts at the token `owner`, that each hop's `to` equals the next hop's `from`, or that the final `to` matches an approved/executing actor field. The valid fixture has one clean `leader-fixed -> worker-3` hop, and the only negative test uses an empty chain; there is no fixture where a non-empty but discontinuous chain like `owner -> lead`, then `unrelated -> worker` is rejected. Therefore an approval token can advertise an auditable delegation chain while the conformance helper accepts broken provenance continuity. **Required fix shape:** (a) add explicit `approvedExecutor`/`executingActor` (or equivalent) to the G004 approval-token schema if final actor is part of the contract; (b) validate first hop `from == owner`; (c) validate adjacent hop continuity (`hop[i].to == hop[i+1].from`); (d) validate final hop reaches the approved/executing actor when present; (e) add negative tests for owner mismatch, discontinuous middle hop, and wrong final executor, plus a positive multi-hop fixture. **Why this matters:** approval tokens are policy-exception evidence. A non-empty list of disconnected hops is not an audit trail; downstream claws can mistakenly trust a token whose delegation provenance does not actually connect the owner approval to the actor consuming it. Source: gaebal-gajae dogfood response to Clawhip message `1507178711106064444` on 2026-05-22.
565. **MCP lifecycle hardened diagnostics expose `timestamp` / `phase_timestamps` as raw epoch seconds while adjacent machine-readable event contracts are being tightened around RFC3339, so MCP/plugin lifecycle logs remain format-inconsistent and parser-hostile** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 02:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@665dd1f`. Active tmux sessions at probe time: none. Code inspection: `runtime/src/mcp_lifecycle_hardened.rs:6-12` defines `now_secs()` as `SystemTime` seconds since `UNIX_EPOCH`; `McpErrorSurface.timestamp` at `mcp_lifecycle_hardened.rs:50-56` stores that raw `u64`, and `McpLifecycleState.phase_timestamps` at `mcp_lifecycle_hardened.rs:125-129` / `180-186` records the same epoch seconds per phase. Tests only assert presence via `phase_timestamp(...).is_some()` and preserve `waited_ms`; they never assert a stable date-time contract or expose both wall-clock and monotonic sequencing. This is a plugin lifecycle variant of the timestamp-contract findings in #548-#551/#554: MCP diagnostics are explicitly machine-readable, but a field named `timestamp` differs from the RFC3339-looking `emittedAt` / `createdAt` surfaces and will force downstream claws to special-case one lifecycle stream. **Required fix shape:** (a) decide and document MCP lifecycle timestamp contract; prefer RFC3339 UTC string fields like `occurredAt` / `phaseStartedAt` for wall-clock log interoperability; (b) if epoch seconds are retained, name them `timestamp_epoch_seconds` and optionally add a separate monotonic `phase_seq`; (c) update `McpErrorSurface` and `phase_timestamps` serialization with backward-compatible aliases or migration notes; (d) add tests that parse MCP lifecycle wall-clock fields as RFC3339 and reject ambiguous raw epoch strings when serialized under the public contract; (e) align MCP lifecycle diagnostics with lane events/recovery/G004 timestamp validators so plugin lifecycle failures can be sorted and displayed without bespoke parser branches. **Why this matters:** MCP startup/plugin lifecycle failures are already hard to debug. If the lifecycle stream uses raw epoch seconds while the rest of claw-code moves to RFC3339 event timestamps, logs and UIs lose a uniform temporal contract exactly on the surface operators need during plugin breakage. Source: gaebal-gajae dogfood response to Clawhip message `1507208914046025731` on 2026-05-22.
566. **Workspace-write outside-workspace detection misses exact sensitive directory targets like `/etc` because `command_targets_outside_workspace` only searches for paths with trailing slashes** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 03:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@665dd1f`. Active tmux sessions at probe time: none. Code inspection: `runtime/src/bash_validation.rs:286-295` routes `PermissionMode::WorkspaceWrite` through `command_targets_outside_workspace`, and that helper at `bash_validation.rs:302-323` only warns for write/state-modifying commands when the command string contains one of `"/etc/", "/usr/", "/var/", "/boot/", "/sys/", "/proc/", "/dev/", "/sbin/", "/lib/", "/opt/"`. Existing coverage checks `validate_mode("cp file.txt /etc/config", PermissionMode::WorkspaceWrite)` and therefore exercises the trailing-slash case. But a write target that is exactly the directory name, such as `cp file.txt /etc`, `mv config /usr`, `install file /opt`, or `tee /etc`, does not contain `"/etc/"` / `"/usr/"` / `"/opt/"`; the helper returns false and `validate_mode` falls through to `Allow`. This is adjacent to, but distinct from, Jobdori's #570 traversal-substring bypass: even without `..`, exact system directory operands evade the workspace-write out-of-scope warning because the heuristic uses substring sentinels instead of token/path boundary checks. **Required fix shape:** (a) tokenize shell words enough to inspect path-like operands rather than raw substring matching; (b) treat exact sensitive directories (`/etc`, `/usr`, `/var`, `/boot`, `/sys`, `/proc`, `/dev`, `/sbin`, `/lib`, `/opt`) and descendants equivalently; (c) include command/path evidence in the warning so operators see which operand escaped workspace scope; (d) add regressions for `cp file.txt /etc`, `mv file /usr`, `install file /opt`, and keep the existing `/etc/config` case green; (e) consider unifying this with the #570 path-resolution hardening so workspace-write/read-only path scope checks use one boundary-aware path classifier. **Why this matters:** workspace-write mode is supposed to allow project edits while warning on system-level writes. Missing exact-directory operands lets an agent attempt writes into sensitive roots with no permission warning, which is precisely the boundary the mode is meant to make visible. Source: gaebal-gajae dogfood response to Clawhip message `1507216459665772677` on 2026-05-22.
567. **Path traversal validation treats any command string containing the workspace path as safe, so mixed outside-target commands can smuggle `../` escapes past the warning** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 04:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@665dd1f`. Active tmux session at probe time: `omc-issue-3079-plugin-cache-commands`; no active claw-code implementation session. Code inspection: `rust/crates/runtime/src/bash_validation.rs::validate_paths` checks `if command.contains("../")`, then suppresses the traversal warning whenever the *entire command string* contains `workspace.to_string_lossy()`. That means a command with one operand inside the workspace and a separate traversal operand outside it, such as `cp /workspace/file ../../../etc/passwd` (workspace `/workspace`) or `tar -cf /workspace/out.tar ../../..`, is treated as `Allow` because the workspace prefix appears somewhere in the command. The validator never tokenizes operands, resolves each candidate path, or proves that the specific `../` path stays under the workspace root. Existing tests cover `cat ../../../etc/passwd` without a workspace substring, but not mixed safe+unsafe operands. This is a separate path-scope gap from #566's exact sensitive-directory miss: here the escape uses traversal and is hidden by an unrelated in-workspace token. **Required fix shape:** (a) tokenize shell words enough to collect path-like operands independently; (b) resolve each relative/absolute candidate against cwd/workspace and warn/block when any operand escapes, rather than checking whether the command string contains the workspace root; (c) keep quoted/non-path text from causing false positives; (d) add regressions for `cp /workspace/file ../../../etc/passwd`, `tar -cf /workspace/out.tar ../../..`, and a positive `cp /workspace/file /workspace/sub/../safe`; (e) align this with the boundary-aware classifier required by #566 so workspace-write/read-only path validation shares one per-operand path-scope primitive. **Why this matters:** path validation is supposed to make workspace escapes visible. A global substring exemption lets one benign workspace path launder a different `../` operand, producing false `Allow` evidence for commands that still target outside the workspace. Source: gaebal-gajae dogfood response to Clawhip message `1507231563484500048` on 2026-05-22.
568. **Read-only bash validation classifies `git fetch` as safe even though fetch mutates `.git/FETCH_HEAD`, remote-tracking refs, packfiles, and optional tags** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 04:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@1882c95`. Active tmux sessions at probe time: `gajae-pr-323-bootstrap-context-readiness-review`, `omx-issue-2451-madmax-detached-lock-timeout`; no active claw-code implementation session. Code inspection: `rust/crates/runtime/src/bash_validation.rs::validate_read_only` delegates `git` commands to `validate_git_read_only`, and `GIT_READ_ONLY_SUBCOMMANDS` includes `fetch` alongside pure inspection commands like `status`, `log`, and `show`. But `git fetch` is not read-only: even without checkout it writes `.git/FETCH_HEAD`, may update remote-tracking refs, downloads pack/object data, updates tags depending on flags/config, and can run hooks/config-driven network behavior. There is no mode distinction for `git fetch --dry-run` versus mutating fetch forms, and no test asserts that read-only mode blocks repository/network mutation. This creates a stale-branch/confusion footgun in the opposite direction: agents can claim read-only validation while changing the local base/ref state used by later freshness checks. **Required fix shape:** (a) remove plain `fetch` from `GIT_READ_ONLY_SUBCOMMANDS` or split it into a separate `NetworkMetadataMutation`/workspace-write classification; (b) allow only explicitly non-mutating forms if Git actually guarantees them for the chosen flags, otherwise require WorkspaceWrite/Prompt; (c) add read-only regressions proving `git fetch`, `git fetch origin main`, and `git fetch --tags` are blocked/warned, while `git status`, `git log`, and `git diff` remain allowed; (d) include a clear reason mentioning `.git` metadata/ref mutation and network access; (e) align stale-branch preflight code so any fetch it performs is an explicit trusted preflight action, not hidden inside user-command read-only permission. **Why this matters:** read-only mode is used as safety evidence. Treating `git fetch` as read-only lets commands mutate repository metadata and remote refs under a permission label that promises observation only, making later branch-freshness and audit results harder to trust. Source: gaebal-gajae dogfood response to Clawhip message `1507239113550725231` on 2026-05-22.
569. **Read-only bash validation does not unwrap `/usr/bin/env` command prefixes, so `env FOO=bar rm file` is allowed even though the executed command is write/destructive** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 05:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@2bb5d49`. Active tmux sessions at probe time: `gajae-pr-323-fix-windows-host-path-safety2`, `omx-pr-2452-madmax-lock-timeout-review`; no active claw-code implementation session. Channel context also had new PR claw-code#3056, but this pinpoint is from local permission-validation inspection. Code inspection: `rust/crates/runtime/src/bash_validation.rs::extract_first_command` strips leading `KEY=value` assignments and `validate_read_only` has a special recursive path for `sudo`, but there is no equivalent handling for the common `env KEY=value <cmd>` launcher form. `env` itself appears in `SEMANTIC_READ_ONLY_COMMANDS`, and in read-only mode `validate_read_only("env FOO=bar rm -rf target", PermissionMode::ReadOnly)` sees first command `env`, does not recurse into the inner `rm`, finds no redirection/git state path, and returns `Allow`. The same bypass applies to `env PATH=/tmp cp a b`, `env -i sh -c 'rm file'`, and likely `/usr/bin/env` shebang-style invocations if the first token is a path. Existing tests cover bare env assignments like `FOO=bar ls -la`, and sudo-wrapped writes, but not `env`-wrapped writes. **Required fix shape:** (a) teach first-command extraction or read-only validation to recognize `env`/`/usr/bin/env` prefixes, skip env flags/assignments, and validate the inner command recursively; (b) block or warn if the inner command cannot be parsed safely (for example `env -i sh -c ...`) under read-only mode; (c) add regressions for `env FOO=bar rm -rf target`, `env PATH=/tmp cp a b`, `env -i npm install`, and safe `env FOO=bar grep x file`; (d) keep the existing `KEY=value cmd` behavior but cover quoted assignment values; (e) report the unwrapped inner command in the block reason so operators can see which payload violated read-only mode. **Why this matters:** permission validation must classify the command that will actually execute, not the wrapper binary. `env` is a routine launcher in scripts; allowing it as read-only lets agents hide write/package/destructive commands behind environment setup and produces false safety evidence. Source: gaebal-gajae dogfood response to Clawhip message `1507246662630903910` on 2026-05-22.
570. **`read_file` binary detection scans only the first 8192 bytes for NUL, so text-prefixed binaries can pass the binary gate and fail later as generic UTF-8 read errors** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 05:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a49d8e8`. Active tmux session at probe time: `gajae-issue-324-review-gate-degradation`; no active claw-code implementation session. Channel context from Jobdori pointed at `is_binary_file` line 30. Code inspection confirmed `rust/crates/runtime/src/file_ops.rs::is_binary_file` opens the path, reads exactly one 8192-byte chunk, and returns true only if that first chunk contains NUL. `read_file` then calls `fs::read_to_string` over the whole file. A file with an 8KB text header followed by NUL/non-UTF8 binary payload is therefore not rejected by the explicit binary gate; it reaches `read_to_string` and surfaces as a generic invalid UTF-8/io error instead of the stable `InvalidData: file appears to be binary` contract. Existing coverage writes NUL at byte 0 (`rejects_binary_files`) and does not cover delayed-NUL or delayed-non-UTF8 payloads. **Required fix shape:** (a) make binary/text validation cover the entire allowed read range (bounded by `MAX_READ_SIZE`) or stream decode as UTF-8 while detecting NUL/non-text bytes; (b) map any delayed NUL/non-UTF8 failure to the same stable binary-file error kind/message used by the early gate; (c) add regressions with NUL at byte 8192+, non-UTF8 after an ASCII prefix, and valid large UTF-8 text controls; (d) avoid loading oversized files by preserving the metadata size gate before full scan/decode; (e) include byte offset/evidence in diagnostics only if it does not leak file contents. **Why this matters:** `read_file` should fail predictably on binary artifacts. A small text header is common in mixed/generated files, and letting those bypass the binary gate creates inconsistent errors that look like brittle UTF-8/tool failures rather than an intentional binary-read refusal. Source: gaebal-gajae dogfood response to Clawhip message `1507254212403138672` on 2026-05-22.