# Claw Code Parity Plan (Rewritten) ## Scope - Align AX Copilot with claw-code quality for loop reliability, permission/hook behavior, and session durability. ## Update - Updated: 2026-04-05 15:34 (KST) - Rebased the AX Agent improvement plan on actual `claw-code` runtime files instead of earlier AX snapshots. The reference spine is now `src/bootstrap/state.ts -> src/bridge/initReplBridge.ts -> src/bridge/sessionRunner.ts -> src/screens/REPL.tsx -> src/components/Messages.tsx -> src/components/StatusLine.tsx`. - AX Agent work should follow that same quality order: state first, execution second, render last. UI-only fixes that bypass state/execution should be treated as temporary. - Updated: 2026-04-05 16:55 (KST) - Current estimated parity vs `claw-code`: core execution engine `82%`, main chat UI `68%`, Cowork/Code status UX `63%`, internal settings linkage `88%`, overall AX Agent `74%`. - Engine-affecting settings should be handled conservatively during parity work. If a setting changes the main execution route, approval flow, or recovery behavior without representing a stable real-world user choice, it should be moved to developer-only UI or removed from user-facing surfaces. ## Preserved History (Summary) - Core loop guards and post-tool verification gates are already partially implemented. - Plan Mode, parallel tool execution, and unknown-tool recovery are in place. - Session restore hardening is ongoing. ## Reference Map | claw-code reference | AX apply target | completion criteria | quality criteria | |---|---|---|---| | `src/bootstrap/state.ts` | `src/AxCopilot/Views/ChatWindow.xaml.cs`, `src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs`, `src/AxCopilot/Services/ChatStorageService.cs` | one canonical runtime/session state for current turn, queue, retry, execution events, and persisted snapshot | reopen/retry/queue flows do not create duplicate or blank assistant messages | | `src/bridge/initReplBridge.ts` | `src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs`, `src/AxCopilot/Services/LlmService.cs` | send/regenerate/retry/queued follow-up/slash all enter through one prepared-execution path | same input under same settings takes same execution route regardless of entry point | | `src/bridge/sessionRunner.ts` | `src/AxCopilot/Services/Agent/AgentLoopService.cs`, `src/AxCopilot/Services/Agent/AgentLoopTransitions.cs`, `src/AxCopilot/Services/Agent/AgentLoopTransitions.Execution.cs` | tool start/result/error/progress normalized once inside loop layer | Cowork/Code no longer flash repeated status strings or overshare debug payloads | | `src/bridge/bridgeMessaging.ts` | `src/AxCopilot/Views/ChatWindow.xaml.cs`, `src/AxCopilot/Services/Agent/AgentLoopService.cs` | inbound execution events separated from display-only events before UI render | execution event replay does not duplicate visible timeline banners | | `src/screens/REPL.tsx` | `src/AxCopilot/Views/ChatWindow.xaml`, `src/AxCopilot/Views/ChatWindow.xaml.cs` | screen state transitions, queue flow, retry flow, and composer state use shared runtime helpers | window resize, queue chaining, and retry feel stable instead of UI-patched | | `src/components/Messages.tsx` | `src/AxCopilot/Views/ChatWindow.xaml.cs` | timeline derives from normalized conversation/session state only | no token-only completions, blank cards, or direct injected duplicates | | `src/components/StatusLine.tsx` | `src/AxCopilot/Views/ChatWindow.xaml`, `src/AxCopilot/Views/ChatWindow.xaml.cs` | status strip computed from debounced runtime state, not multiple imperative refresh calls | metadata stays lightweight and does not overpower message timeline | ## AX Agent Improvement Phases ### Phase A. Runtime State Canonicalization - Reference: `src/bootstrap/state.ts` - AX apply location: `src/AxCopilot/Views/ChatWindow.xaml.cs`, `src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs`, `src/AxCopilot/Services/ChatStorageService.cs` - Completion criteria: - `Chat`, `Cowork`, `Code` all update one shared runtime/session state model. - queue, retry, post-compaction, and execution-event state can be restored after reopen. - Quality criteria: - reopening a conversation reproduces the same visible timeline without extra assistant cards. - queue and execution badges remain in sync with the stored conversation. ### Phase B. Prepared Execution Unification - Reference: `src/bridge/initReplBridge.ts` - AX apply location: `src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs`, `src/AxCopilot/Services/LlmService.cs` - Completion criteria: - prompt stack assembly, execution mode choice, and final assistant commit are engine-owned. - send/regenerate/retry/queued follow-up/slash flows all call the same preparation API. - Quality criteria: - behavior is deterministic per tab/settings combination. - UI stops building different prompt stacks for the same conversation state. ### Phase C. AgentLoop Event Normalization - Reference: `src/bridge/sessionRunner.ts`, `src/bridge/bridgeMessaging.ts` - AX apply location: `src/AxCopilot/Services/Agent/AgentLoopService.cs`, `src/AxCopilot/Services/Agent/AgentLoopTransitions.cs`, `src/AxCopilot/Services/Agent/AgentLoopTransitions.Execution.cs` - Completion criteria: - loop events are normalized into bounded activity/event records before UI consumption. - permission requests, failure states, retries, and completion states use a stable event shape. - Quality criteria: - Cowork/Code no longer flash rapidly during long-running tool sequences. - file path/debug detail remains collapsed by default. ### Phase D. Timeline Render Parity - Reference: `src/screens/REPL.tsx`, `src/components/Messages.tsx` - AX apply location: `src/AxCopilot/Views/ChatWindow.xaml`, `src/AxCopilot/Views/ChatWindow.xaml.cs` - Completion criteria: - assistant/user messages, execution logs, compact boundaries, and queue summaries are rendered from one derived timeline model. - direct imperative bubble injection is removed from normal send/regenerate/retry flows. - Quality criteria: - no blank assistant cards. - no token-only completion without visible content. - no duplicate event banners after re-render. ### Phase E. Composer and Status Strip Simplification - Reference: `src/screens/REPL.tsx`, `src/components/StatusLine.tsx` - AX apply location: `src/AxCopilot/Views/ChatWindow.xaml`, `src/AxCopilot/Views/ChatWindow.xaml.cs` - Completion criteria: - composer height grows only on explicit line breaks. - status strip, queue summary, and runtime activity all use debounced runtime updates. - Chat/Cowork/Code share one responsive width calculation policy. - Quality criteria: - resizing feels natural. - composer does not keep growing after send. - metadata remains subordinate to the message timeline. ### Phase F. Recovery, Resume, and Verification - Reference: `src/bootstrap/state.ts`, `src/bridge/sessionRunner.ts`, `src/screens/REPL.tsx` - AX apply location: `src/AxCopilot/Views/ChatWindow.xaml.cs`, `src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs`, `src/AxCopilot/Services/ChatStorageService.cs` - Completion criteria: - reopen after interruption keeps queue, runtime summary, and latest visible assistant state consistent. - retry-last and regenerate do not depend on mutating `InputBox.Text`. - all three tabs pass reopen/retry/manual compact/manual stop/manual resume scenarios. - Quality criteria: - stored conversation and rendered conversation stay identical after restore. - final reopened state matches the last completed runtime state. ## Execution Tracks 1. Hook contract parity - Structured hook output support (`updatedInput`, `updatedPermissions`, `additionalContext`). - Runtime gating through settings toggles. 2. Session/state parity - Deterministic run resume rules. - Stable jsonl event schema + replay compatibility. 3. Recovery parity - Failure-type classification and standardized retry guidance. - Reduced repeated wrong-tool loops. 4. Completion parity - Evidence-based finalization criteria for code/document tasks. ## Done Criteria - Internal parity scenarios pass target threshold. - Resume/replay failures: zero. - `dotnet build` warnings/errors: zero. ## Validation Matrix - Build: `dotnet build src/AxCopilot/AxCopilot.csproj -c Release -v minimal -p:OutputPath=bin\\verify\\ -p:IntermediateOutputPath=obj\\verify\\` - Manual scenario 1: Chat send -> answer visible -> retry -> regenerate -> reopen conversation - Manual scenario 2: Cowork tool run -> progress summary -> completion -> queue next request -> reopen - Manual scenario 3: Code task with execution log noise -> completion -> compact -> next turn -> reopen - Manual scenario 4: AX Agent internal settings change -> immediate runtime reflection without layout regression ## Canonical Prompt Set - Updated: 2026-04-05 22:04 (KST) - The following prompt set should be used for AX vs `claw-code` parity checks. The goal is not byte-identical output, but equivalent execution route, approval behavior, and artifact/result quality. - Operational checklist copy: `docs/AX_AGENT_REGRESSION_PROMPTS.md` 1. Chat basic answer - Prompt: `회의 일정 조정 메일을 정중한 한국어로 써줘` - Apply to: `Chat` - Verify: normal reply render, retry/regenerate stability, reopen durability 2. Chat long-form explanation - Prompt: `RAG와 fine-tuning 차이를 실무 관점으로 7가지로 설명해줘` - Apply to: `Chat` - Verify: long response rendering, compaction follow-up continuity 3. Cowork document task - Prompt: `신규 ERP 도입 제안서 초안을 작성해줘. 목적, 범위, 기대효과, 추진일정 포함` - Apply to: `Cowork` - Verify: topic/task preset routing, plan-first execution, actual document-oriented output path 4. Cowork data task - Prompt: `매출 CSV를 분석해서 월별 추세와 이상치를 요약해줘` - Apply to: `Cowork` - Verify: data-analysis tool choice, reduced runtime noise, final summary quality 5. Code bug-fix task - Prompt: `현재 프로젝트에서 설정 저장 버그 원인 찾고 수정해줘` - Apply to: `Code` - Verify: read/search/edit path, diff persistence, reopen consistency 6. Code build/test task - Prompt: `빌드 오류를 재현하고 수정한 뒤 다시 빌드해줘` - Apply to: `Code` - Verify: build/test loop, failure retry, final completion message 7. Queued follow-up - Prompt sequence: - `이 창 레이아웃 문제 원인 찾아줘` - `끝나면 README도 같이 갱신해줘` - Apply to: `Cowork`, `Code` - Verify: queue chaining, next-turn pickup without UI mutation 8. Post-compaction continuity - Prompt: `지금까지 논의한 내용을 5줄로 이어서 정리하고 다음 작업 제안해줘` - Apply to: `Chat`, `Cowork`, `Code` - Verify: compact-after-next-turn continuity, no token-only completion 9. Permission approval - Prompt: `이 파일을 수정해서 저장해줘` - Apply to: `Code` - Verify: permission request, approve/reject rendering, final transcript consistency 10. Slash / skill entry - Prompt: `/bug-hunt src 폴더 잠재 버그 찾아줘` - Apply to: `Code` - Verify: slash entry uses the same prepared-execution route as normal send ## Tool / Skill Delta Snapshot - Updated: 2026-04-05 22:04 (KST) - AX tool registry count is larger than `claw-code`, but the shape is different. - AX reference: `src/AxCopilot/Services/Agent/ToolRegistry.cs` - `claw-code` reference: `src/tools/*`, `src/skills/bundledSkills.ts` ### AX stronger areas - Document/office generation and conversion (`ExcelSkill`, `DocxSkill`, `PptxSkill`, `DocumentPlannerTool`, `DocumentAssemblerTool`) - Data/business utilities (`DataPivotTool`, `SqlTool`, `FormatConvertTool`, `TextSummarizeTool`) - WPF-integrated enterprise UX and Korean workflow presets ### claw-code stronger areas - Transcript-native tool use / rejection / approval message taxonomy - Plan approval request/response rendering in the message stream - Permission and tool-result message consistency - Bundled skill registry and skill message integration ### Remaining parity target - Keep AX's richer business/document tool set - Bring transcript rendering and approval/status UX closer to `claw-code` ## Transcript-First Approval / Ask UX - Updated: 2026-04-05 18:58 (KST) - `plan approval` and `user ask` should both resolve inside the transcript first. - Secondary windows are allowed only as detail surfaces, not as the primary decision flow. - AX implementation status: - `plan approval`: transcript-first, detail view via `PlanViewerWindow` - `user ask`: transcript-first inline question card with choices / direct input / submit ## Tool / Skill UX Parity Follow-up - Updated: 2026-04-05 19:04 (KST) - Default transcript should prefer role-oriented badges and readable labels over raw internal tool names. - AX implementation status: - tool event badges: simplified to role-first labels - item naming: normalized into readable Korean labels or `/skill-name` style - observability panels: permission/background diagnostics reduced outside debug mode - Remaining quality target: - move more tool-result and permission-result presentation into smaller message-type-specific helpers, closer to `claw-code` component separation ## Current Snapshot - Updated: 2026-04-05 19:42 (KST) - Estimated parity: - Core engine: `89%` - Main transcript UI: `96%` - Cowork/Code runtime UX: `92%` - Internal settings linkage: `88%` - Overall AX Agent parity: `93%` ## Remaining Gaps 1. Prompt lifecycle parity - `claw-code` reference: `src/utils/handlePromptSubmit.ts`, `src/utils/processUserInput/processTextPrompt.ts` - AX gap: - `send / retry / regenerate` are mostly unified, but `slash / compact 후 다음 턴 / 일부 queue 후처리`는 아직 `ChatWindow.xaml.cs`에서 UI 상태를 먼저 만지는 구간이 남아 있습니다. - 목표는 모든 입력 진입점이 `AxAgentExecutionEngine`의 동일한 prepare/execute/finalize 축만 타게 만드는 것입니다. 2. Plan / approval rendering parity - `claw-code` reference: `src/components/messages/PlanApprovalMessage.tsx` - AX gap: - 기본 transcript에서는 compact pill 위주로 줄였지만, 승인/계획 결과 표현이 아직 `Popup/Window + WPF 카드`와 섞여 있습니다. ## Quality Uplift Plan - Updated: 2026-04-06 00:22 (KST) - Goal: move AX Agent from parity-oriented stability into `claw-code`-grade maintainability and transcript quality, without copying implementation expression. ### Track 1. Transcript Renderer Decomposition - `claw-code` references: - `src/components/Messages.tsx` - `src/components/MessageRow.tsx` - `src/components/messages/AssistantToolUseMessage.tsx` - `src/components/messages/PlanApprovalMessage.tsx` - AX apply targets: - `src/AxCopilot/Views/ChatWindow.xaml.cs` - new partial/helper files under `src/AxCopilot/Views/` - Completion criteria: - `plan / permission / ask / tool-result / task-summary` rendering no longer lives as one large block inside `ChatWindow.xaml.cs` - each transcript concern has a dedicated helper/partial/class boundary - Quality criteria: - render changes for one message type do not regress unrelated timeline behavior - transcript behavior remains stable after reopen / retry / regenerate ### Track 2. Permission Presentation Catalog - `claw-code` references: - `src/components/permissions/PermissionRequest.tsx` - `src/components/permissions/PermissionDialog.tsx` - tool-specific permission request components under `src/components/permissions/*` - AX apply targets: - `src/AxCopilot/Services/Agent/PermissionModeCatalog.cs` - new `src/AxCopilot/Services/Agent/PermissionRequestPresentationCatalog.cs` - `src/AxCopilot/Views/ChatWindow.xaml.cs` - Completion criteria: - permission request title, subtitle, icon, severity, and choice set are resolved by tool/request type - file edit / shell / skill / ask-user / web-like permission requests use distinct presentation metadata - Quality criteria: - permission prompts feel explicit and predictable - user can distinguish request type without reading raw tool names or payload ### Track 3. Tool Result Message Taxonomy - `claw-code` references: - `src/components/messages/UserToolResultMessage/UserToolSuccessMessage.tsx` - `src/components/messages/UserToolResultMessage/UserToolErrorMessage.tsx` - `src/components/messages/UserToolResultMessage/UserToolRejectMessage.tsx` - `src/components/messages/UserToolResultMessage/UserToolCanceledMessage.tsx` - AX apply targets: - new `src/AxCopilot/Services/Agent/ToolResultPresentationCatalog.cs` - `src/AxCopilot/Views/ChatWindow.TranscriptPolicy.cs` - `src/AxCopilot/Views/ChatWindow.xaml.cs` - Completion criteria: - transcript display rules differ for `success / error / reject / cancel` - tool-result badges and summaries are resolved from presentation metadata instead of inline ad-hoc branches - Quality criteria: - result cards read as stable UX language, not raw execution logs - failed and rejected tool runs are visually distinct without increasing noise ### Track 4. Plan Approval Transcript-Only Flow - `claw-code` references: - `src/components/messages/PlanApprovalMessage.tsx` - `src/components/messages/UserPlanMessage.tsx` - AX apply targets: - `src/AxCopilot/Views/ChatWindow.xaml.cs` - `src/AxCopilot/Views/PlanViewerWindow.cs` - Completion criteria: - default approval / reject / revise flow completes inline in transcript - `PlanViewerWindow` is detail-only and never required for primary approval flow - Quality criteria: - planning feels like part of the conversation, not a modal interruption - approval history is replayable from persisted conversation state ### Track 5. Runtime Summary Layer - `claw-code` references: - `src/components/StatusLine.tsx` - `src/components/PromptInput/PromptInputFooter.tsx` - `src/bootstrap/state.ts` - AX apply targets: - `src/AxCopilot/Services/AppStateService.cs` - `src/AxCopilot/Views/ChatWindow.xaml.cs` - Completion criteria: - one runtime/status summary model feeds the status line, queue summary, runtime badge, and completion hint - status rendering no longer depends on scattered imperative refresh branches - Quality criteria: - no contradictory or stale runtime badges - long-running Cowork/Code sessions stay visually calm ### Track 6. Regression Prompt Ritual - `claw-code` references: - runtime validation scenarios implied by `sessionRunner`, `Messages`, `StatusLine`, and permission components - AX apply targets: - `docs/AX_AGENT_REGRESSION_PROMPTS.md` - `docs/claw-code-parity-plan.md` - developer workflow / release checklist - Completion criteria: - Chat / Cowork / Code prompt set is treated as mandatory regression for runtime-affecting changes - each prompt is mapped to a failure class (`blank reply`, `duplicate banner`, `bad approval flow`, `queue drift`, `restore drift`) - Quality criteria: - parity claims are based on repeatable checks instead of visual spot-checks - regressions are easier to catch before release ## Recommended Execution Order 1. Transcript renderer decomposition 2. Permission presentation catalog 3. Tool result taxonomy 4. Plan approval transcript-only flow 5. Runtime summary layer 6. Regression prompt ritual hardening ## Settings and Logic Review - Updated: 2026-04-06 00:22 (KST) - Candidate to move to developer-only: - `FreeTierDelaySeconds` - `MaxAgentIterations` - `MaxRetryOnError` - Keep as runtime-critical user settings: - `OperationMode` - `MaxContextTokens` - `ContextCompactTriggerPercent` - `EnableProactiveContextCompact` - `EnableCoworkVerification` - `EnableCodeVerification` - code tool exposure toggles - Rule: - if a setting changes the main execution route or recovery semantics without representing a stable real-world user choice, move it out of default user-facing surfaces - 목표는 “본문 우선 + 필요 시 열기” 기준으로 더 단일한 timeline 언어로 수렴시키는 것입니다. 3. Status line / composer parity - `claw-code` reference: `src/components/StatusLine.tsx`, `src/components/PromptInput/PromptInput.tsx` - AX gap: - 하단 상태바와 composer 옵션은 많이 줄었지만, 상태 메타가 여전히 분산돼 있고 일부 토글/빠른 설정이 별도 행으로 남아 있습니다. - 목표는 transcript 하단의 작업 바 한 축으로 더 압축하는 것입니다. 4. Runtime event density parity - `claw-code` reference: `src/bridge/sessionRunner.ts`, `src/components/StatusNotices.tsx` - AX gap: - non-debug 기본 로그는 줄었지만, 일부 Cowork/Code 이벤트는 여전히 timeline을 자주 흔듭니다. - 목표는 `permission / tool / error / complete / paused / resumed`를 더 안정된 event shape로 정규화하는 것입니다. ## Settings Review - Remove candidate: - `PlanMode` - current state: 사용자 노출 UI와 저장 경로는 `off` 고정으로 정리됐지만 `AppSettings`, `SettingsViewModel`, `AppStateService` 타입 잔재가 남아 있음 - rationale: 현재 정책이 `off` 고정이라 사용자 선택값이 엔진에 의미 있게 기여하지 않음 - `Code.EnablePlanModeTools` - current state: UI/저장 경로와 기본값은 `false` 고정으로 정리됐지만 모델/설정 타입에 호환용 잔재가 남아 있음 - rationale: 현재 엔진 정책에서 실제 실행 경로를 더 이상 바꾸지 않음 - Move to developer-only candidate: - `FreeTierDelaySeconds` - rationale: 일반 사용자가 조정할 이유가 적고 엔진 지연 정책에 직접 영향 - `MaxAgentIterations` - `MaxRetryOnError` - rationale: 핵심 실행 루프 품질에 직접 영향하는 런타임 튜닝값 - Keep as runtime-critical: - `OperationMode` - `MaxContextTokens` - `ContextCompactTriggerPercent` - `EnableProactiveContextCompact` - `EnableCoworkVerification` - `EnableCodeVerification` - `Code.EnableWorktreeTools / EnableTeamTools / EnableCronTools` ## Known UX / Performance Risks - Topic preset hover flicker was caused by duplicate hover systems: - custom hover label - default WPF `ToolTip` - AX fix: - remove default `ToolTip` from topic cards and keep a single hover label path - Remaining runtime performance review targets: - `RefreshContextUsageVisual()` frequency - `BuildTopicButtons()` rebuild frequency - `OnAgentEvent` timeline churn during long Cowork/Code runs - compact queue summary still needs one more pass to fully match `claw-code` footer minimalism ## Progress Notes - 업데이트: 2026-04-06 00:58 (KST) - transcript renderer 분리 1차 완료 - AX 적용: [ChatWindow.InlineInteractions.cs](/E:/AX%20Copilot%20-%20Codex/src/AxCopilot/Views/ChatWindow.InlineInteractions.cs), [ChatWindow.TaskSummary.cs](/E:/AX%20Copilot%20-%20Codex/src/AxCopilot/Views/ChatWindow.TaskSummary.cs) - 완료 조건: `plan / ask / task-summary` 렌더 helper가 메인 `ChatWindow.xaml.cs` 밖으로 이동 - permission / tool-result presentation catalog 도입 - AX 적용: [PermissionRequestPresentationCatalog.cs](/E:/AX%20Copilot%20-%20Codex/src/AxCopilot/Services/Agent/PermissionRequestPresentationCatalog.cs), [ToolResultPresentationCatalog.cs](/E:/AX%20Copilot%20-%20Codex/src/AxCopilot/Services/Agent/ToolResultPresentationCatalog.cs) - 완료 조건: `AddAgentEventBanner(...)`가 권한/도구 결과 badge 메타를 inline switch가 아니라 catalog에서 해석 - runtime summary 전용 계층 1차 반영 - AX 적용: [AppStateService.cs](/E:/AX%20Copilot%20-%20Codex/src/AxCopilot/Services/AppStateService.cs) - 완료 조건: 상태선 UI가 `OperationalStatusPresentationState`를 소비해 strip/runtime badge visibility를 계산