Files
AX-Copilot-Codex/docs/claw-code-parity-plan.md
lacvet 216b050398
Some checks failed
Release Gate / gate (push) Has been cancelled
claw-code 대비 AX Agent 품질 향상 계획 구체화
- claw-code 소스 구조와 AX Agent 구조를 다시 대조해 추가 품질 향상 계획 수립
- transcript renderer 분리, permission presentation catalog, tool result taxonomy, plan approval inline 마감, runtime summary 계층화, regression prompt ritual 고정 계획 문서화
- 런타임 핵심 설정과 개발자 전용 이동 후보 설정을 구분해 정리
- README 및 DEVELOPMENT 문서에 2026-04-06 00:22 (KST) 기준 이력 반영
- dotnet build src/AxCopilot/AxCopilot.csproj -c Release -v minimal -p:OutputPath=bin\verify\ -p:IntermediateOutputPath=obj\verify\ 경고 0 오류 0 확인
2026-04-05 21:26:25 +09:00

22 KiB

Claw Code Parity Plan (Rewritten)

Scope

  • Align AX Copilot with claw-code quality for loop reliability, permission/hook behavior, and session durability.

Update

  • Updated: 2026-04-05 15:34 (KST)
  • Rebased the AX Agent improvement plan on actual claw-code runtime files instead of earlier AX snapshots. The reference spine is now src/bootstrap/state.ts -> src/bridge/initReplBridge.ts -> src/bridge/sessionRunner.ts -> src/screens/REPL.tsx -> src/components/Messages.tsx -> src/components/StatusLine.tsx.
  • AX Agent work should follow that same quality order: state first, execution second, render last. UI-only fixes that bypass state/execution should be treated as temporary.
  • Updated: 2026-04-05 16:55 (KST)
  • Current estimated parity vs claw-code: core execution engine 82%, main chat UI 68%, Cowork/Code status UX 63%, internal settings linkage 88%, overall AX Agent 74%.
  • Engine-affecting settings should be handled conservatively during parity work. If a setting changes the main execution route, approval flow, or recovery behavior without representing a stable real-world user choice, it should be moved to developer-only UI or removed from user-facing surfaces.

Preserved History (Summary)

  • Core loop guards and post-tool verification gates are already partially implemented.
  • Plan Mode, parallel tool execution, and unknown-tool recovery are in place.
  • Session restore hardening is ongoing.

Reference Map

claw-code reference AX apply target completion criteria quality criteria
src/bootstrap/state.ts src/AxCopilot/Views/ChatWindow.xaml.cs, src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs, src/AxCopilot/Services/ChatStorageService.cs one canonical runtime/session state for current turn, queue, retry, execution events, and persisted snapshot reopen/retry/queue flows do not create duplicate or blank assistant messages
src/bridge/initReplBridge.ts src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs, src/AxCopilot/Services/LlmService.cs send/regenerate/retry/queued follow-up/slash all enter through one prepared-execution path same input under same settings takes same execution route regardless of entry point
src/bridge/sessionRunner.ts src/AxCopilot/Services/Agent/AgentLoopService.cs, src/AxCopilot/Services/Agent/AgentLoopTransitions.cs, src/AxCopilot/Services/Agent/AgentLoopTransitions.Execution.cs tool start/result/error/progress normalized once inside loop layer Cowork/Code no longer flash repeated status strings or overshare debug payloads
src/bridge/bridgeMessaging.ts src/AxCopilot/Views/ChatWindow.xaml.cs, src/AxCopilot/Services/Agent/AgentLoopService.cs inbound execution events separated from display-only events before UI render execution event replay does not duplicate visible timeline banners
src/screens/REPL.tsx src/AxCopilot/Views/ChatWindow.xaml, src/AxCopilot/Views/ChatWindow.xaml.cs screen state transitions, queue flow, retry flow, and composer state use shared runtime helpers window resize, queue chaining, and retry feel stable instead of UI-patched
src/components/Messages.tsx src/AxCopilot/Views/ChatWindow.xaml.cs timeline derives from normalized conversation/session state only no token-only completions, blank cards, or direct injected duplicates
src/components/StatusLine.tsx src/AxCopilot/Views/ChatWindow.xaml, src/AxCopilot/Views/ChatWindow.xaml.cs status strip computed from debounced runtime state, not multiple imperative refresh calls metadata stays lightweight and does not overpower message timeline

AX Agent Improvement Phases

Phase A. Runtime State Canonicalization

  • Reference: src/bootstrap/state.ts
  • AX apply location: src/AxCopilot/Views/ChatWindow.xaml.cs, src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs, src/AxCopilot/Services/ChatStorageService.cs
  • Completion criteria:
    • Chat, Cowork, Code all update one shared runtime/session state model.
    • queue, retry, post-compaction, and execution-event state can be restored after reopen.
  • Quality criteria:
    • reopening a conversation reproduces the same visible timeline without extra assistant cards.
    • queue and execution badges remain in sync with the stored conversation.

Phase B. Prepared Execution Unification

  • Reference: src/bridge/initReplBridge.ts
  • AX apply location: src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs, src/AxCopilot/Services/LlmService.cs
  • Completion criteria:
    • prompt stack assembly, execution mode choice, and final assistant commit are engine-owned.
    • send/regenerate/retry/queued follow-up/slash flows all call the same preparation API.
  • Quality criteria:
    • behavior is deterministic per tab/settings combination.
    • UI stops building different prompt stacks for the same conversation state.

Phase C. AgentLoop Event Normalization

  • Reference: src/bridge/sessionRunner.ts, src/bridge/bridgeMessaging.ts
  • AX apply location: src/AxCopilot/Services/Agent/AgentLoopService.cs, src/AxCopilot/Services/Agent/AgentLoopTransitions.cs, src/AxCopilot/Services/Agent/AgentLoopTransitions.Execution.cs
  • Completion criteria:
    • loop events are normalized into bounded activity/event records before UI consumption.
    • permission requests, failure states, retries, and completion states use a stable event shape.
  • Quality criteria:
    • Cowork/Code no longer flash rapidly during long-running tool sequences.
    • file path/debug detail remains collapsed by default.

Phase D. Timeline Render Parity

  • Reference: src/screens/REPL.tsx, src/components/Messages.tsx
  • AX apply location: src/AxCopilot/Views/ChatWindow.xaml, src/AxCopilot/Views/ChatWindow.xaml.cs
  • Completion criteria:
    • assistant/user messages, execution logs, compact boundaries, and queue summaries are rendered from one derived timeline model.
    • direct imperative bubble injection is removed from normal send/regenerate/retry flows.
  • Quality criteria:
    • no blank assistant cards.
    • no token-only completion without visible content.
    • no duplicate event banners after re-render.

Phase E. Composer and Status Strip Simplification

  • Reference: src/screens/REPL.tsx, src/components/StatusLine.tsx
  • AX apply location: src/AxCopilot/Views/ChatWindow.xaml, src/AxCopilot/Views/ChatWindow.xaml.cs
  • Completion criteria:
    • composer height grows only on explicit line breaks.
    • status strip, queue summary, and runtime activity all use debounced runtime updates.
    • Chat/Cowork/Code share one responsive width calculation policy.
  • Quality criteria:
    • resizing feels natural.
    • composer does not keep growing after send.
    • metadata remains subordinate to the message timeline.

Phase F. Recovery, Resume, and Verification

  • Reference: src/bootstrap/state.ts, src/bridge/sessionRunner.ts, src/screens/REPL.tsx
  • AX apply location: src/AxCopilot/Views/ChatWindow.xaml.cs, src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs, src/AxCopilot/Services/ChatStorageService.cs
  • Completion criteria:
    • reopen after interruption keeps queue, runtime summary, and latest visible assistant state consistent.
    • retry-last and regenerate do not depend on mutating InputBox.Text.
    • all three tabs pass reopen/retry/manual compact/manual stop/manual resume scenarios.
  • Quality criteria:
    • stored conversation and rendered conversation stay identical after restore.
    • final reopened state matches the last completed runtime state.

Execution Tracks

  1. Hook contract parity
  • Structured hook output support (updatedInput, updatedPermissions, additionalContext).
  • Runtime gating through settings toggles.
  1. Session/state parity
  • Deterministic run resume rules.
  • Stable jsonl event schema + replay compatibility.
  1. Recovery parity
  • Failure-type classification and standardized retry guidance.
  • Reduced repeated wrong-tool loops.
  1. Completion parity
  • Evidence-based finalization criteria for code/document tasks.

Done Criteria

  • Internal parity scenarios pass target threshold.
  • Resume/replay failures: zero.
  • dotnet build warnings/errors: zero.

Validation Matrix

  • Build: dotnet build src/AxCopilot/AxCopilot.csproj -c Release -v minimal -p:OutputPath=bin\\verify\\ -p:IntermediateOutputPath=obj\\verify\\
  • Manual scenario 1: Chat send -> answer visible -> retry -> regenerate -> reopen conversation
  • Manual scenario 2: Cowork tool run -> progress summary -> completion -> queue next request -> reopen
  • Manual scenario 3: Code task with execution log noise -> completion -> compact -> next turn -> reopen
  • Manual scenario 4: AX Agent internal settings change -> immediate runtime reflection without layout regression

Canonical Prompt Set

  • Updated: 2026-04-05 22:04 (KST)
  • The following prompt set should be used for AX vs claw-code parity checks. The goal is not byte-identical output, but equivalent execution route, approval behavior, and artifact/result quality.
  • Operational checklist copy: docs/AX_AGENT_REGRESSION_PROMPTS.md
  1. Chat basic answer
  • Prompt: 회의 일정 조정 메일을 정중한 한국어로 써줘
  • Apply to: Chat
  • Verify: normal reply render, retry/regenerate stability, reopen durability
  1. Chat long-form explanation
  • Prompt: RAG와 fine-tuning 차이를 실무 관점으로 7가지로 설명해줘
  • Apply to: Chat
  • Verify: long response rendering, compaction follow-up continuity
  1. Cowork document task
  • Prompt: 신규 ERP 도입 제안서 초안을 작성해줘. 목적, 범위, 기대효과, 추진일정 포함
  • Apply to: Cowork
  • Verify: topic/task preset routing, plan-first execution, actual document-oriented output path
  1. Cowork data task
  • Prompt: 매출 CSV를 분석해서 월별 추세와 이상치를 요약해줘
  • Apply to: Cowork
  • Verify: data-analysis tool choice, reduced runtime noise, final summary quality
  1. Code bug-fix task
  • Prompt: 현재 프로젝트에서 설정 저장 버그 원인 찾고 수정해줘
  • Apply to: Code
  • Verify: read/search/edit path, diff persistence, reopen consistency
  1. Code build/test task
  • Prompt: 빌드 오류를 재현하고 수정한 뒤 다시 빌드해줘
  • Apply to: Code
  • Verify: build/test loop, failure retry, final completion message
  1. Queued follow-up
  • Prompt sequence:
    • 이 창 레이아웃 문제 원인 찾아줘
    • 끝나면 README도 같이 갱신해줘
  • Apply to: Cowork, Code
  • Verify: queue chaining, next-turn pickup without UI mutation
  1. Post-compaction continuity
  • Prompt: 지금까지 논의한 내용을 5줄로 이어서 정리하고 다음 작업 제안해줘
  • Apply to: Chat, Cowork, Code
  • Verify: compact-after-next-turn continuity, no token-only completion
  1. Permission approval
  • Prompt: 이 파일을 수정해서 저장해줘
  • Apply to: Code
  • Verify: permission request, approve/reject rendering, final transcript consistency
  1. Slash / skill entry
  • Prompt: /bug-hunt src 폴더 잠재 버그 찾아줘
  • Apply to: Code
  • Verify: slash entry uses the same prepared-execution route as normal send

Tool / Skill Delta Snapshot

  • Updated: 2026-04-05 22:04 (KST)
  • AX tool registry count is larger than claw-code, but the shape is different.
  • AX reference: src/AxCopilot/Services/Agent/ToolRegistry.cs
  • claw-code reference: src/tools/*, src/skills/bundledSkills.ts

AX stronger areas

  • Document/office generation and conversion (ExcelSkill, DocxSkill, PptxSkill, DocumentPlannerTool, DocumentAssemblerTool)
  • Data/business utilities (DataPivotTool, SqlTool, FormatConvertTool, TextSummarizeTool)
  • WPF-integrated enterprise UX and Korean workflow presets

claw-code stronger areas

  • Transcript-native tool use / rejection / approval message taxonomy
  • Plan approval request/response rendering in the message stream
  • Permission and tool-result message consistency
  • Bundled skill registry and skill message integration

Remaining parity target

  • Keep AX's richer business/document tool set
  • Bring transcript rendering and approval/status UX closer to claw-code

Transcript-First Approval / Ask UX

  • Updated: 2026-04-05 18:58 (KST)
  • plan approval and user ask should both resolve inside the transcript first.
  • Secondary windows are allowed only as detail surfaces, not as the primary decision flow.
  • AX implementation status:
    • plan approval: transcript-first, detail view via PlanViewerWindow
    • user ask: transcript-first inline question card with choices / direct input / submit

Tool / Skill UX Parity Follow-up

  • Updated: 2026-04-05 19:04 (KST)
  • Default transcript should prefer role-oriented badges and readable labels over raw internal tool names.
  • AX implementation status:
    • tool event badges: simplified to role-first labels
    • item naming: normalized into readable Korean labels or /skill-name style
    • observability panels: permission/background diagnostics reduced outside debug mode
  • Remaining quality target:
    • move more tool-result and permission-result presentation into smaller message-type-specific helpers, closer to claw-code component separation

Current Snapshot

  • Updated: 2026-04-05 19:42 (KST)
  • Estimated parity:
    • Core engine: 89%
    • Main transcript UI: 96%
    • Cowork/Code runtime UX: 92%
    • Internal settings linkage: 88%
    • Overall AX Agent parity: 93%

Remaining Gaps

  1. Prompt lifecycle parity
  • claw-code reference: src/utils/handlePromptSubmit.ts, src/utils/processUserInput/processTextPrompt.ts
  • AX gap:
    • send / retry / regenerate are mostly unified, but slash / compact 후 다음 턴 / 일부 queue 후처리는 아직 ChatWindow.xaml.cs에서 UI 상태를 먼저 만지는 구간이 남아 있습니다.
    • 목표는 모든 입력 진입점이 AxAgentExecutionEngine의 동일한 prepare/execute/finalize 축만 타게 만드는 것입니다.
  1. Plan / approval rendering parity
  • claw-code reference: src/components/messages/PlanApprovalMessage.tsx
  • AX gap:
  • 기본 transcript에서는 compact pill 위주로 줄였지만, 승인/계획 결과 표현이 아직 Popup/Window + WPF 카드와 섞여 있습니다.

Quality Uplift Plan

  • Updated: 2026-04-06 00:22 (KST)
  • Goal: move AX Agent from parity-oriented stability into claw-code-grade maintainability and transcript quality, without copying implementation expression.

Track 1. Transcript Renderer Decomposition

  • claw-code references:
    • src/components/Messages.tsx
    • src/components/MessageRow.tsx
    • src/components/messages/AssistantToolUseMessage.tsx
    • src/components/messages/PlanApprovalMessage.tsx
  • AX apply targets:
    • src/AxCopilot/Views/ChatWindow.xaml.cs
    • new partial/helper files under src/AxCopilot/Views/
  • Completion criteria:
    • plan / permission / ask / tool-result / task-summary rendering no longer lives as one large block inside ChatWindow.xaml.cs
    • each transcript concern has a dedicated helper/partial/class boundary
  • Quality criteria:
    • render changes for one message type do not regress unrelated timeline behavior
    • transcript behavior remains stable after reopen / retry / regenerate

Track 2. Permission Presentation Catalog

  • claw-code references:
    • src/components/permissions/PermissionRequest.tsx
    • src/components/permissions/PermissionDialog.tsx
    • tool-specific permission request components under src/components/permissions/*
  • AX apply targets:
    • src/AxCopilot/Services/Agent/PermissionModeCatalog.cs
    • new src/AxCopilot/Services/Agent/PermissionRequestPresentationCatalog.cs
    • src/AxCopilot/Views/ChatWindow.xaml.cs
  • Completion criteria:
    • permission request title, subtitle, icon, severity, and choice set are resolved by tool/request type
    • file edit / shell / skill / ask-user / web-like permission requests use distinct presentation metadata
  • Quality criteria:
    • permission prompts feel explicit and predictable
    • user can distinguish request type without reading raw tool names or payload

Track 3. Tool Result Message Taxonomy

  • claw-code references:
    • src/components/messages/UserToolResultMessage/UserToolSuccessMessage.tsx
    • src/components/messages/UserToolResultMessage/UserToolErrorMessage.tsx
    • src/components/messages/UserToolResultMessage/UserToolRejectMessage.tsx
    • src/components/messages/UserToolResultMessage/UserToolCanceledMessage.tsx
  • AX apply targets:
    • new src/AxCopilot/Services/Agent/ToolResultPresentationCatalog.cs
    • src/AxCopilot/Views/ChatWindow.TranscriptPolicy.cs
    • src/AxCopilot/Views/ChatWindow.xaml.cs
  • Completion criteria:
    • transcript display rules differ for success / error / reject / cancel
    • tool-result badges and summaries are resolved from presentation metadata instead of inline ad-hoc branches
  • Quality criteria:
    • result cards read as stable UX language, not raw execution logs
    • failed and rejected tool runs are visually distinct without increasing noise

Track 4. Plan Approval Transcript-Only Flow

  • claw-code references:
    • src/components/messages/PlanApprovalMessage.tsx
    • src/components/messages/UserPlanMessage.tsx
  • AX apply targets:
    • src/AxCopilot/Views/ChatWindow.xaml.cs
    • src/AxCopilot/Views/PlanViewerWindow.cs
  • Completion criteria:
    • default approval / reject / revise flow completes inline in transcript
    • PlanViewerWindow is detail-only and never required for primary approval flow
  • Quality criteria:
    • planning feels like part of the conversation, not a modal interruption
    • approval history is replayable from persisted conversation state

Track 5. Runtime Summary Layer

  • claw-code references:
    • src/components/StatusLine.tsx
    • src/components/PromptInput/PromptInputFooter.tsx
    • src/bootstrap/state.ts
  • AX apply targets:
    • src/AxCopilot/Services/AppStateService.cs
    • src/AxCopilot/Views/ChatWindow.xaml.cs
  • Completion criteria:
    • one runtime/status summary model feeds the status line, queue summary, runtime badge, and completion hint
    • status rendering no longer depends on scattered imperative refresh branches
  • Quality criteria:
    • no contradictory or stale runtime badges
    • long-running Cowork/Code sessions stay visually calm

Track 6. Regression Prompt Ritual

  • claw-code references:
    • runtime validation scenarios implied by sessionRunner, Messages, StatusLine, and permission components
  • AX apply targets:
    • docs/AX_AGENT_REGRESSION_PROMPTS.md
    • docs/claw-code-parity-plan.md
    • developer workflow / release checklist
  • Completion criteria:
    • Chat / Cowork / Code prompt set is treated as mandatory regression for runtime-affecting changes
    • each prompt is mapped to a failure class (blank reply, duplicate banner, bad approval flow, queue drift, restore drift)
  • Quality criteria:
    • parity claims are based on repeatable checks instead of visual spot-checks
    • regressions are easier to catch before release
  1. Transcript renderer decomposition
  2. Permission presentation catalog
  3. Tool result taxonomy
  4. Plan approval transcript-only flow
  5. Runtime summary layer
  6. Regression prompt ritual hardening

Settings and Logic Review

  • Updated: 2026-04-06 00:22 (KST)
  • Candidate to move to developer-only:
    • FreeTierDelaySeconds
    • MaxAgentIterations
    • MaxRetryOnError
  • Keep as runtime-critical user settings:
    • OperationMode
    • MaxContextTokens
    • ContextCompactTriggerPercent
    • EnableProactiveContextCompact
    • EnableCoworkVerification
    • EnableCodeVerification
    • code tool exposure toggles
  • Rule:
    • if a setting changes the main execution route or recovery semantics without representing a stable real-world user choice, move it out of default user-facing surfaces
    • 목표는 “본문 우선 + 필요 시 열기” 기준으로 더 단일한 timeline 언어로 수렴시키는 것입니다.
  1. Status line / composer parity
  • claw-code reference: src/components/StatusLine.tsx, src/components/PromptInput/PromptInput.tsx
  • AX gap:
    • 하단 상태바와 composer 옵션은 많이 줄었지만, 상태 메타가 여전히 분산돼 있고 일부 토글/빠른 설정이 별도 행으로 남아 있습니다.
    • 목표는 transcript 하단의 작업 바 한 축으로 더 압축하는 것입니다.
  1. Runtime event density parity
  • claw-code reference: src/bridge/sessionRunner.ts, src/components/StatusNotices.tsx
  • AX gap:
    • non-debug 기본 로그는 줄었지만, 일부 Cowork/Code 이벤트는 여전히 timeline을 자주 흔듭니다.
    • 목표는 permission / tool / error / complete / paused / resumed를 더 안정된 event shape로 정규화하는 것입니다.

Settings Review

  • Remove candidate:
    • PlanMode
      • current state: 사용자 노출 UI와 저장 경로는 off 고정으로 정리됐지만 AppSettings, SettingsViewModel, AppStateService 타입 잔재가 남아 있음
      • rationale: 현재 정책이 off 고정이라 사용자 선택값이 엔진에 의미 있게 기여하지 않음
    • Code.EnablePlanModeTools
      • current state: UI/저장 경로와 기본값은 false 고정으로 정리됐지만 모델/설정 타입에 호환용 잔재가 남아 있음
      • rationale: 현재 엔진 정책에서 실제 실행 경로를 더 이상 바꾸지 않음
  • Move to developer-only candidate:
    • FreeTierDelaySeconds
      • rationale: 일반 사용자가 조정할 이유가 적고 엔진 지연 정책에 직접 영향
    • MaxAgentIterations
    • MaxRetryOnError
      • rationale: 핵심 실행 루프 품질에 직접 영향하는 런타임 튜닝값
  • Keep as runtime-critical:
    • OperationMode
    • MaxContextTokens
    • ContextCompactTriggerPercent
    • EnableProactiveContextCompact
    • EnableCoworkVerification
    • EnableCodeVerification
    • Code.EnableWorktreeTools / EnableTeamTools / EnableCronTools

Known UX / Performance Risks

  • Topic preset hover flicker was caused by duplicate hover systems:
    • custom hover label
    • default WPF ToolTip
  • AX fix:
    • remove default ToolTip from topic cards and keep a single hover label path
  • Remaining runtime performance review targets:
    • RefreshContextUsageVisual() frequency
    • BuildTopicButtons() rebuild frequency
    • OnAgentEvent timeline churn during long Cowork/Code runs
    • compact queue summary still needs one more pass to fully match claw-code footer minimalism