- ChatConversation에 CodeWorkingSetSnapshot/진단/시맨틱 요약 상태를 추가하고 AgentLoop가 실행 중 갱신한 작업 세트를 대화에 다시 저장하도록 연결함 - CodeTaskWorkingSetService를 스냅샷 복원/시맨틱 연속성 요약/활성 진단 보호 구조로 확장하고 query assembly에서 working set·semantic summary·workspace bootstrap을 protected evidence로 주입하도록 보강함 - ContextCondenser가 compact 중 MsgId, preview, 토큰 메타데이터를 유지하도록 수정하고 신규 compact marker와 요약 문자열을 영어 안정형으로 정리함 - ChatSessionStateService의 분기 대화가 MsgId와 CodeWorkingSet을 유지하도록 보강하고 관련 회귀 테스트(ChatStorage/ContextCondenser/PreLlmStage/QueryAssembly/WorkingSet)를 추가 및 갱신함 - 검증: dotnet build 경고 0 오류 0, CodeTaskWorkingSetServiceTests|AgentLoopQueryAssemblyServiceTests|AgentQueryContextBuilderTests|ContextCondenserTests|AxAgentExecutionEngineTests|AgentLoopLlmDispatchStageServiceTests|ChatStorageServiceTests 26개 통과, AgentLoopE2ETests 포함 관련 컨텍스트 회귀 56개 통과
390 lines
19 KiB
Markdown
390 lines
19 KiB
Markdown
# Code Context Reliability Plan
|
|
|
|
Update: 2026-04-16 09:12 (KST)
|
|
|
|
- Implemented the remaining continuity gap-closure items that were still open after the earlier staged loop refactor:
|
|
- durable conversation-owned `CodeWorkingSetSnapshot`
|
|
- semantic Code tool-batch continuity summaries
|
|
- compact-safe `MsgId` and preview metadata preservation
|
|
- query-context diagnostics for working-set injection, semantic-summary injection, protected diagnostics, compact-boundary use, and legacy-boundary fallback detection
|
|
- workspace-context bootstrap injection for Code requests when `.ax-context.md` is already available
|
|
- The active implementation now spans:
|
|
- `src/AxCopilot/Models/ChatModels.cs`
|
|
- `src/AxCopilot/Services/Agent/CodeTaskWorkingSetService.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopCodeWorkingSetPersistence.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopQueryAssemblyService.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopLlmRequestPreparationService.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopContextReliability.cs`
|
|
- `src/AxCopilot/Services/Agent/ContextCondenser.cs`
|
|
- `src/AxCopilot/Services/ChatSessionStateService.cs`
|
|
- `src/AxCopilot/Views/ChatWindow.xaml.cs`
|
|
- Remaining follow-up after this wave is now smaller and mainly optional:
|
|
- broaden the workspace bootstrap path from `LoadContext(...)` reuse into a fuller stale-check refresh only if Code runs show it is still needed
|
|
- continue cleaning low-traffic legacy mojibake comments and dormant compatibility strings as those files are touched again
|
|
- evaluate whether the durable Code working set should surface in future debug tooling or exports
|
|
- keep measuring tool-trace repair frequency in longer real Code sessions to confirm the new durable snapshot and semantic summary blocks reduce repair churn
|
|
|
|
Update: 2026-04-16 07:40 (KST)
|
|
|
|
- Closed the main gaps that were still open versus the comparison checklist:
|
|
- durable structured tool transcript persistence back into `conversation.Messages`
|
|
- `MaxContextTokens = 0` Auto mode with model-aware context budget resolution
|
|
- snippet-based post-compact tool trace restoration
|
|
- repeated system prompt deduplication before each prepared turn
|
|
- The active implementation now spans:
|
|
- `src/AxCopilot/Services/Agent/AxAgentExecutionEngine.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
|
- `src/AxCopilot/Services/Agent/ContextCondenser.cs`
|
|
- `src/AxCopilot/Models/AppSettings.cs`
|
|
- `src/AxCopilot/Services/SettingsService.cs`
|
|
- `src/AxCopilot/ViewModels/SettingsViewModel.cs`
|
|
- `src/AxCopilot/Views/SettingsWindow.xaml`
|
|
- `src/AxCopilot/Views/SettingsWindow.xaml.cs`
|
|
- `src/AxCopilot/Views/AgentSettingsWindow.xaml.cs`
|
|
- `src/AxCopilot/Views/ChatWindow.xaml`
|
|
- `src/AxCopilot/Views/ChatWindow.xaml.cs`
|
|
- `src/AxCopilot/Views/ChatWindow.OverlaySettingsPresentation.cs`
|
|
- `src/AxCopilot/Views/ChatWindow.ContextUsagePresentation.cs`
|
|
- `src/AxCopilot/Services/LlmService.cs`
|
|
- `src/AxCopilot/Services/LlmService.ToolUse.cs`
|
|
- Remaining follow-up is now narrower:
|
|
- consider splitting context budget and output budget into separate user-facing controls if future provider tuning needs it
|
|
- continue cleaning legacy mojibake strings in low-traffic diagnostic paths as they are touched
|
|
|
|
Update: 2026-04-16 06:41 (KST)
|
|
|
|
- Added `src/AxCopilot/Services/Agent/AgentLoopLlmDispatchStageService.cs` so the LLM dispatch path is now split into:
|
|
- history/query assembly
|
|
- pre-LLM stage planning
|
|
- dispatch/stream stage execution
|
|
- tool execution and recovery
|
|
- `AgentLoopService` no longer owns the inline stream preview callback or the direct `SendWithToolsWithRecoveryAsync(...)` setup for the primary loop.
|
|
- `StreamingToolExecutionCoordinator.cs` was also normalized to English-only active-path status strings so the staged dispatch path no longer reintroduces mojibake text during wait/retry handling.
|
|
- Remaining structural gap versus the target `claw-code` shape:
|
|
- the `NotSupportedException` / `ToolCallNotSupportedException` fallback branch still lives in `AgentLoopService`
|
|
- the next extraction target should be a narrower fallback policy stage so the main loop keeps shrinking toward a pure orchestrator
|
|
|
|
Update: 2026-04-16 01:37 (KST)
|
|
|
|
## Background
|
|
|
|
Recent Code tab runs show that the LLM request payload is still growing over time. In the `2026-04-16 00:46:26` to `00:50:52` run, the request size grew from `messages=7` to `messages=125`. That means the failure mode is not "context does not grow at all." The real problem is context fidelity: detailed evidence that the model still needs is being replaced too quickly by previews, repair notes, and low-signal summaries.
|
|
|
|
The same log window repeatedly shows:
|
|
|
|
- `tool_calls/tool mismatch detected - flattening assistant message`
|
|
- `orphan tool message detected - converting to user`
|
|
- repeated rereads of nearby files after build failures
|
|
- shifting build failures such as `MC3089` followed by `CS0017` without a stable working set that preserves what was already changed and what remains broken
|
|
|
|
In short, the current system grows the raw message count but does not preserve a stable working set for long-running code tasks.
|
|
|
|
## Current Findings
|
|
|
|
### 1. Workspace context bootstrap is weak on first load
|
|
|
|
- AX targets:
|
|
- `src/AxCopilot/Views/ChatWindow.UtilityPresentation.cs`
|
|
- `src/AxCopilot/Services/Agent/WorkspaceContextGenerator.cs`
|
|
- Finding:
|
|
- When `.ax-context.md` is missing, the first Code request can return before background workspace-context generation becomes useful.
|
|
- Impact:
|
|
- Empty-workspace and fresh-project tasks start without a reliable folder or project summary in the early loops.
|
|
|
|
### 2. Build and file evidence is compacted too aggressively
|
|
|
|
- AX targets:
|
|
- `src/AxCopilot/Services/Agent/AgentToolResultBudget.cs`
|
|
- `src/AxCopilot/Services/Agent/ContextCondenser.cs`
|
|
- Current values:
|
|
- `DefaultSoftCharLimit = 900`
|
|
- `DefaultAggregateBudgetChars = 7_500`
|
|
- `RecentKeepCount = 6`
|
|
- Impact:
|
|
- Code tasks lose detailed build, test, and file-read evidence too early and fall back to previews instead of actionable context.
|
|
|
|
### 3. Session learning is not a durable code working set
|
|
|
|
- AX targets:
|
|
- `src/AxCopilot/Services/Agent/SessionLearningCollector.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
|
- `src/AxCopilot/Views/ChatWindow.SystemPromptBuilder.cs`
|
|
- Finding:
|
|
- Session learnings are injected every loop, but they are not structured strongly enough to lock in:
|
|
- current goal
|
|
- current architecture
|
|
- changed files
|
|
- latest build or test failure
|
|
- next repair target
|
|
- Impact:
|
|
- The model must repeatedly reconstruct project state from noisy history instead of reading a stable code-task memory layer.
|
|
|
|
### 4. Tool-trace invariant repairs are too common
|
|
|
|
- AX targets:
|
|
- `src/AxCopilot/Services/LlmService.ToolUse.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentMessageInvariantHelper.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
|
- Finding:
|
|
- The recent logs show repeated mismatch and orphan corrections.
|
|
- Impact:
|
|
- Even if the total message count grows, the semantic chain between assistant reasoning, tool call, and tool result becomes less reliable.
|
|
|
|
### 5. There is no Code-specific working-set layer
|
|
|
|
- AX targets:
|
|
- new service required
|
|
- injection path should go through:
|
|
- `AgentLoopLlmRequestPreparationService`
|
|
- `AgentQueryContextBuilder`
|
|
- `AgentLoopService`
|
|
- Finding:
|
|
- The current request mixes raw chat history, session learnings, project context, and workspace context, but it does not maintain a dedicated code-task state ledger.
|
|
- Impact:
|
|
- Long-running runs become increasingly inconsistent because the model keeps rediscovering facts that should already be fixed in memory.
|
|
|
|
## External Research Notes
|
|
|
|
### Anthropic Claude Code memory docs
|
|
|
|
- Claude Code explicitly documents memory files that are auto-loaded at startup and inspectable via `/memory`.
|
|
- Planning implication:
|
|
- AX should have a clearly observable memory hierarchy for Code tasks, including what was auto-loaded and why.
|
|
- Source:
|
|
- [Anthropic Claude Code memory docs](https://docs.anthropic.com/zh-CN/docs/claude-code/memory)
|
|
|
|
### OpenAI practical guide to building agents
|
|
|
|
- The guide emphasizes observability, eval baselines, and explicit tool and system design before optimizing agent behavior.
|
|
- Planning implication:
|
|
- AX should log the exact context sections that enter each Code request, including what was compacted and why.
|
|
- Source:
|
|
- [OpenAI practical guide to building agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf)
|
|
|
|
### SWE-Pruner
|
|
|
|
- The paper argues that task-aware adaptive pruning outperforms naive fixed truncation for coding agents.
|
|
- Planning implication:
|
|
- AX should protect code-task evidence such as latest build failures and changed-file summaries instead of applying mostly size-based pruning.
|
|
- Source:
|
|
- [SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents](https://arxiv.org/abs/2601.16746)
|
|
|
|
## `claude-code` Reference Points
|
|
|
|
Reference targets:
|
|
|
|
- `claw-code/claw-code-f5a40b86dede580f6543bf8926c9af017eea9409/src/query.ts`
|
|
- `claw-code/claw-code-f5a40b86dede580f6543bf8926c9af017eea9409/src/history.ts`
|
|
- `claw-code/en/concepts/memory-context.md`
|
|
|
|
Observed direction:
|
|
|
|
- `claude-code` builds a dedicated `messagesForQuery` window.
|
|
- It stages compaction through boundary filtering, tool-result budgeting, snip, microcompact, and autocompact.
|
|
- It treats memory and post-compaction query windows as first-class parts of the request path.
|
|
|
|
AX already has similar mechanisms, but the Code flow still lacks stronger working-set preservation and cleaner invariant handling.
|
|
|
|
## Remediation Plan
|
|
|
|
### Phase 1. Context observability and bootstrap repair
|
|
|
|
- Reference targets:
|
|
- `claw-code/.../src/query.ts`
|
|
- `claw-code/en/concepts/memory-context.md`
|
|
- AX targets:
|
|
- `src/AxCopilot/Views/ChatWindow.UtilityPresentation.cs`
|
|
- `src/AxCopilot/Services/Agent/WorkspaceContextGenerator.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopLlmRequestPreparationService.cs`
|
|
- Work items:
|
|
- guarantee workspace-context generation starts even on first miss
|
|
- log the exact context sections injected into each request
|
|
- add diagnostics for omitted sections
|
|
- Done criteria:
|
|
- empty-workspace runs show workspace context generation by loop 2
|
|
- logs show section names, sizes, and compaction status
|
|
- Quality scenario:
|
|
- a fresh `E:\code` WPF scaffolding run should show folder and project context in the first two request cycles
|
|
|
|
### Phase 2. Code working-set memory layer
|
|
|
|
- Reference targets:
|
|
- `claw-code/.../src/query.ts`
|
|
- `claw-code/.../src/history.ts`
|
|
- Anthropic memory docs
|
|
- AX targets:
|
|
- new `CodeTaskWorkingSetService`
|
|
- `src/AxCopilot/Services/Agent/SessionLearningCollector.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
|
- Work items:
|
|
- maintain a stable structured ledger with:
|
|
- current goal
|
|
- selected architecture
|
|
- changed files
|
|
- latest successful writes
|
|
- open diagnostics
|
|
- next repair target
|
|
- inject it only when changed
|
|
- replace superseded failures with the latest active issue
|
|
- Done criteria:
|
|
- long Code runs keep a single coherent working-set block without noisy duplication
|
|
- build and test failures are preserved as part of the working set
|
|
- Quality scenario:
|
|
- after fixing `MC3089`, the run should still remember the earlier structure change while focusing on the new `CS0017` entry-point failure
|
|
|
|
### Phase 3. Task-aware pruning and protected evidence
|
|
|
|
- Reference targets:
|
|
- `claw-code/.../src/query.ts`
|
|
- SWE-Pruner
|
|
- AX targets:
|
|
- `src/AxCopilot/Services/Agent/AgentToolResultBudget.cs`
|
|
- `src/AxCopilot/Services/Agent/ContextCondenser.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
|
- Work items:
|
|
- protect:
|
|
- latest build error block
|
|
- latest test failure block
|
|
- current plan or working set
|
|
- latest folder tree snapshot
|
|
- last N write diffs
|
|
- move from pure char-based truncation toward semantic snapshots
|
|
- tune compaction rules specifically for Code tasks
|
|
- Done criteria:
|
|
- active repair evidence survives across loops until superseded
|
|
- older noise shrinks without losing the current failure context
|
|
- Quality scenario:
|
|
- a 30-plus-loop Code run should still preserve the latest failure and target files in the request payload
|
|
|
|
### Phase 4. Tool-trace invariant hardening
|
|
|
|
- Reference targets:
|
|
- `claw-code/.../src/query.ts`
|
|
- `claw-code/.../src/history.ts`
|
|
- AX targets:
|
|
- `src/AxCopilot/Services/LlmService.ToolUse.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentMessageInvariantHelper.cs`
|
|
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
|
- Work items:
|
|
- shift from after-the-fact flattening to pre-request validation and normalization
|
|
- classify mismatch and orphan causes and lock them with regression tests
|
|
- add a final integrity pass before query submission
|
|
- Done criteria:
|
|
- standard Code runs approach zero mismatch or orphan repair logs
|
|
- assistant, tool, and tool_result chains remain intact end to end
|
|
- Quality scenario:
|
|
- a 50-loop Code run should complete without repeated tool-trace repair events
|
|
|
|
### Phase 5. Encoding hygiene and prompt cleanup
|
|
|
|
- Reference targets:
|
|
- Anthropic memory docs
|
|
- OpenAI practical guide eval and observability recommendations
|
|
- AX targets:
|
|
- `src/AxCopilot/Views/ChatWindow.SystemPromptBuilder.cs`
|
|
- `src/AxCopilot/Services/Agent/SessionLearningCollector.cs`
|
|
- active status, prompt, and catalog files
|
|
- `AGENTS.md`
|
|
- Work items:
|
|
- enforce English-only comments in code files
|
|
- rewrite mojibake strings in active prompt paths into English
|
|
- add long-run Code evals to catch prompt and status encoding regressions
|
|
- Done criteria:
|
|
- no broken strings remain in active prompt or status paths
|
|
- touched code files keep English comments only
|
|
- Quality scenario:
|
|
- Windows Korean environments should show readable build, test, and status output without mojibake feedback loops
|
|
|
|
## Priority
|
|
|
|
1. Phase 1: bootstrap and observability
|
|
2. Phase 2: working-set memory
|
|
3. Phase 3: task-aware pruning
|
|
4. Phase 4: tool-trace invariants
|
|
5. Phase 5: encoding and prompt cleanup
|
|
|
|
## Expected Outcome
|
|
|
|
- fewer repeated build-failure loops
|
|
- better structural consistency for project generation and large edits
|
|
- less drift in long-running Code tasks
|
|
- fewer quality losses caused by broken strings and low-signal context replacements
|
|
|
|
## Latest Delivery
|
|
|
|
Updated: 2026-04-16 01:41 (KST)
|
|
|
|
- Delivered in this pass:
|
|
- Phase 1 foundation:
|
|
- `ChatWindow.UtilityPresentation.cs` now bootstraps workspace context generation on first access and returns language-workflow fallback hints while `.ax-context.md` is still being generated.
|
|
- `AgentLoopService.cs` now records `query_context` workflow transitions with query-window, budget, supplemental-context, and working-set summaries.
|
|
- Phase 2 foundation:
|
|
- `CodeTaskWorkingSetService.cs` adds a Code-only structured ledger for:
|
|
- goal
|
|
- selected scaffold/profile
|
|
- created directories
|
|
- recent reads/writes
|
|
- latest diagnostics
|
|
- next repair focus
|
|
- the working set is injected into each Code request as a supplemental `code_working_set` system message.
|
|
- Phase 3 foundation:
|
|
- `AgentToolResultBudget.cs` and `AgentQueryContextBuilder.cs` now expose a `code` query profile with a larger protected-recent window and larger retained budgets for `build_run`, `test_loop`, `process`, `file_read`, `multi_read`, `lsp_code_intel`, and `git_tool`.
|
|
- Phase 4 observability step:
|
|
- `LlmService.ToolUse.cs` now logs sanitization counts for flattened assistant tool traces and converted orphan tool messages, so tool-trace repair frequency can be measured per run.
|
|
- Remaining follow-up:
|
|
- extend pre-request tool-trace validation so the flattening/orphan repair count trends toward zero rather than being logged after repair
|
|
- replace more mojibake prompt/status strings in active Code execution paths with English equivalents
|
|
|
|
Updated: 2026-04-16 01:57 (KST)
|
|
|
|
- Delivered in this pass:
|
|
- Phase 4 partial delivery:
|
|
- `AgentMessageInvariantHelper.cs` now normalizes historical tool traces before the request leaves the agent loop.
|
|
- structured assistant tool-call messages without matching `tool_result` now flatten into plain assistant transcript text.
|
|
- orphan `tool_result` messages now flatten into plain user transcript text instead of relying only on late OpenAI payload repair.
|
|
- `AgentLoopLlmRequestPreparationService.cs` clones the query window first, then applies normalization, so request cleanup does not mutate stored conversation history.
|
|
- `AgentLoopContextReliability.cs` now logs tool-trace repair counts inside the `query_context` transition for run-by-run observability.
|
|
- Phase 5 partial delivery:
|
|
- `SessionLearningCollector.cs` was rewritten with English-only comments and English injection text.
|
|
- `AgentLoopDiagnosticsFormatter.cs` no longer emits mojibake-prone compaction status text in active Code paths.
|
|
- Remaining follow-up:
|
|
- measure whether `tool_trace_repair` counts keep trending down in long Code runs after this preflight normalization
|
|
- continue replacing older mojibake strings outside the active Code execution path
|
|
|
|
Updated: 2026-04-16 02:05 (KST)
|
|
|
|
- Delivered in this pass:
|
|
- structural alignment step:
|
|
- `AgentLoopQueryAssemblyService.cs` now owns the staged query/history assembly path.
|
|
- `PrepareHistory(...)` handles session-learning refresh plus queued-command/query-window preparation.
|
|
- `PrepareRequest(...)` handles Code working-set supplemental context and request-message assembly before dispatch.
|
|
- `AgentLoopService.cs` now delegates those responsibilities instead of manually stitching them together inline.
|
|
- test and encoding hygiene step:
|
|
- `AgentLoopQueryAssemblyServiceTests.cs` locks the new staged assembly behavior.
|
|
- `SessionLearningCollectorTests.cs` was rewritten to English-only comments and assertions to match the new repository rule.
|
|
- Remaining follow-up:
|
|
- keep extracting more inline AgentLoop responsibilities into smaller staged services where it improves observability or retry correctness
|
|
- continue measuring long Code runs against claw-code-style continuity scenarios
|
|
|
|
Updated: 2026-04-16 02:13 (KST)
|
|
|
|
- Delivered in this pass:
|
|
- structural alignment step:
|
|
- `AgentLoopPreLlmStageService.cs` now owns the iteration decisions immediately before the LLM call.
|
|
- the service centralizes:
|
|
- thinking-summary selection
|
|
- Gemini free-tier delay planning
|
|
- user-prompt submit hook fingerprint/payload planning
|
|
- missing-tool guard shaping
|
|
- request assembly handoff
|
|
- `AgentLoopService.cs` now consumes that stage result instead of computing those branches inline.
|
|
- test coverage step:
|
|
- `AgentLoopPreLlmStageServiceTests.cs` now locks the new pre-LLM decision layer.
|
|
- Remaining follow-up:
|
|
- continue extracting the actual LLM dispatch / streaming callback branch into a narrower execution service
|
|
- compare long-running Code traces against claw-code-style staged transitions and keep reducing inline loop logic
|