지침과 문서에 코드 컨텍스트 안정화 계획을 반영한다
- AGENTS.md에 코드 파일 주석 영문화와 인코딩 손상 문자열 정리 규칙을 추가한다. - 최근 Code 탭 실행 로그를 재분석해 메시지 수 증가 대비 컨텍스트 충실도 저하 원인을 정리한다. - Code working set, task-aware pruning, tool trace invariant, bootstrap observability를 포함한 장기 수정 계획 문서를 추가한다. - README와 DEVELOPMENT 문서에 2026-04-16 01:28 KST 기준 분석 결과와 후속 계획을 기록한다. - 검증: dotnet build src\\AxCopilot\\AxCopilot.csproj -c Release -v minimal -p:OutputPath=bin\\verify_context_plan_docs\\ -p:IntermediateOutputPath=obj\\verify_context_plan_docs\\ (경고 0 / 오류 0)
This commit is contained in:
@@ -208,6 +208,13 @@ if (!enabled) return ToolResult.Ok("비활성 상태입니다. 설정에서 활
|
||||
- 모든 변경 후 `dotnet build` 실행 → **경고 0, 오류 0** 필수
|
||||
- CS8603 (nullable) 경고 즉시 수정
|
||||
|
||||
### 인코딩 / 주석 기준
|
||||
- **코드 파일 내부 주석은 예외 없이 영어로만 작성**합니다. 새 주석, TODO, XML doc summary, inline comment 모두 동일하게 적용합니다.
|
||||
- 한글 주석 또는 혼합 언어 주석이 필요한 설명이 있더라도, **코드 파일 안에서는 영어 주석으로 변환**하여 작성합니다. 사용자 안내나 설계 설명이 길게 필요하면 `.md` 문서로 분리합니다.
|
||||
- 인코딩 문제가 의심되는 파일을 수정할 때는 **깨진 문자열(mojibake), 깨진 주석, 깨진 프롬프트 조각을 그대로 두지 말고 영어로 정리**합니다.
|
||||
- 특히 `system prompt`, `agent status`, `session memory`, `tool diagnostics`처럼 LLM 컨텍스트에 직접 들어가는 코드 상수 문자열은 깨진 상태로 유지하지 않습니다. 인코딩 오류가 재발하면 해당 코드 파일의 손상된 문자열을 우선 영어로 치환합니다.
|
||||
- 코드 리뷰/수정 시 인코딩 이상 여부를 함께 확인하고, **주석 때문에 인코딩 리스크가 높다고 판단되면 기존 주석도 영어로 정리**합니다.
|
||||
|
||||
### 성능/실행속도 우선 원칙
|
||||
- 기능 구현 시 가능하면 **개발 단계부터 최적화와 실행 속도**를 함께 고려합니다.
|
||||
- 동일 품질을 만족하는 구현안이 여러 개라면, **더 가볍고 빠르게 동작하는 구조**를 우선 채택합니다.
|
||||
|
||||
@@ -1,5 +1,11 @@
|
||||
# AX Commander
|
||||
|
||||
- Update: 2026-04-16 01:28 (KST)
|
||||
- Added an encoding rule to `AGENTS.md`: comments inside code files must be written in English only, and any broken mojibake strings found in touched code files should be rewritten in English before commit.
|
||||
- Reviewed the recent Code tab runs again. On `2026-04-16`, the request message count still grew during long runs (`messages=7 -> 125`, and another run `118 -> 139`), so the main issue is not raw context length. The problem is context fidelity: build/file evidence is being compacted too aggressively while tool-trace repair noise is repeatedly inserted.
|
||||
- Captured the remediation roadmap in `docs/CODE_CONTEXT_RELIABILITY_PLAN.md`. The plan covers workspace-context bootstrap repair, a dedicated Code working-set memory layer, task-aware pruning, tool-trace invariant hardening, and prompt/encoding cleanup.
|
||||
- The plan references `claw-code/.../src/query.ts`, `history.ts`, and `memory-context.md`, and it also incorporates external research from Anthropic Claude Code memory docs, the OpenAI practical guide to building agents, and the SWE-Pruner paper.
|
||||
|
||||
- 업데이트: 2026-04-16 00:57 (KST)
|
||||
- AX Agent 앱 생성 메시지의 가로 폭과 정렬을 다시 다듬었습니다. `src/AxCopilot/Views/ChatWindow.ResponsePresentation.cs`에 `GetAgentEventMaxWidth()`를 추가해 앱이 그리는 진행/도구/완료 카드 폭만 별도로 줄였고, `src/AxCopilot/Views/ChatWindow.AgentEventRendering.cs`, `src/AxCopilot/Views/ChatWindow.V2LiveProgressPresentation.cs`, `src/AxCopilot/Views/ChatWindow.V2AgentEventPresentation.cs`는 해당 폭을 사용하면서 중앙 정렬 대신 좌측 기준으로 붙도록 맞췄습니다.
|
||||
- 라이브 진행 카드와 검증 게이트 문구에서 깨져 보이던 문자열도 함께 정리했습니다. `src/AxCopilot/Services/Agent/AgentLoopTransitions.Verification.cs`의 검증/재시도 안내 문구를 정상 한국어로 교체했고, `src/AxCopilot/Views/ChatWindow.V2LiveProgressPresentation.cs`, `src/AxCopilot/Views/ChatWindow.V2AgentEventPresentation.cs`에는 런타임에 보이는 상태 문구를 안정적인 문자열로 다시 연결했습니다.
|
||||
|
||||
249
docs/CODE_CONTEXT_RELIABILITY_PLAN.md
Normal file
249
docs/CODE_CONTEXT_RELIABILITY_PLAN.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Code Context Reliability Plan
|
||||
|
||||
Update: 2026-04-16 01:37 (KST)
|
||||
|
||||
## Background
|
||||
|
||||
Recent Code tab runs show that the LLM request payload is still growing over time. In the `2026-04-16 00:46:26` to `00:50:52` run, the request size grew from `messages=7` to `messages=125`. That means the failure mode is not "context does not grow at all." The real problem is context fidelity: detailed evidence that the model still needs is being replaced too quickly by previews, repair notes, and low-signal summaries.
|
||||
|
||||
The same log window repeatedly shows:
|
||||
|
||||
- `tool_calls/tool mismatch detected - flattening assistant message`
|
||||
- `orphan tool message detected - converting to user`
|
||||
- repeated rereads of nearby files after build failures
|
||||
- shifting build failures such as `MC3089` followed by `CS0017` without a stable working set that preserves what was already changed and what remains broken
|
||||
|
||||
In short, the current system grows the raw message count but does not preserve a stable working set for long-running code tasks.
|
||||
|
||||
## Current Findings
|
||||
|
||||
### 1. Workspace context bootstrap is weak on first load
|
||||
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Views/ChatWindow.UtilityPresentation.cs`
|
||||
- `src/AxCopilot/Services/Agent/WorkspaceContextGenerator.cs`
|
||||
- Finding:
|
||||
- When `.ax-context.md` is missing, the first Code request can return before background workspace-context generation becomes useful.
|
||||
- Impact:
|
||||
- Empty-workspace and fresh-project tasks start without a reliable folder or project summary in the early loops.
|
||||
|
||||
### 2. Build and file evidence is compacted too aggressively
|
||||
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Services/Agent/AgentToolResultBudget.cs`
|
||||
- `src/AxCopilot/Services/Agent/ContextCondenser.cs`
|
||||
- Current values:
|
||||
- `DefaultSoftCharLimit = 900`
|
||||
- `DefaultAggregateBudgetChars = 7_500`
|
||||
- `RecentKeepCount = 6`
|
||||
- Impact:
|
||||
- Code tasks lose detailed build, test, and file-read evidence too early and fall back to previews instead of actionable context.
|
||||
|
||||
### 3. Session learning is not a durable code working set
|
||||
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Services/Agent/SessionLearningCollector.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
||||
- `src/AxCopilot/Views/ChatWindow.SystemPromptBuilder.cs`
|
||||
- Finding:
|
||||
- Session learnings are injected every loop, but they are not structured strongly enough to lock in:
|
||||
- current goal
|
||||
- current architecture
|
||||
- changed files
|
||||
- latest build or test failure
|
||||
- next repair target
|
||||
- Impact:
|
||||
- The model must repeatedly reconstruct project state from noisy history instead of reading a stable code-task memory layer.
|
||||
|
||||
### 4. Tool-trace invariant repairs are too common
|
||||
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Services/LlmService.ToolUse.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentMessageInvariantHelper.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
||||
- Finding:
|
||||
- The recent logs show repeated mismatch and orphan corrections.
|
||||
- Impact:
|
||||
- Even if the total message count grows, the semantic chain between assistant reasoning, tool call, and tool result becomes less reliable.
|
||||
|
||||
### 5. There is no Code-specific working-set layer
|
||||
|
||||
- AX targets:
|
||||
- new service required
|
||||
- injection path should go through:
|
||||
- `AgentLoopLlmRequestPreparationService`
|
||||
- `AgentQueryContextBuilder`
|
||||
- `AgentLoopService`
|
||||
- Finding:
|
||||
- The current request mixes raw chat history, session learnings, project context, and workspace context, but it does not maintain a dedicated code-task state ledger.
|
||||
- Impact:
|
||||
- Long-running runs become increasingly inconsistent because the model keeps rediscovering facts that should already be fixed in memory.
|
||||
|
||||
## External Research Notes
|
||||
|
||||
### Anthropic Claude Code memory docs
|
||||
|
||||
- Claude Code explicitly documents memory files that are auto-loaded at startup and inspectable via `/memory`.
|
||||
- Planning implication:
|
||||
- AX should have a clearly observable memory hierarchy for Code tasks, including what was auto-loaded and why.
|
||||
- Source:
|
||||
- [Anthropic Claude Code memory docs](https://docs.anthropic.com/zh-CN/docs/claude-code/memory)
|
||||
|
||||
### OpenAI practical guide to building agents
|
||||
|
||||
- The guide emphasizes observability, eval baselines, and explicit tool and system design before optimizing agent behavior.
|
||||
- Planning implication:
|
||||
- AX should log the exact context sections that enter each Code request, including what was compacted and why.
|
||||
- Source:
|
||||
- [OpenAI practical guide to building agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf)
|
||||
|
||||
### SWE-Pruner
|
||||
|
||||
- The paper argues that task-aware adaptive pruning outperforms naive fixed truncation for coding agents.
|
||||
- Planning implication:
|
||||
- AX should protect code-task evidence such as latest build failures and changed-file summaries instead of applying mostly size-based pruning.
|
||||
- Source:
|
||||
- [SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents](https://arxiv.org/abs/2601.16746)
|
||||
|
||||
## `claude-code` Reference Points
|
||||
|
||||
Reference targets:
|
||||
|
||||
- `claw-code/claw-code-f5a40b86dede580f6543bf8926c9af017eea9409/src/query.ts`
|
||||
- `claw-code/claw-code-f5a40b86dede580f6543bf8926c9af017eea9409/src/history.ts`
|
||||
- `claw-code/en/concepts/memory-context.md`
|
||||
|
||||
Observed direction:
|
||||
|
||||
- `claude-code` builds a dedicated `messagesForQuery` window.
|
||||
- It stages compaction through boundary filtering, tool-result budgeting, snip, microcompact, and autocompact.
|
||||
- It treats memory and post-compaction query windows as first-class parts of the request path.
|
||||
|
||||
AX already has similar mechanisms, but the Code flow still lacks stronger working-set preservation and cleaner invariant handling.
|
||||
|
||||
## Remediation Plan
|
||||
|
||||
### Phase 1. Context observability and bootstrap repair
|
||||
|
||||
- Reference targets:
|
||||
- `claw-code/.../src/query.ts`
|
||||
- `claw-code/en/concepts/memory-context.md`
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Views/ChatWindow.UtilityPresentation.cs`
|
||||
- `src/AxCopilot/Services/Agent/WorkspaceContextGenerator.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentLoopLlmRequestPreparationService.cs`
|
||||
- Work items:
|
||||
- guarantee workspace-context generation starts even on first miss
|
||||
- log the exact context sections injected into each request
|
||||
- add diagnostics for omitted sections
|
||||
- Done criteria:
|
||||
- empty-workspace runs show workspace context generation by loop 2
|
||||
- logs show section names, sizes, and compaction status
|
||||
- Quality scenario:
|
||||
- a fresh `E:\code` WPF scaffolding run should show folder and project context in the first two request cycles
|
||||
|
||||
### Phase 2. Code working-set memory layer
|
||||
|
||||
- Reference targets:
|
||||
- `claw-code/.../src/query.ts`
|
||||
- `claw-code/.../src/history.ts`
|
||||
- Anthropic memory docs
|
||||
- AX targets:
|
||||
- new `CodeTaskWorkingSetService`
|
||||
- `src/AxCopilot/Services/Agent/SessionLearningCollector.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
||||
- Work items:
|
||||
- maintain a stable structured ledger with:
|
||||
- current goal
|
||||
- selected architecture
|
||||
- changed files
|
||||
- latest successful writes
|
||||
- open diagnostics
|
||||
- next repair target
|
||||
- inject it only when changed
|
||||
- replace superseded failures with the latest active issue
|
||||
- Done criteria:
|
||||
- long Code runs keep a single coherent working-set block without noisy duplication
|
||||
- build and test failures are preserved as part of the working set
|
||||
- Quality scenario:
|
||||
- after fixing `MC3089`, the run should still remember the earlier structure change while focusing on the new `CS0017` entry-point failure
|
||||
|
||||
### Phase 3. Task-aware pruning and protected evidence
|
||||
|
||||
- Reference targets:
|
||||
- `claw-code/.../src/query.ts`
|
||||
- SWE-Pruner
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Services/Agent/AgentToolResultBudget.cs`
|
||||
- `src/AxCopilot/Services/Agent/ContextCondenser.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
|
||||
- Work items:
|
||||
- protect:
|
||||
- latest build error block
|
||||
- latest test failure block
|
||||
- current plan or working set
|
||||
- latest folder tree snapshot
|
||||
- last N write diffs
|
||||
- move from pure char-based truncation toward semantic snapshots
|
||||
- tune compaction rules specifically for Code tasks
|
||||
- Done criteria:
|
||||
- active repair evidence survives across loops until superseded
|
||||
- older noise shrinks without losing the current failure context
|
||||
- Quality scenario:
|
||||
- a 30-plus-loop Code run should still preserve the latest failure and target files in the request payload
|
||||
|
||||
### Phase 4. Tool-trace invariant hardening
|
||||
|
||||
- Reference targets:
|
||||
- `claw-code/.../src/query.ts`
|
||||
- `claw-code/.../src/history.ts`
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Services/LlmService.ToolUse.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentMessageInvariantHelper.cs`
|
||||
- `src/AxCopilot/Services/Agent/AgentLoopService.cs`
|
||||
- Work items:
|
||||
- shift from after-the-fact flattening to pre-request validation and normalization
|
||||
- classify mismatch and orphan causes and lock them with regression tests
|
||||
- add a final integrity pass before query submission
|
||||
- Done criteria:
|
||||
- standard Code runs approach zero mismatch or orphan repair logs
|
||||
- assistant, tool, and tool_result chains remain intact end to end
|
||||
- Quality scenario:
|
||||
- a 50-loop Code run should complete without repeated tool-trace repair events
|
||||
|
||||
### Phase 5. Encoding hygiene and prompt cleanup
|
||||
|
||||
- Reference targets:
|
||||
- Anthropic memory docs
|
||||
- OpenAI practical guide eval and observability recommendations
|
||||
- AX targets:
|
||||
- `src/AxCopilot/Views/ChatWindow.SystemPromptBuilder.cs`
|
||||
- `src/AxCopilot/Services/Agent/SessionLearningCollector.cs`
|
||||
- active status, prompt, and catalog files
|
||||
- `AGENTS.md`
|
||||
- Work items:
|
||||
- enforce English-only comments in code files
|
||||
- rewrite mojibake strings in active prompt paths into English
|
||||
- add long-run Code evals to catch prompt and status encoding regressions
|
||||
- Done criteria:
|
||||
- no broken strings remain in active prompt or status paths
|
||||
- touched code files keep English comments only
|
||||
- Quality scenario:
|
||||
- Windows Korean environments should show readable build, test, and status output without mojibake feedback loops
|
||||
|
||||
## Priority
|
||||
|
||||
1. Phase 1: bootstrap and observability
|
||||
2. Phase 2: working-set memory
|
||||
3. Phase 3: task-aware pruning
|
||||
4. Phase 4: tool-trace invariants
|
||||
5. Phase 5: encoding and prompt cleanup
|
||||
|
||||
## Expected Outcome
|
||||
|
||||
- fewer repeated build-failure loops
|
||||
- better structural consistency for project generation and large edits
|
||||
- less drift in long-running Code tasks
|
||||
- fewer quality losses caused by broken strings and low-signal context replacements
|
||||
@@ -1746,3 +1746,18 @@ UI ?遺우쁽????域뱀뮆???귐뗫솯?醫딆춦 ???袁る퓮 ?臾믩씜 ??疫
|
||||
- 프로세스 출력 인코딩은 `src/AxCopilot/Services/Agent/BuildRunTool.cs`, `src/AxCopilot/Services/Agent/ProcessTool.cs`에서 UTF-8 고정 대신 Windows 기본 출력 인코딩을 우선 사용하도록 조정했습니다. 한국어 콘솔 출력이 UTF-8로 강제 디코딩되며 깨질 수 있던 경로를 줄이기 위한 수정입니다.
|
||||
- 검증: `dotnet build src/AxCopilot/AxCopilot.csproj -c Release -v minimal -p:OutputPath=bin\\verify_agent_ui_layout_encoding\\ -p:IntermediateOutputPath=obj\\verify_agent_ui_layout_encoding\\` 경고 0 / 오류 0
|
||||
- 검증: `dotnet test src/AxCopilot.Tests/AxCopilot.Tests.csproj -c Release -v minimal --filter "ChatWindowSlashPolicyTests|AgentLoopCodeQualityTests" -p:OutputPath=bin\\verify_agent_ui_layout_encoding_tests\\ -p:IntermediateOutputPath=obj\\verify_agent_ui_layout_encoding_tests\\` 통과 194
|
||||
|
||||
업데이트: 2026-04-16 01:28 (KST)
|
||||
- 최상위 개발 지침 `AGENTS.md`의 코드 품질 섹션에 인코딩/주석 규칙을 추가했습니다. 앞으로 코드 파일 내부 주석은 영어만 사용하고, 인코딩 손상 문자열이 보이는 코드 파일을 수정할 때는 깨진 주석/프롬프트/상태 문자열도 영어로 정리하는 것을 기본 규칙으로 고정했습니다.
|
||||
- 최근 Code 탭 실행 로그를 다시 점검했습니다. `2026-04-16 00:46:26`부터 `00:50:52`까지 같은 실행에서 `messages=7 -> 125`로 증가한 것을 확인했고, 단순히 컨텍스트 길이가 늘지 않는 문제는 아니었습니다. 대신 아래 두 축이 더 직접적인 원인으로 보였습니다.
|
||||
- `tool_calls/tool 쌍 불일치`, `고아 tool 메시지` 보정이 반복되며 tool trace 구조가 흔들리는 문제
|
||||
- `AgentToolResultBudget`, `ContextCondenser`, `SessionLearningCollector`, `LoadWorkspaceContext` 경로가 Code 작업에 필요한 build/file evidence보다 preview/요약을 더 빨리 남기는 문제
|
||||
- 최근 WPF 지뢰찾기 실행에서는 `MC3089(StatusBarItem 자식 중복)` 이후 `CS0017(Program.cs / App.g.cs 진입점 중복)`로 실패 원인이 옮겨갔는데, 이때 이전 수정 의도와 최신 실패 원인을 묶어 주는 Code 전용 working set 계층이 없어 같은 파일과 오류를 반복 재탐색하는 패턴이 나타났습니다.
|
||||
- 위 분석과 외부 리서치를 바탕으로 `docs/CODE_CONTEXT_RELIABILITY_PLAN.md`를 추가했습니다. 이 문서는 다음 5단계 계획을 정리합니다.
|
||||
- Context observability and bootstrap repair
|
||||
- Code working-set memory layer
|
||||
- Task-aware pruning and protected evidence
|
||||
- Tool trace invariant hardening
|
||||
- Encoding hygiene and prompt quality cleanup
|
||||
- 계획 문서는 `claude-code` 참조 지점(`claw-code/.../src/query.ts`, `history.ts`, `memory-context.md`), AX 적용 위치, 완료 조건, 품질 판정 시나리오를 함께 기록했습니다.
|
||||
- 외부 근거로는 Anthropic Claude Code memory docs, OpenAI practical guide to building agents, `SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents`를 반영해 "자동 메모리 계층", "관측 가능성/eval 우선", "task-aware pruning" 원칙을 계획에 녹였습니다.
|
||||
|
||||
Reference in New Issue
Block a user