Code 탭 컨텍스트 누적 신뢰성과 작업 연속성을 전면 보강한다

이번 커밋은 Code 탭 장기 실행에서 build/file 근거가 너무 빨리 축약되고, 이전 수정 맥락이 다음 LLM 요청에 안정적으로 누적되지 않던 문제를 해결하기 위한 전면 보강을 담는다. 핵심 수정사항: - CodeTaskWorkingSetService를 추가해 최근 생성 디렉터리, 최근 읽기/쓰기 파일, 최신 build/test 진단, 다음 복구 초점을 구조화된 working set으로 유지하고 각 반복 요청에 보조 system context로 주입한다. - AgentQueryContextBuilder와 AgentToolResultBudget에 code profile을 도입해 protected recent window와 tool_result budget을 확장하고 build_run, test_loop, file_read, multi_read, lsp_code_intel, git_tool 같은 고가치 evidence가 기본 탭보다 덜 잘리도록 조정한다. - AgentLoopIterationPreparationService와 AgentLoopLlmRequestPreparationService를 확장해 query-context options와 supplemental messages를 함께 전달하고, AgentLoopService에서는 Code 탭에서 generic session learnings 대신 working set 중심으로 요청을 구성하도록 변경한다. - ChatWindow.UtilityPresentation에서 workspace context 첫 부트스트랩을 강화해 .ax-context.md가 아직 없더라도 첫 요청 시점부터 background generation과 language workflow bootstrap hints가 반영되도록 수정한다. - LlmService.ToolUse에서 historical tool trace sanitization 결과를 assistant flatten/orphan conversion 건수로 요약 로그에 남겨 tool-trace 불변식 문제를 추적 가능하게 만든다. - 관련 테스트를 추가·갱신해 working set 누적, code profile budget, supplemental message 주입, query-context option 전달을 회귀 고정한다. 검증 결과: - dotnet build src/AxCopilot/AxCopilot.csproj -c Release -v minimal -p:OutputPath=bin\\verify_context_reliability_full\\ -p:IntermediateOutputPath=obj\\verify_context_reliability_full\\ : 경고 0 / 오류 0 - dotnet test src/AxCopilot.Tests/AxCopilot.Tests.csproj -c Release -v minimal --filter "AgentQueryContextBuilderTests|AgentToolResultBudgetTests|AgentLoopIterationPreparationServiceTests|AgentLoopLlmRequestPreparationServiceTests|CodeTaskWorkingSetServiceTests|AgentLoopCodeQualityTests" -p:OutputPath=bin\\verify_context_reliability_full_tests\\ -p:IntermediateOutputPath=obj\\verify_context_reliability_full_tests\\ : 통과 150 - dotnet test src/AxCopilot.Tests/AxCopilot.Tests.csproj -c Release -v minimal --filter "AgentLoopE2ETests|AgentMessageInvariantHelperTests" -p:OutputPath=bin\\verify_context_reliability_e2e\\ -p:IntermediateOutputPath=obj\\verify_context_reliability_e2e\\ : 통과 21
2026-04-16 01:45:28 +09:00
parent eb884e9263
commit 0f64bf3f84
17 changed files with 1074 additions and 129 deletions
--- a/docs/CODE_CONTEXT_RELIABILITY_PLAN.md
+++ b/docs/CODE_CONTEXT_RELIABILITY_PLAN.md
@@ -247,3 +247,28 @@ AX already has similar mechanisms, but the Code flow still lacks stronger workin
 - better structural consistency for project generation and large edits
 - less drift in long-running Code tasks
 - fewer quality losses caused by broken strings and low-signal context replacements
+
+## Latest Delivery
+
+Updated: 2026-04-16 01:41 (KST)
+
+- Delivered in this pass:
+  - Phase 1 foundation:
+    - `ChatWindow.UtilityPresentation.cs` now bootstraps workspace context generation on first access and returns language-workflow fallback hints while `.ax-context.md` is still being generated.
+    - `AgentLoopService.cs` now records `query_context` workflow transitions with query-window, budget, supplemental-context, and working-set summaries.
+  - Phase 2 foundation:
+    - `CodeTaskWorkingSetService.cs` adds a Code-only structured ledger for:
+      - goal
+      - selected scaffold/profile
+      - created directories
+      - recent reads/writes
+      - latest diagnostics
+      - next repair focus
+    - the working set is injected into each Code request as a supplemental `code_working_set` system message.
+  - Phase 3 foundation:
+    - `AgentToolResultBudget.cs` and `AgentQueryContextBuilder.cs` now expose a `code` query profile with a larger protected-recent window and larger retained budgets for `build_run`, `test_loop`, `process`, `file_read`, `multi_read`, `lsp_code_intel`, and `git_tool`.
+  - Phase 4 observability step:
+    - `LlmService.ToolUse.cs` now logs sanitization counts for flattened assistant tool traces and converted orphan tool messages, so tool-trace repair frequency can be measured per run.
+- Remaining follow-up:
+  - extend pre-request tool-trace validation so the flattening/orphan repair count trends toward zero rather than being logged after repair
+  - replace more mojibake prompt/status strings in active Code execution paths with English equivalents
--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -1761,3 +1761,46 @@ UI ?遺우쁽????域뱀뮆???귐뗫솯?醫딆춦 ???袁る퓮 ?臾믩씜 ??疫
  - Encoding hygiene and prompt quality cleanup
 - 계획 문서는 `claude-code` 참조 지점(`claw-code/.../src/query.ts`, `history.ts`, `memory-context.md`), AX 적용 위치, 완료 조건, 품질 판정 시나리오를 함께 기록했습니다.
 - 외부 근거로는 Anthropic Claude Code memory docs, OpenAI practical guide to building agents, `SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents`를 반영해 "자동 메모리 계층", "관측 가능성/eval 우선", "task-aware pruning" 원칙을 계획에 녹였습니다.
+
+업데이트: 2026-04-16 01:41 (KST)
+- Code 탭 컨텍스트 신뢰성 보강 1차 구현을 적용했다.
+  - `src/AxCopilot/Services/Agent/CodeTaskWorkingSetService.cs`
+    - Code 전용 working-set 메모리 레이어를 추가했다.
+    - 최근 생성 디렉터리, 최근 읽기/쓰기 파일, 최신 build/test 진단, 다음 복구 초점을 구조화해 유지한다.
+    - `build_run`, `test_loop`, `process`, `file_manage`, `file_write`, `file_edit`, `multi_read` 결과를 바탕으로 현재 작업 연속성을 요약한 `code_working_set` system 메시지를 만든다.
+  - `src/AxCopilot/Services/Agent/AgentLoopService.cs`
+    - Code 탭 실행에서 `CodeTaskWorkingSetService`를 생성하고, 각 도구 실행 뒤 결과를 working set에 기록한다.
+    - Code 탭에서는 generic `session_learnings` 주입을 줄이고, 대신 working set 보조 context를 LLM 요청 직전에 삽입한다.
+    - 각 반복마다 `query_context` 전이 로그를 남겨 query-view 범위, profile, protected recent 값, supplemental context 수, estimated send token, working-set 요약을 관찰 가능하게 만들었다.
+  - `src/AxCopilot/Services/Agent/AgentQueryContextBuilder.cs`
+    - `AgentQueryContextBuildOptions`를 추가해 `default`와 `code` profile을 분리했다.
+    - 결과 객체에 profile, protected recent, tool-result budget 메타를 함께 남긴다.
+  - `src/AxCopilot/Services/Agent/AgentToolResultBudget.cs`
+    - `AgentToolResultBudgetOptions`를 도입했다.
+    - Code profile에서 `build_run`, `test_loop`, `process`, `file_read`, `multi_read`, `lsp_code_intel`, `git_tool` 같은 고가치 evidence의 truncation 한도를 더 크게 잡아 최신 오류와 읽은 파일 근거가 너무 빨리 preview로 축약되지 않게 했다.
+    - truncation marker 문자열은 영어 기준으로 정리했다.
+  - `src/AxCopilot/Services/Agent/AgentLoopIterationPreparationService.cs`
+    - iteration 준비 단계에서 query-context build options를 주입하도록 확장했다.
+  - `src/AxCopilot/Services/Agent/AgentLoopLlmRequestPreparationService.cs`
+    - query view 외에 working set 같은 supplemental messages를 요청 배열에 추가할 수 있게 확장했다.
+    - tool reminder 메시지 문자열을 영어 기준으로 정리했다.
+  - `src/AxCopilot/Views/ChatWindow.UtilityPresentation.cs`
+    - `.ax-context.md`가 아직 없는 첫 요청에서도 workspace context 생성을 즉시 시작한다.
+    - 생성이 완료되기 전에는 `DetectLanguageWorkflowHints(...)` 기반 bootstrap context를 반환해 완전 빈 작업 폴더에서도 첫 루프에 최소 힌트가 포함되도록 보강했다.
+  - `src/AxCopilot/Services/LlmService.ToolUse.cs`
+    - historical tool-call sanitization 결과를 `flattened_assistant`, `converted_orphans` 건수로 요약 로그에 남긴다.
+    - 사후 보정은 유지하면서도 빈도를 추적해 후속 invariant hardening 작업의 기준선을 확보했다.
+- 테스트:
+  - `src/AxCopilot.Tests/Services/CodeTaskWorkingSetServiceTests.cs`
+    - 구조/쓰기 working set 누적, build diagnostic 유지, 성공 build 후 diagnostic clearing을 검증한다.
+  - `src/AxCopilot.Tests/Services/AgentQueryContextBuilderTests.cs`
+    - Code profile 메타데이터 노출을 검증한다.
+  - `src/AxCopilot.Tests/Services/AgentToolResultBudgetTests.cs`
+    - Code mode에서 긴 `build_run` 결과를 더 오래 보존하는지 검증한다.
+  - `src/AxCopilot.Tests/Services/AgentLoopIterationPreparationServiceTests.cs`
+    - iteration 준비 단계가 Code profile query options를 반영하는지 검증한다.
+  - `src/AxCopilot.Tests/Services/AgentLoopLlmRequestPreparationServiceTests.cs`
+    - supplemental messages가 tool reminder 앞에 추가되는지 검증한다.
+- 검증:
+  - `dotnet build src/AxCopilot/AxCopilot.csproj -c Release -v minimal -p:OutputPath=bin\\verify_context_reliability_full\\ -p:IntermediateOutputPath=obj\\verify_context_reliability_full\\` 경고 0 / 오류 0
+  - `dotnet test src/AxCopilot.Tests/AxCopilot.Tests.csproj -c Release -v minimal --filter "AgentQueryContextBuilderTests|AgentToolResultBudgetTests|AgentLoopIterationPreparationServiceTests|AgentLoopLlmRequestPreparationServiceTests|CodeTaskWorkingSetServiceTests|AgentLoopCodeQualityTests" -p:OutputPath=bin\\verify_context_reliability_full_tests\\ -p:IntermediateOutputPath=obj\\verify_context_reliability_full_tests\\` 통과 150