IBM vLLM 배포형 채팅 요청 스키마 분기와 문서 반영

IBM/CP4D 인증을 사용하는 vLLM 등록 모델에서 배포형 /ml/v1/deployments/.../text/chat 계열 엔드포인트를 감지하도록 정리했다. 일반 OpenAI 호환 body 대신 messages+parameters 형태의 IBM deployment chat body를 사용하고 /v1/chat/completions를 강제로 붙이지 않도록 수정했다. IBM 배포형 응답은 results.generated_text, output_text, choices.message.content를 함께 파싱하도록 보강했고 도구 호출 경로는 안전하게 일반 응답 폴백을 유도하도록 정리했다. README와 DEVELOPMENT 문서를 2026-04-06 18:02 (KST) 기준으로 갱신했고 dotnet build 검증에서 경고 0 / 오류 0을 확인했다.
2026-04-06 17:49:48 +09:00
parent e439fd144e
commit 606ecbe6cd
4 changed files with 170 additions and 10 deletions
--- a/docs/DEVELOPMENT.md
+++ b/docs/DEVELOPMENT.md
@@ -4994,3 +4994,6 @@ ow + toggle ?쒓컖 ?몄뼱濡??ㅼ떆 ?뺣젹?덈떎.
 - Document update: 2026-04-06 17:43 (KST) - Removed the eager tray-menu warmup path (`PrepareForDisplay()`) from startup. This avoids doing popup layout/measure work for the tray menu until the user actually opens it, reducing idle desktop overhead further.
 - Document update: 2026-04-06 17:52 (KST) - Restored eager `LauncherWindow` construction so launcher open latency stays low, but kept the tray-menu warmup removed. This keeps the launcher responsive while continuing to trim startup work elsewhere.
 - Document update: 2026-04-06 17:52 (KST) - Moved index warmup from launcher-show time to actual search time. `LauncherViewModel.SearchAsync(...)` now triggers `App.EnsureIndexWarmupStarted()` only when the user performs a real search, so opening the launcher by itself no longer starts a full file scan and watcher bootstrap.
+- Document update: 2026-04-06 18:02 (KST) - Added IBM deployment-chat detection in `LlmService.cs` for vLLM registered models using `ibm_iam` or `cp4d_*` auth with `/ml/v1/deployments/...`-style endpoints. These requests now use IBM deployment chat URLs (`/text/chat` or `/text/chat_stream`) instead of appending `/v1/chat/completions`.
+- Document update: 2026-04-06 18:02 (KST) - Added an IBM deployment request body builder in `LlmService.cs` that omits OpenAI-style `model` and `stream` fields and sends `messages + parameters` instead. This directly addresses IBM responses complaining that `model_id` or `mode` must not be specified in the request body.
+- Document update: 2026-04-06 18:02 (KST) - Hardened vLLM response handling for IBM deployment endpoints by accepting `results[].generated_text`, `output_text`, and `choices[].message.content`, and by short-circuiting tool-use requests in `LlmService.ToolUse.cs` with a `ToolCallNotSupportedException` so IBM deployment chat connections do not receive an incompatible OpenAI function-calling payload.