2026-04-14 LLM Architecture Snapshot

현재 결론

오늘 수정 후 운영 LLM 체인의 핵심 불변식은 다음이다.

openclaw:main은 upstream 모델명이 아니라 추상 라벨/agent selector다.
GPT-5 계열은 GitHub Models/Azure 경로에서 max_completion_tokens를 사용한다.
openai-codex/*, anthropic/*는 Azure가 아니라 Hermes local API 127.0.0.1:18789로 간다.
ollama/*는 OpenAI-compatible endpoint가 아니라 native Ollama /api/chat로 간다.
cloud provider 장애/쿼터 시 최종 안전망은 로컬 Ollama다.

현재 운영 체인 상태

기본 `openclaw:main` 확장 체인

Tier	모델	현재 상태	정상 경로	비고
0	`github-copilot/gpt-5-mini`	429 quota 소진 관측	GitHub Models/Azure	GPT-5 payload는 `max_completion_tokens` 필요
1	`openai-codex/gpt-5.4`	Hermes 라우팅 대상	Hermes `127.0.0.1:18789/v1/chat/completions`	Azure로 가면 unsupported/unknown model
2	`ollama/qwen2.5:3b`	정상 fallback	Ollama `127.0.0.1:11434/api/chat`	오늘 smoke에서 실제 응답 성공

Agent별 주요 체인

Agent	체인	현재 해석
`ron`	`github-copilot/gpt-5-mini` → `openrouter/nvidia/nemotron-3-super-120b-a12b:free` → `openai-codex/gpt-5.4` → `openrouter/minimax/minimax-m2.5` → `ollama/qwen2.5:3b`	GitHub/OpenRouter/Hermes 후 로컬 fallback
`codex`	`github-copilot/gpt-5-mini` → `openrouter/minimax/minimax-m2.5` → `ollama/qwen3.5:9b-nothinker`	코딩 작업용, 최종 로컬 9B
`cowork`	`anthropic/claude-opus-4-6` → `openai-codex/gpt-5.4` → `openrouter/minimax/minimax-m2.5` → `ollama/qwen3.5:9b-nothinker`	Anthropic/Codex는 Hermes 경유
`analyst-fundamental`	`anthropic/claude-sonnet-4-6` → `openai-codex/gpt-5.4` → `openrouter/minimax/minimax-m2.5` → `ollama/qwen3.5:9b-nothinker`	분석 품질 우선, 로컬 fallback 보유
`analyst-macro`	`anthropic/claude-sonnet-4-6` → `openai-codex/gpt-5.4` → `openrouter/minimax/minimax-m2.5` → `ollama/qwen3.5:9b-nothinker`	analyst-fundamental과 동일
`analyst-technical`	`github-copilot/gpt-5-mini` → `openai-codex/gpt-5.4` → `openrouter/minimax/minimax-m2.5` → `openrouter/nvidia/nemotron-3-super-120b-a12b:free` → `ollama/qwen2.5:3b`	smoke test 대상
`analyst-pm`	`github-copilot/gpt-5-mini` → `openrouter/nvidia/nemotron-3-super-120b-a12b:free` → `openai-codex/gpt-5.4` → `openrouter/minimax/minimax-m2.5` → `ollama/qwen2.5:3b`	PM/guardian 기본 체인 계열
`guardian`	`github-copilot/gpt-5-mini` → `openrouter/nvidia/nemotron-3-super-120b-a12b:free` → `openai-codex/gpt-5.4` → `openrouter/minimax/minimax-m2.5` → `ollama/qwen2.5:3b`	운영 감시 체인

수정 전/후 비교

영역	수정 전	수정 후
Ollama 라우팅	`ollama/*`도 Azure/GitHub Models로 POST되어 401/400	`ollama/*`는 native `/api/chat` 사용
GPT-5 payload	`max_tokens` 전송으로 400	GitHub Models + `gpt-5*`에만 `max_completion_tokens` 사용
`openclaw:main`	추상 라벨이 실제 모델명으로 API에 전송	`_expand_model_chain`과 orchestrator/worker 확장으로 구체 체인 변환
pykrx/FDR import	`shared.cycle_base` import만으로 pandas/pykrx/FDR eager load	`_ensure_pykrx`, `_ensure_fdr` lazy loader로 실제 사용 시점 import
Codex/Anthropic	Azure/GitHub Models로 가서 unsupported	Hermes local API server로 프록시
Hermes LaunchAgent	API env 누락 또는 kickstart만으로 env 미반영	`API_SERVER_ENABLED/HOST/PORT` 추가, 변경 시 `bootout + bootstrap` 필요

현재 서비스 상태 스냅샷

항목	상태	확인 방법
Hermes local API	LISTEN 확인됨	`lsof -nP -iTCP:18789 -sTCP:LISTEN`
Hermes health	OK	`curl http://127.0.0.1:18789/v1/health` → `{"status":"ok","platform":"hermes-agent"}`
Ollama	LISTEN 확인됨	`lsof -nP -iTCP:11434 -sTCP:LISTEN`
Local models	`qwen2.5:3b`, `qwen3.5:9b-nothinker` 등 존재	`ollama list`
400 재발	최근 검증 구간 0건	`HTTP Error 400` 로그 집계
Copilot	429 소진 관측	`RateLimitReached`, `12 per 86400s`

다음 리스크

Copilot 일일 한도
오늘 github-copilot/gpt-5-mini는 12 per 86400s 한도 소진이 관측됐다.
내일 quota가 회복되기 전까지 Tier 0은 429를 낼 수 있다.
운영은 Hermes/Ollama fallback 중심으로 간주한다.
OAuth 토큰 만료/갱신 주기
Hermes가 ChatGPT Plus OAuth를 프록시하므로 토큰 만료 시 openai-codex/*, anthropic/* 경로가 실패할 수 있다.
실패 시 먼저 curl /v1/health, Hermes 로그, OAuth profile 상태를 확인한다.
LaunchAgent 환경변수 반영
plist에 env를 추가해도 kickstart만으로는 적용되지 않을 수 있다.
API server env 변경 시 항상 bootout + bootstrap을 쓴다.
OpenRouter key 부재/크레딧 이슈
missing OpenRouter API key 또는 402 credits는 체인 중간 fallback을 느리게 한다.
OpenRouter를 실제 운영 tier로 쓸지, 로컬 fallback을 앞당길지 별도 정책 결정 필요.
메모리 압박
cycle_base lazy import로 worker당 고정 메모리 압박은 줄었지만, qwen3.5:9b 계열 사용 시 Ollama 메모리 압박은 여전히 남는다.
메모리 불안정 시 qwen2.5:3b fallback을 우선한다.

회귀 테스트 산출물

/tmp/openclaw_regression_tests_260414.py
/tmp/test_full_chain_smoke.py

실행:

PYTHONPATH=/Users/ron/.openclaw/workspace/scripts OPENCLAW_MAX_RETRY_PER_MODEL=1 python3 -m pytest -q /tmp/openclaw_regression_tests_260414.py
PYTHONPATH=/Users/ron/.openclaw/workspace/scripts OPENCLAW_MAX_RETRY_PER_MODEL=1 python3 -m pytest -q /tmp/test_full_chain_smoke.py

검증 기록

2026-04-14 13:05 KST 실행:

PYTHONPATH=/Users/ron/.openclaw/workspace/scripts \
OPENCLAW_MAX_RETRY_PER_MODEL=1 \
python3 -m pytest -q /tmp/openclaw_regression_tests_260414.py /tmp/test_full_chain_smoke.py

결과:

8 passed in 0.84s

비고: Codex 샌드박스 내부에서는 로컬 소켓 접근이 Operation not permitted로 막혀 실패했으나, 실제 Mac 로컬 권한으로 실행 시 Hermes/Ollama 소켓 검증까지 모두 통과했다.

2026-04-14 LLM Architecture Snapshot

2026-04-14 LLM Architecture Snapshot

현재 결론

현재 운영 체인 상태

기본 openclaw:main 확장 체인

Agent별 주요 체인

수정 전/후 비교

현재 서비스 상태 스냅샷

다음 리스크

회귀 테스트 산출물

검증 기록

기본 `openclaw:main` 확장 체인