<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Agentforge on Yarang's Tech Lair</title><link>https://blog.fcoinfup.com/tags/agentforge/</link><description>Recent content in Agentforge on Yarang's Tech Lair</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 08 May 2026 21:57:11 +0900</lastBuildDate><atom:link href="https://blog.fcoinfup.com/tags/agentforge/index.xml" rel="self" type="application/rss+xml"/><item><title>Building a Multi-LLM Distributed Orchestrator with NATS JetStream</title><link>https://blog.fcoinfup.com/post/building-a-multi-llm-distributed-orchestrator-with-nats-jetstream/</link><pubDate>Fri, 08 May 2026 21:57:11 +0900</pubDate><guid>https://blog.fcoinfup.com/post/building-a-multi-llm-distributed-orchestrator-with-nats-jetstream/</guid><description>&lt;p&gt;Part 1 discussed the model-specific limitations discovered while running four AIs—Claude, ZAI, Codex, and Gemini—concurrently on the same tasks. This part is about &amp;ldquo;how we made it possible&amp;rdquo;—the system design and implementation story.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="system-overview"&gt;System Overview
&lt;/h2&gt;&lt;p&gt;AgentForge consists of three components.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[Task Publisher]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; │ NATS JetStream publish
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ▼
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[NATS Broker] ─── af.worker.{id}.inbox
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; │ JetStream consume (independent streams per worker)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ▼
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[Worker Pollers] × N (poller.py × 18 instances)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; │ LLM CLI Execution (claude / codex / gemini)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ▼
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[Result Return] af.task.{task_id}.completed
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;When a publisher posts a task to NATS, each worker, which is independently subscribed, receives the message on its inbox and executes the LLM CLI. The result is then published back to a completion topic.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="why-nats-jetstream"&gt;Why NATS JetStream?
&lt;/h2&gt;&lt;p&gt;We considered several message broker options: Redis Streams, Kafka, RabbitMQ, and NATS JetStream.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reasons for choosing NATS JetStream:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Single Binary&lt;/strong&gt; — Operates with a single &lt;code&gt;nats-server&lt;/code&gt; without requiring separate runtimes. It has no dependencies like Kafka&amp;rsquo;s ZooKeeper or RabbitMQ&amp;rsquo;s Erlang/OTP.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Built-in Persistence&lt;/strong&gt; — JetStream is a streaming layer on top of NATS, storing messages to the filesystem. This ensures that unprocessed tasks are not lost even if a worker restarts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NKey-based Authentication&lt;/strong&gt; — We can issue independent Ed25519 key pairs for each worker. If one worker is compromised, the credentials of other workers remain valid.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lightweight&lt;/strong&gt; — Memory usage is around 30MB on a single server. Even with 18 workers connected, the broker load is minimal.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id="the-core-backend-adapter-in-pollerpy"&gt;The Core: Backend Adapter in &lt;code&gt;poller.py&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;The heart of the worker is &lt;code&gt;poller.py&lt;/code&gt;. This single file handles NATS subscriptions, LLM CLI execution, and result returns.&lt;/p&gt;
&lt;p&gt;Since LLMs have different execution methods, we separated them into a backend adapter dictionary.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;_BACKENDS: dict[str, dict] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;claude&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;bin&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;CLAUDE_BIN&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;/usr/local/bin/claude&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;tools&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;ALLOWED_TOOLS&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;Read,Edit,Write,Glob,Grep&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;CLAUDE_MODEL&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;codex&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;bin&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;CODEX_BIN&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;/usr/bin/codex&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;CODEX_MODEL&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;sandbox&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;CODEX_SANDBOX&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;read-only&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;gemini_cli&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;bin&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;GEMINI_BIN&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;/usr/bin/gemini&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;: os&lt;span style="color:#f92672"&gt;.&lt;/span&gt;environ&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;GEMINI_MODEL&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;MODEL_BACKEND&lt;/code&gt; environment variable determines which LLM to use. This allows the same &lt;code&gt;poller.py&lt;/code&gt; code to run different LLMs across 18 workers.&lt;/p&gt;
&lt;h3 id="claude-backend"&gt;Claude Backend
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;async&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;run_claude&lt;/span&gt;(instructions: str, task_id: str) &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; tuple[int, str]:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cfg &lt;span style="color:#f92672"&gt;=&lt;/span&gt; _BACKENDS[&lt;span style="color:#e6db74"&gt;&amp;#34;claude&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cmd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; [cfg[&lt;span style="color:#e6db74"&gt;&amp;#34;bin&amp;#34;&lt;/span&gt;], &lt;span style="color:#e6db74"&gt;&amp;#34;--print&amp;#34;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;--allowedTools&amp;#34;&lt;/span&gt;, cfg[&lt;span style="color:#e6db74"&gt;&amp;#34;tools&amp;#34;&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; cfg&lt;span style="color:#f92672"&gt;.&lt;/span&gt;get(&lt;span style="color:#e6db74"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cmd &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; [&lt;span style="color:#e6db74"&gt;&amp;#34;--model&amp;#34;&lt;/span&gt;, cfg[&lt;span style="color:#e6db74"&gt;&amp;#34;model&amp;#34;&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; proc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;await&lt;/span&gt; asyncio&lt;span style="color:#f92672"&gt;.&lt;/span&gt;create_subprocess_exec(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;cmd, stdin&lt;span style="color:#f92672"&gt;=&lt;/span&gt;PIPE, stdout&lt;span style="color:#f92672"&gt;=&lt;/span&gt;PIPE, stderr&lt;span style="color:#f92672"&gt;=&lt;/span&gt;PIPE)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;--print&lt;/code&gt; flag is key. It runs Claude Code in non-interactive mode instead of conversational mode, ensuring the results are returned via stdout.&lt;/p&gt;
&lt;h3 id="zai-backend"&gt;ZAI Backend
&lt;/h3&gt;&lt;p&gt;ZAI offers an Anthropic API-compatible endpoint, so it doesn&amp;rsquo;t require a separate backend. Routing is handled by two environment variables.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-ini" data-lang="ini"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# /etc/agentforge/cc-zai-high-dev-01.env&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;lt;ZAI endpoint&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;lt;ZAI API key&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By injecting this file using systemd&amp;rsquo;s &lt;code&gt;EnvironmentFile=&lt;/code&gt; directive, the &lt;code&gt;claude&lt;/code&gt; binary sends requests to the ZAI endpoint. This allows us to connect to a different LLM provider simply by changing environment variables, without altering the code.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="declarative-management-fleetyaml--serversyaml"&gt;Declarative Management: &lt;code&gt;fleet.yaml&lt;/code&gt; × &lt;code&gt;servers.yaml&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;Manually managing 18 workers is impractical. We declaratively defined the entire infrastructure using two YAML files.&lt;/p&gt;
&lt;h3 id="serversyaml--server-inventory"&gt;&lt;code&gt;servers.yaml&lt;/code&gt; — Server Inventory
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;servers&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-node-1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;role&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-host&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;services&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;agentforge-worker, tunnel-arm1]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;broker-host&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;role&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;broker-host&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;services&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;nats-jetstream, postgres]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-node-2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;role&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-host&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;services&lt;/span&gt;: [&lt;span style="color:#ae81ff"&gt;agentforge-worker, tunnel-arm1]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="fleetyaml--worker-placement"&gt;&lt;code&gt;fleet.yaml&lt;/code&gt; — Worker Placement
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;workers&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;worker_id&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;cc-go-dev-01&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;llm&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;claude-code&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;model&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;claude-sonnet-4-6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;lang&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;go&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;role&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;developer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;host&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-node-1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;enabled&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;create_pr&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;worker_id&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;codex-py-dev-01&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;llm&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;codex&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;model&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;gpt-5.5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;lang&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;python&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;role&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;developer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;host&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-node-1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;enabled&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;create_pr&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Changing just the &lt;code&gt;host&lt;/code&gt; field moves a worker to a different server. Setting &lt;code&gt;enabled: false&lt;/code&gt; stops the deployment script from starting that worker.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="worker-templating-system-provision_workerpy"&gt;Worker Templating System: &lt;code&gt;provision_worker.py&lt;/code&gt;
&lt;/h2&gt;&lt;p&gt;Manually writing systemd unit files for each new worker is prone to errors. We automated this using Jinja2 templates and a provisioning script.&lt;/p&gt;
&lt;h3 id="template-structure"&gt;Template Structure
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;templates/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; systemd/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; claude.service.j2 # For claude-code and ZAI alike
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; codex.service.j2 # OpenAI Codex
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gemini.service.j2 # Google Gemini CLI
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The core part of &lt;code&gt;claude.service.j2&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;MODEL_BACKEND&lt;span style="color:#f92672"&gt;=&lt;/span&gt;claude
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;CLAUDE_BIN&lt;span style="color:#f92672"&gt;=&lt;/span&gt;{{ claude_bin }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{&lt;span style="color:#f92672"&gt;%&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; claude_model &lt;span style="color:#f92672"&gt;%&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;CLAUDE_MODEL&lt;span style="color:#f92672"&gt;=&lt;/span&gt;{{ claude_model }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{&lt;span style="color:#f92672"&gt;%&lt;/span&gt; endif &lt;span style="color:#f92672"&gt;%&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{&lt;span style="color:#f92672"&gt;%&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; env_file &lt;span style="color:#f92672"&gt;%&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EnvironmentFile&lt;span style="color:#f92672"&gt;=&lt;/span&gt;{{ env_file }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{&lt;span style="color:#f92672"&gt;%&lt;/span&gt; endif &lt;span style="color:#f92672"&gt;%&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;WORK_BASE&lt;span style="color:#f92672"&gt;=&lt;/span&gt;{{ work_base }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;WORK_DIR&lt;span style="color:#f92672"&gt;=&lt;/span&gt;{{ work_base }}&lt;span style="color:#f92672"&gt;/&lt;/span&gt;repo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;{{ &amp;#39;ALLOWED_TOOLS=&amp;#39; + allowed_tools }}&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;CREATE_PR&lt;span style="color:#f92672"&gt;=&lt;/span&gt;{{ &lt;span style="color:#e6db74"&gt;&amp;#39;true&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; create_pr &lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;false&amp;#39;&lt;/span&gt; }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{&lt;span style="color:#f92672"&gt;%&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; create_pr &lt;span style="color:#f92672"&gt;and&lt;/span&gt; github_remote &lt;span style="color:#f92672"&gt;%&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Environment&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;GITHUB_REMOTE&lt;span style="color:#f92672"&gt;=&lt;/span&gt;{{ github_remote }}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{&lt;span style="color:#f92672"&gt;%&lt;/span&gt; endif &lt;span style="color:#f92672"&gt;%&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;For ZAI workers, the &lt;code&gt;env_file&lt;/code&gt; block is activated, adding the &lt;code&gt;EnvironmentFile&lt;/code&gt;. For PR creation workers, &lt;code&gt;github_remote&lt;/code&gt; is injected. Other settings use defaults.&lt;/p&gt;
&lt;h3 id="provision_workerpy-usage"&gt;&lt;code&gt;provision_worker.py&lt;/code&gt; Usage
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Preview (no actual deployment)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python3 scripts/provision_worker.py --worker new-worker-id --dry-run
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Actual deployment (including NATS creds issuance)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python3 scripts/provision_worker.py --worker new-worker-id --issue-creds
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Bulk deployment for the entire fleet.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python3 scripts/provision_worker.py --all
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Internal operations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Reads worker entries from &lt;code&gt;fleet.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Reads target hosts from &lt;code&gt;servers.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Renders Jinja2 templates.&lt;/li&gt;
&lt;li&gt;Deploys &lt;code&gt;/etc/systemd/system/{worker_id}-poller.service&lt;/code&gt; via SSH.&lt;/li&gt;
&lt;li&gt;Creates the working directory.&lt;/li&gt;
&lt;li&gt;Executes &lt;code&gt;systemctl daemon-reload &amp;amp;&amp;amp; enable --now&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;(Optional) Issues NATS NKey with &lt;code&gt;nsc add user&lt;/code&gt; → deploys creds → regenerates &lt;code&gt;auth.conf&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id="distributed-hosting-adding-workers-to-a-second-server"&gt;Distributed Hosting: Adding Workers to a Second Server
&lt;/h2&gt;&lt;p&gt;Running all workers on a single server creates a single point of failure. We added Claude workers to a second host.&lt;/p&gt;
&lt;p&gt;The method for workers on the second host to connect to the NATS broker is via an autossh tunnel.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-ini" data-lang="ini"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;[Unit]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Description&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;NATS Broker Tunnel&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;After&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;network-online.target&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;[Service]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExecStart&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;/usr/bin/autossh -N \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; -L 4222:127.0.0.1:4222 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; -i /home/ubuntu/.ssh/id_ed25519 \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; broker-host&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Restart&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;always&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RestartSec&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;With this configuration active, workers always connect to &lt;code&gt;nats://127.0.0.1:4222&lt;/code&gt;. They don&amp;rsquo;t need to know the broker host&amp;rsquo;s address. As long as the tunnel is alive, it works the same way from any host.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="nats-credential-operations-experience"&gt;NATS Credential Operations Experience
&lt;/h2&gt;&lt;p&gt;NATS NKey management was the most complex part of the implementation.&lt;/p&gt;
&lt;p&gt;NATS JetStream&amp;rsquo;s authentication structure is hierarchical.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Operator (Root Signing Authority)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; └── Account: SYS (System Account)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; └── Account: Services (Worker Account)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ├── User: cc-dev-01
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ├── User: cc-go-dev-01
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ├── User: codex-py-dev-01
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; └── ...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Each worker has an independent User NKey and can publish/subscribe within the permissions scope (&lt;code&gt;af.&amp;gt;&lt;/code&gt;, &lt;code&gt;_INBOX.&amp;gt;&lt;/code&gt;, &lt;code&gt;$JS.&amp;gt;&lt;/code&gt;) of the Services account.&lt;/p&gt;
&lt;p&gt;Adding a new worker requires the Operator&amp;rsquo;s signing key. We initially made the mistake of not backing up this key, leading to its loss. Consequently, we had to regenerate the entire Operator and replace all worker credentials en masse. The service downtime was approximately 60 seconds.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Regeneration procedure&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nsc add operator AgentForge
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nsc add account SYS
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nsc add account Services
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; worker in cc-dev-01 cc-go-dev-01 ...; &lt;span style="color:#66d9ef"&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; nsc add user --account Services --name $worker &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --allow-pub &lt;span style="color:#e6db74"&gt;&amp;#34;af.&amp;gt;,_INBOX.&amp;gt;,&lt;/span&gt;$JS&lt;span style="color:#e6db74"&gt;.&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --allow-sub &lt;span style="color:#e6db74"&gt;&amp;#34;af.&amp;gt;,_INBOX.&amp;gt;,&lt;/span&gt;$JS&lt;span style="color:#e6db74"&gt;.&amp;gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nsc generate config --mem-resolver --sys-account SYS &amp;gt; auth.new.conf
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="adding-a-new-worker-the-full-procedure"&gt;Adding a New Worker: The Full Procedure
&lt;/h2&gt;&lt;p&gt;Since the completion of this system, adding a new worker is straightforward.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 1&lt;/strong&gt;: Add an entry to &lt;code&gt;fleet.yaml&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- &lt;span style="color:#f92672"&gt;worker_id&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;my-new-worker&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;llm&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;claude-code&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;model&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;claude-haiku-4-5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;lang&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;multi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;role&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;developer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;host&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;worker-node-1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;enabled&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;create_pr&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Step 2&lt;/strong&gt;: Preview&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python3 scripts/provision_worker.py --worker my-new-worker --dry-run
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Step 3&lt;/strong&gt;: Actual Deployment&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;python3 scripts/provision_worker.py --worker my-new-worker --issue-creds
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That&amp;rsquo;s it. Template rendering, SSH deployment, NATS credential issuance, and service registration are all handled by a single command.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="next-steps"&gt;Next Steps
&lt;/h2&gt;&lt;p&gt;The current system is structured such that workers process tasks independently. Future plans include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Routing Policies&lt;/strong&gt;: Automatically selecting the appropriate worker based on task characteristics (e.g., Go code → &lt;code&gt;claude-go-dev&lt;/code&gt;, cost-first → ZAI lightweight tier).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Results Comparison Dashboard&lt;/strong&gt;: A UI to display fan-out results side-by-side.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Tracking&lt;/strong&gt;: Aggregating API call costs per worker.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The code is publicly available on GitHub.&lt;/p&gt;</description></item><item><title>I Sent the Same Coding Task to 4 AIs Simultaneously</title><link>https://blog.fcoinfup.com/post/i-sent-the-same-coding-task-to-4-ais-simultaneously/</link><pubDate>Fri, 08 May 2026 21:55:39 +0900</pubDate><guid>https://blog.fcoinfup.com/post/i-sent-the-same-coding-task-to-4-ais-simultaneously/</guid><description>&lt;p&gt;What happens when the same bug-fixing task is sent to Claude, ZAI (GLM), OpenAI Codex, and Google Gemini simultaneously?&lt;/p&gt;
&lt;p&gt;This question sparked the AgentForge project. We built a system that connects multiple LLM CLIs with the NATS JetStream message queue to process the same tasks in parallel, and in the process, we made some unexpected discoveries. This article focuses on the comparative experimental findings during the setup phase.&lt;/p&gt;
&lt;p&gt;The system&amp;rsquo;s design and implementation will be covered in Part 2.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="list-of-ais-tested"&gt;List of AIs Tested
&lt;/h2&gt;&lt;p&gt;The final configuration of 18 operational workers is as follows:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Family&lt;/th&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Notes&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;claude-sonnet-4-6&lt;/td&gt;
 &lt;td&gt;Main development worker&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Code&lt;/td&gt;
 &lt;td&gt;claude-sonnet-4-5&lt;/td&gt;
 &lt;td&gt;Previous generation comparison&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Code&lt;/td&gt;
 &lt;td&gt;claude-haiku-4-5&lt;/td&gt;
 &lt;td&gt;Lightweight &amp;amp; High-speed&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Code&lt;/td&gt;
 &lt;td&gt;claude-opus-4-6&lt;/td&gt;
 &lt;td&gt;Top-tier&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Claude Code&lt;/td&gt;
 &lt;td&gt;claude-opus-4-5&lt;/td&gt;
 &lt;td&gt;Previous generation comparison&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;ZAI (GLM)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;glm-5.1&lt;/td&gt;
 &lt;td&gt;High-tier&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ZAI (GLM)&lt;/td&gt;
 &lt;td&gt;glm-4.7&lt;/td&gt;
 &lt;td&gt;Mid-tier&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ZAI (GLM)&lt;/td&gt;
 &lt;td&gt;glm-4.5-air&lt;/td&gt;
 &lt;td&gt;Lightweight tier&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;OpenAI Codex&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;gpt-5.5&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Codex&lt;/td&gt;
 &lt;td&gt;gpt-5.4&lt;/td&gt;
 &lt;td&gt;1M context&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Codex&lt;/td&gt;
 &lt;td&gt;gpt-5.4-mini&lt;/td&gt;
 &lt;td&gt;400K context&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Codex&lt;/td&gt;
 &lt;td&gt;gpt-5.3-codex&lt;/td&gt;
 &lt;td&gt;272K context&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Google Gemini&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;gemini-2.5-flash&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Gemini&lt;/td&gt;
 &lt;td&gt;gemini-2.5-pro&lt;/td&gt;
 &lt;td&gt;High-tier&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Gemini&lt;/td&gt;
 &lt;td&gt;gemini-2.5-flash-lite&lt;/td&gt;
 &lt;td&gt;Lightweight&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The list was much shorter when we first started. It grew as we experimented with which models were available.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="discovery-1-claude-3x-series-is-already-inaccessible"&gt;Discovery 1: Claude 3.x Series is Already Inaccessible
&lt;/h2&gt;&lt;p&gt;Those who have used Claude Code for a long time might recall Claude 3.7 Sonnet, 3.5 Sonnet, and 3.5 Haiku. We attempted to add these models as workers.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;claude --model claude-3-7-sonnet-20250219 --print &lt;span style="color:#e6db74"&gt;&amp;#34;hello&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → &amp;#34;may not exist or no access&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;All three models returned the same error. The Claude 3 series reached its EOL in early 2026, and access via the Claude Code CLI has been blocked. Currently, only the 4.x series is available with a Claude Code subscription.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;: Claude workers were configured using only the 4.5/4.6 series.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="discovery-2-limited-model-selection-for-chatgpt-account-codex"&gt;Discovery 2: Limited Model Selection for ChatGPT Account Codex
&lt;/h2&gt;&lt;p&gt;The OpenAI Codex CLI authenticates with a ChatGPT Plus/Pro account or a separate API key. If authenticated via a ChatGPT account, the accessible models are limited.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;codex --model gpt-5.5-pro &lt;span style="color:#e6db74"&gt;&amp;#34;fix the bug&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → &amp;#34;Model gpt-5.5-pro is not supported with ChatGPT account&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;codex --model gpt-5.5 &lt;span style="color:#e6db74"&gt;&amp;#34;fix the bug&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → Works normally&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Models available with a ChatGPT account:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Context&lt;/th&gt;
 &lt;th&gt;Inference Level&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;gpt-5.5&lt;/td&gt;
 &lt;td&gt;1M / 1M&lt;/td&gt;
 &lt;td&gt;High&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gpt-5.4&lt;/td&gt;
 &lt;td&gt;1M / 1M&lt;/td&gt;
 &lt;td&gt;Medium&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gpt-5.4-mini&lt;/td&gt;
 &lt;td&gt;400K / 400K&lt;/td&gt;
 &lt;td&gt;Medium&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gpt-5.3-codex&lt;/td&gt;
 &lt;td&gt;272K / 400K&lt;/td&gt;
 &lt;td&gt;Medium&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;All other models, including &lt;code&gt;gpt-5.5-pro&lt;/code&gt;, returned a &amp;ldquo;not supported with ChatGPT account&amp;rdquo; error. More models are available with an API key, but that&amp;rsquo;s a different approach.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="discovery-3-gemini-cli-only-supports-25-series"&gt;Discovery 3: Gemini CLI Only Supports 2.5 Series
&lt;/h2&gt;&lt;p&gt;We tested various models with the Gemini CLI (&lt;code&gt;gemini&lt;/code&gt; binary).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gemini -p &lt;span style="color:#e6db74"&gt;&amp;#34;hello&amp;#34;&lt;/span&gt; -m gemini-2.0-flash
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → ModelNotFoundError: models/gemini-2.0-flash is not found&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gemini -p &lt;span style="color:#e6db74"&gt;&amp;#34;hello&amp;#34;&lt;/span&gt; -m gemini-1.5-pro
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → ModelNotFoundError&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gemini -p &lt;span style="color:#e6db74"&gt;&amp;#34;hello&amp;#34;&lt;/span&gt; -m gemini-2.5-flash
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → Works normally&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Gemini models accessible with the current account:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gemini-2.5-flash&lt;/code&gt; — Default recommended model&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemini-2.5-pro&lt;/code&gt; — High-tier&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gemini-2.5-flash-lite&lt;/code&gt; — Lightweight&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Versions of Gemini 2.0 and below return &lt;code&gt;ModelNotFoundError&lt;/code&gt;. While this might vary based on account plan or API key type, based on the Gemini CLI, only the 2.5 series worked reliably.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="discovery-4-zai-can-be-bypassed-with-claude-sdk"&gt;Discovery 4: ZAI Can Be Bypassed with Claude SDK
&lt;/h2&gt;&lt;p&gt;ZAI is a service that provides an endpoint compatible with the Anthropic API. This allows us to use GLM models with the Claude Code CLI by changing just two environment variables.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ANTHROPIC_BASE_URL&lt;span style="color:#f92672"&gt;=&lt;/span&gt;https://&amp;lt;ZAI endpoint&amp;gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ANTHROPIC_AUTH_TOKEN&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&amp;lt;ZAI_KEY&amp;gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;claude --model glm-5.1 --print &lt;span style="color:#e6db74"&gt;&amp;#34;fix the bug&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Since Claude Code internally uses the Anthropic Python SDK, simply overriding &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; allows calling ZAI&amp;rsquo;s GLM models with the same format. It was interesting that we could reuse the existing &lt;code&gt;claude&lt;/code&gt; backend without any separate adapter code.&lt;/p&gt;
&lt;p&gt;The three GLM models used were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;glm-5.1&lt;/code&gt; — High-tier&lt;/li&gt;
&lt;li&gt;&lt;code&gt;glm-4.7&lt;/code&gt; — Cost-performance balance&lt;/li&gt;
&lt;li&gt;&lt;code&gt;glm-4.5-air&lt;/code&gt; — Lightweight &amp;amp; High-speed&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="4-way-fan-out-comparison-test"&gt;4-Way Fan-out Comparison Test
&lt;/h2&gt;&lt;p&gt;We simultaneously issued the same Go bug-fixing task to 4 representative workers out of the 18 (Claude Sonnet, GLM-5.1, Codex gpt-5.5, Gemini 2.5 Flash).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Task: &amp;#34;fix the off-by-one error in the binary search function&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Response times (wall clock):&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Worker&lt;/th&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Response Time&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;cc-go-dev-01&lt;/td&gt;
 &lt;td&gt;claude-sonnet-4-6&lt;/td&gt;
 &lt;td&gt;~8 seconds&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;cc-zai-high-dev-01&lt;/td&gt;
 &lt;td&gt;glm-5.1&lt;/td&gt;
 &lt;td&gt;~12 seconds&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;codex-py-dev-01&lt;/td&gt;
 &lt;td&gt;gpt-5.5&lt;/td&gt;
 &lt;td&gt;~15 seconds&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gemini-py-dev-01&lt;/td&gt;
 &lt;td&gt;gemini-2.5-flash&lt;/td&gt;
 &lt;td&gt;~10 seconds&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;More interesting than the response times were the differences in their approaches. Claude tended to refactor the entire function, while Gemini preferred minimal modifications. Codex often included test code along with the fix.&lt;/p&gt;
&lt;p&gt;Of course, this is a single task result and has no statistical significance. It was a verification at the &amp;ldquo;does it actually work&amp;rdquo; level, not a benchmark.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="distributed-workers-adding-a-second-host"&gt;Distributed Workers: Adding a Second Host
&lt;/h2&gt;&lt;p&gt;If all workers are on the same server, the comparative experiment loses some of its meaning. Therefore, we added Claude workers to a second host.&lt;/p&gt;
&lt;p&gt;The method for workers to access the NATS broker (on the first host) from the second host is via an &lt;code&gt;autossh&lt;/code&gt; tunnel.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-ini" data-lang="ini"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;[Service]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExecStart&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;autossh -N -L 4222:127.0.0.1:4222 broker-host&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;By forwarding the local port 4222 to the broker, workers can connect to &lt;code&gt;nats://127.0.0.1:4222&lt;/code&gt; from any host without code changes.&lt;/p&gt;
&lt;p&gt;Advantage of this method: Workers don&amp;rsquo;t need to know where the broker is. They can always connect to &lt;code&gt;localhost:4222&lt;/code&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="most-panicked-moment-during-operation"&gt;Most Panicked Moment During Operation
&lt;/h2&gt;&lt;p&gt;The most distressing situation was losing the NATS operator signing key. NATS JetStream uses NKey-based authentication, and the operator/account&amp;rsquo;s signing key (nsc seed) is required to issue credentials for new workers.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nsc add user --account Services --name new-worker
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → &amp;#34;signing key not found&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;There was no backup. Ultimately, we had to perform a large-scale cutover, regenerating the entire NATS operator and replacing all worker credentials with a new permission tree. Service downtime was approximately 60 seconds.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lesson&lt;/strong&gt;: Always create an offline backup of the NATS operator seed immediately after generation. If it&amp;rsquo;s lost, regeneration is the only option.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="summary"&gt;Summary
&lt;/h2&gt;&lt;p&gt;Practical conclusions from this experiment:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Claude 3.x is EOL&lt;/strong&gt; - Inaccessible via Claude Code CLI as of 2026. Use only 4.x.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex ChatGPT Account Limited to 4 Models&lt;/strong&gt; - gpt-5.5, 5.4, 5.4-mini, 5.3-codex. Pro models require a separate API key.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemini Only 2.5 Series&lt;/strong&gt; - Previous versions inaccessible via CLI.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ZAI Integrable via Claude SDK Environment Variable Override&lt;/strong&gt; - No separate adapter needed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NATS NKey Must Be Backed Up&lt;/strong&gt; - Losing the signing key means reissuing everything.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The next installment will cover how these workers are connected, discussing system design and implementation.&lt;/p&gt;</description></item><item><title>AgentForge Blog Automation Service: Full Architecture - From AI Comments to Translation and Post Generation</title><link>https://blog.fcoinfup.com/post/2026-05-05-001-agentforge-blog-automation-architecture/</link><pubDate>Tue, 05 May 2026 00:30:00 +0900</pubDate><guid>https://blog.fcoinfup.com/post/2026-05-05-001-agentforge-blog-automation-architecture/</guid><description>&lt;p&gt;Running a blog involves three of the most tedious tasks: replying to comments, maintaining English translations, and consistently writing new posts. The &lt;a class="link" href="https://github.com/yarang" target="_blank" rel="noopener"
 &gt;AgentForge&lt;/a&gt; project automates all three with AI agents.&lt;/p&gt;
&lt;p&gt;This post outlines the complete architecture of our blog automation service, which operates across two servers.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="system-topology"&gt;System Topology
&lt;/h2&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;┌─────────────────────┐ HTTPS ┌─────────────────────┐
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ arm1 server │ ──────────────▶ │ ec1 server │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ (Agent Operator) │ │ (Blog Hosting) │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;├─────────────────────┤ ├─────────────────────┤
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ blog-agent (:8081) │ │ Hugo (nginx) │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├─ CommentHandler │ │ Blog API (:8000) │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ ├─ TranslateHandler│ │ ├─ translator.py │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ └─ PostGenerator │ │ ├─ blog_manager.py │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ │ │ └─ git_handler.py │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ NATS / PostgreSQL │ │ │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;│ Prometheus / Grafana │ │ Git (yarang/blogs) │
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;└─────────────────────┘ └─────────────────────┘
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Server&lt;/th&gt;
 &lt;th&gt;Role&lt;/th&gt;
 &lt;th&gt;Core Services&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;arm1&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Agent Operator&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;blog-agent.service&lt;/code&gt; — Flask + Scheduler + LLM Client&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;ec1&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Blog Hosting + API&lt;/td&gt;
 &lt;td&gt;Hugo (nginx) + &lt;code&gt;blog-api.service&lt;/code&gt; (FastAPI)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Communication between the two servers is restricted to &lt;strong&gt;HTTPS API calls only&lt;/strong&gt;. SSH access from arm1 to ec1 is blocked, so all integrations are done through the Blog API.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="arm1-unified-blog-agent"&gt;arm1: Unified Blog Agent
&lt;/h2&gt;&lt;h3 id="why-unified"&gt;Why Unified?
&lt;/h3&gt;&lt;p&gt;Initially, comment response, translation, and post generation operated as separate processes (three systemd services). The issues were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using Claude Code CLI (&lt;code&gt;--print&lt;/code&gt;) for calls resulted in a &lt;strong&gt;response time of 9.7 seconds&lt;/strong&gt; and consumed 688MB of disk space.&lt;/li&gt;
&lt;li&gt;Managing six systemd units was burdensome.&lt;/li&gt;
&lt;li&gt;No state sharing between processes was possible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By unifying these into &lt;strong&gt;one process&lt;/strong&gt; and switching to direct LLM API calls, we achieved the following:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;Before&lt;/th&gt;
 &lt;th&gt;After&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Response Time&lt;/td&gt;
 &lt;td&gt;9.7s&lt;/td&gt;
 &lt;td&gt;1.7s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Disk Usage&lt;/td&gt;
 &lt;td&gt;688MB&lt;/td&gt;
 &lt;td&gt;~50MB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;systemd Units&lt;/td&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Processes&lt;/td&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="architecture"&gt;Architecture
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;class&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;BlogAgent&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&amp;#34;1 Process = Flask (webhook) + Scheduler (timer) + LLM Client&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;__init__&lt;/span&gt;(self):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;config &lt;span style="color:#f92672"&gt;=&lt;/span&gt; AgentConfig&lt;span style="color:#f92672"&gt;.&lt;/span&gt;from_credentials()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;llm &lt;span style="color:#f92672"&gt;=&lt;/span&gt; LLMClient(self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;config) &lt;span style="color:#75715e"&gt;# ZAI glm-4.7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;api &lt;span style="color:#f92672"&gt;=&lt;/span&gt; BlogAPIClient(self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;config) &lt;span style="color:#75715e"&gt;# ec1 Blog API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Handlers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;comment &lt;span style="color:#f92672"&gt;=&lt;/span&gt; CommentHandler(self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;llm, self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;config)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;translate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; TranslateHandler(self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;api)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;post_gen &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PostGenerator(self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;llm, self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;api)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;# Scheduler&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;scheduler &lt;span style="color:#f92672"&gt;=&lt;/span&gt; Scheduler()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;scheduler&lt;span style="color:#f92672"&gt;.&lt;/span&gt;every(hours&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, task&lt;span style="color:#f92672"&gt;=&lt;/span&gt;self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;translate&lt;span style="color:#f92672"&gt;.&lt;/span&gt;check_and_sync)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;scheduler&lt;span style="color:#f92672"&gt;.&lt;/span&gt;daily_at(hour&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;, task&lt;span style="color:#f92672"&gt;=&lt;/span&gt;self&lt;span style="color:#f92672"&gt;.&lt;/span&gt;post_gen&lt;span style="color:#f92672"&gt;.&lt;/span&gt;generate_and_publish)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="module-operations"&gt;Module Operations
&lt;/h3&gt;&lt;h4 id="1-commenthandler--ai-comment-response"&gt;1. CommentHandler — AI Comment Response
&lt;/h4&gt;&lt;p&gt;Receives Webhook events from GitHub Discussions to automatically generate AI comments.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[User Comment] → GitHub Webhook → arm1 Flask → CommentHandler
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; → LLM Call (ZAI glm-4.7) → Generate Reply → Post Comment via GitHub API
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Trigger&lt;/strong&gt;: Webhook event-based (real-time)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Filtering&lt;/strong&gt;: Skips blog owner comments and AI-generated comments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security&lt;/strong&gt;: HMAC-SHA256 Webhook secret verification, Flask-Limiter applied.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="2-translatehandler--automatic-translation-trigger"&gt;2. TranslateHandler — Automatic Translation Trigger
&lt;/h4&gt;&lt;p&gt;Requests translation synchronization from ec1&amp;rsquo;s Blog API every 6 hours.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[Scheduler 6h] → TranslateHandler.check_and_sync()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; → POST /translate/sync → ec1 Blog API performs actual translation
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;arm1 does not perform the translation itself; it only sends a &lt;strong&gt;trigger&lt;/strong&gt; to the ec1 API. The actual translation logic resides in &lt;code&gt;translator.py&lt;/code&gt; on ec1.&lt;/p&gt;
&lt;h4 id="3-postgenerator--automatic-post-generation"&gt;3. PostGenerator — Automatic Post Generation
&lt;/h4&gt;&lt;p&gt;Automatically generates technical blog posts every day at 9 AM.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[Scheduler 09:00 KST] → PostGenerator.generate_and_publish()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; → Collect existing topics → Refer to RSS trends → Generate content with LLM
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; → Deduplication Check → Publish via Blog API
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Deduplication&lt;/strong&gt; is key. It compares the similarity between new titles and the last 100 existing titles using &lt;code&gt;difflib.SequenceMatcher&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;_is_duplicate_title&lt;/span&gt;(self, new_title, existing_titles):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&amp;#34;Considers it a duplicate if the ratio is &amp;gt;= 0.6&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; new_lower &lt;span style="color:#f92672"&gt;=&lt;/span&gt; new_title&lt;span style="color:#f92672"&gt;.&lt;/span&gt;lower()&lt;span style="color:#f92672"&gt;.&lt;/span&gt;strip()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; title &lt;span style="color:#f92672"&gt;in&lt;/span&gt; existing_titles[&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;:]:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ex_lower &lt;span style="color:#f92672"&gt;=&lt;/span&gt; title&lt;span style="color:#f92672"&gt;.&lt;/span&gt;lower()&lt;span style="color:#f92672"&gt;.&lt;/span&gt;strip()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ratio &lt;span style="color:#f92672"&gt;=&lt;/span&gt; difflib&lt;span style="color:#f92672"&gt;.&lt;/span&gt;SequenceMatcher(&lt;span style="color:#66d9ef"&gt;None&lt;/span&gt;, new_lower, ex_lower)&lt;span style="color:#f92672"&gt;.&lt;/span&gt;ratio()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; ratio &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.6&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="ec1-blog-api-translation-system"&gt;ec1: Blog API Translation System
&lt;/h2&gt;&lt;h3 id="transition-to-gemini"&gt;Transition to Gemini
&lt;/h3&gt;&lt;p&gt;Initially, translations were performed using ZAI (glm-4.7), but a critical issue arose:&lt;/p&gt;

 &lt;blockquote&gt;
 &lt;p&gt;glm-4.7 is a &lt;strong&gt;reasoning model&lt;/strong&gt;, which first consumes its &lt;code&gt;max_tokens&lt;/code&gt; budget for &lt;code&gt;reasoning_content&lt;/code&gt; (internal thought process). If &lt;code&gt;max_tokens=256&lt;/code&gt;, it uses all 256 tokens for reasoning, leaving the actual &lt;code&gt;content&lt;/code&gt; as an empty string.&lt;/p&gt;

 &lt;/blockquote&gt;
&lt;p&gt;This led to an incident where &lt;strong&gt;nine English posts were translated with empty string titles&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Solution: Replaced with &lt;strong&gt;Gemini 2.5 Flash Lite&lt;/strong&gt;.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Item&lt;/th&gt;
 &lt;th&gt;ZAI (Previous)&lt;/th&gt;
 &lt;th&gt;Gemini (Current)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Model&lt;/td&gt;
 &lt;td&gt;glm-4.7 (reasoning)&lt;/td&gt;
 &lt;td&gt;gemini-2.5-flash-lite&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Translation Time&lt;/td&gt;
 &lt;td&gt;~30s/post&lt;/td&gt;
 &lt;td&gt;~8s/post&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Cost&lt;/td&gt;
 &lt;td&gt;Paid API&lt;/td&gt;
 &lt;td&gt;Free (1,500 requests/day)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Empty Response Issue&lt;/td&gt;
 &lt;td&gt;Occurred&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="openai-compatible-endpoint"&gt;OpenAI-Compatible Endpoint
&lt;/h3&gt;&lt;p&gt;Gemini provides an OpenAI-compatible API. The existing code can be used &lt;strong&gt;without any changes&lt;/strong&gt; by simply switching the base URL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LLM_BASE_URLS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;GEMINI&amp;#34;&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;https://generativelanguage.googleapis.com/v1beta/openai&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;ZAI&amp;#34;&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;https://api.z.ai/api/coding/paas/v4&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="translation-matching-logic"&gt;Translation Matching Logic
&lt;/h3&gt;&lt;p&gt;Pairing Korean↔English posts uses &lt;strong&gt;date prefix matching&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ko: 2026-05-04-001-개발-생산성-17배-극대화-deepseek-v4와-...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;en: 2026-05-04-001-개발-생산성-17배-극대화-deepseek-v4와-...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ↑ Same prefix = Same post
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Although the slugs might differ in language, if the &lt;code&gt;YYYY-MM-DD-NNN&lt;/code&gt; part is the same, it&amp;rsquo;s recognized as the same post. The prerequisite for this method is that &lt;strong&gt;no two posts with the same date and number exist&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="title-in-body-translation-technique"&gt;Title-in-Body Translation Technique
&lt;/h3&gt;&lt;p&gt;Translating the title via a separate API call caused issues with empty results from the reasoning model. The solution is to &lt;strong&gt;include the title as the first line of the body&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# When requesting translation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;prompt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;f&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;# &lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;original_title&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\n\n&lt;/span&gt;&lt;span style="color:#e6db74"&gt;{&lt;/span&gt;original_body&lt;span style="color:#e6db74"&gt;}&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Extracting the title from the translation result&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; translated&lt;span style="color:#f92672"&gt;.&lt;/span&gt;lstrip()&lt;span style="color:#f92672"&gt;.&lt;/span&gt;startswith(&lt;span style="color:#e6db74"&gt;&amp;#34;# &amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lines &lt;span style="color:#f92672"&gt;=&lt;/span&gt; translated&lt;span style="color:#f92672"&gt;.&lt;/span&gt;lstrip()&lt;span style="color:#f92672"&gt;.&lt;/span&gt;split(&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\n&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; extracted_title &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lines[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;]&lt;span style="color:#f92672"&gt;.&lt;/span&gt;lstrip(&lt;span style="color:#e6db74"&gt;&amp;#34;# &amp;#34;&lt;/span&gt;)&lt;span style="color:#f92672"&gt;.&lt;/span&gt;strip()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; translated_body &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lines[&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;]&lt;span style="color:#f92672"&gt;.&lt;/span&gt;lstrip(&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\n&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This translates the title and body simultaneously in a single API call, preserving context and saving tokens.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="llm-strategy-role-based-model-separation"&gt;LLM Strategy: Role-Based Model Separation
&lt;/h2&gt;&lt;p&gt;Not all tasks are handled by a single LLM. Models are separated based on the nature of the task.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Task&lt;/th&gt;
 &lt;th&gt;Server&lt;/th&gt;
 &lt;th&gt;Model&lt;/th&gt;
 &lt;th&gt;Reason&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;AI Comment Response&lt;/td&gt;
 &lt;td&gt;arm1&lt;/td&gt;
 &lt;td&gt;ZAI glm-4.7&lt;/td&gt;
 &lt;td&gt;Conversational, excellent Korean quality&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Post Generation&lt;/td&gt;
 &lt;td&gt;arm1&lt;/td&gt;
 &lt;td&gt;ZAI glm-4.7&lt;/td&gt;
 &lt;td&gt;Long-form content generation, creativity required&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Translation (ko→en)&lt;/td&gt;
 &lt;td&gt;ec1&lt;/td&gt;
 &lt;td&gt;Gemini Flash Lite&lt;/td&gt;
 &lt;td&gt;Non-reasoning, fast and free&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Core Principle: &lt;strong&gt;Do not use reasoning models for translation&lt;/strong&gt;. Reasoning models consume tokens for internal thought processes, making non-reasoning models more suitable for simple conversion tasks.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="monitoring-and-operations"&gt;Monitoring and Operations
&lt;/h2&gt;&lt;h3 id="health-check-endpoints"&gt;Health Check Endpoints
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# arm1 agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl http://arm1:8081/health
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → {&amp;#34;status&amp;#34;:&amp;#34;healthy&amp;#34;,&amp;#34;agent&amp;#34;:&amp;#34;blog-agent&amp;#34;,&amp;#34;scheduler_jobs&amp;#34;:2,&amp;#34;uptime_sec&amp;#34;:...}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl http://arm1:8081/status
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → {&amp;#34;scheduler&amp;#34;:[{&amp;#34;name&amp;#34;:&amp;#34;auto-translate&amp;#34;,&amp;#34;last_run&amp;#34;:...},{&amp;#34;name&amp;#34;:&amp;#34;post-generator&amp;#34;,&amp;#34;last_run&amp;#34;:&amp;#34;2026-05-04&amp;#34;}]}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# ec1 Blog API&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;curl https://blog.example.com/api/health
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# → {&amp;#34;status&amp;#34;:&amp;#34;healthy&amp;#34;,&amp;#34;version&amp;#34;:&amp;#34;2.0.0&amp;#34;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="observability-points"&gt;Observability Points
&lt;/h3&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;Normal Range&lt;/th&gt;
 &lt;th&gt;Alert Condition&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;arm1 uptime&lt;/td&gt;
 &lt;td&gt;&amp;gt;0&lt;/td&gt;
 &lt;td&gt;Service Down&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;scheduler_jobs&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;≠ 2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Translation Sync&lt;/td&gt;
 &lt;td&gt;ko post count = en post count&lt;/td&gt;
 &lt;td&gt;Discrepancy occurs&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Post Generation&lt;/td&gt;
 &lt;td&gt;1 post daily&lt;/td&gt;
 &lt;td&gt;No posts for over 24 hours&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="lessons-learned-and-operational-tips"&gt;Lessons Learned and Operational Tips
&lt;/h2&gt;&lt;h3 id="1-the-pitfall-of-reasoning-models"&gt;1. The Pitfall of Reasoning Models
&lt;/h3&gt;&lt;p&gt;It&amp;rsquo;s often not explicitly stated in documentation that &lt;code&gt;max_tokens&lt;/code&gt; &lt;strong&gt;combines&lt;/strong&gt; reasoning and content. If you get an empty response, check the &lt;code&gt;finish_reason&lt;/code&gt;—if it&amp;rsquo;s &lt;code&gt;&amp;quot;length&amp;quot;&lt;/code&gt;, it indicates insufficient token budget.&lt;/p&gt;
&lt;h3 id="2-value-of-the-openai-compatible-pattern"&gt;2. Value of the OpenAI-Compatible Pattern
&lt;/h3&gt;&lt;p&gt;When switching translation providers from ZAI to Gemini, the code change was just &lt;strong&gt;one line for the base URL&lt;/strong&gt;. Abstracting to an OpenAI-compatible interface from the start dramatically reduces LLM replacement costs.&lt;/p&gt;
&lt;h3 id="3-constraints-of-date-prefix-matching"&gt;3. Constraints of Date Prefix Matching
&lt;/h3&gt;&lt;p&gt;In the &lt;code&gt;YYYY-MM-DD-NNN&lt;/code&gt; pattern, if two or more posts share the same date and number, translation matching will break. The &lt;code&gt;PostGenerator&lt;/code&gt; must include logic to check the last number for that date and increment it when generating new posts.&lt;/p&gt;
&lt;h3 id="4-benefits-of-process-consolidation"&gt;4. Benefits of Process Consolidation
&lt;/h3&gt;&lt;p&gt;Consolidating three independent services into one resulted in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;State Sharing (LLM clients, configurations, API clients initialized only once)&lt;/li&gt;
&lt;li&gt;Simplified Deployment (one systemd unit)&lt;/li&gt;
&lt;li&gt;Easier Debugging (logs consolidated in one place)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="future-plans"&gt;Future Plans
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Review the integration of arm1 agent&amp;rsquo;s LLM with Gemini.&lt;/li&gt;
&lt;li&gt;Comment Quality Evaluation Pipeline (monitoring the appropriateness of auto-generated comments).&lt;/li&gt;
&lt;li&gt;Automatic Translation Quality Verification (comparing with back-translation).&lt;/li&gt;
&lt;li&gt;Expanding inter-agent collaboration through the AgentForge framework.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;Blog automation aims not for &amp;ldquo;complete automation,&amp;rdquo; but for &amp;ldquo;minimal human intervention.&amp;rdquo; A structure where AI generates content, humans review it, and the system alerts operators to anomalies is the key to stable operation.&lt;/p&gt;</description></item></channel></rss>