Agentforge on Yarang's Tech Lair

Building a Multi-LLM Distributed Orchestrator with NATS JetStream

Fri, 08 May 2026 21:57:11 +0900

Part 1 discussed the model-specific limitations discovered while running four AIs—Claude, ZAI, Codex, and Gemini—concurrently on the same tasks. This part is about “how we made it possible”—the system design and implementation story.

System Overview

AgentForge consists of three components.

[Task Publisher]
 │ NATS JetStream publish
 ▼
[NATS Broker] ─── af.worker.{id}.inbox
 │ JetStream consume (independent streams per worker)
 ▼
[Worker Pollers] × N (poller.py × 18 instances)
 │ LLM CLI Execution (claude / codex / gemini)
 ▼
[Result Return] af.task.{task_id}.completed

When a publisher posts a task to NATS, each worker, which is independently subscribed, receives the message on its inbox and executes the LLM CLI. The result is then published back to a completion topic.

Why NATS JetStream?

We considered several message broker options: Redis Streams, Kafka, RabbitMQ, and NATS JetStream.

Reasons for choosing NATS JetStream:

Single Binary — Operates with a single nats-server without requiring separate runtimes. It has no dependencies like Kafka’s ZooKeeper or RabbitMQ’s Erlang/OTP.
Built-in Persistence — JetStream is a streaming layer on top of NATS, storing messages to the filesystem. This ensures that unprocessed tasks are not lost even if a worker restarts.
NKey-based Authentication — We can issue independent Ed25519 key pairs for each worker. If one worker is compromised, the credentials of other workers remain valid.
Lightweight — Memory usage is around 30MB on a single server. Even with 18 workers connected, the broker load is minimal.

The Core: Backend Adapter in `poller.py`

The heart of the worker is poller.py. This single file handles NATS subscriptions, LLM CLI execution, and result returns.

Since LLMs have different execution methods, we separated them into a backend adapter dictionary.

_BACKENDS: dict[str, dict] = {
 "claude": {
 "bin": os.environ.get("CLAUDE_BIN", "/usr/local/bin/claude"),
 "tools": os.environ.get("ALLOWED_TOOLS", "Read,Edit,Write,Glob,Grep"),
 "model": os.environ.get("CLAUDE_MODEL", ""),
 },
 "codex": {
 "bin": os.environ.get("CODEX_BIN", "/usr/bin/codex"),
 "model": os.environ.get("CODEX_MODEL", ""),
 "sandbox": os.environ.get("CODEX_SANDBOX", "read-only"),
 },
 "gemini_cli": {
 "bin": os.environ.get("GEMINI_BIN", "/usr/bin/gemini"),
 "model": os.environ.get("GEMINI_MODEL", ""),
 },
}

The MODEL_BACKEND environment variable determines which LLM to use. This allows the same poller.py code to run different LLMs across 18 workers.

Claude Backend

async def run_claude(instructions: str, task_id: str) -> tuple[int, str]:
 cfg = _BACKENDS["claude"]
 cmd = [cfg["bin"], "--print", "--allowedTools", cfg["tools"]]
 if cfg.get("model"):
 cmd += ["--model", cfg["model"]]
 proc = await asyncio.create_subprocess_exec(*cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)

The --print flag is key. It runs Claude Code in non-interactive mode instead of conversational mode, ensuring the results are returned via stdout.

ZAI Backend

ZAI offers an Anthropic API-compatible endpoint, so it doesn’t require a separate backend. Routing is handled by two environment variables.

# /etc/agentforge/cc-zai-high-dev-01.env
ANTHROPIC_BASE_URL=<ZAI endpoint>
ANTHROPIC_AUTH_TOKEN=<ZAI API key>

By injecting this file using systemd’s EnvironmentFile= directive, the claude binary sends requests to the ZAI endpoint. This allows us to connect to a different LLM provider simply by changing environment variables, without altering the code.

Declarative Management: `fleet.yaml` × `servers.yaml`

Manually managing 18 workers is impractical. We declaratively defined the entire infrastructure using two YAML files.

`servers.yaml` — Server Inventory

servers:
 - name: worker-node-1
 role: worker-host
 services: [agentforge-worker, tunnel-arm1]

 - name: broker-host
 role: broker-host
 services: [nats-jetstream, postgres]

 - name: worker-node-2
 role: worker-host
 services: [agentforge-worker, tunnel-arm1]

`fleet.yaml` — Worker Placement

workers:
 - worker_id: cc-go-dev-01
 llm: claude-code
 model: claude-sonnet-4-6
 lang: go
 role: developer
 host: worker-node-1
 enabled: true
 create_pr: true

 - worker_id: codex-py-dev-01
 llm: codex
 model: gpt-5.5
 lang: python
 role: developer
 host: worker-node-1
 enabled: true
 create_pr: false

Changing just the host field moves a worker to a different server. Setting enabled: false stops the deployment script from starting that worker.

Worker Templating System: `provision_worker.py`

Manually writing systemd unit files for each new worker is prone to errors. We automated this using Jinja2 templates and a provisioning script.

Template Structure

templates/
 systemd/
 claude.service.j2 # For claude-code and ZAI alike
 codex.service.j2 # OpenAI Codex
 gemini.service.j2 # Google Gemini CLI

The core part of claude.service.j2:

Environment=MODEL_BACKEND=claude
Environment=CLAUDE_BIN={{ claude_bin }}
{% if claude_model %}
Environment=CLAUDE_MODEL={{ claude_model }}
{% endif %}
{% if env_file %}
EnvironmentFile={{ env_file }}
{% endif %}
Environment=WORK_BASE={{ work_base }}
Environment=WORK_DIR={{ work_base }}/repo
Environment="{{ 'ALLOWED_TOOLS=' + allowed_tools }}"
Environment=CREATE_PR={{ 'true' if create_pr else 'false' }}
{% if create_pr and github_remote %}
Environment=GITHUB_REMOTE={{ github_remote }}
{% endif %}

For ZAI workers, the env_file block is activated, adding the EnvironmentFile. For PR creation workers, github_remote is injected. Other settings use defaults.

`provision_worker.py` Usage

# Preview (no actual deployment)
python3 scripts/provision_worker.py --worker new-worker-id --dry-run

# Actual deployment (including NATS creds issuance)
python3 scripts/provision_worker.py --worker new-worker-id --issue-creds

# Bulk deployment for the entire fleet.yaml
python3 scripts/provision_worker.py --all

Internal operations:

Reads worker entries from fleet.yaml.
Reads target hosts from servers.yaml.
Renders Jinja2 templates.
Deploys /etc/systemd/system/{worker_id}-poller.service via SSH.
Creates the working directory.
Executes systemctl daemon-reload && enable --now.
(Optional) Issues NATS NKey with nsc add user → deploys creds → regenerates auth.conf.

Distributed Hosting: Adding Workers to a Second Server

Running all workers on a single server creates a single point of failure. We added Claude workers to a second host.

The method for workers on the second host to connect to the NATS broker is via an autossh tunnel.

[Unit]
Description=NATS Broker Tunnel
After=network-online.target

[Service]
ExecStart=/usr/bin/autossh -N \
 -L 4222:127.0.0.1:4222 \
 -i /home/ubuntu/.ssh/id_ed25519 \
 broker-host
Restart=always
RestartSec=10

With this configuration active, workers always connect to nats://127.0.0.1:4222. They don’t need to know the broker host’s address. As long as the tunnel is alive, it works the same way from any host.

NATS Credential Operations Experience

NATS NKey management was the most complex part of the implementation.

NATS JetStream’s authentication structure is hierarchical.

Operator (Root Signing Authority)
 └── Account: SYS (System Account)
 └── Account: Services (Worker Account)
 ├── User: cc-dev-01
 ├── User: cc-go-dev-01
 ├── User: codex-py-dev-01
 └── ...

Each worker has an independent User NKey and can publish/subscribe within the permissions scope (af.>, _INBOX.>, $JS.>) of the Services account.

Adding a new worker requires the Operator’s signing key. We initially made the mistake of not backing up this key, leading to its loss. Consequently, we had to regenerate the entire Operator and replace all worker credentials en masse. The service downtime was approximately 60 seconds.

# Regeneration procedure
nsc add operator AgentForge
nsc add account SYS
nsc add account Services
for worker in cc-dev-01 cc-go-dev-01 ...; do
 nsc add user --account Services --name $worker \
 --allow-pub "af.>,_INBOX.>,$JS.>" \
 --allow-sub "af.>,_INBOX.>,$JS.>"
done
nsc generate config --mem-resolver --sys-account SYS > auth.new.conf

Adding a New Worker: The Full Procedure

Since the completion of this system, adding a new worker is straightforward.

Step 1: Add an entry to fleet.yaml

- worker_id: my-new-worker
 llm: claude-code
 model: claude-haiku-4-5
 lang: multi
 role: developer
 host: worker-node-1
 enabled: true
 create_pr: false

Step 2: Preview

python3 scripts/provision_worker.py --worker my-new-worker --dry-run

Step 3: Actual Deployment

python3 scripts/provision_worker.py --worker my-new-worker --issue-creds

That’s it. Template rendering, SSH deployment, NATS credential issuance, and service registration are all handled by a single command.

Next Steps

The current system is structured such that workers process tasks independently. Future plans include:

Routing Policies: Automatically selecting the appropriate worker based on task characteristics (e.g., Go code → claude-go-dev, cost-first → ZAI lightweight tier).
Results Comparison Dashboard: A UI to display fan-out results side-by-side.
Cost Tracking: Aggregating API call costs per worker.

The code is publicly available on GitHub.

I Sent the Same Coding Task to 4 AIs Simultaneously

Fri, 08 May 2026 21:55:39 +0900

What happens when the same bug-fixing task is sent to Claude, ZAI (GLM), OpenAI Codex, and Google Gemini simultaneously?

This question sparked the AgentForge project. We built a system that connects multiple LLM CLIs with the NATS JetStream message queue to process the same tasks in parallel, and in the process, we made some unexpected discoveries. This article focuses on the comparative experimental findings during the setup phase.

The system’s design and implementation will be covered in Part 2.

List of AIs Tested

The final configuration of 18 operational workers is as follows:

Family	Model	Notes
Claude Code	claude-sonnet-4-6	Main development worker
Claude Code	claude-sonnet-4-5	Previous generation comparison
Claude Code	claude-haiku-4-5	Lightweight & High-speed
Claude Code	claude-opus-4-6	Top-tier
Claude Code	claude-opus-4-5	Previous generation comparison
ZAI (GLM)	glm-5.1	High-tier
ZAI (GLM)	glm-4.7	Mid-tier
ZAI (GLM)	glm-4.5-air	Lightweight tier
OpenAI Codex	gpt-5.5
Codex	gpt-5.4	1M context
Codex	gpt-5.4-mini	400K context
Codex	gpt-5.3-codex	272K context
Google Gemini	gemini-2.5-flash
Gemini	gemini-2.5-pro	High-tier
Gemini	gemini-2.5-flash-lite	Lightweight

The list was much shorter when we first started. It grew as we experimented with which models were available.

Discovery 1: Claude 3.x Series is Already Inaccessible

Those who have used Claude Code for a long time might recall Claude 3.7 Sonnet, 3.5 Sonnet, and 3.5 Haiku. We attempted to add these models as workers.

claude --model claude-3-7-sonnet-20250219 --print "hello"
# → "may not exist or no access"

All three models returned the same error. The Claude 3 series reached its EOL in early 2026, and access via the Claude Code CLI has been blocked. Currently, only the 4.x series is available with a Claude Code subscription.

Conclusion: Claude workers were configured using only the 4.5/4.6 series.

Discovery 2: Limited Model Selection for ChatGPT Account Codex

The OpenAI Codex CLI authenticates with a ChatGPT Plus/Pro account or a separate API key. If authenticated via a ChatGPT account, the accessible models are limited.

codex --model gpt-5.5-pro "fix the bug"
# → "Model gpt-5.5-pro is not supported with ChatGPT account"

codex --model gpt-5.5 "fix the bug"
# → Works normally

Models available with a ChatGPT account:

Model	Context	Inference Level
gpt-5.5	1M / 1M	High
gpt-5.4	1M / 1M	Medium
gpt-5.4-mini	400K / 400K	Medium
gpt-5.3-codex	272K / 400K	Medium

All other models, including gpt-5.5-pro, returned a “not supported with ChatGPT account” error. More models are available with an API key, but that’s a different approach.

Discovery 3: Gemini CLI Only Supports 2.5 Series

We tested various models with the Gemini CLI (gemini binary).

gemini -p "hello" -m gemini-2.0-flash
# → ModelNotFoundError: models/gemini-2.0-flash is not found

gemini -p "hello" -m gemini-1.5-pro
# → ModelNotFoundError

gemini -p "hello" -m gemini-2.5-flash
# → Works normally

Gemini models accessible with the current account:

gemini-2.5-flash — Default recommended model
gemini-2.5-pro — High-tier
gemini-2.5-flash-lite — Lightweight

Versions of Gemini 2.0 and below return ModelNotFoundError. While this might vary based on account plan or API key type, based on the Gemini CLI, only the 2.5 series worked reliably.

Discovery 4: ZAI Can Be Bypassed with Claude SDK

ZAI is a service that provides an endpoint compatible with the Anthropic API. This allows us to use GLM models with the Claude Code CLI by changing just two environment variables.

ANTHROPIC_BASE_URL=https://<ZAI endpoint> \
ANTHROPIC_AUTH_TOKEN=<ZAI_KEY> \
claude --model glm-5.1 --print "fix the bug"

Since Claude Code internally uses the Anthropic Python SDK, simply overriding ANTHROPIC_BASE_URL allows calling ZAI’s GLM models with the same format. It was interesting that we could reuse the existing claude backend without any separate adapter code.

The three GLM models used were:

glm-5.1 — High-tier
glm-4.7 — Cost-performance balance
glm-4.5-air — Lightweight & High-speed

4-Way Fan-out Comparison Test

We simultaneously issued the same Go bug-fixing task to 4 representative workers out of the 18 (Claude Sonnet, GLM-5.1, Codex gpt-5.5, Gemini 2.5 Flash).

Task: "fix the off-by-one error in the binary search function"

Response times (wall clock):

Worker	Model	Response Time
cc-go-dev-01	claude-sonnet-4-6	~8 seconds
cc-zai-high-dev-01	glm-5.1	~12 seconds
codex-py-dev-01	gpt-5.5	~15 seconds
gemini-py-dev-01	gemini-2.5-flash	~10 seconds

More interesting than the response times were the differences in their approaches. Claude tended to refactor the entire function, while Gemini preferred minimal modifications. Codex often included test code along with the fix.

Of course, this is a single task result and has no statistical significance. It was a verification at the “does it actually work” level, not a benchmark.

Distributed Workers: Adding a Second Host

If all workers are on the same server, the comparative experiment loses some of its meaning. Therefore, we added Claude workers to a second host.

The method for workers to access the NATS broker (on the first host) from the second host is via an autossh tunnel.

[Service]
ExecStart=autossh -N -L 4222:127.0.0.1:4222 broker-host

By forwarding the local port 4222 to the broker, workers can connect to nats://127.0.0.1:4222 from any host without code changes.

Advantage of this method: Workers don’t need to know where the broker is. They can always connect to localhost:4222.

Most Panicked Moment During Operation

The most distressing situation was losing the NATS operator signing key. NATS JetStream uses NKey-based authentication, and the operator/account’s signing key (nsc seed) is required to issue credentials for new workers.

nsc add user --account Services --name new-worker
# → "signing key not found"

There was no backup. Ultimately, we had to perform a large-scale cutover, regenerating the entire NATS operator and replacing all worker credentials with a new permission tree. Service downtime was approximately 60 seconds.

Lesson: Always create an offline backup of the NATS operator seed immediately after generation. If it’s lost, regeneration is the only option.

Summary

Practical conclusions from this experiment:

Claude 3.x is EOL - Inaccessible via Claude Code CLI as of 2026. Use only 4.x.
Codex ChatGPT Account Limited to 4 Models - gpt-5.5, 5.4, 5.4-mini, 5.3-codex. Pro models require a separate API key.
Gemini Only 2.5 Series - Previous versions inaccessible via CLI.
ZAI Integrable via Claude SDK Environment Variable Override - No separate adapter needed.
NATS NKey Must Be Backed Up - Losing the signing key means reissuing everything.

The next installment will cover how these workers are connected, discussing system design and implementation.

AgentForge Blog Automation Service: Full Architecture - From AI Comments to Translation and Post Generation

Tue, 05 May 2026 00:30:00 +0900

Running a blog involves three of the most tedious tasks: replying to comments, maintaining English translations, and consistently writing new posts. The AgentForge project automates all three with AI agents.

This post outlines the complete architecture of our blog automation service, which operates across two servers.

System Topology

┌─────────────────────┐ HTTPS ┌─────────────────────┐
│ arm1 server │ ──────────────▶ │ ec1 server │
│ (Agent Operator) │ │ (Blog Hosting) │
├─────────────────────┤ ├─────────────────────┤
│ blog-agent (:8081) │ │ Hugo (nginx) │
│ ├─ CommentHandler │ │ Blog API (:8000) │
│ ├─ TranslateHandler│ │ ├─ translator.py │
│ └─ PostGenerator │ │ ├─ blog_manager.py │
│ │ │ └─ git_handler.py │
│ NATS / PostgreSQL │ │ │
│ Prometheus / Grafana │ │ Git (yarang/blogs) │
└─────────────────────┘ └─────────────────────┘

Server	Role	Core Services
arm1	Agent Operator	`blog-agent.service` — Flask + Scheduler + LLM Client
ec1	Blog Hosting + API	Hugo (nginx) + `blog-api.service` (FastAPI)

Communication between the two servers is restricted to HTTPS API calls only. SSH access from arm1 to ec1 is blocked, so all integrations are done through the Blog API.

arm1: Unified Blog Agent

Why Unified?

Initially, comment response, translation, and post generation operated as separate processes (three systemd services). The issues were:

Using Claude Code CLI (--print) for calls resulted in a response time of 9.7 seconds and consumed 688MB of disk space.
Managing six systemd units was burdensome.
No state sharing between processes was possible.

By unifying these into one process and switching to direct LLM API calls, we achieved the following:

Metric	Before	After
Response Time	9.7s	1.7s
Disk Usage	688MB	~50MB
systemd Units	6	1
Processes	3	1

Architecture

class BlogAgent:
 """1 Process = Flask (webhook) + Scheduler (timer) + LLM Client"""
 
 def __init__(self):
 self.config = AgentConfig.from_credentials()
 self.llm = LLMClient(self.config) # ZAI glm-4.7
 self.api = BlogAPIClient(self.config) # ec1 Blog API
 
 # Handlers
 self.comment = CommentHandler(self.llm, self.config)
 self.translate = TranslateHandler(self.api)
 self.post_gen = PostGenerator(self.llm, self.api)
 
 # Scheduler
 self.scheduler = Scheduler()
 self.scheduler.every(hours=6, task=self.translate.check_and_sync)
 self.scheduler.daily_at(hour=9, task=self.post_gen.generate_and_publish)

Module Operations

1. CommentHandler — AI Comment Response

Receives Webhook events from GitHub Discussions to automatically generate AI comments.

[User Comment] → GitHub Webhook → arm1 Flask → CommentHandler
 → LLM Call (ZAI glm-4.7) → Generate Reply → Post Comment via GitHub API

Trigger: Webhook event-based (real-time)
Filtering: Skips blog owner comments and AI-generated comments.
Security: HMAC-SHA256 Webhook secret verification, Flask-Limiter applied.

2. TranslateHandler — Automatic Translation Trigger

Requests translation synchronization from ec1’s Blog API every 6 hours.

[Scheduler 6h] → TranslateHandler.check_and_sync()
 → POST /translate/sync → ec1 Blog API performs actual translation

arm1 does not perform the translation itself; it only sends a trigger to the ec1 API. The actual translation logic resides in translator.py on ec1.

3. PostGenerator — Automatic Post Generation

Automatically generates technical blog posts every day at 9 AM.

[Scheduler 09:00 KST] → PostGenerator.generate_and_publish()
 → Collect existing topics → Refer to RSS trends → Generate content with LLM
 → Deduplication Check → Publish via Blog API

Deduplication is key. It compares the similarity between new titles and the last 100 existing titles using difflib.SequenceMatcher:

def _is_duplicate_title(self, new_title, existing_titles):
 """Considers it a duplicate if the ratio is >= 0.6"""
 new_lower = new_title.lower().strip()
 for title in existing_titles[-100:]:
 ex_lower = title.lower().strip()
 ratio = difflib.SequenceMatcher(None, new_lower, ex_lower).ratio()
 if ratio >= 0.6:
 return True
 return False

ec1: Blog API Translation System

Transition to Gemini

Initially, translations were performed using ZAI (glm-4.7), but a critical issue arose:

glm-4.7 is a reasoning model, which first consumes its max_tokens budget for reasoning_content (internal thought process). If max_tokens=256, it uses all 256 tokens for reasoning, leaving the actual content as an empty string.

This led to an incident where nine English posts were translated with empty string titles.

Solution: Replaced with Gemini 2.5 Flash Lite.

Item	ZAI (Previous)	Gemini (Current)
Model	glm-4.7 (reasoning)	gemini-2.5-flash-lite
Translation Time	~30s/post	~8s/post
Cost	Paid API	Free (1,500 requests/day)
Empty Response Issue	Occurred	None

OpenAI-Compatible Endpoint

Gemini provides an OpenAI-compatible API. The existing code can be used without any changes by simply switching the base URL:

LLM_BASE_URLS = {
 "GEMINI": "https://generativelanguage.googleapis.com/v1beta/openai",
 "ZAI": "https://api.z.ai/api/coding/paas/v4",
}

Translation Matching Logic

Pairing Korean↔English posts uses date prefix matching:

ko: 2026-05-04-001-개발-생산성-17배-극대화-deepseek-v4와-...
en: 2026-05-04-001-개발-생산성-17배-극대화-deepseek-v4와-...
 ↑ Same prefix = Same post

Although the slugs might differ in language, if the YYYY-MM-DD-NNN part is the same, it’s recognized as the same post. The prerequisite for this method is that no two posts with the same date and number exist.

Title-in-Body Translation Technique

Translating the title via a separate API call caused issues with empty results from the reasoning model. The solution is to include the title as the first line of the body:

# When requesting translation
prompt = f"# {original_title}\n\n{original_body}"

# Extracting the title from the translation result
if translated.lstrip().startswith("# "):
 lines = translated.lstrip().split("\n", 1)
 extracted_title = lines[0].lstrip("# ").strip()
 translated_body = lines[1].lstrip("\n")

This translates the title and body simultaneously in a single API call, preserving context and saving tokens.

LLM Strategy: Role-Based Model Separation

Not all tasks are handled by a single LLM. Models are separated based on the nature of the task.

Task	Server	Model	Reason
AI Comment Response	arm1	ZAI glm-4.7	Conversational, excellent Korean quality
Post Generation	arm1	ZAI glm-4.7	Long-form content generation, creativity required
Translation (ko→en)	ec1	Gemini Flash Lite	Non-reasoning, fast and free

Core Principle: Do not use reasoning models for translation. Reasoning models consume tokens for internal thought processes, making non-reasoning models more suitable for simple conversion tasks.

Monitoring and Operations

Health Check Endpoints

# arm1 agent
curl http://arm1:8081/health
# → {"status":"healthy","agent":"blog-agent","scheduler_jobs":2,"uptime_sec":...}

curl http://arm1:8081/status
# → {"scheduler":[{"name":"auto-translate","last_run":...},{"name":"post-generator","last_run":"2026-05-04"}]}

# ec1 Blog API
curl https://blog.example.com/api/health
# → {"status":"healthy","version":"2.0.0"}

Observability Points

Metric	Normal Range	Alert Condition
arm1 uptime	>0	Service Down
scheduler_jobs	2	≠ 2
Translation Sync	ko post count = en post count	Discrepancy occurs
Post Generation	1 post daily	No posts for over 24 hours

Lessons Learned and Operational Tips

1. The Pitfall of Reasoning Models

It’s often not explicitly stated in documentation that max_tokens combines reasoning and content. If you get an empty response, check the finish_reason—if it’s "length", it indicates insufficient token budget.

2. Value of the OpenAI-Compatible Pattern

When switching translation providers from ZAI to Gemini, the code change was just one line for the base URL. Abstracting to an OpenAI-compatible interface from the start dramatically reduces LLM replacement costs.

3. Constraints of Date Prefix Matching

In the YYYY-MM-DD-NNN pattern, if two or more posts share the same date and number, translation matching will break. The PostGenerator must include logic to check the last number for that date and increment it when generating new posts.

4. Benefits of Process Consolidation

Consolidating three independent services into one resulted in:

State Sharing (LLM clients, configurations, API clients initialized only once)
Simplified Deployment (one systemd unit)
Easier Debugging (logs consolidated in one place)

Future Plans

Review the integration of arm1 agent’s LLM with Gemini.
Comment Quality Evaluation Pipeline (monitoring the appropriateness of auto-generated comments).
Automatic Translation Quality Verification (comparing with back-translation).
Expanding inter-agent collaboration through the AgentForge framework.

Blog automation aims not for “complete automation,” but for “minimal human intervention.” A structure where AI generates content, humans review it, and the system alerts operators to anomalies is the key to stable operation.