Multi-Model AI Agent Team Design: Composed Architecture and 5-Team Hierarchy

Mon, 30 Mar 2026 00:33:34 +0900

Overview

For building a blog system, I designed a multi-model agent team consisting of 14 AI specialists, 5 teams, and 4 LLM models. The core innovations are two:

Composed Agent: Separating role definitions from execution profiles for maximum reusability
Hierarchical Bridge Leadership: Dual membership of tech leads between upper and lower teams to resolve communication bottlenecks

This post covers the final structure, model distribution strategy, and the composed architecture design process.

Background: Why Multi-Model?

Using a single LLM for all tasks creates two problems:

Cost: Running 14 specialists on a Claude Opus-level model makes costs uncontrollable
Fit: Design needs fast reasoning, security analysis needs deep logic, implementation needs stable coding

So we distributed models based on task characteristics.

Final Team Structure

5 teams, 14 specialists, 4 models.

graph TD
 subgraph UPPER["Upper — Steering Team · consensus"]
 ORC["Orchestrator
relay:steering-orchestrator"]
 DES["Architect
gemini:gemini-2.5-flash"]
 SEC["Security Reviewer
codex:gpt-4o"]
 STL["Backend Tech Lead
relay:developer-zai"]
 FTL["Frontend Tech Lead
relay:developer-zai"]
 DTL["Desktop Tech Lead
relay:developer-zai"]
 INF["Infra Network
gemini:gemini-2.5-flash"]
 SAD["Server Admin
relay:developer-zai"]
 end

 subgraph LOWER_BE["Backend Team · leader_decides"]
 BTL["Backend Tech Lead"]
 BDEV["Backend Developer"]
 end

 subgraph LOWER_FE["Frontend Team · leader_decides"]
 FTL2["Frontend Tech Lead"]
 FDEV["Frontend Developer"]
 FUX["UX Designer"]
 end

 subgraph LOWER_DT["Desktop Team · leader_decides"]
 DTL2["Desktop Tech Lead"]
 DDEV["Desktop Developer"]
 DUX["UX Designer"]
 end

 subgraph LOWER_INFRA["Infra Team · leader_decides"]
 SAD2["Server Admin (Leader)"]
 INET["Cloud Network"]
 DBA["DB Architect"]
 end

 UPPER -.->|bridge| LOWER_BE
 UPPER -.->|bridge| LOWER_FE
 UPPER -.->|bridge| LOWER_DT
 UPPER -.->|bridge| LOWER_INFRA

 BTL --> BDEV
 FTL2 --> FDEV
 FTL2 --> FUX
 DTL2 --> DDEV
 DTL2 --> DUX
 SAD2 --> INET
 SAD2 --> DBA

Team Details

Team	Type	Decision-Making	Leader	Members
Steering Team	upper	consensus	Orchestrator	8 (including bridges)
Backend Team	lower	leader_decides	Backend Tech Lead	2
Frontend Team	lower	leader_decides	Frontend Tech Lead	3
Desktop Team	lower	leader_decides	Desktop Tech Lead	3
Infra Team	lower	leader_decides	Server Admin	3

The Infra Team Separation Decision

In the initial design, DB Architect and Server Admin were part of the backend team. But we separated them based on workspace.

graph LR
 subgraph BackendTeam["Backend Team"]
 B["API Code Writing
FastAPI, Python
workspace: VS Code / SSH"]
 end

 subgraph InfraTeam["Infra Team"]
 S["Server Management
Docker, Ubuntu, Nginx
workspace: SSH Terminal"]
 N["Cloud Network
Cloudflare Dashboard
workspace: Web Console"]
 D["DB Management
PostgreSQL, Migrations
workspace: psql / SSH"]
 end

 B -.->|API deployment| S
 B -.->|Query optimization| D

When workspaces differ, separation is more natural than keeping them together.

Model Distribution Strategy

pie title Specialists by Model
 "relay:developer-zai (GLM)" : 10
 "gemini:gemini-2.5-flash" : 2
 "codex:gpt-4o" : 1
 "zai:glm-4" : 1

Model	Specialists	Purpose	Why Chosen
relay:developer-zai	10	Implementation, ops, leads	Cost-effective, stable coding
gemini:gemini-2.5-flash	2	Design, infra network	Fast response, easy external API calls
codex:gpt-4o	1	Security review	High reasoning, OWASP knowledge
zai:glm-4	1	Context compression	Free tier, text summarization specialized

By assigning 10 implementation specialists to GLM, we achieved 60-70% total cost reduction.

Composed Agent Architecture

The core innovation is separating role definitions (Expert) from execution profiles (Definition).

With the traditional approach, role and execution logic are coupled — any change requires a full rewrite, and reuse is impossible.

Composed Approach

graph TD
 DEF["Definition
Backend Developer"]
 DEF --> BASE["Base: backend-core"]
 DEF --> CAP["Capabilities:
rest-api, crud, auth-jwt"]
 DEF --> PLAT["Platform: fastapi"]
 DEF --> POL["Policy: blog-default"]

 BASE --> |"compose"| RUN["Runtime Agent"]
 CAP --> |"compose"| RUN
 PLAT --> |"compose"| RUN
 POL --> |"compose"| RUN

Module Structure

agent-library/
├── definitions/ ← 14 agent definitions
├── modules/
│ ├── base/ ← 6 base modules
│ ├── capabilities/ ← 15 capability modules
│ ├── platforms/ ← 5 platform modules
│ └── policies/ ← 1 policy
└── runs/ ← execution history

Advantages

Reusability: rest-api capability module shared by backend developer and backend tech lead
Platform swap: Change platform: fastapi to platform: django for instant switching
Capability extension: Add a new capability module and connect it to the Definition
Policy unification: All agents follow the same blog-default policy

Expert-Definition Mapping

Expert	Definition	Base	Capabilities	Platform
Backend Developer	backend-developer	backend-core	rest-api, crud, auth-jwt	fastapi
Backend Tech Lead	backend-tech-lead	backend-core	rest-api, crud, code-review	fastapi
Frontend Developer	frontend-developer	frontend-core	markdown-renderer, list-filter-sort	nextjs
Server Admin	server-administrator	server-core	docker-management, nginx-config, postgres-admin	ubuntu
Infra Network	infra-network-admin	infra-core	dns-management, ssl-certificates, rate-limiting	cloudflare
Security Reviewer	security-auditor	specialist-core	security-audit	fastapi
Context Compressor	context-compressor	specialist-core	context-compression	markdown

TLS Certificate Strategy: Cloudflare Origin CA

For production TLS certificates, we chose Cloudflare Origin CA over Let’s Encrypt.

sequenceDiagram
 participant Client as Visitor
 participant CF as Cloudflare (Proxy)
 participant Nginx as Nginx (Origin)
 participant API as FastAPI

 Client->>CF: HTTPS request
 CF->>CF: Terminate with Cloudflare-managed cert
 CF->>Nginx: Encrypt with Origin CA cert
 Nginx->>API: HTTP (local)
 API-->>Nginx: Response
 Nginx-->>CF: Encrypted with Origin CA
 CF-->>Client: Response

Item	Let’s Encrypt	Cloudflare Origin CA
Validity	90 days (renewal required)	15 years (no renewal)
Issuance	ACME automation required	Manual from Dashboard
Complexity	certbot setup	Copy cert files only

Production architecture:

Oracle Cloud ARM (4 OCPU, 24GB)
├── PostgreSQL (installed directly on host)
├── Docker Compose
│ ├── blog-api (FastAPI)
│ ├── blog-frontend (Next.js standalone)
│ ├── MinIO (S3-compatible storage)
│ └── Nginx (Cloudflare Origin CA)
└── Cloudflare Proxy (Full Strict SSL)

Relay Plugin: Agent Invocation Mechanism

The team structure runs in Claude Code through the Relay plugin.

sequenceDiagram
 participant User as User
 participant Claude as Claude Code
 participant Plugin as Relay Plugin
 participant MCP as MCP Server
 participant LLM as External LLM

 User->>Claude: /relay:invoke-agent
 Claude->>Plugin: Load definition by expert slug
 Plugin->>Plugin: Compose Definition (base + capabilities + platform + policy)
 Plugin->>Plugin: Check backed_by

 alt relay:developer-zai
 Plugin->>Claude: Run internal agent
 else gemini:*
 Plugin->>MCP: Call gemini_mcp server
 MCP->>LLM: Gemini API
 LLM-->>MCP: Response
 MCP-->>Plugin: Result
 else codex:*
 Plugin->>MCP: Call codex_mcp server
 MCP->>LLM: OpenAI API
 LLM-->>MCP: Response
 MCP-->>Plugin: Result
 end

 Plugin-->>Claude: Final result
 Claude-->>User: Response

backed_by Namespaces

Namespace	MCP Server	Purpose
`relay:developer-zai`	internal agent	Implementation, ops (low cost)
`relay:steering-orchestrator`	internal agent	Coordination, final decisions
`gemini:gemini-2.5-flash`	gemini_mcp	Design, external APIs
`codex:gpt-4o`	codex_mcp	Security analysis
`zai:glm-4`	zai_mcp	Context compression

Design Decision Log

Decision	Alternative	Rationale
Separate infra team	Include in backend	Different workspaces (SSH vs IDE)
Cloudflare Origin CA	Let’s Encrypt	15-year validity, no renewal
PostgreSQL on host	Docker container	Memory efficiency on single server
Composed agent	Single-definition agent	Module reusability, easy platform swap
Assign many GLM	Assign many Claude	60-70% cost reduction

Retrospective: What I Learned

1. “Executable Structure” over “Perfect Structure”

Trying to design everything perfectly can prevent you from ever starting. It’s better to compromise and improve as you execute.

2. Workspace Defines Team Boundaries

People who write code and people who manage servers have different physical work environments — that’s a natural team boundary.

3. The Value of Composed Architecture

In an environment with 14 specialists, 5 teams, and 4 models interacting, module separation is essential. It minimizes the scope of changes and maximizes reusability.

4. Cost is Determined at Design Time

Asking “does this task really need a high-cost model?” every time naturally optimizes costs.

Next Steps

Start Phase 1 implementation: DB, Auth, Post/Category CRUD, Docker
Share team operation experience: Problems encountered during actual execution
Performance monitoring: Response time per model, cost vs quality analysis

This post summarizes my experience building an AI agent team using Claude Code + Relay plugin. It reflects learnings from applying this to a real project — different approaches may be more suitable depending on your situation.

Gpt on Yarang's Tech Lair