v0.2.0 · Apache 2.0
LLM agents, reconciled.
The Kubernetes operator for controlling LLM agents at scale.
Every guardrail enforced at the infrastructure level.
# research-analyst.yaml
spec:
model: claude-sonnet-4-20250514
prompt:
inline: "You are a senior research analyst."
tools:
mcp:
- name: web-search
url: https://search.mcp.internal/sse
auth:
bearer:
secretKeyRef: { name: mcp-tokens, key: search }
guardrails:
tools:
allow: ["web-search/*"]
deny: ["shell/*"]
budgetRef:
name: monthly-10k
runtime:
replicas: 3
observability:
healthCheck:
type: semantic
prompt: "Reply OK if ready."AI agents are everywhere. Control over them is nowhere.
Teams burn through LLM budgets with no enforcement. You find out when the invoice arrives.
Agents call arbitrary tools, access any API and produce unvalidated output. No audit trail.
API keys in env vars, no namespace boundaries, no network policies. Every agent is a blast radius.
Every team deploys agents differently. No unified health checks, scaling, or incident response.
Full control. Every layer.
Ten concerns your platform team needs to manage. All declarative. All enforced by the operator.
Define
Agents as Kubernetes resources. RBAC, GitOps and namespace isolation built in.
SwarmAgentConnect
MCP servers with auth enforcement. Dynamic tool discovery at runtime.
SwarmAgentOrchestrate
Pipeline DAGs with validation gates. Output verified before it flows downstream.
SwarmTeamGovern
Audit, warn, or enforce. Test policies before hard-enforcing. No agent can opt out.
SwarmPolicyControl cost
Rolling budgets with warning thresholds. Hard stop at admission. Thinking token caps per call.
SwarmBudgetDefine
Agents as Kubernetes resources. RBAC, GitOps and namespace isolation built in.
SwarmAgentConnect
MCP servers with auth enforcement. Dynamic tool discovery at runtime.
SwarmAgentOrchestrate
Pipeline DAGs with validation gates. Output verified before it flows downstream.
SwarmTeamGovern
Audit, warn, or enforce. Test policies before hard-enforcing. No agent can opt out.
SwarmPolicyControl cost
Rolling budgets with warning thresholds. Hard stop at admission. Thinking token caps per call.
SwarmBudgetRemember
Persistent memory across runs. Redis, Qdrant, or pgvector. Agents learn, not just execute.
SwarmMemoryOptimize
Tool result sandboxing cuts tokens by 53%. Context compression when the window fills.
SwarmAgentScale
Demand-driven scaling on queue depth. Scale to zero when idle.
SwarmAgentAudit
Causal chain tracing with W3C context. Structured JSON, OTel-native, redaction rules.
SwarmRunExtend
gRPC plugin escape hatches. Bring your own LLM provider or queue backend.
SwarmRegistryRemember
Persistent memory across runs. Redis, Qdrant, or pgvector. Agents learn, not just execute.
SwarmMemoryOptimize
Tool result sandboxing cuts tokens by 53%. Context compression when the window fills.
SwarmAgentScale
Demand-driven scaling on queue depth. Scale to zero when idle.
SwarmAgentAudit
Causal chain tracing with W3C context. Structured JSON, OTel-native, redaction rules.
SwarmRunExtend
gRPC plugin escape hatches. Bring your own LLM provider or queue backend.
SwarmRegistryThree orchestration modes. One resource.
Pipeline DAGs for deterministic chains, routed dispatch when an LLM picks the specialist, dynamic delegation when agents call each other at runtime. All three are SwarmTeam in YAML
Deterministic chains with validation gates
Ordered steps with dependsOn.
Parallel branches, quality gates and a revision loop - all declared in YAML.
One request, one specialist
A router LLM classifies each request and picks the single best agent. One hop. Like a load balancer with brains.
Agents delegate in chains
No predefined order. Each agent decides at runtime who to call next. Multi-hop delegation - the path emerges from the task.
Need a single entrypoint?
One agent becomes the front door for external requests. It wraps any of the three modes above behind a single endpoint. Optional - add it when your organization needs one URL for the swarm.
# SwarmAgent as gateway entrypoint
spec:
gateway:
registryRef: { name: platform-registry }
dispatchMode: enabled
maxDispatchDepth: 3
fallback: { mode: answer-directly }Built for compliance, not demos.
Agent frameworks give you building blocks. kubeswarm gives your organization control.
Security & governance
- Network policies auto-generated
Every agent pod gets a
NetworkPolicyscoped to its declared MCP servers and queue backend. No manual YAML. - Pod security hardened
Non-root user, read-only root filesystem, all capabilities dropped, RuntimeDefault seccomp profile. Matches CIS benchmarks.
- Tool allow/deny with trust levels
Glob patterns on
server/toolpaths. Trust levels (internal, external, sandbox) control validation depth. - Prompt injection defense
Pipeline step outputs are wrapped in structural delimiters. Downstream agents are instructed to treat them as untrusted data.
Operations & cost
- Set a budget. Kubernetes enforces it.
Rolling daily, weekly, or monthly budgets with configurable warning thresholds. Scoped by namespace, team, or label. Hard stop at admission - not a dashboard alert, an API-level block.
- Circuit breakers and retry budgets
After 5 consecutive LLM failures the circuit opens. Cooldown, then half-open probes. Retries have a per-task cap. No infinite loops.
- Audit trail with causal chain
Every tool call, delegation and LLM turn is recorded with
parentEventIDlinking. Structured JSON, OTel-native, redaction rules included. - OTel-native observability
Traces and metrics export via OpenTelemetry. W3C TraceContext propagation across queue boundaries. Works with Jaeger, Grafana, Datadog out of the box.
- 53% fewer tokens per task
Tool result sandboxing replaces large outputs with compact digests. Context compression auto-summarizes when the window fills. Measured 53% reduction on 7KB tool results.
# Auto-generated by the operator
kind: NetworkPolicy
metadata:
name: research-agent-egress
spec:
podSelector:
matchLabels:
kubeswarm.io/agent: research-agent
egress:
- to: [{ podSelector: { matchLabels:
{ app: mcp-search } } }]
ports: [{ port: 8080 }]# SwarmBudget - hard stop at $10/day
apiVersion: kubeswarm.io/v1alpha1
kind: SwarmBudget
metadata:
name: daily-cap
spec:
period: daily
limit: 10
currency: USD
warnAt: 80
hardStop: trueSemantic health checks prompt the model and evaluate the response - not just HTTP 200. The operator knows when an agent is loaded but broken. Falls back to ping mode for cost-sensitive deployments.
Runs with your stack
Kubernetes-native. No vendor lock-in.
Use the runtime, model providers, memory stores and operators your platform team already trusts.
- KE
Framework vs. infrastructure.
Frameworks help you build agents. kubeswarm helps your organization control them.
| What your org needs | Agent frameworks | kubeswarm |
|---|---|---|
| Governance | ||
| Agents as managed resources | Objects in application code | RBAC, GitOps, namespace isolation |
| Spend enforcement | Left to application code | Hard stop at admission - before tokens are spent |
| Namespace-wide policy | Varies by framework | Budgets, model restrictions, mandatory audit |
| Security | ||
| Tool permissions | Per-call code review | Allow/deny lists with trust levels |
| Output validation | Library helpers, opt-in | Regex, schema, semantic, injection detection |
| Audit trail | Print statements | Full trace with automatic redaction |
| Operations | ||
| Multi-mode orchestration | DAG in code; routed/dynamic hand-rolled | Pipeline, routed, dynamic - all in YAML |
| Agent discovery | Hardcoded handoffs | Auto-indexed registry, runtime resolution |
| Demand-driven scaling | Left to application code | Queue-depth scaling, scale to zero |
| Portability | ||
| Vendor lock-in | Framework SDK required in every agent | None. Delete the operator, agents keep running. |
- Governance
Agents as managed resources
- Agent frameworks
- Objects in application code
- kubeswarm
- RBAC, GitOps, namespace isolation
Spend enforcement
- Agent frameworks
- Left to application code
- kubeswarm
- Hard stop at admission - before tokens are spent
Namespace-wide policy
- Agent frameworks
- Varies by framework
- kubeswarm
- Budgets, model restrictions, mandatory audit
- Security
Tool permissions
- Agent frameworks
- Per-call code review
- kubeswarm
- Allow/deny lists with trust levels
Output validation
- Agent frameworks
- Library helpers, opt-in
- kubeswarm
- Regex, schema, semantic, injection detection
Audit trail
- Agent frameworks
- Print statements
- kubeswarm
- Full trace with automatic redaction
- Operations
Multi-mode orchestration
- Agent frameworks
- DAG in code; routed/dynamic hand-rolled
- kubeswarm
- Pipeline, routed, dynamic - all in YAML
Agent discovery
- Agent frameworks
- Hardcoded handoffs
- kubeswarm
- Auto-indexed registry, runtime resolution
Demand-driven scaling
- Agent frameworks
- Left to application code
- kubeswarm
- Queue-depth scaling, scale to zero
- Portability
Vendor lock-in
- Agent frameworks
- Framework SDK required in every agent
- kubeswarm
- None. Delete the operator, agents keep running.
Operator footprint and scope.
The questions your platform team will ask before installing anything new in their cluster.
What it installs
- Single Deployment, leader-elected for HA.
- Default:
100m / 128Mirequest,500m / 512Milimit. - If the operator goes down, agents keep serving from their last reconciled state.
helm uninstallremoves the operator. Agent pods stay running.
What it is not
- Not an inference server. Bring your own provider.
- Not a model gateway or router proxy.
- Not a vector database. Wires existing ones.
- Not an agent framework. Your code stays in your container.
- Not a prompt library or eval harness.
v1alpha1 CRDs. Pre-1.0. Apache 2.0. Conversion webhooks ship with every version bump. Get in touch if you're running LLM workloads on Kubernetes.
Your agents. Your cluster.
Your rules.
Open source. Apache 2.0. No vendor lock-in. Deploy on your infrastructure.
helm repo add kubeswarm https://kubeswarm.github.io/helm-charts && helm install kubeswarm kubeswarm/kubeswarmGet in touch
Running LLM workloads on Kubernetes? We'd like to hear about your use case.