v0.2.0 · Apache 2.0

LLM agents, reconciled.

The Kubernetes operator for controlling LLM agents at scale.
Every guardrail enforced at the infrastructure level.

research-analyst.yaml
# research-analyst.yaml
spec:
  model: claude-sonnet-4-20250514
  prompt:
    inline: "You are a senior research analyst."
  tools:
    mcp:
      - name: web-search
        url: https://search.mcp.internal/sse
        auth:
          bearer:
            secretKeyRef: { name: mcp-tokens, key: search }
  guardrails:
    tools:
      allow: ["web-search/*"]
      deny: ["shell/*"]
    budgetRef:
      name: monthly-10k
  runtime:
    replicas: 3
  observability:
    healthCheck:
      type: semantic
      prompt: "Reply OK if ready."

AI agents are everywhere. Control over them is nowhere.

$
No spend limits

Teams burn through LLM budgets with no enforcement. You find out when the invoice arrives.

!
No governance

Agents call arbitrary tools, access any API and produce unvalidated output. No audit trail.

*
No isolation

API keys in env vars, no namespace boundaries, no network policies. Every agent is a blast radius.

%
No standard

Every team deploys agents differently. No unified health checks, scaling, or incident response.

Full control. Every layer.

Ten concerns your platform team needs to manage. All declarative. All enforced by the operator.

Define

Agents as Kubernetes resources. RBAC, GitOps and namespace isolation built in.

SwarmAgent

Connect

MCP servers with auth enforcement. Dynamic tool discovery at runtime.

SwarmAgent

Orchestrate

Pipeline DAGs with validation gates. Output verified before it flows downstream.

SwarmTeam

Govern

Audit, warn, or enforce. Test policies before hard-enforcing. No agent can opt out.

SwarmPolicy

Control cost

Rolling budgets with warning thresholds. Hard stop at admission. Thinking token caps per call.

SwarmBudget

Define

Agents as Kubernetes resources. RBAC, GitOps and namespace isolation built in.

SwarmAgent

Connect

MCP servers with auth enforcement. Dynamic tool discovery at runtime.

SwarmAgent

Orchestrate

Pipeline DAGs with validation gates. Output verified before it flows downstream.

SwarmTeam

Govern

Audit, warn, or enforce. Test policies before hard-enforcing. No agent can opt out.

SwarmPolicy

Control cost

Rolling budgets with warning thresholds. Hard stop at admission. Thinking token caps per call.

SwarmBudget

Remember

Persistent memory across runs. Redis, Qdrant, or pgvector. Agents learn, not just execute.

SwarmMemory

Optimize

Tool result sandboxing cuts tokens by 53%. Context compression when the window fills.

SwarmAgent

Scale

Demand-driven scaling on queue depth. Scale to zero when idle.

SwarmAgent

Audit

Causal chain tracing with W3C context. Structured JSON, OTel-native, redaction rules.

SwarmRun

Extend

gRPC plugin escape hatches. Bring your own LLM provider or queue backend.

SwarmRegistry

Remember

Persistent memory across runs. Redis, Qdrant, or pgvector. Agents learn, not just execute.

SwarmMemory

Optimize

Tool result sandboxing cuts tokens by 53%. Context compression when the window fills.

SwarmAgent

Scale

Demand-driven scaling on queue depth. Scale to zero when idle.

SwarmAgent

Audit

Causal chain tracing with W3C context. Structured JSON, OTel-native, redaction rules.

SwarmRun

Extend

gRPC plugin escape hatches. Bring your own LLM provider or queue backend.

SwarmRegistry

Three orchestration modes. One resource.

Pipeline DAGs for deterministic chains, routed dispatch when an LLM picks the specialist, dynamic delegation when agents call each other at runtime. All three are SwarmTeam in YAML

01

Deterministic chains with validation gates

Ordered steps with dependsOn.
Parallel branches, quality gates and a revision loop - all declared in YAML.

quality fail → redraft TriggerSwarmEventCoordinatorllama3.2 canDelegate: [*] dynamic dispatch Researchergpt-4o-mini web-search · fetch Fact-checkclaude-haiku-4-5 verify · sources Analystgpt-4o-minidata · chartsWriterclaude-sonnet-4-6draft · formatQualityconditionalpass / retryEditorclaude-opus-4-6 review · publish
Waiting for trigger…
02

One request, one specialist

A router LLM classifies each request and picks the single best agent. One hop. Like a load balancer with brains.

Requestuser inputRouter LLMclaude-haiku-4-5 picks specialist classificationbilling-agentspecialisttechnical-agentspecialistaccount-agentspecialistgeneralistfallbackResponseto caller
Waiting for request…
03

Agents delegate in chains

No predefined order. Each agent decides at runtime who to call next. Multi-hop delegation - the path emerges from the task.

results bubble back to coordinator SwarmRegistrycapability indexCoordinatorentrypointdelegate(*)Researcherruntime discoveryweb-searchAnalystruntime discoverydata · chartsSummarizerruntime discoverycondense
Idle…
+ Gateway

Need a single entrypoint?

One agent becomes the front door for external requests. It wraps any of the three modes above behind a single endpoint. Optional - add it when your organization needs one URL for the swarm.

gateway-agent.yaml
# SwarmAgent as gateway entrypoint
spec:
  gateway:
    registryRef: { name: platform-registry }
    dispatchMode: enabled
    maxDispatchDepth: 3
    fallback: { mode: answer-directly }

Built for compliance, not demos.

Agent frameworks give you building blocks. kubeswarm gives your organization control.

Security & governance

  • Network policies auto-generated

    Every agent pod gets a NetworkPolicy scoped to its declared MCP servers and queue backend. No manual YAML.

  • Pod security hardened

    Non-root user, read-only root filesystem, all capabilities dropped, RuntimeDefault seccomp profile. Matches CIS benchmarks.

  • Tool allow/deny with trust levels

    Glob patterns on server/tool paths. Trust levels (internal, external, sandbox) control validation depth.

  • Prompt injection defense

    Pipeline step outputs are wrapped in structural delimiters. Downstream agents are instructed to treat them as untrusted data.

Operations & cost

  • Set a budget. Kubernetes enforces it.

    Rolling daily, weekly, or monthly budgets with configurable warning thresholds. Scoped by namespace, team, or label. Hard stop at admission - not a dashboard alert, an API-level block.

  • Circuit breakers and retry budgets

    After 5 consecutive LLM failures the circuit opens. Cooldown, then half-open probes. Retries have a per-task cap. No infinite loops.

  • Audit trail with causal chain

    Every tool call, delegation and LLM turn is recorded with parentEventID linking. Structured JSON, OTel-native, redaction rules included.

  • OTel-native observability

    Traces and metrics export via OpenTelemetry. W3C TraceContext propagation across queue boundaries. Works with Jaeger, Grafana, Datadog out of the box.

  • 53% fewer tokens per task

    Tool result sandboxing replaces large outputs with compact digests. Context compression auto-summarizes when the window fills. Measured 53% reduction on 7KB tool results.

auto-generated NetworkPolicy
# Auto-generated by the operator
kind: NetworkPolicy
metadata:
  name: research-agent-egress
spec:
  podSelector:
    matchLabels:
      kubeswarm.io/agent: research-agent
  egress:
    - to: [{ podSelector: { matchLabels:
        { app: mcp-search } } }]
      ports: [{ port: 8080 }]
SwarmBudget
# SwarmBudget - hard stop at $10/day
apiVersion: kubeswarm.io/v1alpha1
kind: SwarmBudget
metadata:
  name: daily-cap
spec:
  period: daily
  limit: 10
  currency: USD
  warnAt: 80
  hardStop: true

Semantic health checks prompt the model and evaluate the response - not just HTTP 200. The operator knows when an agent is loaded but broken. Falls back to ping mode for cost-sensitive deployments.

Runs with your stack

Kubernetes-native. No vendor lock-in.

Use the runtime, model providers, memory stores and operators your platform team already trusts.

  • KE

Framework vs. infrastructure.

Frameworks help you build agents. kubeswarm helps your organization control them.

What your org needsAgent frameworkskubeswarm
Governance
Agents as managed resourcesObjects in application codeRBAC, GitOps, namespace isolation
Spend enforcementLeft to application codeHard stop at admission - before tokens are spent
Namespace-wide policyVaries by frameworkBudgets, model restrictions, mandatory audit
Security
Tool permissionsPer-call code reviewAllow/deny lists with trust levels
Output validationLibrary helpers, opt-inRegex, schema, semantic, injection detection
Audit trailPrint statementsFull trace with automatic redaction
Operations
Multi-mode orchestrationDAG in code; routed/dynamic hand-rolledPipeline, routed, dynamic - all in YAML
Agent discoveryHardcoded handoffsAuto-indexed registry, runtime resolution
Demand-driven scalingLeft to application codeQueue-depth scaling, scale to zero
Portability
Vendor lock-inFramework SDK required in every agentNone. Delete the operator, agents keep running.
  • Governance
  • Agents as managed resources

    Agent frameworks
    Objects in application code
    kubeswarm
    RBAC, GitOps, namespace isolation
  • Spend enforcement

    Agent frameworks
    Left to application code
    kubeswarm
    Hard stop at admission - before tokens are spent
  • Namespace-wide policy

    Agent frameworks
    Varies by framework
    kubeswarm
    Budgets, model restrictions, mandatory audit
  • Security
  • Tool permissions

    Agent frameworks
    Per-call code review
    kubeswarm
    Allow/deny lists with trust levels
  • Output validation

    Agent frameworks
    Library helpers, opt-in
    kubeswarm
    Regex, schema, semantic, injection detection
  • Audit trail

    Agent frameworks
    Print statements
    kubeswarm
    Full trace with automatic redaction
  • Operations
  • Multi-mode orchestration

    Agent frameworks
    DAG in code; routed/dynamic hand-rolled
    kubeswarm
    Pipeline, routed, dynamic - all in YAML
  • Agent discovery

    Agent frameworks
    Hardcoded handoffs
    kubeswarm
    Auto-indexed registry, runtime resolution
  • Demand-driven scaling

    Agent frameworks
    Left to application code
    kubeswarm
    Queue-depth scaling, scale to zero
  • Portability
  • Vendor lock-in

    Agent frameworks
    Framework SDK required in every agent
    kubeswarm
    None. Delete the operator, agents keep running.

Operator footprint and scope.

The questions your platform team will ask before installing anything new in their cluster.

What it installs

  • Single Deployment, leader-elected for HA.
  • Default: 100m / 128Mi request, 500m / 512Mi limit.
  • If the operator goes down, agents keep serving from their last reconciled state.
  • helm uninstall removes the operator. Agent pods stay running.

What it is not

  • Not an inference server. Bring your own provider.
  • Not a model gateway or router proxy.
  • Not a vector database. Wires existing ones.
  • Not an agent framework. Your code stays in your container.
  • Not a prompt library or eval harness.

v1alpha1 CRDs. Pre-1.0. Apache 2.0. Conversion webhooks ship with every version bump. Get in touch if you're running LLM workloads on Kubernetes.

Your agents. Your cluster.
Your rules.

Open source. Apache 2.0. No vendor lock-in. Deploy on your infrastructure.

helm repo add kubeswarm https://kubeswarm.github.io/helm-charts && helm install kubeswarm kubeswarm/kubeswarm
Local CLIDev clusterProduction

Get in touch

Running LLM workloads on Kubernetes? We'd like to hear about your use case.