Run AI agents on Kubernetes: a support triage pipeline with budget guardrails
TL;DR - Two AI agents classify support tickets and draft replies on Kubernetes. Runs on a local 7B model. Total cost per ticket: 1,612 tokens, $0.00 (compute costs for self-hosted inference still apply). A namespace policy blocks unauthorized models and tools at admission time. All YAML files are in the examples directory.
You already have ticket automation. Zendesk triggers, Intercom bots, maybe a custom rule engine your team built three years ago. It matches keywords, routes to queues, fires templates.
And it gets things wrong constantly.
A customer writes "I can't access my dashboard, we have a client demo in 2 hours" and your automation routes it to the billing queue because the word "account" appeared somewhere in the thread. The template replies with "we'll look into it within 24 hours." The customer's demo fails.
The problem isn't that you lack automation. It's that your automation can't read. It pattern-matches keywords. It doesn't understand that "demo in 2 hours" means this is critical, or that "blank white screen after login" is a bug, not a billing issue.
An LLM agent actually reads the ticket. It understands context, urgency, and intent. But running LLM agents in production creates new problems: cost control, model governance, audit trails, tool restrictions. That's the gap kubeswarm fills.
| Rule-based automation | kubeswarm pipeline | |
|---|---|---|
| Classification | Keyword matching | Reads and understands context |
| Priority | Static rules | Infers urgency from content |
| Responses | Templates | Contextual, specific to the issue |
| Changing logic | Edit flowcharts in admin UI | Change the prompt |
| Cost control | N/A | Token budgets enforced per namespace |
| Audit | Logs which rule fired | Tracks every token, every agent, every run |
Prerequisites
You need a Kubernetes cluster with kubeswarm installed. See the installation guide for setup.
Two agents, one pipeline, zero API cost
First, create the namespace and a Secret with your LLM endpoint:
kubectl create namespace support-triage
kubectl create secret generic ollama-credentials \
-n support-triage \
--from-literal=LLM_ENDPOINT=http://ollama.ollama.svc:11434
The classifier reads the ticket and outputs a structured JSON with category, priority, and summary.
# classifier-agent.yaml
spec:
model: qwen2.5:7b
prompt:
inline: |
You are a support ticket classifier. For every ticket you receive,
respond with ONLY a JSON object:
{
"category": "<billing|technical|bug|feature-request>",
"priority": "<low|medium|high|critical>",
"summary": "<one sentence summary>"
}
guardrails:
limits:
tokensPerCall: 500
timeoutSeconds: 30
500 tokens. 30 second timeout. That's all a classifier needs.
The responder takes that classification and drafts an empathetic customer reply.
# responder-agent.yaml
spec:
model: qwen2.5:7b
prompt:
inline: |
You are a customer support agent. You receive a ticket with its
classification. Write a helpful, empathetic reply under 150 words.
guardrails:
limits:
tokensPerCall: 2000
timeoutSeconds: 60
Both agents reference the Secret you created earlier via infrastructure.envFrom -
see the full YAML files for the complete setup.
kubectl apply -f classifier-agent.yaml
kubectl apply -f responder-agent.yaml
kubectl get swarmagents -n support-triage
NAME MODEL REPLICAS READY AGE
ticket-classifier qwen2.5:7b 1 1 5s
ticket-responder qwen2.5:7b 1 1 5s
Two agents. Ready in seconds.
Building a multi-agent pipeline with SwarmTeam
Agents alone are just pods. A SwarmTeam wires them into a pipeline where one agent's output feeds into the next.
# support-team.yaml
spec:
roles:
- name: classifier
swarmAgent: ticket-classifier
- name: responder
swarmAgent: ticket-responder
pipeline:
- role: classifier
inputs:
ticket: "{{ .input.ticket }}"
- role: responder
dependsOn: [classifier]
inputs:
classification: "{{ .steps.classifier.output }}"
ticket: "{{ .input.ticket }}"
Think of it like a Makefile target with dependencies. The responder declares
dependsOn: [classifier], so it waits for the classifier to finish. The{{ .steps.classifier.output }}template is how data flows between steps.
Running a real support ticket through the pipeline
Here's a real ticket. Sarah's team can't access their dashboard, client demo in two hours.
# sample-run.yaml
spec:
teamRef: support-triage
input:
ticket: |
Subject: Can't access my dashboard since this morning
I've been using your platform for 3 months and everything was fine
until today. When I try to log in, I get a blank white screen after
entering my credentials. Tried Chrome, Firefox, incognito. Nothing.
This is blocking my entire team - we have a demo with a client
in 2 hours and need access to our analytics dashboard urgently.
Sarah Chen, Acme Corp - Enterprise Plan
kubectl apply -f sample-run.yaml
kubectl get swarmrun ticket-001 -n support-triage -w
NAME PHASE AGE
ticket-001 Pending 0s
ticket-001 Running 2s
ticket-001 Succeeded 20s
The classifier returns:
{
"category": "technical",
"priority": "high",
"summary": "Blank white screen on dashboard login from Chrome and Firefox"
}
The responder drafts:
Hi Sarah,
Thank you for reaching out and I'm sorry to hear about the issue on your dashboard. We've marked this as a high-priority task.
Could you please provide us with:
- Your exact version of Chrome/Firefox
- Any error messages shown (if any)
- A screenshot if possible
This will help us diagnose the problem faster. Rest assured, we are escalating this and will keep you updated.
Support Team
Category: correct. Priority: correct. Response: empathetic and actionable. From a 7B model running locally. Total cost: 1,612 tokens, $0.00.
AI agent guardrails: budget limits and model policies on Kubernetes
Your triage pipeline works. You deploy it for the team. Then three things happen:
- Someone deploys a "helper" agent using GPT-4 that burns through $200 in a day
- Another agent gets access to a shell tool and starts executing commands from ticket text
- Nobody knows which agent processed which ticket, or how many tokens it cost
This is where kubeswarm is different from a Python script. A SwarmPolicy enforces rules at the namespace level - before the pod even starts:
# budget-policy.yaml
spec:
enforcementMode: Enforce
limits:
maxDailyTokens: 500000
maxTokensPerCall: 2000
maxTimeoutSeconds: 120
tools:
deny:
- "*"
models:
allowed:
- "qwen*"
- "gpt-oss-*"
| Rule | What it does |
|---|---|
maxDailyTokens: 500000 | Hard ceiling on daily spend per namespace |
maxTokensPerCall: 2000 | No single call burns more than 2K tokens |
tools.deny: ["*"] | No tools at all - these agents read and write text, nothing else |
models.allowed | Only approved models can be deployed |
Try to sneak in a different model:
Error from server (Forbidden): model "gpt-4o" is not allowed
by SwarmPolicy "support-budget" (allowed: qwen*, gpt-oss-*)
Rejected at the API level. Not at runtime. Not after it already spent your money. At admission time. The pod never starts.
Connecting agents to Jira, Zendesk, or Slack via MCP
In production the agent should create the ticket itself, not just draft a reply. That's what MCP tools are for. Add an MCP server to the responder agent:
# give the responder access to your ticketing system
spec:
tools:
mcp:
- name: jira
url: "http://jira-mcp-server.support-triage.svc:8080/sse"
You'll also need to update the SwarmPolicy - the policy we created earlier
denies all tools with tools.deny: ["*"]. Narrow the deny list to only block
what you don't want:
# updated policy - block shell and filesystem tools, allow everything else
spec:
tools:
deny:
- "shell/*"
- "filesystem/*"
Now the agent can call jira_create_issue as part of its response - classify the
ticket, draft the reply, and file it in Jira in one step. Any MCP-compatible server
works: Zendesk, Linear, PagerDuty, or your own internal API.
The pipeline stays the same. You just gave the agent a new tool.
What you end up with
| Metric | Value |
|---|---|
| Agents | 2 (classifier + responder) |
| Model | qwen2.5:7b (any model works) |
| Time per ticket | ~20 seconds |
| Tokens per ticket | 1,612 |
| Cost per ticket | $0.00 (local model) |
| Guardrails | Token budget, model allowlist, tool deny-all |
| Audit | Full token tracking per agent, per run |
All defined in YAML. All managed with kubectl. No custom orchestration code.
No vendor lock-in. No $400 surprise bills.
Cleanup
kubectl delete namespace support-triage
This removes all agents, teams, runs, and policies in the namespace.
Ready to try it? Grab the example files, check out the documentation for setup instructions, and the cookbook for more pipeline patterns.
kubeswarm is an open-source Kubernetes operator for managing AI agents. Check out the documentation or the cookbook for more patterns.