April 23, 2026Tutorial7 min read

Run AI agents on Kubernetes: a support triage pipeline with budget guardrails

support-triagemulti-agentcost-control

TL;DR - Two AI agents classify support tickets and draft replies on Kubernetes. Runs on a local 7B model. Total cost per ticket: 1,612 tokens, $0.00 (compute costs for self-hosted inference still apply). A namespace policy blocks unauthorized models and tools at admission time. All YAML files are in the examples directory.

You already have ticket automation. Zendesk triggers, Intercom bots, maybe a custom rule engine your team built three years ago. It matches keywords, routes to queues, fires templates.

And it gets things wrong constantly.

A customer writes "I can't access my dashboard, we have a client demo in 2 hours" and your automation routes it to the billing queue because the word "account" appeared somewhere in the thread. The template replies with "we'll look into it within 24 hours." The customer's demo fails.

The problem isn't that you lack automation. It's that your automation can't read. It pattern-matches keywords. It doesn't understand that "demo in 2 hours" means this is critical, or that "blank white screen after login" is a bug, not a billing issue.

An LLM agent actually reads the ticket. It understands context, urgency, and intent. But running LLM agents in production creates new problems: cost control, model governance, audit trails, tool restrictions. That's the gap kubeswarm fills.

	Rule-based automation	kubeswarm pipeline
Classification	Keyword matching	Reads and understands context
Priority	Static rules	Infers urgency from content
Responses	Templates	Contextual, specific to the issue
Changing logic	Edit flowcharts in admin UI	Change the prompt
Cost control	N/A	Token budgets enforced per namespace
Audit	Logs which rule fired	Tracks every token, every agent, every run

Prerequisites

You need a Kubernetes cluster with kubeswarm installed. See the installation guide for setup.

Two agents, one pipeline, zero API cost

First, create the namespace and a Secret with your LLM endpoint:

kubectl create namespace support-triage
kubectl create secret generic ollama-credentials \
  -n support-triage \
  --from-literal=LLM_ENDPOINT=http://ollama.ollama.svc:11434

The classifier reads the ticket and outputs a structured JSON with category, priority, and summary.

# classifier-agent.yaml
spec:
  model: qwen2.5:7b
  prompt:
    inline: |
      You are a support ticket classifier. For every ticket you receive,
      respond with ONLY a JSON object:

      {
        "category": "<billing|technical|bug|feature-request>",
        "priority": "<low|medium|high|critical>",
        "summary": "<one sentence summary>"
      }
  guardrails:
    limits:
      tokensPerCall: 500
      timeoutSeconds: 30

500 tokens. 30 second timeout. That's all a classifier needs.

The responder takes that classification and drafts an empathetic customer reply.

# responder-agent.yaml
spec:
  model: qwen2.5:7b
  prompt:
    inline: |
      You are a customer support agent. You receive a ticket with its
      classification. Write a helpful, empathetic reply under 150 words.
  guardrails:
    limits:
      tokensPerCall: 2000
      timeoutSeconds: 60

Both agents reference the Secret you created earlier via infrastructure.envFrom - see the full YAML files for the complete setup.

kubectl apply -f classifier-agent.yaml
kubectl apply -f responder-agent.yaml
kubectl get swarmagents -n support-triage

NAME                MODEL        REPLICAS   READY   AGE
ticket-classifier   qwen2.5:7b   1          1       5s
ticket-responder    qwen2.5:7b   1          1       5s

Two agents. Ready in seconds.

Building a multi-agent pipeline with SwarmTeam

Agents alone are just pods. A SwarmTeam wires them into a pipeline where one agent's output feeds into the next.

# support-team.yaml
spec:
  roles:
    - name: classifier
      swarmAgent: ticket-classifier
    - name: responder
      swarmAgent: ticket-responder
  pipeline:
    - role: classifier
      inputs:
        ticket: "{{ .input.ticket }}"
    - role: responder
      dependsOn: [classifier]
      inputs:
        classification: "{{ .steps.classifier.output }}"
        ticket: "{{ .input.ticket }}"

Think of it like a Makefile target with dependencies. The responder declares dependsOn: [classifier], so it waits for the classifier to finish. The {{ .steps.classifier.output }} template is how data flows between steps.

Running a real support ticket through the pipeline

Here's a real ticket. Sarah's team can't access their dashboard, client demo in two hours.

# sample-run.yaml
spec:
  teamRef: support-triage
  input:
    ticket: |
      Subject: Can't access my dashboard since this morning

      I've been using your platform for 3 months and everything was fine
      until today. When I try to log in, I get a blank white screen after
      entering my credentials. Tried Chrome, Firefox, incognito. Nothing.

      This is blocking my entire team - we have a demo with a client
      in 2 hours and need access to our analytics dashboard urgently.

      Sarah Chen, Acme Corp - Enterprise Plan

kubectl apply -f sample-run.yaml
kubectl get swarmrun ticket-001 -n support-triage -w

NAME         PHASE      AGE
ticket-001   Pending    0s
ticket-001   Running    2s
ticket-001   Succeeded  20s

The classifier returns:

{
  "category": "technical",
  "priority": "high",
  "summary": "Blank white screen on dashboard login from Chrome and Firefox"
}

The responder drafts:

Hi Sarah,
Thank you for reaching out and I'm sorry to hear about the issue on your dashboard. We've marked this as a high-priority task.
Could you please provide us with:
Your exact version of Chrome/Firefox
Any error messages shown (if any)
A screenshot if possible
This will help us diagnose the problem faster. Rest assured, we are escalating this and will keep you updated.
Support Team

Category: correct. Priority: correct. Response: empathetic and actionable. From a 7B model running locally. Total cost: 1,612 tokens, $0.00.

AI agent guardrails: budget limits and model policies on Kubernetes

Your triage pipeline works. You deploy it for the team. Then three things happen:

Someone deploys a "helper" agent using GPT-4 that burns through $200 in a day
Another agent gets access to a shell tool and starts executing commands from ticket text
Nobody knows which agent processed which ticket, or how many tokens it cost

This is where kubeswarm is different from a Python script. A SwarmPolicy enforces rules at the namespace level - before the pod even starts:

# budget-policy.yaml
spec:
  enforcementMode: Enforce
  limits:
    maxDailyTokens: 500000
    maxTokensPerCall: 2000
    maxTimeoutSeconds: 120
  tools:
    deny:
      - "*"
  models:
    allowed:
      - "qwen*"
      - "gpt-oss-*"

Rule	What it does
`maxDailyTokens: 500000`	Hard ceiling on daily spend per namespace
`maxTokensPerCall: 2000`	No single call burns more than 2K tokens
`tools.deny: ["*"]`	No tools at all - these agents read and write text, nothing else
`models.allowed`	Only approved models can be deployed

Try to sneak in a different model:

Error from server (Forbidden): model "gpt-4o" is not allowed
by SwarmPolicy "support-budget" (allowed: qwen*, gpt-oss-*)

Rejected at the API level. Not at runtime. Not after it already spent your money. At admission time. The pod never starts.

Connecting agents to Jira, Zendesk, or Slack via MCP

In production the agent should create the ticket itself, not just draft a reply. That's what MCP tools are for. Add an MCP server to the responder agent:

# give the responder access to your ticketing system
spec:
  tools:
    mcp:
      - name: jira
        url: "http://jira-mcp-server.support-triage.svc:8080/sse"

You'll also need to update the SwarmPolicy - the policy we created earlier denies all tools with tools.deny: ["*"]. Narrow the deny list to only block what you don't want:

# updated policy - block shell and filesystem tools, allow everything else
spec:
  tools:
    deny:
      - "shell/*"
      - "filesystem/*"

Now the agent can call jira_create_issue as part of its response - classify the ticket, draft the reply, and file it in Jira in one step. Any MCP-compatible server works: Zendesk, Linear, PagerDuty, or your own internal API.

The pipeline stays the same. You just gave the agent a new tool.

What you end up with

Metric	Value
Agents	2 (classifier + responder)
Model	qwen2.5:7b (any model works)
Time per ticket	~20 seconds
Tokens per ticket	1,612
Cost per ticket	$0.00 (local model)
Guardrails	Token budget, model allowlist, tool deny-all
Audit	Full token tracking per agent, per run

All defined in YAML. All managed with kubectl. No custom orchestration code. No vendor lock-in. No $400 surprise bills.

Cleanup

kubectl delete namespace support-triage

This removes all agents, teams, runs, and policies in the namespace.

Ready to try it? Grab the example files, check out the documentation for setup instructions, and the cookbook for more pipeline patterns.

kubeswarm is an open-source Kubernetes operator for managing AI agents. Check out the documentation or the cookbook for more patterns.

Was this useful?