Pipelines are straight lines. Your agents need to think in trees.
TL;DR - SwarmTeam gains a fourth orchestration mode:
search. A planner agent explores multiple approaches in parallel, an evaluator scores each one, and weak branches get pruned. The search converges when a solution scores high enough or the tree hits its budget. BFS and BeamSearch strategies, fully observable viakubectl describe, OTel metrics, and audit trail.
"Find the root cause of this production outage."
Hand that to a pipeline and here's what happens: step 1 checks logs, step 2 reads metrics, step 3 writes a report. One linear path. If the first hypothesis is wrong, the pipeline doesn't backtrack. It writes a confident report about the wrong thing.
Hand it to a human SRE and they do something completely different. They generate three hypotheses. Test the most likely one first. When it doesn't match the evidence, they drop it and go deeper on the second. They prune dead ends. They converge on the answer.
That's a tree search. And until now, you couldn't express it in Kubernetes.
The problem with "just use a pipeline"
kubeswarm already has three orchestration modes for SwarmTeam: pipeline (DAG), routed (LLM dispatch), and dynamic (agents self-organize). They cover most patterns. But they share a limitation.
| Mode | Execution path | Backtracking | Scoring |
|---|---|---|---|
| Pipeline | Fixed at YAML time | No | No |
| Routed | Single dispatch decision | No | No |
| Dynamic | Unstructured delegation | Ad-hoc | No |
| Search | Expands at runtime | Yes (pruning) | Yes (evaluator) |
Pipelines are predetermined. Routed mode makes one decision. Dynamic mode has no structure to guide exploration. None of them lets you say "explore three approaches, score each one, drop the worst, go deeper on the best."
Search mode: structured exploration
Search adds a fourth mode to SwarmTeam. You define three roles:
- Planner - decides what to explore next, what to prune, when to stop
- Executor - does the actual work on each branch
- Evaluator (optional) - scores executor output on a 0-1000 scale
The planner sees the current tree state and outputs structured actions: expand (create branches), prune (kill dead ends), converge (declare a winner). The operator applies these atomically, dispatches executors in parallel, collects scores, and loops back to the planner.
Here's the simplest version - a brainstorming search where the planner self-scores:
# brainstorm-team.yaml
spec:
roles:
- name: planner
model: claude-sonnet-4-20250514
prompt:
inline: |
You receive the current search tree as JSON. Decide what to explore next.
Respond with a JSON array of actions:
[{"action": "expand", "parentNode": 0, "task": "...", "scoreMillis": 700}]
- name: worker
model: claude-sonnet-4-20250514
prompt:
inline: "Execute the given task thoroughly."
search:
strategy: BFS
plannerRole: planner
executorRole: worker
initialPrompt: "{{ .input.problem }}"
maxDepth: 3
maxNodes: 15
minScorePercent: 85
Two roles, twelve lines of search config. The planner explores breadth-first, scoring its own branches. When something scores above 85%, the search stops.
BeamSearch: keep the best, prune the rest
BFS explores everything at each level. For bigger problems, that's expensive. BeamSearch keeps only the top K branches per depth level and prunes the rest.
This is where the evaluator role earns its keep. A separate model scores each branch objectively, so beam pruning decisions are based on independent assessment rather than the planner grading its own work.
# root-cause-analyzer.yaml
spec:
roles:
- name: investigator
model: claude-sonnet-4-20250514
- name: tester
model: claude-sonnet-4-20250514
- name: judge
model: claude-haiku-4-5-20251001 # cheap model for scoring
search:
strategy: BeamSearch
plannerRole: investigator
executorRole: tester
evaluatorRole: judge
initialPrompt: "{{ .input.incident }}"
beamWidth: 3
minScorePercent: 85
maxDepth: 5
maxNodes: 30
maxParallel: 3
The investigator generates hypotheses. The tester checks each one against evidence. The judge scores how well the evidence supports the hypothesis. After each depth level, only the top 3 hypotheses survive. The rest get pruned.
Cost optimization is built in. The investigator (expensive reasoning model) runs once per iteration. The judge (cheap fast model) runs once per node. Most of the compute goes to testers running in parallel. You can easily spend 10x less on scoring than on execution.
The evaluator's JSON contract
The evaluator returns structured scores. No prompt engineering gymnastics - just a JSON object:
{
"scoreMillis": 720,
"reasoning": "Correct approach but missing edge case handling",
"shouldPrune": false,
"metadata": {"evidence_strength": "moderate"}
}
scoreMillis is 0-1000 (milli-units, not milliseconds). shouldPrune lets the evaluator flag dead ends without waiting for the planner's next iteration. metadata carries domain-specific signals the planner can reason about.
If the evaluator returns garbage JSON, the operator retries up to maxEvaluatorRetries times, then marks the node EvalFailed. The planner sees the failure and decides whether to retry or move on. No silent data loss.
Five ways a search terminates
Every search is bounded. There's no way to create a runaway search.
search:
minScorePercent: 85 # a node scores above 85% -> converge
maxDepth: 5 # tree gets 5 levels deep -> stop
maxNodes: 30 # 30 nodes created -> stop
maxIterations: 10 # planner invoked 10 times -> stop
Plus: the planner can explicitly converge at any time, and SwarmBudget hard-stops when the token budget runs out. First condition hit wins.
When the search terminates, the highest-scoring node's output becomes the SwarmRun output. If nothing scored above minScorePercent, the run fails with SearchExhausted. No ambiguity about what happened or why.
The tree is in your status
kubectl describe shows the full tree. Every node, every score, every pruning decision.
kubectl get swrun rca-demo -o jsonpath='{.status.searchTree.nodes}' | \
jq '.[] | {id, depth, phase, scoreMillis, task}'
{"id": 0, "depth": 0, "phase": "Scored", "scoreMillis": null, "task": "Investigate latency spike"}
{"id": 1, "depth": 1, "phase": "Scored", "scoreMillis": 720, "task": "Check database connections"}
{"id": 2, "depth": 1, "phase": "Pruned", "scoreMillis": 310, "task": "Check network partitions"}
{"id": 3, "depth": 2, "phase": "Solution", "scoreMillis": 920, "task": "Profile connection pool leaks"}
Node 2 was pruned (network theory disproved). Node 3 scored 920 and was selected as the solution. The planner explored, the evaluator scored, the operator pruned. All of it auditable.
OTel metrics track the search in real time: kubeswarm.search.best_score shows convergence progress, kubeswarm.search.nodes.pruned counts dead ends, and kubeswarm.search.stagnation_iterations warns you when the search is stuck.
When to use search vs pipeline
Use search when the problem has multiple valid approaches and you need to systematically find the best one. Root cause analysis, code generation with test validation, adversarial testing, prompt optimization.
Use pipeline when the steps are predetermined. Research -> fact-check -> write. ETL. Anything where the path is known upfront.
Use dynamic when agents should self-organize without scoring. Collaborative brainstorming where there's no "best answer" - just emergent output.
Search mode is not a replacement for pipelines. It's for the class of problems where the first answer is probably wrong and the value is in the exploration.
Try it
The cookbook has two examples: a minimal BFS brainstorm (recipe 18) and a full BeamSearch root cause analyzer (recipe 19). Both work with Ollama or any OpenAI-compatible API.
kubectl apply -f 18-search-brainstorm/team.yaml
kubectl get swrun brainstorm-demo -w
Watch the tree grow. Watch branches get pruned. Watch the search converge on an answer that a pipeline would never have found.