← Back to blog

Pipelines are straight lines. Your agents need to think in trees.

orchestrationagentsdeep-divearchitecture

TL;DR - SwarmTeam gains a fourth orchestration mode: search. A planner agent explores multiple approaches in parallel, an evaluator scores each one, and weak branches get pruned. The search converges when a solution scores high enough or the tree hits its budget. BFS and BeamSearch strategies, fully observable via kubectl describe, OTel metrics, and audit trail.


"Find the root cause of this production outage."

Hand that to a pipeline and here's what happens: step 1 checks logs, step 2 reads metrics, step 3 writes a report. One linear path. If the first hypothesis is wrong, the pipeline doesn't backtrack. It writes a confident report about the wrong thing.

Hand it to a human SRE and they do something completely different. They generate three hypotheses. Test the most likely one first. When it doesn't match the evidence, they drop it and go deeper on the second. They prune dead ends. They converge on the answer.

That's a tree search. And until now, you couldn't express it in Kubernetes.

The problem with "just use a pipeline"

kubeswarm already has three orchestration modes for SwarmTeam: pipeline (DAG), routed (LLM dispatch), and dynamic (agents self-organize). They cover most patterns. But they share a limitation.

ModeExecution pathBacktrackingScoring
PipelineFixed at YAML timeNoNo
RoutedSingle dispatch decisionNoNo
DynamicUnstructured delegationAd-hocNo
SearchExpands at runtimeYes (pruning)Yes (evaluator)

Pipelines are predetermined. Routed mode makes one decision. Dynamic mode has no structure to guide exploration. None of them lets you say "explore three approaches, score each one, drop the worst, go deeper on the best."

Search mode: structured exploration

Search adds a fourth mode to SwarmTeam. You define three roles:

  • Planner - decides what to explore next, what to prune, when to stop
  • Executor - does the actual work on each branch
  • Evaluator (optional) - scores executor output on a 0-1000 scale

The planner sees the current tree state and outputs structured actions: expand (create branches), prune (kill dead ends), converge (declare a winner). The operator applies these atomically, dispatches executors in parallel, collects scores, and loops back to the planner.

Here's the simplest version - a brainstorming search where the planner self-scores:

# brainstorm-team.yaml
spec:
  roles:
    - name: planner
      model: claude-sonnet-4-20250514
      prompt:
        inline: |
          You receive the current search tree as JSON. Decide what to explore next.
          Respond with a JSON array of actions:
          [{"action": "expand", "parentNode": 0, "task": "...", "scoreMillis": 700}]
    - name: worker
      model: claude-sonnet-4-20250514
      prompt:
        inline: "Execute the given task thoroughly."

  search:
    strategy: BFS
    plannerRole: planner
    executorRole: worker
    initialPrompt: "{{ .input.problem }}"
    maxDepth: 3
    maxNodes: 15
    minScorePercent: 85

Two roles, twelve lines of search config. The planner explores breadth-first, scoring its own branches. When something scores above 85%, the search stops.

BeamSearch: keep the best, prune the rest

BFS explores everything at each level. For bigger problems, that's expensive. BeamSearch keeps only the top K branches per depth level and prunes the rest.

This is where the evaluator role earns its keep. A separate model scores each branch objectively, so beam pruning decisions are based on independent assessment rather than the planner grading its own work.

# root-cause-analyzer.yaml
spec:
  roles:
    - name: investigator
      model: claude-sonnet-4-20250514
    - name: tester
      model: claude-sonnet-4-20250514
    - name: judge
      model: claude-haiku-4-5-20251001   # cheap model for scoring

  search:
    strategy: BeamSearch
    plannerRole: investigator
    executorRole: tester
    evaluatorRole: judge
    initialPrompt: "{{ .input.incident }}"
    beamWidth: 3
    minScorePercent: 85
    maxDepth: 5
    maxNodes: 30
    maxParallel: 3

The investigator generates hypotheses. The tester checks each one against evidence. The judge scores how well the evidence supports the hypothesis. After each depth level, only the top 3 hypotheses survive. The rest get pruned.

Cost optimization is built in. The investigator (expensive reasoning model) runs once per iteration. The judge (cheap fast model) runs once per node. Most of the compute goes to testers running in parallel. You can easily spend 10x less on scoring than on execution.

The evaluator's JSON contract

The evaluator returns structured scores. No prompt engineering gymnastics - just a JSON object:

{
  "scoreMillis": 720,
  "reasoning": "Correct approach but missing edge case handling",
  "shouldPrune": false,
  "metadata": {"evidence_strength": "moderate"}
}

scoreMillis is 0-1000 (milli-units, not milliseconds). shouldPrune lets the evaluator flag dead ends without waiting for the planner's next iteration. metadata carries domain-specific signals the planner can reason about.

If the evaluator returns garbage JSON, the operator retries up to maxEvaluatorRetries times, then marks the node EvalFailed. The planner sees the failure and decides whether to retry or move on. No silent data loss.

Five ways a search terminates

Every search is bounded. There's no way to create a runaway search.

search:
  minScorePercent: 85     # a node scores above 85% -> converge
  maxDepth: 5             # tree gets 5 levels deep -> stop
  maxNodes: 30            # 30 nodes created -> stop
  maxIterations: 10       # planner invoked 10 times -> stop

Plus: the planner can explicitly converge at any time, and SwarmBudget hard-stops when the token budget runs out. First condition hit wins.

When the search terminates, the highest-scoring node's output becomes the SwarmRun output. If nothing scored above minScorePercent, the run fails with SearchExhausted. No ambiguity about what happened or why.

The tree is in your status

kubectl describe shows the full tree. Every node, every score, every pruning decision.

kubectl get swrun rca-demo -o jsonpath='{.status.searchTree.nodes}' | \
  jq '.[] | {id, depth, phase, scoreMillis, task}'
{"id": 0, "depth": 0, "phase": "Scored", "scoreMillis": null, "task": "Investigate latency spike"}
{"id": 1, "depth": 1, "phase": "Scored", "scoreMillis": 720, "task": "Check database connections"}
{"id": 2, "depth": 1, "phase": "Pruned", "scoreMillis": 310, "task": "Check network partitions"}
{"id": 3, "depth": 2, "phase": "Solution", "scoreMillis": 920, "task": "Profile connection pool leaks"}

Node 2 was pruned (network theory disproved). Node 3 scored 920 and was selected as the solution. The planner explored, the evaluator scored, the operator pruned. All of it auditable.

OTel metrics track the search in real time: kubeswarm.search.best_score shows convergence progress, kubeswarm.search.nodes.pruned counts dead ends, and kubeswarm.search.stagnation_iterations warns you when the search is stuck.

When to use search vs pipeline

Use search when the problem has multiple valid approaches and you need to systematically find the best one. Root cause analysis, code generation with test validation, adversarial testing, prompt optimization.

Use pipeline when the steps are predetermined. Research -> fact-check -> write. ETL. Anything where the path is known upfront.

Use dynamic when agents should self-organize without scoring. Collaborative brainstorming where there's no "best answer" - just emergent output.

Search mode is not a replacement for pipelines. It's for the class of problems where the first answer is probably wrong and the value is in the exploration.

Try it

The cookbook has two examples: a minimal BFS brainstorm (recipe 18) and a full BeamSearch root cause analyzer (recipe 19). Both work with Ollama or any OpenAI-compatible API.

kubectl apply -f 18-search-brainstorm/team.yaml
kubectl get swrun brainstorm-demo -w

Watch the tree grow. Watch branches get pruned. Watch the search converge on an answer that a pipeline would never have found.

Was this useful?