2026-05-10 LangGraph Multi-agent Agentic AI Groq Human-in-the-loop Finance AI ⏱ 6 min read

Why I Made My AI Agents Argue: Adversarial Multi-Agent Design with LangGraph

Most multi-agent tutorials show agents cooperating: one researches, one writes, one reviews. The outputs are additive. Everyone agrees by the end.

That works fine for content generation. It fails badly for decisions.

When I built the finance-ai-agent — an autonomous investment committee that analyses stocks — I needed agents that would genuinely challenge each other, not just summarise the same data from different angles. The result was an adversarial architecture: sequential conflict with forced cross-examination, orchestrated by a LangGraph state machine with a hard human-in-the-loop gate before any output is produced.

Here is how it works, and why it surfaces insights that cooperative agents routinely miss.

The Problem With Parallel Summarisation

The naive multi-agent approach for investment analysis looks like this:

Agent A reads the balance sheet and writes a bull case
Agent B reads the same data and writes a bear case
Agent C reads both summaries and produces a verdict

This is parallel summarisation with a merge step. It sounds adversarial but it is not. Each agent reasons from the same raw data in isolation. Agent A never has to defend its bull case against Agent B's strongest objection. Agent B never has to confront the specific evidence that undermines the bear thesis.

The result: both sides present their best case on paper, the judge picks the more convincing one, and the gap in the losing argument is never examined. In production, those unexamined gaps are exactly where the bad calls live.

The Adversarial Design Pattern

The architecture runs sequentially, not in parallel. The order matters:

Planner Agent — retrieves live data from three sources: yfinance (price, financials), SEC EDGAR (filings), Tavily (recent news). Builds a shared fact base all agents operate from.
Bull Analyst — reads the fact base and constructs the investment thesis.
Bear Analyst — reads the fact base and the Bull full argument. Must identify the single strongest objection to the bull case — not a general bear thesis.
Bull Rebuttal — reads the Bear specific objection and responds to it directly. Cannot introduce new evidence not in the original thesis.
Risk Auditor — independent of the debate. Scores five risk dimensions: valuation, liquidity, macro exposure, regulatory, execution. Returns a structured scorecard.
CIO Judge — reads the full debate transcript (Bull then Bear objection then Bull rebuttal) plus the risk scorecard. Delivers a verdict with position sizing, stop-loss level, and confidence score.

The key constraint is the Bear Analyst instruction: attack the Bull strongest claim, not the weakest one. Cherry-picking weak arguments is easy. Dismantling the core thesis is where the model has to actually think.

Why Sequential Order Matters

In the parallel version, the Bear writes independently. It attacks whatever the data suggests is most vulnerable — which is often a side point, not the load-bearing assumption of the bull case.

In the sequential version, the Bear has read the Bull actual argument. It knows which claim is central. It cannot ignore it. The forced cross-examination means the debate converges on the thing that actually matters for the investment decision.

Backtesting the verdicts against real 30-day price outcomes, the sequential adversarial model consistently surfaced valuation and liquidity risks that the parallel model summarised away. The difference was not in the raw data — both agents saw the same numbers. It was in the obligation to respond.

The LangGraph State Machine

LangGraph was the right tool here because the workflow is a conditional directed graph, not a chain. Each node is an agent. The edges carry the accumulated state — fact base, thesis, objection, rebuttal, risk scorecard — forward to the next node.

graph = StateGraph(AnalysisState)
graph.add_node("planner",    planner_agent)
graph.add_node("bull",       bull_analyst)
graph.add_node("bear",       bear_analyst)
graph.add_node("rebuttal",   bull_rebuttal)
graph.add_node("risk",       risk_auditor)
graph.add_node("cio",        cio_judge)
graph.add_node("human_gate", human_approval)

graph.set_entry_point("planner")
graph.add_edge("planner",  "bull")
graph.add_edge("bull",     "bear")
graph.add_edge("bear",     "rebuttal")
graph.add_edge("rebuttal", "risk")
graph.add_edge("risk",     "cio")
graph.add_edge("cio",      "human_gate")
graph.add_conditional_edges("human_gate", route_approval,
    {"approved": "pdf_report", "rejected": END})

The AnalysisState TypedDict accumulates every agent output. Each agent receives the full state and appends its contribution — no agent operates from a blank slate.

The Human-in-the-Loop Gate

After the CIO Judge delivers the verdict, the graph pauses at human_gate. No PDF report is generated without explicit user approval.

This is not just UX polish. It is an architectural commitment: autonomous agents should inform decisions, not make them unilaterally. The PDF report is the artefact that gets shared — potentially acted upon. Requiring human sign-off before it exists creates a clear accountability boundary between the AI recommendation and the human decision.

In the Streamlit interface, the gate surfaces as a dialogue: approve the verdict, reject it, or request a new analysis. The graph branches accordingly.

The Fallback Pattern: Groq to GPT-4o

The system runs on Groq Llama 3.3 by default — fast and cheap for the debate steps. If Groq rate-limits or errors under load, the orchestrator automatically retries the same prompt against GPT-4o.

The fallback is transparent to the user. The CIO verdict tags which model produced it, so the output is always traceable — important when scoring verdicts against real market outcomes.

When to Use This Pattern

The adversarial sequential pattern is overkill for most tasks. Use it when:

The decision has real consequences (financial, medical, legal, hiring)
Confirmation bias is a known failure mode — you want the system to try to break its own thesis
You need an audit trail showing the reasoning was challenged, not just produced
A cooperative agent would naturally converge on agreement because the data is ambiguous

For content generation, data extraction, or summarisation — use cooperative agents. Adversarial debate is slow, token-expensive, and unnecessary when there is no genuine tension in the problem.

What I Would Change

If I rebuilt this today, I would add a fourth debate turn: the Bear gets a final response to the Bull rebuttal. One rebuttal round surfaces the core tension; a second round reveals whether the Bull defence holds under pressure or just restates the original thesis with more confidence.

I would also score the debate quality separately from the verdict — a judge that rates how well each side engaged with the opposition argument, not just which thesis was more convincing on its own terms.

Stack: LangGraph, Groq Llama 3.3, OpenAI GPT-4o (fallback), yfinance, SEC EDGAR, Tavily, Streamlit, ReportLab. Source: github.com/iampriyabrat14/finance-ai-agent

💬 Comments

No comments yet — be the first!