Signal 02 · Autonomous researchreport in preparation

A research synthesis,measured against the best.

Rhizome researches a question and writes up the result, then we compare it side by side with leading systems on a rubric fixed before the runs. We publish the prompt, the rubric, and every raw output — not just the verdict.

The problem

The first signal shows Rhizome can build. This one asks whether the same coordination generalizes past code — whether a field of agents can do open-ended research, and do it well enough to stand next to the strongest single systems.

The task: a structured synthesis of roughly 2,500–3,500 words on a deliberately hard question — how autonomous AI agents verify their own work: the 2026 landscape — error detection, fighting hallucinated progress, and gating real actions. It calls for synthesis, not fact-lookup: a taxonomy of approaches, a comparative analysis, the open gaps, and a list of real, verifiable sources.

What makes the signal trustworthy is the comparison protocol. The same prompt goes to Rhizome, LLM Council and ChatGPT Pro. The rubric is fixed before any run — factual accuracy, source quality and reality, originality, synthesis depth — and every raw output is published so the comparison is something you can audit, not a claim you have to accept.

The mechanism on display is self-verification: the system doesn't trust its own prose, it runs the result through independent checks — source existence and relevance, originality — and a dissent agent that keeps the field from collapsing into early consensus. To be explicit: the originality checks are a quality control, never a way to dodge AI detectors.

And the report will be honest about the result. Where Rhizome wins an axis we show it; where it loses one, we show that too. A partial, verifiable win is more convincing than a clean sweep no one can check.

Method

The run pipeline end to end: source search, multi-agent synthesis, a dissent agent pushing against premature consensus, self-verification (source existence, relevance, originality), then revision on the findings.

lands with the run

Baselines

The same prompt sent to Rhizome, LLM Council, and ChatGPT Pro. Every raw output published in full — so the comparison is something you can re-check, not a verdict you have to take on faith.

lands with the run

Rubric

The four axes fixed before any run: factual accuracy, source quality and reality (do the sources exist and are they relevant), originality, and synthesis depth — scored with the grading method stated openly.

lands with the run

Findings

The filled scorecard across all three systems, plus 1–2 documented episodes where dissent or self-verification caught a real weakness. Where Rhizome loses an axis, we show it — a partial win with transparency beats a clean sweep you can't trust.

lands with the run

Reproducibility

The published article itself, the prompt, the rubric, the grading method, the source list with verification status, and run metadata — everything needed to reproduce the comparison.

lands with the run