DOSSIER · SYS.MARP / 01 · operational
> source: github.com/advay-sinha/Multiagentic-Research-PlatformMulti-Agentic Research Platform
Answers research questions only with claims it can verify — every stage traced, every citation grounded.
operational overview
MARP exists because a single LLM call cannot be audited: it answers, but it cannot show its work. The platform decomposes research into five specialized agents in sequence, iterates until the answer meets a confidence threshold or hits the iteration cap, and returns every step traced and timed for inspection.
architecture
The Planner turns the question into a structured retrieval plan (typed PlanStep objects: sub-question plus search query). The Retriever runs cosine-similarity search against PostgreSQL with pgvector — embeddings generated through Gemini's embedContent API — and returns ranked chunks with source metadata and similarity scores. The Writer drafts from evidence, the Critic challenges the draft, and the Verifier checks claims before release; the Critic→Writer loop repeats until confidence clears the bar. Every agent emits typed trace events.
constraints
- evidence grounding — no claim ships without a retrieval trail behind it
- bounded iteration — the critique loop must converge or stop at a hard cap, never spin
- LLM output fragility — structured JSON from a model cannot be assumed valid
tradeoffs
- five single-responsibility agents over one omnibus prompt: more inference calls per question, but each stage emits typed traces and can be replaced without retraining the others
- loop-until-confident over single-pass answers: response latency deliberately spent on claim-level verification, bounded by a hard iteration cap so the spend cannot run away
- pgvector inside Postgres over a managed vector service: one database, one operational surface, one failure domain to observe
failure notes
- the Planner's JSON parsing can fail on malformed LLM output — it degrades to treating the raw output as a single search query rather than aborting the run
- the Retriever returns an empty list gracefully when the vector store has nothing — downstream stages handle absence of evidence as a first-class state
- the Retriever currently executes only the first PlanStep of a multi-step plan — a known limit, preserved in the trace rather than papered over
infrastructure
python · postgres + pgvector · gemini embeddings · typescript · docker
engineering reasoning
The interesting problem was never the model — it was how intelligence behaves under constraints: what pipeline shape makes an LLM's answer auditable instead of plausible. Single-responsibility stages with typed contracts and traces are boring, and boring is what can be debugged.
future work
- execute the full retrieval plan, not just its first step
- confidence calibration against held-out questions