Benchmarked Graph-RAG vs. Graph-Free Multi-Hop RAG: The graph mostly bought us a massive rebuild bill, not accuracy.

We kept hitting the same wall building multi-hop RAG: the systems with the best accuracy (GraphRAG, HippoRAG 2, RAPTOR) all lean on a knowledge graph built offline - and that’s great numbers, until the moment your data changes! Every update means re-running an LLM indexing pass to rebuild the graph. For a corpus that moves daily (prices, filings, tickets, news), you're paying that rebuild cost constantly.

So we tested whether the graph is actually necessary. We ran a graph-free dense index with query-time orchestration instead (with no graph, no GPU), every component behind a commodity API — against the graph-based systems on HotpotQA, 2WikiMultiHopQA, and MuSiQue.

Against the graph systems, it won on all three benchmarks:

Benchmark MOTHRAG (ours) GraphRAG HippoRAG 2 RAPTOR HotpotQA 78.1 68.6 75.5 69.5 2WikiMultiHop 76.3 58.6 71.0 52.1 MuSiQue 50.5 38.5 48.6 28.9 And updates are just embed-and-append, with no need in rebuild, and retraining. Cost is ~$0.03/query on commodity APIs, no GPU anywhere.

Against GPU-bound systems that use constrained decoding (NeocorRAG), it's not a clean win. We match them on HotpotQA (78.1 vs 78.3) and 2Wiki (76.3 vs 76.1), but we lose on MuSiQue (50.5 vs 52.6). MuSiQue is our weak spot (retrieval recall bottlenecks there), and we haven't solved it yet.

The takeaway for us: for multi-hop over changing data, the graph overhead mostly buys you a rebuild bill, not accuracy. A graph-free index with good query-time orchestration held up.

Curious where others landed on this, is the graph worth the rebuild cost for data that changes?

主题标签限时活动

原始关键词#benchmarked#accuracy#massive#rebuild#bought#mostly

查看原文reddit.com

单一来源，暂无交叉验证