One Search, Both Halves

Fusing three indexes to search any codebase

The Problem

Every code-search tool is half-blind

grep finds exact text but not meaning. Embeddings find meaning but miss exact symbols. Nothing did both at enterprise scale.

OCTO Platform is a multi-index code-search engine (ramparte/OCTOPlatform) built to do both at once.

So does fusing indexes actually work? The eval says yes.

grep exact symbols — blind to meaning
embeddings meaning — blind to exact symbols
OCTO fuses code vectors + keyword + LLM summaries

The Proof

The fused approach hits 88.0% overall

On a 542-question golden set run against the real Grafana codebase, OCTO's best config (summary embeddings + cross-encoder) landed 477 of 542 hits.

A big number only counts if the test behind it is honest.

88.0%

overall hit rate (477/542)

0.653

mean MRR

0.880

recall@10

542

questions in the golden set

Honest Eval

The 88% isn't inflated by missing files

542 real questions — 116 function + 384 class lookups + 42 comprehension — with 100% index coverage: every expected file was indexed, 0 missing across 11,196 files.

But one overall number can still hide where a tool goes blind.

100%

index coverage — 542/542 files indexed

0

questions with a missing file

11,196

unique files in the index

500 / 42

retrieval / comprehension split

The Real Test

One number hides where a tool goes blind

A tool can ace exact-symbol lookups and flunk natural-language questions — or the reverse. Most tools pick a side. Winning both ends is the thing that actually matters.

OCTO's answer: route each query to the index that can see it.

How — Routing

A deterministic 5-pass heuristic routes each query

No LLM classifies the query. extract_symbol() runs five passes; symbol lookups get hybrid RRF fusion of text + code vectors, everything else gets semantic search plus keyword re-rank.

Routing covers exact queries — but the semantic half still needed a lift.

1Quoted symbolsfindall-based extraction from the query
2Keyword prefixes"function", "class" and similar cues
3"Where is X"locate-style natural phrasing
4"implementation of X"find-implementation phrasing
5Unquoted CamelCasefallback symbol detection

How — Comprehension

LLM summaries were the biggest comprehension lever

Summary embeddings alone lifted natural-language accuracy from a 61.9% baseline to 76.2%. The eval stayed honest — it rejected ideas that hurt.

Together, routing plus summaries let one system win both halves.

+Summary search: 61.9% → 76.2% comprehension (+5 hits)
−4Query expansion: hurt comprehension
−3Chunk enrichment: hurt comprehension
≈Term re-rank & wider window: neutral

The Both-Ends Win

One system wins both halves at once

87.1% function-lookup and 89.1% class-lookup on exact symbols, AND 81.0% on natural-language comprehension — 88.0% overall. Something grep and embeddings couldn't do apart.

That's the pattern worth keeping.

Exact symbols

87.1%

function lookup (101/116)

89.1%

class lookup (342/384)

Natural language

81.0%

comprehension (34/42)

88.0%

overall (477/542)

The Pattern

Fuse indexes, route deterministically, prove it honestly

Fuse multiple indexes, route each query deterministically, and prove it on an honest eval that reports what hurt — not just what won. That's how you search any codebase without picking a side.

One search. Both halves.

Sources

Research Methodology

Data as of: April 21, 2026 · Working / Eval-verified · summary+cross-encoder not yet wired into production orchestrator

Primary sources: ramparte/OCTOPlatform (checked out at ~/dev/ANext/OCTOPlatform) and eval artifacts under ~/dev/ANext/eval-repos/grafana-eval-results. Every metric re-derived independently from command output.

Commands run:

Headline metrics: python3 -c "import json;d=json.load(open('eval_v2_coe_combined.json'));print(d['primary_full_set']['aggregate'])" → 477/542 = 88.0%, MRR 0.653, r@10 0.880
Type & tier breakdown from eval_v2_coe_combined.json → function 101/116, class 342/384, comprehension 34/42
Baseline vs. summary lever: eval_v2_full_baseline.json (26/42) vs eval_v2_coe_summary_search.json (32/42 = 76.2%)
Golden set counts: python3 -c "import json,collections;d=json.load(open('golden_combined.json'));print(len(d['questions']),...)" → 542 = 384 class + 116 function + 42 comprehension
Index coverage: index_coverage in eval JSON → 11,196 unique files, 0 missing, 100.0%
Routing & rerank code: grep -n 'def extract_symbol|def rrf_fuse|def keyword_rerank' qdrant_eval_adapter.py (:199 / :378 / :593)
Feature flags (what hurt): sed -n on .amplifier/eval-status-2026-04-21.md → query expansion −4, chunk enrichment −3, both neutral
History & author: git log --format='%ci %h %s'; git shortlog -sne → 37 commits, Sam Schillace

Gaps & caveats: Only tested on Grafana (~11K files) — 100K+ scale is estimated, not measured. Summary embeddings run as an 11-file proof-of-concept, not yet wired into the production orchestrator. Session-volume figures from OCTO-PLATFORM-ASSESSMENT.md are doc-reported, not git-reproduced.

Primary contributor: Sam Schillace — 37 of 37 commits (100%).