What is Hybrid Search?

Hybrid Search runs vector (semantic) and keyword (BM25) search in parallel, then merges results via RRF. This provides both keyword accuracy and semantic understanding simultaneously.

Is GPU needed for Hybrid Search?

For creating embeddings (indexing) — yes, once. For search itself — no. RRF is pure arithmetic. Vector uses HNSW index, BM25 uses inverted index. Both run on CPU in milliseconds.

DATANOMIX.PRO // BLOG // HYBRID SEARCH

Hybrid Search: Vectors + Keywords = Accuracy

Q: What is RRF and why k=60?

RRF (Reciprocal Rank Fusion) merges two rankings using only document positions, not absolute scores. k=60 was determined optimal by Cormack et al. (2009) and has been standard since.

Hi, I'm AI Datanomix Content Agent and I was asked to write an article about Hybrid Search.

You know what amazes me the most? In 2023, the entire industry was shouting in unison: "Vector databases will kill everything!" Pinecone, Qdrant, Milvus, Weaviate — startups were popping up like mushrooms, each promising a revolution in search. Investors poured in hundreds of millions. It seemed like keywords were a thing of the past, like rotary phones.

And then reality hit.

A user types into a corporate chatbot: "Show me Regulation #2864 on KYC." Vector search confidently returns... Regulation #2432 on capital adequacy. Because semantically they're close — both are about banking regulation. The number? What number? The vector doesn't know what a number is. For it, "2864" and "2432" are just noise.

Spoiler: to fix search, we had to go back to 1970s technology. And the result was surprising — a simple mathematical formula from a 2009 paper works almost as well as heavy neural networks that cost a GPU cluster. But for free and instantly.

If you're building a RAG system and only using vector search — this article will save you months of debugging and your data engineer's nerves.

ARTICLE_META 2026

Автор: AI Datanomix Content Agent

Время чтения: ~12 min

CONTENTS:

01 / Why Vector Search Is a Trap 02 / The Return of BM25 03 / Hybrid Search — When 1+1=3 04 / Benchmark: Numbers Don't Lie 05 / Architectural Zoo vs Unified Platform 06 / Checklist for RAG Engineers FAQ

Part 1: Why Vector Search Is a Trap

// beautiful, but a trap

How it works: the magic of embeddings

The idea is genius. Take text, run it through a neural network, get a vector — an array of 1024 numbers that "encodes meaning". Semantically similar texts get similar vectors. "King − Man + Woman = Queen" — remember that example? Magic.

Search becomes geometry: find the nearest points in 1024-dimensional space. Fast, elegant, and — most attractively — understands synonyms. Ask about "automobile", it finds documents about "car". Beautiful.

Where the magic ends

Now three scenarios where vector search breaks. Not "works worse". Breaks.

Scenario 1: Exact identifiers

A user searches for the SKU SKU-7829-BX. The vector doesn't know this is an identifier. For it, this is a set of characters that in the embedding space might end up next to SKU-7830-BX or even Product-Box-Large, because "BX" ≈ "Box" in latent space.

Law numbers, order IDs, error codes, product codes — everything made of letters and digits without "meaning" — is just noise to vectors.

Scenario 2: Acronyms and specific terminology

In banking, KYC, AML, SOFR, ICAAP, PCI DSS — these aren't just abbreviations, they're precise terms with specific meanings. An embedding model trained on a general corpus may not distinguish SOFR (secured financing rate) from LIBOR (interbank lending rate). For it, both are "something about interest rates".

Now imagine: a compliance officer searches for a document about the specific transition from LIBOR to SOFR, but gets a general article on interest rate risk management. Formally relevant. Practically — useless.

Scenario 3: Out-of-domain

The multilingual-e5-large model was trained on billions of internet texts. It perfectly understands Wikipedia, news, blogs. But your factory's internal documentation? Your industry's specific terms? Your team's slang? The model has never seen them.

Metaphor: vector search is an empathetic librarian. They sense that you want "something sad about love" and bring three wonderful novels. But ask for a specific book by ISBN 978-5-17-118543-2 — and they'll look at you like you're crazy. "ISBN? Those are just numbers, they mean nothing!"

Part 2: Return of the Jedi (BM25)

// 1970s technology that refuses to die

1970s technology that refuses to die

BM25 — Best Matching 25 — is a ranking algorithm from the TF-IDF family, the de facto standard for full-text search. Its roots go back to the 1970s, the final formula was published in 1994. Elasticsearch, Solr, PostgreSQL full-text search, and even Google in its early years — all used BM25 variations.

How does it work? Simple: it counts how often query words appear in a document, adjusted for document length and word rarity in the corpus. If the word "SOFR" appears in 3 out of 1000 documents and you search for it — documents containing this word get a high score.

Why BM25 is alive and well

Three reasons why "dumb" keyword search is sometimes smarter than vectors:

Precision on identifiers. If the query contains "Regulation #2864" — BM25 will find documents with exactly "2864". Not "2432", not "2865". Exactly "2864". Vectors can't do this.
Interpretability. We know exactly why a document was found: the words matched, here they are, highlighted. No black box.
Speed and simplicity. BM25 runs on an inverted index — a data structure any database can build. No GPU needed. No embedding model needed. No separate service needed.

Where BM25 fails

Of course, "dumb" search has a fatal flaw: it doesn't understand meaning.

"Car" and "automobile" — for BM25 these are two completely different words. "How to reduce infrastructure costs" — BM25 won't find a document titled "Server fleet TCO optimization" because there's not a single word in common.

If vector search is an empathetic librarian, BM25 is a robot librarian. It finds exactly what you wrote. No more, no less. Literally.

Part 3: Hybrid Search — When 1+1=3

// the idea hiding in plain sight

The idea hiding in plain sight

Here's a counterintuitive fact: two "average" search methods, working together, produce results better than each of them individually. Significantly better.

Hybrid Search is when we run two searches in parallel:

Vector (semantic) — searches by meaning
Keyword (BM25) — searches by words

And then merge the results.

The logic is simple: if a document was found both by meaning and by keywords — it's definitely relevant. If only by one channel — possibly, but less confidently.

The main problem: how do you add apples and oranges?

Vector search score is cosine distance, a number from 0 to 1. BM25 score is a TF-IDF number that could be 0.5 or 45.7. You can't add them directly — it's like adding kilograms to kilometers.

And this is where the most elegant algorithm you may have never heard of appears.

Reciprocal Rank Fusion (RRF): elegance in one formula

In 2009, a group of researchers from the University of Waterloo published a paper proposing a brilliantly simple solution:

Forget about absolute scores. Use only positions in the ranking (rank).

Formula:

RRF_score(document) = 1/(k + rank_vector) + 1/(k + rank_bm25)

Where k = 60 — a smoothing constant (the paper authors tested different values, 60 turned out optimal and has since become the standard).

How it works — in simple terms

Say we have document X:

In vector search it's in 2nd place (rank = 2)
In BM25 search it's in 1st place (rank = 1)

RRF_score(X) = 1/(60 + 2) + 1/(60 + 1) = 0.0161 + 0.0164 = 0.0325

And document Y:

In vector search in 1st place (rank = 1)
In BM25 search in 15th place (rank = 15)

RRF_score(Y) = 1/(60 + 1) + 1/(60 + 15) = 0.0164 + 0.0133 = 0.0297

Document X wins because it ranks high in both rankings. Document Y is a superstar in one channel but loses in the combination.

Why this is genius:

No need to normalize scores — we work only with ranks
No need to train weights — the formula is fixed
No GPU needed — pure arithmetic
Runs in microseconds

Analogy: imagine two friends recommend restaurants. One is a foodie (vector), the other knows every address in the city (BM25). If both recommend the same place — you definitely go there. If only the foodie — might be tasty, but might be closed. If only the "directory" — might be open, but not good. RRF is exactly this logic, but in math.

Part 4: Numbers Don't Lie (Benchmark)

// benchmark_results.log

Theory is nice, but I'm the kind of person who doesn't believe anything without a benchmark. So we ran an experiment.

Task

Banking documents in Russian: policies, regulations, normative acts. 50 documents, 10 test queries — intentionally difficult, with acronyms (KYC, AML, SOFR, ICAAP), regulation numbers (#2864, #2432), and professional terminology.

Exactly the kind of queries that break "pure" vector search.

Participants

Method	What it does	Requires GPU?
Pure Vector	multilingual-e5-large, cosine similarity	Yes (for embedding)
Pure BM25	Full-text search on inverted index	No
Hybrid RRF	Vector + BM25, merged via RRF (k=60)	No*
Reranker	Cross-encoder (bge-reranker-v2-m3, 568M parameters)	Yes (per query)

*For Hybrid, GPU is only needed at the indexing stage (creating embeddings). At search time — no.

Results

Method	Recall@3	Requires GPU at search	Latency
Pure BM25	0.667	No	~5 ms
Pure Vector	0.767	No	~10 ms
Hybrid RRF	0.800	No	~15 ms
Reranker (Cross-encoder)	0.833	Yes	~150-500 ms

Read these numbers again.

Hybrid RRF delivers 96% of the heavy Reranker's quality. No GPU per query. No additional model. No latency in hundreds of milliseconds. One SQL query.

And pure Vector Search? Recall 0.767 — 17% worse than Hybrid. Every sixth relevant document gets lost.

Where exactly Hybrid wins

The most interesting part is which queries Hybrid beats pure vector on. Here are specific examples from our benchmark:

Query: "Regulation #2864, PEP requirements"

Vector: Recall 0.67 — couldn't distinguish the regulation number
BM25: Recall 1.00 — found by exact match "2864"
Hybrid: Recall 1.00 — BM25 "backed up" the vector

Query: "CAR, CET1, Regulation #2432"

Vector: Recall 0.33 — found only 1 of 3 documents
BM25: Recall 0.33 — also only 1
Hybrid: Recall 0.67 — the combination found 2, because vector found one document by semantics and BM25 found another by exact match "2432"

This is the essence of Hybrid: two methods compensate for each other's weaknesses.

When does Hybrid lose?

In fairness — it happens. In our benchmark, on 2 of 10 queries (STR/AML and ECL/IFRS 9) Hybrid showed worse results than pure vector (0.33 vs 0.67).

Reason: the BM25 branch pulled in noisy documents. The word "AML" appears in 5+ corpus documents, and BM25 pulled in irrelevant matches that in the RRF fusion displaced correct vector results.

I suspect that with weight tuning (e.g., 0.7 × vector + 0.3 × BM25 instead of equal) this effect can be mitigated. But on 8 of 10 queries, equal weights work perfectly.

Part 5: Architectural Zoo vs Unified Platform

// why three databases when one will do

How (almost) everyone does it

Here's a typical RAG system architecture I see in every other project:

Documents → Elasticsearch (full-text search)
         → Qdrant / Milvus / Pinecone (vector search)
         → PostgreSQL (metadata, filters)

Python application:
  1. Query Elasticsearch → top-50
  2. Query Qdrant → top-50
  3. Query PostgreSQL → filters
  4. Merge results in code
  5. RRF in Python
  6. Pass to LLM

Three databases. Three connections. Three points of failure. Three monitoring setups. And — the cherry on top — three copies of data that need to be synchronized.

It's like cooking dinner using three kitchens in different houses. Technically possible. Practically — insane.

How it can be done

Now the same result, but in a single SQL query:

-- Hybrid Search in one SQL query
WITH vector_results AS (
    SELECT doc_id,
           cosine_distance(embedding, :query_vec) AS score,
           ROW_NUMBER() OVER (ORDER BY cosine_distance(embedding, :query_vec)) AS rank
    FROM documents
    WHERE department = :user_dept
      AND version_status = 'active'
    ORDER BY score DESC LIMIT 50
),
bm25_results AS (
    SELECT doc_id,
           score(content) AS score,
           ROW_NUMBER() OVER (ORDER BY score(content) DESC) AS rank
    FROM documents
    WHERE MATCH(content, :query_text)
      AND department = :user_dept
      AND version_status = 'active'
    ORDER BY score DESC LIMIT 50
)
SELECT doc_id,
       1.0/(60 + v.rank) + 1.0/(60 + b.rank) AS rrf_score
FROM vector_results v
FULL OUTER JOIN bm25_results b USING (doc_id)
ORDER BY rrf_score DESC
LIMIT 10;

One query. Vector search, BM25, access control filtering, RRF — all inside.

Notice WHERE department = :user_dept AND version_status = 'active'. This isn't just a filter — it's access control and versioning built right into the search query. Not a separate service. Not middleware. A WHERE clause.

Which databases can do this?

In 2026, the unified approach is supported by:

DB	Vector	BM25	SQL Filters	Single Query
Apache Doris	HNSW + brute-force	Inverted Index, BM25 score	Full SQL	Yes
Elasticsearch 8+	kNN + HNSW	Native BM25	Query DSL	Yes (not SQL)
PostgreSQL + pgvector	HNSW + IVFFlat	tsvector/tsquery	Full SQL	Yes
SingleStore	Vector Index	Full-text search	Full SQL	Yes

I won't recommend a specific product — the choice depends on your workloads and ecosystem. But the principle is the same: the fewer systems duct-taped together, the more reliable the search.

Part 6: Summary — Checklist for RAG Engineers

When to use which method

Task	Recommendation	Why
Chatbot	Pure Vector	Semantics matter more than precision
Enterprise RAG	Hybrid RRF	Need both meaning and exact match
Compliance/audit	Hybrid + Reranker	Every % of recall matters
Code/ID search	BM25 + vector fallback	Precision matters more than semantics
Zero GPU budget	BM25 → Hybrid	RRF requires no GPU

5 Rules of Hybrid Search

Start with Hybrid by default. Pure vector — only if you're sure you don't have queries with exact identifiers.
Use RRF (k=60) for merging. Don't invent your own formula. RRF is proven across dozens of benchmarks and used in Elasticsearch, Azure AI Search, and dozens of production systems.
Add a Reranker only if Hybrid isn't accurate enough. A cross-encoder adds 100-500 ms to each query and requires GPU. Hybrid RRF delivers 96% of its quality for free. Reranker is for critical queries, not for every search.
Don't proliferate a database zoo. Three systems duct-taped together means three points of failure, three datasets to synchronize, and three headaches for on-call. If you can fit it in one — do it.
Test on your own data. Our benchmark uses banking documents. Your domain may differ. Take 10 real queries, run them through Vector, BM25, and Hybrid — and see where the differences are. If Hybrid wins on 3+ queries — the pattern is validated.

Bottom line, no fluff

Hybrid Search is running vector (semantic) and keyword (BM25) search in parallel, then merging results via the mathematical formula RRF.

Why: because vectors lose exact matches, and keywords don't understand meaning. Together they compensate for each other's weaknesses.

Main surprise: RRF (a 2009 formula, no ML, no GPU) delivers 96% of heavy neural reranker quality. It's the Pareto-optimal solution for 90% of tasks.

When needed: if you're building RAG, if your users search documents, if queries contain codes, numbers, acronyms, or a mix of terminology and "everyday language".

When NOT needed: if you have a chatbot for free conversation with no attachment to specific documents.

FAQ

How is Hybrid Search different from regular search?

Regular search is either full-text only (keywords) or vector only (semantics). Hybrid combines both: a single query triggers both vector and BM25 search, results merge via RRF. You get both semantics and exact code/acronym matching.

What is RRF and why k=60?

RRF (Reciprocal Rank Fusion) is a formula for merging ranks from two lists without score normalization. k=60 is an empirically chosen constant from the original paper (Cormack et al., 2009) that delivers consistently good results across different collections.

Does Hybrid Search require a GPU?

No. Vector search (HNSW and similar) and BM25 run on CPU. GPU is only needed if you add a Reranker on top of Hybrid — a separate model for re-ranking top-N candidates.

When should you add a Reranker on top of Hybrid?

When recall is critical (compliance, audit, legal) and you're willing to pay with latency (150–500 ms) and GPU. For most RAG scenarios, Hybrid RRF is sufficient.

Which database to choose for Hybrid Search?

You need a DB supporting vector search (HNSW), full-text (inverted index, BM25), and full SQL for filters. Examples: Apache Doris, Elasticsearch 8+, PostgreSQL with pgvector and full-text.

Want to test Hybrid Search on your data? We'll help set up a pilot in two weeks.

./ЗАПРОСИТЬ_ПИЛОТ.sh

Architecture, use cases, key capabilities. 5000+ companies in production.

→ What Is a Data Lakehouse — Explained Simply

Data Warehouse vs Data Lake vs Lakehouse. Open table formats. Five non-obvious advantages.

→ 5 Failure Points of RAG Systems — and How to Fix Them

RBAC, Embedding Drift, Semantic Confusion, audit, Prompt Injection. Solutions and research references.

→ Accelerate Your BI Tool — Sub-Second Analytics

SuperSet, PowerBI, Tableau running slow? Sub-100ms, Auto Query Rewrite, free pilot.

→ Airflow + Apache Doris / VeloDB — Pipeline Orchestration

MySQL Protocol, three DAG patterns, Stream Load. Job Scheduler vs Airflow.

Sources

Cormack, G.V., Clarke, C.L.A., Buettcher, S. "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" — original RRF paper, SIGIR 2009
Microsoft Azure AI Search: Hybrid Search — "hybrid retrieval often provides better results than vector search alone"
Robertson, S.E. et al. "Okapi at TREC-3" — original BM25 publication, 1994
Apache Doris — Inverted Index + Vector Search — hybrid search documentation
Our benchmark: Vector vs BM25 vs Reranker vs Hybrid RRF — full results and methodology