Skip to main content
Knowledge base search strategies determine how relevant information is retrieved from your documents. Choosing the right strategy dramatically impacts accuracy, relevance, and performance.

Understanding Retrieval

The Retrieval Process

User Query

Query Embedding (convert to vector)

Search Strategy (semantic/hybrid/keyword)

Retrieve Top-K Candidates (e.g., 50 chunks)

(Optional) Reranking (refine to best 10)

Filter by Similarity Threshold

Return Final Results

Key Concepts

Chunk: A segment of document text (typically 256-1024 tokens) Embedding: High-dimensional vector representation of text (e.g., 1536 dimensions) Similarity: Measure of how close two embeddings are (cosine similarity, euclidean distance) Top-K: Number of chunks to retrieve (e.g., top 10 most similar) Threshold: Minimum similarity score to include result (e.g., 0.7)

Search Strategies

How it works: Compares query embedding to document embeddings using vector similarity. Process:
Query: "What's the refund policy?"

Embed query → [0.23, -0.15, 0.42, ...]

Compare to all chunk embeddings in KB

Return most similar vectors
Similarity Metrics:
  • Cosine Similarity: Angle between vectors (most common)
  • Euclidean Distance: Straight-line distance
  • Dot Product: Combined magnitude and direction
Strengths:
  • Understands semantic meaning, not just keywords
  • Finds conceptually similar content
  • Handles synonyms and paraphrasing
  • Cross-lingual search (with multilingual embeddings)
Weaknesses:
  • May miss exact term matches
  • Less effective for proper nouns, codes, IDs
  • Computationally intensive
  • Requires good embeddings
Best For:
  • Natural language queries
  • Conceptual searches (“how to improve performance”)
  • When you want related/similar content
  • Cross-lingual documents
Configuration:
search_mode: "semantic"
top_k: 10
similarity_threshold: 0.7
Example:
Query: "return policy"
Matches:
✓ "Refund Policy: Returns accepted within 30 days..."
✓ "Money-back guarantee for unsatisfied customers..."
✓ "Exchange or return items with receipt..."

Semantic understanding matches different wordings.

Keyword Search (BM25)

How it works: Traditional keyword matching with term frequency and inverse document frequency scoring. Process:
Query: "SKU-12345"

Tokenize: ["sku", "12345"]

Find documents containing terms

Score by term frequency and rarity

Return highest scored documents
BM25 Scoring:
  • Term Frequency: More occurrences = higher score
  • Inverse Document Frequency: Rare terms score higher
  • Document Length Normalization: Penalize very long docs
Strengths:
  • Exact term matching (SKU-12345, error codes)
  • Fast and efficient
  • Doesn’t require embeddings
  • Works well for technical queries
Weaknesses:
  • No semantic understanding
  • Misses synonyms and paraphrasing
  • Word order doesn’t matter
  • Sensitive to exact spelling
Best For:
  • Product codes, SKUs, IDs
  • Technical error codes
  • Specific names (people, places)
  • When exact terms are critical
Configuration:
search_mode: "keyword"
top_k: 10
Example:
Query: "error E404-23B"
Matches:
✓ "Error E404-23B: Connection timeout..."
✓ "Troubleshooting guide for E404-23B..."

Semantic search might miss the exact code.

Hybrid Search (Semantic + Keyword)

How it works: Combines semantic and keyword search with weighted scoring. Process:
Query

Semantic Search → Score A (0-1)

Keyword Search → Score B (0-1)

Final Score = (alpha × Score A) + ((1-alpha) × Score B)

Return combined results
Alpha Parameter:
  • 0.0: Pure keyword search (BM25 only)
  • 0.5: Balanced (50% semantic, 50% keyword)
  • 1.0: Pure semantic search (vector only)
Strengths:
  • Best of both worlds
  • Catches both semantic and exact matches
  • Flexible weighting
  • More robust than either alone
Weaknesses:
  • Slightly slower (two searches)
  • Requires tuning alpha parameter
  • More complex to understand results
Best For:
  • General-purpose search (default recommendation)
  • Mixed content (technical + natural language)
  • When you want comprehensive coverage
  • Unknown query types
Configuration:
search_mode: "hybrid"
alpha: 0.5  # Balanced
top_k: 20   # Retrieve more candidates
Example:
Query: "how to reset admin password"

Semantic matches:
✓ "Administrator account recovery procedures..."
✓ "Password reset instructions for users..."

Keyword matches:
✓ "Admin Password Reset: Step-by-step guide..."
✓ "Reset admin credentials with password command..."

Hybrid combines both sets.
Use CaseAlphaReasoning
General Search0.5Balanced approach
Technical Docs0.3Favor exact terms (error codes, commands)
Help Articles0.7Favor semantic (natural language)
Product Catalog0.3Favor exact matches (SKUs, models)
Legal Documents0.2Exact term matching critical
Research Papers0.7Conceptual similarity important

Reranking

What is Reranking?

Purpose: Second-stage ranking to improve precision by reordering top candidates. Process:
Initial Search → 50 candidates

Reranking Model (ColBERT) → Fine-grained scoring

Top 10 reranked results
ColBERT Model:
  • Cross-encoder architecture
  • Compares query to each candidate passage
  • More computationally expensive but more accurate
  • Fine-grained token-level matching

When to Use Reranking

Enable reranking when:
  • Precision is critical (top-k must be highly relevant)
  • You can tolerate slightly slower queries (adds ~200-500ms)
  • Initial retrieval returns too many marginal results
  • Quality over quantity matters
Skip reranking when:
  • Speed is critical (real-time queries)
  • Good results from initial search
  • High volume, low-stakes queries
  • Cost optimization needed

Reranking Configuration

Basic Configuration:
search_mode: "hybrid"
alpha: 0.5
top_k: 10
rerank: true
rerank_top_n: 50
Parameters:
  • rerank: Enable/disable reranking
  • rerank_top_n: How many candidates to rerank (e.g., 50)
  • top_k: Final results after reranking (e.g., 10)
Rule of Thumb: rerank_top_n should be 3-10x top_k

Impact of Reranking

Before Reranking (semantic search, top-k=10):
1. Somewhat relevant (score: 0.75)
2. Highly relevant (score: 0.74)
3. Marginally relevant (score: 0.73)
4. Not relevant (score: 0.72)
5. Highly relevant (score: 0.71)
...
After Reranking (rerank top 50, final top-k=10):
1. Highly relevant (rerank score: 0.95)
2. Highly relevant (rerank score: 0.92)
3. Very relevant (rerank score: 0.88)
4. Relevant (rerank score: 0.82)
5. Somewhat relevant (rerank score: 0.78)
...
Reranking reorders results for better precision.

Retrieval Configuration

Top-K (Number of Results)

Definition: Number of chunks to retrieve Guidelines:
  • Small (3-5): Concise answers, low context
  • Medium (10-15): Balanced (recommended)
  • Large (20-50): Comprehensive coverage, more expensive
Trade-offs:
  • More chunks = More context but slower, more expensive
  • Fewer chunks = Faster, cheaper but may miss information
Recommendations:
  • Q&A Systems: 5-10 chunks
  • Research: 15-30 chunks
  • Summarization: 10-20 chunks

Similarity Threshold

Definition: Minimum similarity score to include result (0.0 to 1.0) Guidelines:
  • Low (0.3-0.5): Inclusive, risk of irrelevant results
  • Medium (0.6-0.7): Balanced (recommended)
  • High (0.8-0.9): Strict, may miss relevant results
Recommendations by Use Case:
  • Customer Support: 0.6 (don’t miss potential answers)
  • Technical Docs: 0.7 (balance precision/recall)
  • Legal/Compliance: 0.8 (high precision required)
Dynamic Threshold:
# Adjust based on query complexity
if query_length < 5:
    threshold = 0.7  # Short query, stricter
else:
    threshold = 0.6  # Longer query, more lenient

Chunk Size & Overlap

Chunk Size: Number of tokens per chunk Guidelines:
  • Small (256-512): Precise, more chunks needed
  • Medium (512-1024): Balanced (recommended)
  • Large (1024-2048): More context, fewer chunks
Chunk Overlap: Overlap between consecutive chunks Guidelines:
  • No overlap (0): Risk cutting off sentences
  • Small (50-100 tokens): Recommended
  • Large (200+ tokens): Redundancy
Example:
Chunk size: 512 tokens
Overlap: 50 tokens

Chunk 1: Tokens 1-512
Chunk 2: Tokens 462-974 (overlaps 50 tokens)
Chunk 3: Tokens 924-1436 (overlaps 50 tokens)
Recommendations:
  • Technical Docs: Small chunks (256-512), precise retrieval
  • Articles/Reports: Medium chunks (512-1024)
  • Books: Large chunks (1024-2048), preserve context

Optimization Strategies

Strategy 1: Start with Hybrid + Reranking

Configuration:
search_mode: "hybrid"
alpha: 0.5
top_k: 10
rerank: true
rerank_top_n: 30
similarity_threshold: 0.65
Why: Best default for most use cases
  • Catches both semantic and exact matches
  • Reranking improves precision
  • Balanced performance

Strategy 2: Optimize for Speed

Configuration:
search_mode: "semantic"
alpha: N/A
top_k: 5
rerank: false
similarity_threshold: 0.7
Why: Fastest retrieval
  • Single search pass
  • Fewer results
  • No reranking overhead
Use when: High volume, real-time queries

Strategy 3: Optimize for Precision

Configuration:
search_mode: "hybrid"
alpha: 0.5
top_k: 5
rerank: true
rerank_top_n: 50
similarity_threshold: 0.75
Why: Highest quality results
  • Aggressive reranking
  • High threshold filters noise
  • Few but highly relevant results
Use when: Critical accuracy (legal, medical, compliance)

Strategy 4: Optimize for Recall

Configuration:
search_mode: "hybrid"
alpha: 0.5
top_k: 30
rerank: false
similarity_threshold: 0.5
Why: Comprehensive coverage
  • More results
  • Lower threshold
  • Less filtering
Use when: Research, exploratory search

Strategy 5: Technical/Exact Match

Configuration:
search_mode: "hybrid"
alpha: 0.2  # Favor keyword
top_k: 10
rerank: false
similarity_threshold: 0.6
Why: Prioritizes exact terms
  • Low alpha = more weight on keywords
  • Good for codes, IDs, commands
Use when: Technical documentation, product catalogs

Embedding Models

Model Selection

FastEmbed (Default):
  • Speed: Very fast
  • Quality: Good
  • Cost: Low
  • Use case: General-purpose
OpenAI Embeddings:
  • Speed: Moderate
  • Quality: Excellent
  • Cost: Higher
  • Use case: Best accuracy
Jina Embeddings:
  • Speed: Fast
  • Quality: Very good
  • Cost: Low
  • Use case: Multilingual documents
Cohere Embeddings:
  • Speed: Moderate
  • Quality: Excellent
  • Cost: Moderate
  • Use case: Enterprise, specialized domains
For non-English or mixed-language documents: Multilingual Embedding Models:
  • E5-multilingual: Supports 100+ languages
  • LaBSE: Google’s multilingual model
  • Jina-multilingual: Optimized for cross-lingual search
Configuration:
embedding_model: "multilingual-e5-large"
search_mode: "semantic"  # Semantic works best cross-lingual
Cross-Lingual Search:
  • Query in English, find Spanish documents
  • Query in French, find German documents
  • Requires multilingual embedding model

Folder Filtering

How Folder Filtering Works

Folder Structure:
Knowledge Base
├── /products/
│   ├── /hardware/
│   └── /software/
├── /policies/
│   ├── /hr/
│   └── /legal/
└── /support/
Filtering by Folder:
folder_filter: "/products"
Returns only chunks from documents in /products and subfolders. Benefits:
  • Faster: Smaller search space
  • More Relevant: Domain-specific results
  • Better Accuracy: Less noise
Use Cases:
  • Domain-specific queries (HR queries → /policies/hr/)
  • Multi-tenant KBs (customer1 → /customers/customer1/)
  • Department-specific search

Monitoring & Tuning

Metrics to Track

Retrieval Metrics:
  • Average similarity score: Are results generally relevant?
  • Results per query: How many chunks returned?
  • Zero-result queries: Queries with no results
  • Query latency: How fast is retrieval?
Quality Metrics:
  • User feedback: Thumbs up/down on results
  • Click-through rate: Which results are used?
  • Answer accuracy: Does LLM generate correct answer?

A/B Testing Strategies

Test Different Configurations: Test 1: Alpha Value
  • Group A: alpha=0.3 (favor keyword)
  • Group B: alpha=0.7 (favor semantic)
  • Measure: Relevance scores, user feedback
Test 2: Reranking
  • Group A: rerank=false
  • Group B: rerank=true, rerank_top_n=30
  • Measure: Precision, latency, user satisfaction
Test 3: Chunk Size
  • Group A: chunk_size=512
  • Group B: chunk_size=1024
  • Measure: Context quality, answer accuracy

Iterative Improvement

Step 1: Baseline
  • Start with default hybrid search
  • Collect metrics for 1 week
Step 2: Identify Issues
  • Too many irrelevant results? → Increase threshold or enable reranking
  • Missing exact matches? → Lower alpha (favor keyword)
  • Slow queries? → Reduce top_k or disable reranking
Step 3: Adjust & Test
  • Make one change at a time
  • A/B test against baseline
  • Measure impact
Step 4: Deploy & Monitor
  • Roll out winning configuration
  • Continue monitoring
  • Iterate regularly

Troubleshooting

”No Results Found”

Causes:
  • Threshold too high
  • Documents not indexed
  • Poor embedding quality
  • Folder filter too restrictive
Solutions:
  1. Lower similarity threshold (try 0.5)
  2. Check if documents indexed
  3. Try different search mode (keyword if semantic fails)
  4. Remove or broaden folder filter
  5. Verify query spelling

”Results Not Relevant”

Causes:
  • Threshold too low
  • Wrong search mode
  • Poor chunk size
  • Query too vague
Solutions:
  1. Increase similarity threshold (try 0.7-0.8)
  2. Enable reranking
  3. Try hybrid search (alpha=0.5)
  4. Adjust chunk size
  5. Refine query to be more specific

”Slow Queries”

Causes:
  • Large knowledge base
  • High top_k value
  • Reranking enabled
  • Large rerank_top_n
Solutions:
  1. Reduce top_k (try 5-10)
  2. Disable reranking or reduce rerank_top_n
  3. Use folder filtering to narrow scope
  4. Consider semantic-only search (faster than hybrid)
  5. Optimize chunk size

”Inconsistent Results”

Causes:
  • Low similarity threshold
  • High alpha causing semantic drift
  • Poor quality embeddings
  • Ambiguous queries
Solutions:
  1. Increase similarity threshold
  2. Lower alpha to favor keyword matching (0.3-0.4)
  3. Enable reranking for consistency
  4. Improve query specificity

Advanced Techniques

Query Expansion

Automatically expand queries for better recall:
Original: "refund policy"
Expanded: "refund policy OR return policy OR money-back guarantee"
Implementation: Use LLM to generate query variations before search.

Two-Stage Retrieval

Stage 1: Fast, broad retrieval (top-k=100, low threshold) Stage 2: Rerank aggressively (final top-k=10) Better precision than single-stage.

Contextual Filtering

Filter results based on context:
# Example: Filter by date range
results = search(query, top_k=50)
filtered = [r for r in results if r.date > cutoff_date]
return filtered[:10]

Hybrid Scoring Functions

Custom scoring beyond simple weighted average:
# Example: Boost recent documents
score = (alpha * semantic_score) + ((1-alpha) * keyword_score)
if document.age_days < 30:
    score *= 1.2  # 20% boost for recent docs

Best Practices

Start Simple: Begin with hybrid search (alpha=0.5), tune from there Measure Everything: Track metrics before and after changes Domain-Specific Tuning: Different domains need different configurations User Feedback: Collect thumbs up/down, iterate based on real usage Regular Review: Re-evaluate configuration quarterly as KB grows Document Changes: Keep changelog of configuration adjustments
Mastering search strategies unlocks the full potential of your knowledge bases, delivering accurate, relevant answers to every query.

Retrieval Settings

Complete guide to all retrieval configuration options