Understanding Retrieval
The Retrieval Process
Key Concepts
Chunk: A segment of document text (typically 256-1024 tokens) Embedding: High-dimensional vector representation of text (e.g., 1536 dimensions) Similarity: Measure of how close two embeddings are (cosine similarity, euclidean distance) Top-K: Number of chunks to retrieve (e.g., top 10 most similar) Threshold: Minimum similarity score to include result (e.g., 0.7)Search Strategies
Semantic Search (Vector Search)
How it works: Compares query embedding to document embeddings using vector similarity. Process:- Cosine Similarity: Angle between vectors (most common)
- Euclidean Distance: Straight-line distance
- Dot Product: Combined magnitude and direction
- Understands semantic meaning, not just keywords
- Finds conceptually similar content
- Handles synonyms and paraphrasing
- Cross-lingual search (with multilingual embeddings)
- May miss exact term matches
- Less effective for proper nouns, codes, IDs
- Computationally intensive
- Requires good embeddings
- Natural language queries
- Conceptual searches (“how to improve performance”)
- When you want related/similar content
- Cross-lingual documents
Keyword Search (BM25)
How it works: Traditional keyword matching with term frequency and inverse document frequency scoring. Process:- Term Frequency: More occurrences = higher score
- Inverse Document Frequency: Rare terms score higher
- Document Length Normalization: Penalize very long docs
- Exact term matching (SKU-12345, error codes)
- Fast and efficient
- Doesn’t require embeddings
- Works well for technical queries
- No semantic understanding
- Misses synonyms and paraphrasing
- Word order doesn’t matter
- Sensitive to exact spelling
- Product codes, SKUs, IDs
- Technical error codes
- Specific names (people, places)
- When exact terms are critical
Hybrid Search (Semantic + Keyword)
How it works: Combines semantic and keyword search with weighted scoring. Process:- 0.0: Pure keyword search (BM25 only)
- 0.5: Balanced (50% semantic, 50% keyword)
- 1.0: Pure semantic search (vector only)
- Best of both worlds
- Catches both semantic and exact matches
- Flexible weighting
- More robust than either alone
- Slightly slower (two searches)
- Requires tuning alpha parameter
- More complex to understand results
- General-purpose search (default recommendation)
- Mixed content (technical + natural language)
- When you want comprehensive coverage
- Unknown query types
Recommended Alpha Values
| Use Case | Alpha | Reasoning |
|---|---|---|
| General Search | 0.5 | Balanced approach |
| Technical Docs | 0.3 | Favor exact terms (error codes, commands) |
| Help Articles | 0.7 | Favor semantic (natural language) |
| Product Catalog | 0.3 | Favor exact matches (SKUs, models) |
| Legal Documents | 0.2 | Exact term matching critical |
| Research Papers | 0.7 | Conceptual similarity important |
Reranking
What is Reranking?
Purpose: Second-stage ranking to improve precision by reordering top candidates. Process:- Cross-encoder architecture
- Compares query to each candidate passage
- More computationally expensive but more accurate
- Fine-grained token-level matching
When to Use Reranking
Enable reranking when:- Precision is critical (top-k must be highly relevant)
- You can tolerate slightly slower queries (adds ~200-500ms)
- Initial retrieval returns too many marginal results
- Quality over quantity matters
- Speed is critical (real-time queries)
- Good results from initial search
- High volume, low-stakes queries
- Cost optimization needed
Reranking Configuration
Basic Configuration:- rerank: Enable/disable reranking
- rerank_top_n: How many candidates to rerank (e.g., 50)
- top_k: Final results after reranking (e.g., 10)
rerank_top_n should be 3-10x top_k
Impact of Reranking
Before Reranking (semantic search, top-k=10):Retrieval Configuration
Top-K (Number of Results)
Definition: Number of chunks to retrieve Guidelines:- Small (3-5): Concise answers, low context
- Medium (10-15): Balanced (recommended)
- Large (20-50): Comprehensive coverage, more expensive
- More chunks = More context but slower, more expensive
- Fewer chunks = Faster, cheaper but may miss information
- Q&A Systems: 5-10 chunks
- Research: 15-30 chunks
- Summarization: 10-20 chunks
Similarity Threshold
Definition: Minimum similarity score to include result (0.0 to 1.0) Guidelines:- Low (0.3-0.5): Inclusive, risk of irrelevant results
- Medium (0.6-0.7): Balanced (recommended)
- High (0.8-0.9): Strict, may miss relevant results
- Customer Support: 0.6 (don’t miss potential answers)
- Technical Docs: 0.7 (balance precision/recall)
- Legal/Compliance: 0.8 (high precision required)
Chunk Size & Overlap
Chunk Size: Number of tokens per chunk Guidelines:- Small (256-512): Precise, more chunks needed
- Medium (512-1024): Balanced (recommended)
- Large (1024-2048): More context, fewer chunks
- No overlap (0): Risk cutting off sentences
- Small (50-100 tokens): Recommended
- Large (200+ tokens): Redundancy
- Technical Docs: Small chunks (256-512), precise retrieval
- Articles/Reports: Medium chunks (512-1024)
- Books: Large chunks (1024-2048), preserve context
Optimization Strategies
Strategy 1: Start with Hybrid + Reranking
Configuration:- Catches both semantic and exact matches
- Reranking improves precision
- Balanced performance
Strategy 2: Optimize for Speed
Configuration:- Single search pass
- Fewer results
- No reranking overhead
Strategy 3: Optimize for Precision
Configuration:- Aggressive reranking
- High threshold filters noise
- Few but highly relevant results
Strategy 4: Optimize for Recall
Configuration:- More results
- Lower threshold
- Less filtering
Strategy 5: Technical/Exact Match
Configuration:- Low alpha = more weight on keywords
- Good for codes, IDs, commands
Embedding Models
Model Selection
FastEmbed (Default):- Speed: Very fast
- Quality: Good
- Cost: Low
- Use case: General-purpose
- Speed: Moderate
- Quality: Excellent
- Cost: Higher
- Use case: Best accuracy
- Speed: Fast
- Quality: Very good
- Cost: Low
- Use case: Multilingual documents
- Speed: Moderate
- Quality: Excellent
- Cost: Moderate
- Use case: Enterprise, specialized domains
Multilingual Search
For non-English or mixed-language documents: Multilingual Embedding Models:- E5-multilingual: Supports 100+ languages
- LaBSE: Google’s multilingual model
- Jina-multilingual: Optimized for cross-lingual search
- Query in English, find Spanish documents
- Query in French, find German documents
- Requires multilingual embedding model
Folder Filtering
How Folder Filtering Works
Folder Structure:/products and subfolders.
Benefits:
- Faster: Smaller search space
- More Relevant: Domain-specific results
- Better Accuracy: Less noise
- Domain-specific queries (HR queries → /policies/hr/)
- Multi-tenant KBs (customer1 → /customers/customer1/)
- Department-specific search
Monitoring & Tuning
Metrics to Track
Retrieval Metrics:- Average similarity score: Are results generally relevant?
- Results per query: How many chunks returned?
- Zero-result queries: Queries with no results
- Query latency: How fast is retrieval?
- User feedback: Thumbs up/down on results
- Click-through rate: Which results are used?
- Answer accuracy: Does LLM generate correct answer?
A/B Testing Strategies
Test Different Configurations: Test 1: Alpha Value- Group A: alpha=0.3 (favor keyword)
- Group B: alpha=0.7 (favor semantic)
- Measure: Relevance scores, user feedback
- Group A: rerank=false
- Group B: rerank=true, rerank_top_n=30
- Measure: Precision, latency, user satisfaction
- Group A: chunk_size=512
- Group B: chunk_size=1024
- Measure: Context quality, answer accuracy
Iterative Improvement
Step 1: Baseline- Start with default hybrid search
- Collect metrics for 1 week
- Too many irrelevant results? → Increase threshold or enable reranking
- Missing exact matches? → Lower alpha (favor keyword)
- Slow queries? → Reduce top_k or disable reranking
- Make one change at a time
- A/B test against baseline
- Measure impact
- Roll out winning configuration
- Continue monitoring
- Iterate regularly
Troubleshooting
”No Results Found”
Causes:- Threshold too high
- Documents not indexed
- Poor embedding quality
- Folder filter too restrictive
- Lower similarity threshold (try 0.5)
- Check if documents indexed
- Try different search mode (keyword if semantic fails)
- Remove or broaden folder filter
- Verify query spelling
”Results Not Relevant”
Causes:- Threshold too low
- Wrong search mode
- Poor chunk size
- Query too vague
- Increase similarity threshold (try 0.7-0.8)
- Enable reranking
- Try hybrid search (alpha=0.5)
- Adjust chunk size
- Refine query to be more specific
”Slow Queries”
Causes:- Large knowledge base
- High top_k value
- Reranking enabled
- Large rerank_top_n
- Reduce top_k (try 5-10)
- Disable reranking or reduce rerank_top_n
- Use folder filtering to narrow scope
- Consider semantic-only search (faster than hybrid)
- Optimize chunk size
”Inconsistent Results”
Causes:- Low similarity threshold
- High alpha causing semantic drift
- Poor quality embeddings
- Ambiguous queries
- Increase similarity threshold
- Lower alpha to favor keyword matching (0.3-0.4)
- Enable reranking for consistency
- Improve query specificity
Advanced Techniques
Query Expansion
Automatically expand queries for better recall:Two-Stage Retrieval
Stage 1: Fast, broad retrieval (top-k=100, low threshold) Stage 2: Rerank aggressively (final top-k=10) Better precision than single-stage.Contextual Filtering
Filter results based on context:Hybrid Scoring Functions
Custom scoring beyond simple weighted average:Best Practices
Start Simple: Begin with hybrid search (alpha=0.5), tune from there Measure Everything: Track metrics before and after changes Domain-Specific Tuning: Different domains need different configurations User Feedback: Collect thumbs up/down, iterate based on real usage Regular Review: Re-evaluate configuration quarterly as KB grows Document Changes: Keep changelog of configuration adjustmentsMastering search strategies unlocks the full potential of your knowledge bases, delivering accurate, relevant answers to every query.
Retrieval Settings
Complete guide to all retrieval configuration options