Search Strategies

Knowledge base search strategies determine how relevant information is retrieved from your documents. Choosing the right strategy dramatically impacts accuracy, relevance, and performance.

Understanding Retrieval

The Retrieval Process

User Query
    ↓
Query Embedding (convert to vector)
    ↓
Search Strategy (semantic/hybrid/keyword)
    ↓
Retrieve Top-K Candidates (e.g., 50 chunks)
    ↓
(Optional) Reranking (refine to best 10)
    ↓
Filter by Similarity Threshold
    ↓
Return Final Results

Key Concepts

Chunk: A segment of document text (typically 256-1024 tokens) Embedding: High-dimensional vector representation of text (e.g., 1536 dimensions) Similarity: Measure of how close two embeddings are (cosine similarity, euclidean distance) Top-K: Number of chunks to retrieve (e.g., top 10 most similar) Threshold: Minimum similarity score to include result (e.g., 0.7)

Semantic Search (Vector Search)

How it works: Compares query embedding to document embeddings using vector similarity. Process:

Query: "What's the refund policy?"
    ↓
Embed query → [0.23, -0.15, 0.42, ...]
    ↓
Compare to all chunk embeddings in KB
    ↓
Return most similar vectors

Similarity Metrics:

Cosine Similarity: Angle between vectors (most common)
Euclidean Distance: Straight-line distance
Dot Product: Combined magnitude and direction

Strengths:

Understands semantic meaning, not just keywords
Finds conceptually similar content
Handles synonyms and paraphrasing
Cross-lingual search (with multilingual embeddings)

Weaknesses:

May miss exact term matches
Less effective for proper nouns, codes, IDs
Computationally intensive
Requires good embeddings

Best For:

Natural language queries
Conceptual searches (“how to improve performance”)
When you want related/similar content
Cross-lingual documents

Configuration:

search_mode: "semantic"
top_k: 10
similarity_threshold: 0.7

Example:

Query: "return policy"
Matches:
✓ "Refund Policy: Returns accepted within 30 days..."
✓ "Money-back guarantee for unsatisfied customers..."
✓ "Exchange or return items with receipt..."

Semantic understanding matches different wordings.

Keyword Search (BM25)

How it works: Traditional keyword matching with term frequency and inverse document frequency scoring. Process:

Query: "SKU-12345"
    ↓
Tokenize: ["sku", "12345"]
    ↓
Find documents containing terms
    ↓
Score by term frequency and rarity
    ↓
Return highest scored documents

BM25 Scoring:

Term Frequency: More occurrences = higher score
Inverse Document Frequency: Rare terms score higher
Document Length Normalization: Penalize very long docs

Strengths:

Exact term matching (SKU-12345, error codes)
Fast and efficient
Doesn’t require embeddings
Works well for technical queries

Weaknesses:

No semantic understanding
Misses synonyms and paraphrasing
Word order doesn’t matter
Sensitive to exact spelling

Best For:

Product codes, SKUs, IDs
Technical error codes
Specific names (people, places)
When exact terms are critical

Configuration:

search_mode: "keyword"
top_k: 10

Example:

Query: "error E404-23B"
Matches:
✓ "Error E404-23B: Connection timeout..."
✓ "Troubleshooting guide for E404-23B..."

Semantic search might miss the exact code.

Hybrid Search (Semantic + Keyword)

How it works: Combines semantic and keyword search with weighted scoring. Process:

Query
    ↓
Semantic Search → Score A (0-1)
    ↓
Keyword Search → Score B (0-1)
    ↓
Final Score = (alpha × Score A) + ((1-alpha) × Score B)
    ↓
Return combined results

Alpha Parameter:

0.0: Pure keyword search (BM25 only)
0.5: Balanced (50% semantic, 50% keyword)
1.0: Pure semantic search (vector only)

Strengths:

Best of both worlds
Catches both semantic and exact matches
Flexible weighting
More robust than either alone

Weaknesses:

Slightly slower (two searches)
Requires tuning alpha parameter
More complex to understand results

Best For:

General-purpose search (default recommendation)
Mixed content (technical + natural language)
When you want comprehensive coverage
Unknown query types

Configuration:

search_mode: "hybrid"
alpha: 0.5  # Balanced
top_k: 20   # Retrieve more candidates

Example:

Query: "how to reset admin password"

Semantic matches:
✓ "Administrator account recovery procedures..."
✓ "Password reset instructions for users..."

Keyword matches:
✓ "Admin Password Reset: Step-by-step guide..."
✓ "Reset admin credentials with password command..."

Hybrid combines both sets.

Recommended Alpha Values

Use Case	Alpha	Reasoning
General Search	0.5	Balanced approach
Technical Docs	0.3	Favor exact terms (error codes, commands)
Help Articles	0.7	Favor semantic (natural language)
Product Catalog	0.3	Favor exact matches (SKUs, models)
Legal Documents	0.2	Exact term matching critical
Research Papers	0.7	Conceptual similarity important

Reranking

What is Reranking?

Purpose: Second-stage ranking to improve precision by reordering top candidates. Process:

Initial Search → 50 candidates
    ↓
Reranking Model (ColBERT) → Fine-grained scoring
    ↓
Top 10 reranked results

ColBERT Model:

Cross-encoder architecture
Compares query to each candidate passage
More computationally expensive but more accurate
Fine-grained token-level matching

When to Use Reranking

Enable reranking when:

Precision is critical (top-k must be highly relevant)
You can tolerate slightly slower queries (adds ~200-500ms)
Initial retrieval returns too many marginal results
Quality over quantity matters

Skip reranking when:

Speed is critical (real-time queries)
Good results from initial search
High volume, low-stakes queries
Cost optimization needed

Reranking Configuration

Basic Configuration:

search_mode: "hybrid"
alpha: 0.5
top_k: 10
rerank: true
rerank_top_n: 50

Parameters:

rerank: Enable/disable reranking
rerank_top_n: How many candidates to rerank (e.g., 50)
top_k: Final results after reranking (e.g., 10)

Rule of Thumb: rerank_top_n should be 3-10x top_k

Impact of Reranking

Before Reranking (semantic search, top-k=10):

Somewhat relevant (score: 0.75)
Highly relevant (score: 0.74)
Marginally relevant (score: 0.73)
Not relevant (score: 0.72)
Highly relevant (score: 0.71)
...

After Reranking (rerank top 50, final top-k=10):

Highly relevant (rerank score: 0.95)
Highly relevant (rerank score: 0.92)
Very relevant (rerank score: 0.88)
Relevant (rerank score: 0.82)
Somewhat relevant (rerank score: 0.78)
...

Reranking reorders results for better precision.

Retrieval Configuration

Top-K (Number of Results)

Definition: Number of chunks to retrieve Guidelines:

Small (3-5): Concise answers, low context
Medium (10-15): Balanced (recommended)
Large (20-50): Comprehensive coverage, more expensive

Trade-offs:

More chunks = More context but slower, more expensive
Fewer chunks = Faster, cheaper but may miss information

Recommendations:

Q&A Systems: 5-10 chunks
Research: 15-30 chunks
Summarization: 10-20 chunks

Similarity Threshold

Definition: Minimum similarity score to include result (0.0 to 1.0) Guidelines:

Low (0.3-0.5): Inclusive, risk of irrelevant results
Medium (0.6-0.7): Balanced (recommended)
High (0.8-0.9): Strict, may miss relevant results

Recommendations by Use Case:

Customer Support: 0.6 (don’t miss potential answers)
Technical Docs: 0.7 (balance precision/recall)
Legal/Compliance: 0.8 (high precision required)

Dynamic Threshold:

# Adjust based on query complexity
if query_length < 5:
    threshold = 0.7  # Short query, stricter
else:
    threshold = 0.6  # Longer query, more lenient

Chunk Size & Overlap

Chunk Size: Number of tokens per chunk Guidelines:

Small (256-512): Precise, more chunks needed
Medium (512-1024): Balanced (recommended)
Large (1024-2048): More context, fewer chunks

Chunk Overlap: Overlap between consecutive chunks Guidelines:

No overlap (0): Risk cutting off sentences
Small (50-100 tokens): Recommended
Large (200+ tokens): Redundancy

Example:

Chunk size: 512 tokens
Overlap: 50 tokens

Chunk 1: Tokens 1-512
Chunk 2: Tokens 462-974 (overlaps 50 tokens)
Chunk 3: Tokens 924-1436 (overlaps 50 tokens)

Recommendations:

Technical Docs: Small chunks (256-512), precise retrieval
Articles/Reports: Medium chunks (512-1024)
Books: Large chunks (1024-2048), preserve context

Optimization Strategies

Strategy 1: Start with Hybrid + Reranking

Configuration:

search_mode: "hybrid"
alpha: 0.5
top_k: 10
rerank: true
rerank_top_n: 30
similarity_threshold: 0.65

Why: Best default for most use cases

Catches both semantic and exact matches
Reranking improves precision
Balanced performance

Strategy 2: Optimize for Speed

Configuration:

search_mode: "semantic"
alpha: N/A
top_k: 5
rerank: false
similarity_threshold: 0.7

Why: Fastest retrieval

Single search pass
Fewer results
No reranking overhead

Use when: High volume, real-time queries

Strategy 3: Optimize for Precision

Configuration:

search_mode: "hybrid"
alpha: 0.5
top_k: 5
rerank: true
rerank_top_n: 50
similarity_threshold: 0.75

Why: Highest quality results

Aggressive reranking
High threshold filters noise
Few but highly relevant results

Use when: Critical accuracy (legal, medical, compliance)

Strategy 4: Optimize for Recall

Configuration:

search_mode: "hybrid"
alpha: 0.5
top_k: 30
rerank: false
similarity_threshold: 0.5

Why: Comprehensive coverage

More results
Lower threshold
Less filtering

Use when: Research, exploratory search

Strategy 5: Technical/Exact Match

Configuration:

search_mode: "hybrid"
alpha: 0.2  # Favor keyword
top_k: 10
rerank: false
similarity_threshold: 0.6

Why: Prioritizes exact terms

Low alpha = more weight on keywords
Good for codes, IDs, commands

Use when: Technical documentation, product catalogs

Embedding Models

Model Selection

FastEmbed (Default):

Speed: Very fast
Quality: Good
Cost: Low
Use case: General-purpose

OpenAI Embeddings:

Speed: Moderate
Quality: Excellent
Cost: Higher
Use case: Best accuracy

Jina Embeddings:

Speed: Fast
Quality: Very good
Cost: Low
Use case: Multilingual documents

Cohere Embeddings:

Speed: Moderate
Quality: Excellent
Cost: Moderate
Use case: Enterprise, specialized domains

Multilingual Search

For non-English or mixed-language documents: Multilingual Embedding Models:

E5-multilingual: Supports 100+ languages
LaBSE: Google’s multilingual model
Jina-multilingual: Optimized for cross-lingual search

Configuration:

embedding_model: "multilingual-e5-large"
search_mode: "semantic"  # Semantic works best cross-lingual

Cross-Lingual Search:

Query in English, find Spanish documents
Query in French, find German documents
Requires multilingual embedding model

Folder Filtering

How Folder Filtering Works

Folder Structure:

Knowledge Base
├── /products/
│   ├── /hardware/
│   └── /software/
├── /policies/
│   ├── /hr/
│   └── /legal/
└── /support/

Filtering by Folder:

folder_filter: "/products"

Returns only chunks from documents in /products and subfolders. Benefits:

Faster: Smaller search space
More Relevant: Domain-specific results
Better Accuracy: Less noise

Use Cases:

Domain-specific queries (HR queries → /policies/hr/)
Multi-tenant KBs (customer1 → /customers/customer1/)
Department-specific search

Monitoring & Tuning

Metrics to Track

Retrieval Metrics:

Average similarity score: Are results generally relevant?
Results per query: How many chunks returned?
Zero-result queries: Queries with no results
Query latency: How fast is retrieval?

Quality Metrics:

User feedback: Thumbs up/down on results
Click-through rate: Which results are used?
Answer accuracy: Does LLM generate correct answer?

A/B Testing Strategies

Test Different Configurations: Test 1: Alpha Value

Group A: alpha=0.3 (favor keyword)
Group B: alpha=0.7 (favor semantic)
Measure: Relevance scores, user feedback

Test 2: Reranking

Group A: rerank=false
Group B: rerank=true, rerank_top_n=30
Measure: Precision, latency, user satisfaction

Test 3: Chunk Size

Group A: chunk_size=512
Group B: chunk_size=1024
Measure: Context quality, answer accuracy

Iterative Improvement

Step 1: Baseline

Start with default hybrid search
Collect metrics for 1 week

Step 2: Identify Issues

Too many irrelevant results? → Increase threshold or enable reranking
Missing exact matches? → Lower alpha (favor keyword)
Slow queries? → Reduce top_k or disable reranking

Step 3: Adjust & Test

Make one change at a time
A/B test against baseline
Measure impact

Step 4: Deploy & Monitor

Roll out winning configuration
Continue monitoring
Iterate regularly

Troubleshooting

”No Results Found”

Causes:

Threshold too high
Documents not indexed
Poor embedding quality
Folder filter too restrictive

Solutions:

Lower similarity threshold (try 0.5)
Check if documents indexed
Try different search mode (keyword if semantic fails)
Remove or broaden folder filter
Verify query spelling

”Results Not Relevant”

Causes:

Threshold too low
Wrong search mode
Poor chunk size
Query too vague

Solutions:

Increase similarity threshold (try 0.7-0.8)
Enable reranking
Try hybrid search (alpha=0.5)
Adjust chunk size
Refine query to be more specific

”Slow Queries”

Causes:

Large knowledge base
High top_k value
Reranking enabled
Large rerank_top_n

Solutions:

Reduce top_k (try 5-10)
Disable reranking or reduce rerank_top_n
Use folder filtering to narrow scope
Consider semantic-only search (faster than hybrid)
Optimize chunk size

”Inconsistent Results”

Causes:

Low similarity threshold
High alpha causing semantic drift
Poor quality embeddings
Ambiguous queries

Solutions:

Increase similarity threshold
Lower alpha to favor keyword matching (0.3-0.4)
Enable reranking for consistency
Improve query specificity

Advanced Techniques

Query Expansion

Automatically expand queries for better recall:

Original: "refund policy"
Expanded: "refund policy OR return policy OR money-back guarantee"

Implementation: Use LLM to generate query variations before search.

Two-Stage Retrieval

Stage 1: Fast, broad retrieval (top-k=100, low threshold) Stage 2: Rerank aggressively (final top-k=10) Better precision than single-stage.

Contextual Filtering

Filter results based on context:

# Example: Filter by date range
results = search(query, top_k=50)
filtered = [r for r in results if r.date > cutoff_date]
return filtered[:10]

Hybrid Scoring Functions

Custom scoring beyond simple weighted average:

# Example: Boost recent documents
score = (alpha * semantic_score) + ((1-alpha) * keyword_score)
if document.age_days < 30:
    score *= 1.2  # 20% boost for recent docs

Best Practices

Start Simple: Begin with hybrid search (alpha=0.5), tune from there Measure Everything: Track metrics before and after changes Domain-Specific Tuning: Different domains need different configurations User Feedback: Collect thumbs up/down, iterate based on real usage Regular Review: Re-evaluate configuration quarterly as KB grows Document Changes: Keep changelog of configuration adjustments

Mastering search strategies unlocks the full potential of your knowledge bases, delivering accurate, relevant answers to every query.

Retrieval Settings

Complete guide to all retrieval configuration options

Introduction

Core Concepts

Platform

Infrastructure

​Understanding Retrieval

​The Retrieval Process

​Key Concepts