Knowledge Bases (KBs) are collections of documents and data that provide context and information to AI agents. They serve as the foundation for enabling AI to access, understand, and utilize both structured and unstructured information.

A Knowledge Base acts as a centralized repository that organizes information in a way that’s optimized for AI consumption and retrieval.

Key Features

  • Centralized Storage: Organize documents and data in one location
  • Intelligent Processing: Content optimized for AI consumption
  • Efficient Retrieval: Quick access to relevant information
  • Flexible Sources: Support for multiple data formats and origins

Common Operations

Creating a Knowledge Base

You can create a new Knowledge Base within a group using the Add Knowledge Base endpoint. This returns the created KB object with its unique identifier.

Adding Documents

The platform provides several ways to add documents to a Knowledge Base:

  • Upload files directly using the Upload Train endpoint
  • Import from external sources like Google Drive, OneDrive, or SharePoint using the Generic Train endpoint
  • Add documents with custom metadata using the Add Knowledge Base Document endpoint
    Adding a document will NOT automatically trigger the ingestion process.

These operations support:

  • Multiple file uploads
  • Custom path prefixes for organization
  • Automatic processing and indexing

Managing Documents

Document management is handled through various endpoints that allow you to:

Viewing and Updating

Knowledge Base details, including document status, can be retrieved through the Get Knowledge Base endpoint. You can also update KB properties using the update knowledge base endpoint.

Monitoring Processing

You can monitor processing through the Running Jobs endpoint that provides detailed information about ongoing and completed operations.

Supported Sources

Document Upload

Direct file uploads (PDFs, text files, images) with support for batch uploading

Google Drive

Import documents and files directly from your Google Drive

OneDrive

Access and import documents stored in Microsoft OneDrive

SharePoint

Access and import documents from SharePoint repositories

Websites

Web crawling with configurable depth and URL patterns

Coming Soon

Slack, Notion, and more

Even though there are more integrations available, the above are currently supported for Knowledge Bases.

Knowledge Base Types

Knowledge Bases come in two primary types:

Entity Knowledge Bases

Permanent Knowledge Bases that are managed through the Data Hub and can be referenced by multiple agents or workflows.

Temporary Knowledge Bases

Created within the Workflow Editor for specific workflow use cases, with the option to promote them to Entity KBs.

Entity Knowledge Bases

Entity Knowledge Bases are permanent repositories that:

  • Are created and managed through the Data Hub
  • Can be shared across multiple agents and workflows
  • Persist until explicitly deleted
  • Support all document sources and management operations
  • Provide centralized management of organizational knowledge

These are the standard Knowledge Bases that most users will interact with for long-term knowledge storage and retrieval.

Temporary Knowledge Bases

Temporary Knowledge Bases are workflow-specific repositories that:

  • Are created directly within the Workflow Editor
  • Are initially only available within the workflow where they were created
  • Can be used for processing intermediate data or testing document structures
  • Can be promoted to Entity Knowledge Bases when needed for broader use
  • Provide a flexible way to experiment with different knowledge structures

Temporary KBs are ideal for workflow-specific data that may not need to be part of your permanent knowledge repository. If you later decide the knowledge is valuable for broader use, you can promote it to an Entity KB without losing any data.

Status Tracking

Knowledge Base Status

Knowledge Bases have the following status values that indicate their overall state:

StatusDescription
createdKB has been created but no documents have been added yet
trainingKB has documents that are currently being processed
trainedAll documents in the KB have been successfully processed
errorAll documents in the KB have failed processing

The KB status is automatically updated based on the status of its documents.

Document Status

Individual documents within a Knowledge Base have their own status values:

StatusDescription
uploadedDocument has been uploaded but processing hasn’t started
trainingDocument is currently being processed (chunked, embedded, etc.)
trainedDocument has been successfully processed and is available for queries
errorDocument processing failed

You can filter documents by status using the Get Documents by Status endpoint.

Knowledge Base Workflows

Knowledge Bases are processed through a series of automated workflows that handle document ingestion, processing, and indexing. Understanding these workflows can help you optimize your knowledge base usage.

Document Processing Flow

When you add documents to a Knowledge Base, they go through the following processing steps:

1

Document Upload

Documents are uploaded to secure storage and registered in the Knowledge Base with ‘uploaded’ status

2

Text Extraction

Text is extracted from various file formats (PDF, DOCX, images, etc.) using specialized parsers

3

Chunking

Documents are split into smaller, semantically meaningful chunks for better retrieval

4

Embedding Generation

Vector embeddings are created for each chunk to enable semantic search

5

Indexing

Chunks and their embeddings are stored in a vector database for efficient retrieval

Error Handling and Retries

If any step in the document processing flow fails:

  1. The document is marked with ‘error’ status
  2. Error details are captured in the processing run logs
  3. You can view failed documents using the Get Documents by Status endpoint with status=‘error’
  4. You can retry processing all failed documents using the Retry All Errors endpoint

Integration with Agents

Knowledge Bases can be integrated with AI agents to provide context for conversations:

  1. Create a Knowledge Base and add documents
  2. Wait for the documents to be fully processed (status=trained)
  3. Create or update an agent with the Knowledge Base ID
  4. The agent will now use the Knowledge Base to provide context-aware responses

Batch Processing

For large document sets, the platform supports batch processing:

  1. Upload multiple documents in a single request
  2. Monitor processing status through the Running Jobs endpoint
  3. The system automatically manages concurrent processing to optimize performance

Best Practices

Organizing Documents

For optimal Knowledge Base management:

  • Use consistent naming conventions for documents
  • Organize documents in a logical folder structure using path prefixes
  • Group related documents together for better context retrieval
  • Consider document size and complexity when uploading (very large documents may need to be split)

Performance Optimization & Recommendations

To get the best performance from your Knowledge Bases:

  • Keep individual documents focused on specific topics
  • Use descriptive filenames that reflect document content
  • Remove unnecessary formatting, headers, footers, and boilerplate text
  • For websites, configure crawl depth appropriately to avoid irrelevant content
  • Regularly review and remove outdated or irrelevant documents
  • Group related documents together in a folder for better context retrieval
  • Do make different KBs for different topics, rather than having a KB with a lot of documents

Supported File Types

Knowledge Bases can process a wide variety of file formats to accommodate different content types and sources. Understanding which file types are supported helps ensure successful document ingestion.

Document Formats

CategorySupported Formats
DocumentsPDF, DOCX, DOC, PPTX, PPT
Text FilesTXT, HTML, MD, JSON
ImagesJPG/JPEG, PNG
ArchivesZIP
Google WorkspaceGoogle Docs, Google Slides

Office Documents

Our platform supports all major document formats including PDF, Microsoft Word (DOCX, DOC), and PowerPoint (PPTX, PPT) files.

Web & Code Content

Process plain text (TXT), web pages (HTML), documentation (MD), and structured data (JSON) with full text extraction.

Visual Content

Extract text from images (JPG/JPEG, PNG) using advanced OCR technology to make visual content searchable.

Packaged Content

Upload ZIP archives containing multiple documents for batch processing, with automatic extraction and organization.

File Size Limits

File TypeMaximum SizeNotes
Documents50 MBIncludes PDF, DOCX, DOC, etc.
Images20 MBText will be extracted using OCR
Archives100 MBContents will be extracted and processed individually

Very large files may take longer to process and could impact system performance. Consider splitting large documents into smaller, more focused files for optimal results.

Text Extraction

The platform uses specialized parsers to extract text from different file types:

  • PDF documents: Full text extraction with layout preservation
  • Microsoft Office: Structured content extraction with formatting awareness
  • Images: Optical Character Recognition (OCR) for text extraction
  • Archives: Automatic extraction and processing of contained files

Special Considerations

  • Password-protected files are not supported and will fail during processing
  • Scanned documents are supported through OCR but may have lower accuracy
  • Corrupted files will fail during processing with appropriate error messages
  • Embedded content in documents (like images in PDFs) is processed when possible