Knowledge Bases - Noxus Documentation

Knowledge Bases (KBs) are collections of documents and data that provide context and information to AI agents. They serve as the foundation for enabling AI to access, understand, and utilize both structured and unstructured information.

A Knowledge Base acts as a centralized repository that organizes information in a way that’s optimized for AI consumption and retrieval.

Key Features

Centralized Storage: Organize documents and data in one location
Intelligent Processing: Content optimized for AI consumption
Efficient Retrieval: Quick access to relevant information
Flexible Sources: Support for multiple data formats and origins

Common Operations

Creating a Knowledge Base

You can create a new Knowledge Base within a group using the Add Knowledge Base endpoint. This returns the created KB object with its unique identifier.

Adding Documents

The platform provides several ways to add documents to a Knowledge Base:

Upload files directly using the Upload Train endpoint
Import from external sources like Google Drive, OneDrive, or SharePoint using the Generic Train endpoint
Add documents with custom metadata using the Add Knowledge Base Document endpoint

Adding a document will NOT automatically trigger the ingestion process.

These operations support:

Multiple file uploads
Custom path prefixes for organization
Automatic processing and indexing

Managing Documents

Document management is handled through various endpoints that allow you to:

Retrieve documents with specific status using the Get Knowledge Base Documents endpoint
Remove documents using the Delete Document endpoint
Update document metadata using the Update Document endpoint

Viewing and Updating

Knowledge Base details, including document status, can be retrieved through the Get Knowledge Base endpoint. You can also update KB properties using the update knowledge base endpoint.

Monitoring Processing

You can monitor processing through the Running Jobs endpoint that provides detailed information about ongoing and completed operations.

Supported Sources

Document Upload

Direct file uploads (PDFs, text files, images) with support for batch uploading

Google Drive

Import documents and files directly from your Google Drive

OneDrive

Access and import documents stored in Microsoft OneDrive

SharePoint

Access and import documents from SharePoint repositories

Websites

Web crawling with configurable depth and URL patterns

Coming Soon

Slack, Notion, and more

Even though there are more integrations available, the above are currently supported for Knowledge Bases.

Knowledge Base Types

Knowledge Bases come in two primary types:

Entity Knowledge Bases

Permanent Knowledge Bases that are managed through the Knowledge Base sections and can be referenced by multiple agents or workflows.

Temporary Knowledge Bases

Created within the Workflow Editor for specific workflow use cases, with the option to promote them to Entity KBs.

Entity Knowledge Bases

Entity Knowledge Bases are permanent repositories that:

Are created and managed through the Knowledge Bases section
Can be shared across multiple agents and workflows
Persist until explicitly deleted
Support all document sources and management operations
Provide centralized management of organizational knowledge

These are the standard Knowledge Bases that most users will interact with for long-term knowledge storage and retrieval.

Temporary Knowledge Bases

Temporary Knowledge Bases are workflow-specific repositories that:

Are created directly within the Workflow Editor
Are initially only available within the workflow where they were created
Can be used for processing intermediate data or testing document structures
Can be promoted to Entity Knowledge Bases when needed for broader use
Provide a flexible way to experiment with different knowledge structures

Temporary KBs are ideal for workflow-specific data that may not need to be part of your permanent knowledge repository. If you later decide the knowledge is valuable for broader use, you can promote it to an Entity KB without losing any data.

Status Tracking

Knowledge Base Status

Knowledge Bases have the following status values that indicate their overall state:

Status	Description
`created`	KB has been created but no documents have been added yet
`training`	KB has documents that are currently being processed
`trained`	All documents in the KB have been successfully processed
`error`	All documents in the KB have failed processing

The KB status is automatically updated based on the status of its documents.

Document Status

Individual documents within a Knowledge Base have their own status values:

Status	Description
`uploaded`	Document has been uploaded but processing hasn’t started
`training`	Document is currently being processed (chunked, embedded, etc.)
`trained`	Document has been successfully processed and is available for queries
`error`	Document processing failed

You can filter documents by status using the Get Documents by Status endpoint.

Knowledge Base Workflows

Knowledge Bases are processed through a series of automated workflows that handle document ingestion, processing, and indexing. Understanding these workflows can help you optimize your knowledge base usage.

Document Processing Flow

When you add documents to a Knowledge Base, they go through the following processing steps:

Document Upload

Documents are uploaded to secure storage and registered in the Knowledge Base with ‘uploaded’ status

Text Extraction

Text is extracted from various file formats (PDF, DOCX, images, etc.) using specialized parsers

Chunking

Documents are split into smaller, semantically meaningful chunks for better retrieval

Embedding Generation

Vector embeddings are created for each chunk to enable semantic search

Indexing

Chunks and their embeddings are stored in a vector database for efficient retrieval

Error Handling and Retries

If any step in the document processing flow fails:

The document is marked with ‘error’ status
Error details are captured in the processing run logs
You can view failed documents using the Get Documents by Status endpoint with status=‘error’
You can retry processing all failed documents using the Retry All Errors endpoint

Integration with Agents

Knowledge Bases can be integrated with AI agents to provide context for conversations:

Create a Knowledge Base and add documents
Wait for the documents to be fully processed (status=trained)
Create or update an agent with the Knowledge Base ID
The agent will now use the Knowledge Base to provide context-aware responses

Batch Processing

For large document sets, the platform supports batch processing:

Upload multiple documents in a single request
Monitor processing status through the Running Jobs endpoint
The system automatically manages concurrent processing to optimize performance

Best Practices

Organizing Documents

For optimal Knowledge Base management:

Use consistent naming conventions for documents
Organize documents in a logical folder structure using path prefixes
Group related documents together for better context retrieval
Consider document size and complexity when uploading (very large documents may need to be split)

Performance Optimization & Recommendations

To get the best performance from your Knowledge Bases:

Keep individual documents focused on specific topics
Use descriptive filenames that reflect document content
Remove unnecessary formatting, headers, footers, and boilerplate text
For websites, configure crawl depth appropriately to avoid irrelevant content
Regularly review and remove outdated or irrelevant documents
Group related documents together in a folder for better context retrieval
Do make different KBs for different topics, rather than having a KB with a lot of documents

Supported File Types

Knowledge Bases can process a wide variety of file formats to accommodate different content types and sources. Understanding which file types are supported helps ensure successful document ingestion.

Document Formats

Category	Supported Formats
Documents	PDF, DOCX, DOC, PPTX, PPT
Text Files	TXT, HTML, MD, JSON
Images	JPG/JPEG, PNG
Archives	ZIP
Google Workspace	Google Docs, Google Slides

Office Documents

Our platform supports all major document formats including PDF, Microsoft Word (DOCX, DOC), and PowerPoint (PPTX, PPT) files.

Web & Code Content

Process plain text (TXT), web pages (HTML), documentation (MD), and structured data (JSON) with full text extraction.

Visual Content

Extract text from images (JPG/JPEG, PNG) using advanced OCR technology to make visual content searchable.

Packaged Content

Upload ZIP archives containing multiple documents for batch processing, with automatic extraction and organization.

File Size Limits

File Type	Maximum Size	Notes
Documents	50 MB	Includes PDF, DOCX, DOC, etc.
Images	20 MB	Text will be extracted using OCR
Archives	100 MB	Contents will be extracted and processed individually

Very large files may take longer to process and could impact system performance. Consider splitting large documents into smaller, more focused files for optimal results.

Text Extraction

The platform uses specialized parsers to extract text from different file types:

PDF documents: Full text extraction with layout preservation
Microsoft Office: Structured content extraction with formatting awareness
Images: Optical Character Recognition (OCR) for text extraction
Archives: Automatic extraction and processing of contained files

Special Considerations

Password-protected files are not supported and will fail during processing
Scanned documents are supported through OCR but may have lower accuracy
Corrupted files will fail during processing with appropriate error messages
Embedded content in documents (like images in PDFs) is processed when possible

Noxus SDK

Extending the platform

Concepts

​Key Features

​Common Operations

​Creating a Knowledge Base

​Adding Documents

​Managing Documents

​Viewing and Updating

​Monitoring Processing

​Supported Sources

Document Upload

Google Drive

OneDrive

SharePoint

Websites

Coming Soon

​Knowledge Base Types

Entity Knowledge Bases

Temporary Knowledge Bases

​Entity Knowledge Bases

​Temporary Knowledge Bases

​Status Tracking

​Knowledge Base Status

​Document Status

​Knowledge Base Workflows

​Document Processing Flow

​Error Handling and Retries

​Integration with Agents

​Batch Processing

​Best Practices

​Organizing Documents

​Performance Optimization & Recommendations

​Supported File Types

​Document Formats

Office Documents

Web & Code Content

Visual Content

Packaged Content

​File Size Limits

​Text Extraction

​Special Considerations

Key Features

Common Operations

Creating a Knowledge Base

Adding Documents

Managing Documents

Viewing and Updating

Monitoring Processing

Supported Sources

Knowledge Base Types

Entity Knowledge Bases

Temporary Knowledge Bases

Status Tracking

Knowledge Base Status

Document Status

Knowledge Base Workflows

Document Processing Flow

Error Handling and Retries

Integration with Agents

Batch Processing

Best Practices

Organizing Documents

Performance Optimization & Recommendations

Supported File Types

Document Formats

File Size Limits

Text Extraction

Special Considerations