Knowledge Base
Manage documents and document sources to build a knowledge base for your AI chatbot. Upload files, crawl websites, and integrate with Jira.
Build your chatbot's knowledge base by uploading documents or configuring automated document sources. Documents are processed into searchable chunks with vector embeddings, enabling the chatbot to provide accurate, context-aware responses.
Documents
Documents are individual files or web pages that make up your knowledge base. Each document is split into chunks, enriched with metadata, and vectorized for semantic search.
Supported File Types
- Documents: PDF, DOCX, XLSX, PPTX
- Web Content: HTML, Markdown
- Data: CSV, Plain Text
Uploading Documents
Go to Documents
Go to Admin → Documents and click Upload Documents.
Select Files
Select one or more files and choose which Workspaces should have access.
Upload
Click Upload. Documents are automatically processed through the ingestion pipeline.
Uploaded documents go through the following processing stages:
- Loading — File content is extracted
- Chunking — Content is split into searchable segments
- Enrichment — Title, description, and locale are generated using AI
- Contextual Processing — Each chunk receives surrounding context for better retrieval
- Vectorization — Embeddings are generated for semantic search
Managing Documents
The documents index provides search and filtering:
- Search by title or source URL
- Filter by vectorization status (vectorized or pending)
- Sort by name, date, or file size
- Bulk delete multiple documents at once
Each document displays its processing status, including the number of chunks created and how many have been vectorized.
Workspace Access
Documents can be shared across workspaces. For manually uploaded documents (not from a source), admins can manage workspace access from the document detail page.
Document Sources
Document sources automate the ingestion of documents from external systems. Instead of uploading files manually, configure a source to crawl and import content automatically.
Web Source
Crawl websites to import their content as documents.
Go to Document Sources
Go to Admin → Document Sources and click Add Web Source.
Configure Settings
Configure the source settings (see table below).
Save and Process
Click Save, then click Process to start the initial crawl.
Web Source Settings
| Setting | Description |
|---|---|
| URL | Starting URL for the crawler (must be HTTP or HTTPS) |
| Limit | Maximum number of documents to fetch (1–10,000, default: 100) |
| Max Depth | How deep the crawler follows links (0–10, default: 10) |
| Include URL Globs | URL patterns to include, semicolon-separated (e.g., https://example.com/docs/*) |
| Exclude URL Globs | URL patterns to exclude, semicolon-separated. Exclude rules take precedence over include rules. |
| JavaScript Enabled | Enable headless Chrome rendering for JavaScript-heavy websites. Disabled by default for faster crawling. |
| Ignore Selectors | CSS selectors for elements to remove from pages (e.g., nav;footer;.sidebar) |
| Header | Custom HTTP headers, semicolon-separated (see Crawling Authenticated Websites) |
| Locale | Expected language of documents (ISO 639-1 code, e.g., en, lv) |
| Periodic Crawl | Enable daily automatic re-crawling |
| Translate | Automatically translate documents to the base language |
| Description | Optional description of the source |
Crawling Authenticated Websites
If the website you want to crawl is behind a login or requires authentication, you can use the Header setting to pass custom HTTP headers with each request. Headers are semicolon-separated.
Option 1: Basic Authentication
If the website uses HTTP Basic Authentication, add an Authorization header with a Base64-encoded username:password value:
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=To generate the Base64 value, encode username:password (e.g., echo -n 'username:password' | base64).
Option 2: Re-using a Session Cookie
If you can log into the website manually, you can copy the session cookie from your browser and pass it as a header:
Cookie: session_id=abc123def456Session cookies typically expire after some time. You will need to update the header when the cookie expires.
Option 3: API Key or Bearer Token
If the website supports token-based access, use the appropriate header:
Authorization: Bearer your_access_tokenor
X-Api-Key: your_api_keyThis option typically requires involvement from the website developers to configure robot/service account access that does not expire.
Jira Source
Import Jira issues as documents. Each issue is converted to a structured document containing its summary, description, comments, status, and other metadata.
Note: Jira integration requires the Jira feature to be enabled.
Go to Document Sources
Go to Admin → Document Sources and click Add Jira Source.
Configure Settings
Configure the source settings (see table below).
Save and Process
Click Save, then click Process to start the initial import.
Jira Source Settings
| Setting | Description |
|---|---|
| URL | Jira instance URL |
| Chunk Size | Text chunk size for splitting issue content (1–100,000) |
| Locale | Expected language of issues (ISO 639-1 code) |
| Periodic Crawl | Enable daily automatic re-import (only fetches issues updated since last run) |
| Translate | Automatically translate issues to the base language |
| Description | Optional description of the source |
Google Drive Source
Import files from Google Drive into your knowledge base. Supports Google Docs, Sheets, and Slides (automatically converted to DOCX, XLSX, and PPTX), along with all other supported file types.
Note: Google Drive integration requires a Google Drive connector with OAuth authentication.
Go to Document Sources
Go to Admin → Document Sources and click Add Google Drive Source.
Select a Connector
Select the Google Drive connector to use for authentication.
Pick Files
Click Browse Google Drive to open the file picker. Select individual files or entire folders. When a folder is selected, all files within it (including subfolders) are ingested.
Configure Settings
Configure the source settings (see table below).
Save and Process
Click Save, then click Process to start the initial import.
Google Drive Source Settings
| Setting | Description |
|---|---|
| Connector | Google Drive connector for OAuth authentication |
| Selected Files | Files and folders to import (selected via the file picker) |
| Chunk Size | Text chunk size for splitting document content (1–100,000) |
| Locale | Expected language of documents (ISO 639-1 code) |
| Periodic Crawl | Enable daily automatic re-import. Only changed files are reprocessed. |
| Translate | Automatically translate documents to the base language |
| Description | Optional description of the source |
Google native formats (Docs, Sheets, Slides) are automatically exported to Office formats for processing. Files that have not changed since the last crawl are skipped.
SharePoint Source
Import files from Microsoft SharePoint into your knowledge base. Browse SharePoint sites, drives, and folders to select files for ingestion.
Note: SharePoint integration requires a Microsoft SharePoint connector with OAuth authentication.
Go to Document Sources
Go to Admin → Document Sources and click Add SharePoint Source.
Select a Connector
Select the SharePoint connector to use for authentication.
Pick Files
Select a SharePoint site, then click Browse to open the file picker. Select individual files or entire folders. Folders are traversed recursively during ingestion.
Configure Settings
Configure the source settings (see table below).
Save and Process
Click Save, then click Process to start the initial import.
SharePoint Source Settings
| Setting | Description |
|---|---|
| Connector | SharePoint connector for OAuth authentication |
| Selected Files | Files and folders to import (selected via the file picker) |
| Chunk Size | Text chunk size for splitting document content (1–100,000) |
| Locale | Expected language of documents (ISO 639-1 code) |
| Periodic Crawl | Enable daily automatic re-import. Only changed files are reprocessed. |
| Translate | Automatically translate documents to the base language |
| Description | Optional description of the source |
Processing Sources
After creating a source, click Process to start the crawl. You can monitor progress from the Crawler Runs page, which shows:
- Status — Pending, running, completed, or failed
- Progress — Percentage of completion
- Duration — How long the crawl took
- Error Message — Details if the crawl failed
Periodic Crawling
When Periodic Crawl is enabled on a source, the system automatically re-crawls daily. For Jira sources, only issues updated since the last run are fetched, making subsequent crawls faster.
Workspace Access
Document sources can be assigned to specific workspaces. All documents imported from a source are automatically available to the same workspaces.
Translation
When the translation feature is enabled, documents can be automatically translated to a configured base language. This is useful when your knowledge base contains documents in multiple languages but you want consistent retrieval.
- Enable Translate on the document source
- Set the base translation language in Admin → Settings
- Translated content is used alongside original content for search
- Documents already in the target language are left unchanged