Knowledge Base

Manage documents and document sources to build a knowledge base for your AI chatbot. Upload files, crawl websites, and integrate with Jira.

Build your chatbot's knowledge base by uploading documents or configuring automated document sources. Documents are processed into searchable chunks with vector embeddings, enabling the chatbot to provide accurate, context-aware responses.

Documents

Documents are individual files or web pages that make up your knowledge base. Each document is split into chunks, enriched with metadata, and vectorized for semantic search.

Supported File Types

Documents: PDF, DOCX, XLSX, PPTX
Web Content: HTML, Markdown
Data: CSV, Plain Text

Uploading Documents

Go to Documents

Go to Admin → Documents and click Upload Documents.

Select Files

Select one or more files and choose which Workspaces should have access.

Upload

Click Upload. Documents are automatically processed through the ingestion pipeline.

Uploaded documents go through the following processing stages:

Loading — File content is extracted
Chunking — Content is split into searchable segments
Enrichment — Title, description, and locale are generated using AI
Contextual Processing — Each chunk receives surrounding context for better retrieval
Vectorization — Embeddings are generated for semantic search

Managing Documents

The documents index provides search and filtering:

Search by title or source URL
Filter by vectorization status (vectorized or pending)
Sort by name, date, or file size
Bulk delete multiple documents at once

Each document displays its processing status, including the number of chunks created and how many have been vectorized.

Workspace Access

Documents can be shared across workspaces. For manually uploaded documents (not from a source), admins can manage workspace access from the document detail page.

Document Sources

Document sources automate the ingestion of documents from external systems. Instead of uploading files manually, configure a source to crawl and import content automatically.

Web Source

Crawl websites to import their content as documents.

Go to Document Sources

Go to Admin → Document Sources and click Add Web Source.

Configure Settings

Configure the source settings (see table below).

Save and Process

Click Save, then click Process to start the initial crawl.

Web Source Settings

Setting	Description
URL	Starting URL for the crawler (must be HTTP or HTTPS)
Limit	Maximum number of documents to fetch (1–10,000, default: 100)
Max Depth	How deep the crawler follows links (0–10, default: 10)
Include URL Globs	URL patterns to include, semicolon-separated (e.g., `https://example.com/docs/*`)
Exclude URL Globs	URL patterns to exclude, semicolon-separated. Exclude rules take precedence over include rules.
JavaScript Enabled	Enable headless Chrome rendering for JavaScript-heavy websites. Disabled by default for faster crawling.
Ignore Selectors	CSS selectors for elements to remove from pages (e.g., `nav;footer;.sidebar`)
Header	Custom HTTP headers, semicolon-separated (see Crawling Authenticated Websites)
Locale	Expected language of documents (ISO 639-1 code, e.g., `en`, `lv`)
Periodic Crawl	Enable daily automatic re-crawling
Translate	Automatically translate documents to the base language
Description	Optional description of the source

Crawling Authenticated Websites

If the website you want to crawl is behind a login or requires authentication, you can use the Header setting to pass custom HTTP headers with each request. Headers are semicolon-separated.

Option 1: Basic Authentication

If the website uses HTTP Basic Authentication, add an Authorization header with a Base64-encoded username:password value:

Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

To generate the Base64 value, encode username:password (e.g., echo -n 'username:password' | base64).

Option 2: Re-using a Session Cookie

If you can log into the website manually, you can copy the session cookie from your browser and pass it as a header:

Cookie: session_id=abc123def456

Session cookies typically expire after some time. You will need to update the header when the cookie expires.

Option 3: API Key or Bearer Token

If the website supports token-based access, use the appropriate header:

Authorization: Bearer your_access_token

X-Api-Key: your_api_key

This option typically requires involvement from the website developers to configure robot/service account access that does not expire.

Jira Source

Import Jira issues as documents. Each issue is converted to a structured document containing its summary, description, comments, status, and other metadata.

Note: Jira integration requires the Jira feature to be enabled.

Go to Document Sources

Go to Admin → Document Sources and click Add Jira Source.

Configure Settings

Configure the source settings (see table below).

Save and Process

Click Save, then click Process to start the initial import.

Jira Source Settings

Setting	Description
URL	Jira instance URL
Chunk Size	Text chunk size for splitting issue content (1–100,000)
Locale	Expected language of issues (ISO 639-1 code)
Periodic Crawl	Enable daily automatic re-import (only fetches issues updated since last run)
Translate	Automatically translate issues to the base language
Description	Optional description of the source

Google Drive Source

Import files from Google Drive into your knowledge base. Supports Google Docs, Sheets, and Slides (automatically converted to DOCX, XLSX, and PPTX), along with all other supported file types.

Note: Google Drive integration requires a Google Drive connector with OAuth authentication.

Go to Document Sources

Go to Admin → Document Sources and click Add Google Drive Source.

Select a Connector

Select the Google Drive connector to use for authentication.

Pick Files

Click Browse Google Drive to open the file picker. Select individual files or entire folders. When a folder is selected, all files within it (including subfolders) are ingested.

Configure Settings

Configure the source settings (see table below).

Save and Process

Click Save, then click Process to start the initial import.

Google Drive Source Settings

Setting	Description
Connector	Google Drive connector for OAuth authentication
Selected Files	Files and folders to import (selected via the file picker)
Chunk Size	Text chunk size for splitting document content (1–100,000)
Locale	Expected language of documents (ISO 639-1 code)
Periodic Crawl	Enable daily automatic re-import. Only changed files are reprocessed.
Translate	Automatically translate documents to the base language
Description	Optional description of the source

Google native formats (Docs, Sheets, Slides) are automatically exported to Office formats for processing. Files that have not changed since the last crawl are skipped.

SharePoint Source

Import files from Microsoft SharePoint into your knowledge base. Browse SharePoint sites, drives, and folders to select files for ingestion.

Note: SharePoint integration requires a Microsoft SharePoint connector with OAuth authentication.

Go to Document Sources

Go to Admin → Document Sources and click Add SharePoint Source.

Select a Connector

Select the SharePoint connector to use for authentication.

Pick Files

Select a SharePoint site, then click Browse to open the file picker. Select individual files or entire folders. Folders are traversed recursively during ingestion.

Configure Settings

Configure the source settings (see table below).

Save and Process

Click Save, then click Process to start the initial import.

SharePoint Source Settings

Setting	Description
Connector	SharePoint connector for OAuth authentication
Selected Files	Files and folders to import (selected via the file picker)
Chunk Size	Text chunk size for splitting document content (1–100,000)
Locale	Expected language of documents (ISO 639-1 code)
Periodic Crawl	Enable daily automatic re-import. Only changed files are reprocessed.
Translate	Automatically translate documents to the base language
Description	Optional description of the source

Processing Sources

After creating a source, click Process to start the crawl. You can monitor progress from the Crawler Runs page, which shows:

Status — Pending, running, completed, or failed
Progress — Percentage of completion
Duration — How long the crawl took
Error Message — Details if the crawl failed

Enable Translate on the document source
Set the base translation language in Admin → Settings
Translated content is used alongside original content for search
Documents already in the target language are left unchanged

Knowledge Base

Documents

Supported File Types

Uploading Documents

Go to Documents

Select Files

Upload

Managing Documents

Workspace Access

Document Sources

Web Source

Go to Document Sources

Configure Settings

Save and Process

Web Source Settings

Crawling Authenticated Websites

Jira Source

Go to Document Sources

Configure Settings

Save and Process

Jira Source Settings

Google Drive Source

Go to Document Sources

Select a Connector

Pick Files

Configure Settings

Save and Process

Google Drive Source Settings

SharePoint Source

Go to Document Sources

Select a Connector

Pick Files

Configure Settings

Save and Process

SharePoint Source Settings

Processing Sources

Periodic Crawling

Workspace Access

Translation

On this page