Overview
What Gets Indexed
| Content Type | Description |
|---|---|
| Text files | Full content extraction for text-based files |
| Code files | Source code files (.py, .js, .rs, .go, etc.) |
| Documents | Markdown, JSON, XML, and other structured text |
| Metadata | File name, path, size, timestamps for all files |
Supported File Formats
Full content indexing (text extracted and searchable):text/*(all text MIME types)application/jsonapplication/xmlapplication/javascriptapplication/x-sh(shell scripts)application/x-pythonapplication/x-ruby
- Binary files (images, PDFs, etc.)
- Files larger than 10MB
How It Works
- The connector scans configured directories recursively
- Text files are read and content is extracted for full-text search
- File metadata (name, path, size, timestamps) is indexed for all files
- A file watcher detects changes in real-time between full scans
The connector uses read-only access. Omni cannot modify or delete any files in your filesystem.
Prerequisites
Before setting up the Filesystem connector, ensure you have:- Omni deployment with the Filesystem connector service running
- Directory access to the files you want to index
- Docker volume mounting configured (for containerized deployments)
Setup
Step 1: Configure Docker Volume Mounts
For Docker Compose deployments, add volume mounts to the Filesystem connector service in yourdocker-compose.override.yml:
Step 2: Add Environment Variables
Add the following to your.env file:
Step 3: Connect in Omni
- Navigate to Settings → Integrations in Omni
- Find Filesystem and click Connect
- Configure the source settings:
| Setting | Required | Default | Description |
|---|---|---|---|
base_path | Yes | - | Root directory to scan (inside container, e.g., /data/documents) |
scan_interval_seconds | No | 300 | Full scan interval (5 minutes) |
file_extensions | No | - | Whitelist of extensions (e.g., ["txt", "md", "json"]) |
exclude_patterns | No | - | Patterns to exclude (e.g., [".git", "node_modules"]) |
max_file_size_bytes | No | 10485760 | Max file size for content extraction (10MB) |
Step 4: Start Initial Sync
- Click Save Configuration
- Click Sync Now to start the initial scan
- Monitor progress in the admin panel
Your Filesystem connector is now configured. The connector will scan and index files from the configured directory.
Example Configurations
Documentation Repository
Index a documentation folder with Markdown and text files:Code Repository
Index source code with common development exclusions:Shared File Server
Index a shared network drive with all text content:Managing the Integration
Viewing Sync Status
Navigate to Settings → Integrations → Filesystem to view:- Last sync time
- Number of indexed files
- Any scan errors
Sync Behavior
The Filesystem connector uses two synchronization mechanisms:| Mechanism | Frequency | Description |
|---|---|---|
| Full Scan | Every 5 min (default) | Walks entire directory tree |
| File Watcher | Real-time | Detects file changes between scans |
Adding More Directories
To index additional directories:- Add another volume mount in Docker Compose
- Create a new Filesystem source in Omni with the new
base_path
Removing the Integration
- Navigate to Settings → Integrations → Filesystem
- Click Remove
- Indexed documents will be deleted from search
Troubleshooting
Files not appearing in search
Files not appearing in search
Common causes:
- Volume not mounted correctly in Docker
- File extension not in whitelist
- File matches an exclude pattern
- File larger than max size (content-only issue)
- Verify the volume mount:
docker exec omni-filesystem-connector ls /data/documents - Check configuration for file_extensions and exclude_patterns
- Trigger a manual sync
Permission denied errors
Permission denied errors
The container doesn’t have read access to the files.Solution:
- Ensure files are readable by the container user
- Check that SELinux/AppArmor isn’t blocking access
- Try mounting with
:ro,z(for SELinux) or:ro,Z(for private mount)
Large files not indexed
Large files not indexed
Files larger than 10MB (default) are indexed for metadata only.Solution: Increase
max_file_size_bytes in the source configuration. Be aware this increases memory usage.Scan taking too long
Scan taking too long
Large directories with many files will take longer to scan.Factors affecting scan time:
- Number of files in the directory tree
- Average file size
- Disk I/O speed
- Use
file_extensionsto limit which files are processed - Use
exclude_patternsto skip unnecessary directories - Increase
scan_interval_secondsfor stable directories
Changes not detected
Changes not detected
The file watcher may not detect all changes.Solution: The full scan (every 5 minutes by default) will catch any missed changes. You can also trigger a manual sync.
Symlinks not followed
Symlinks not followed
By default, symlinks are not followed to prevent infinite loops.Solution: Currently, symlink following is disabled. Place actual files in the indexed directory.
Security Considerations
- Read-only mounts: Always use
:roflag for volume mounts - File permissions: The connector respects filesystem permissions
- No network access: The connector only reads local files
- Container isolation: Files are accessed through Docker volumes