Features
- Local execution - No API keys or internet required
- Multiple models - Support for Llama, Mistral, Gemma, and more
- Full model types - Text, embeddings, and objects
- Cost-free - No API charges
- Fallback option - Can serve as a local fallback when cloud providers are unavailable
Prerequisites
- Install Ollama
- Pull desired models:
Installation
Configuration
Environment Variables
Character Configuration
Supported Operations
Operation | Models | Notes |
---|---|---|
TEXT_GENERATION | llama3, mistral, gemma | Various sizes available |
EMBEDDING | nomic-embed-text, mxbai-embed-large | Local embeddings |
OBJECT_GENERATION | All text models | JSON generation |
Model Configuration
The plugin uses three model tiers:- SMALL_MODEL: Quick responses, lower resource usage
- MEDIUM_MODEL: Balanced performance
- LARGE_MODEL: Best quality, highest resource needs
- Llama models (3, 3.1, 3.2, 3.3)
- Mistral/Mixtral models
- Gemma models
- Phi models
- Any custom models you’ve created
nomic-embed-text
- Balanced performancemxbai-embed-large
- Higher qualityall-minilm
- Lightweight option
Performance Tips
- GPU Acceleration - Dramatically improves speed
- Model Quantization - Use Q4/Q5 versions for better performance
- Context Length - Limit context for faster responses
Hardware Requirements
Model Size | RAM Required | GPU Recommended |
---|---|---|
7B | 8GB | Optional |
13B | 16GB | Yes |
70B | 64GB+ | Required |
Common Issues
”Connection refused”
Ensure Ollama is running:Slow Performance
- Use smaller models or quantized versions
- Enable GPU acceleration
- Reduce context length