Last Updated: 3/9/2026
Supported Models
Pie supports a variety of open-source LLM models. This page lists all currently supported model families and their variants.
Model Families
| Family | Models | Notes |
|---|---|---|
| Llama 3.x | Llama 3.2 (1B, 3B), Llama 3.1 (8B, 70B), Llama 3 (8B, 70B) | Full support including instruct variants |
| Qwen | Qwen 2, Qwen 2.5, Qwen 3 | All sizes supported |
| Gemma | Gemma 2, Gemma 3 | Google’s open models |
| Mistral | Ministral 3B, Mistral 7B | Including instruct variants |
| OLMo | OLMo 3 | AI2’s open language model |
| GPT-OSS | Various | Community open-source GPT variants |
Downloading Models
Use the Pie CLI to download models from HuggingFace:
# Download a model # Download a modelpie model download meta-llama/Llama-3.2-1B-Instructpie model download meta-llama/Llama-3.2-1B-Instruct # List downloaded models # List downloaded models pie model list pie model list # Remove a model # Remove a modelpie model remove meta-llama/Llama-3.2-1B-Instructpie model remove meta-llama/Llama-3.2-1B-InstructConfiguring Models
Set your default model in ~/.pie/config.toml:
[[model]][[model]]hf_repo = "meta-llama/Llama-3.2-1B-Instruct"hf_repo = "meta-llama/Llama-3.2-1B-Instruct"device = ["cuda:0"]device = ["cuda:0"]For multi-GPU setups:
[[model]][[model]]hf_repo = "meta-llama/Llama-3.1-70B-Instruct"hf_repo = "meta-llama/Llama-3.1-70B-Instruct"device = ["cuda:0", "cuda:1", "cuda:2", "cuda:3"]device = ["cuda:0", "cuda:1", "cuda:2", "cuda:3"]Quantization
Pie supports quantized inference to reduce memory usage. Configure quantization in your ~/.pie/config.toml:
[[model]][[model]]hf_repo = "meta-llama/Llama-3.2-1B-Instruct"hf_repo = "meta-llama/Llama-3.2-1B-Instruct"device = ["cuda:0"]device = ["cuda:0"]activation_dtype = "bfloat16" # or "float8"activation_dtype = "bfloat16" # or "float8"weight_dtype = "float8" # or "int8", "float4"weight_dtype = "float8" # or "int8", "float4"| Format | Description |
|---|---|
| bfloat16 | Default, full precision |
| float8 | 8-bit floating point, good balance of speed and quality |
| int8 | 8-bit integer quantization |
| float4 | 4-bit floating point, maximum memory savings |
Adding Model Support
Want support for a new model? Open an issue on GitHub with:
- Model name and HuggingFace link
- Architecture details (if non-standard)
- Your use case
We prioritize models based on community demand.