Last Updated: 3/9/2026

Supported Models

Pie supports a variety of open-source LLM models. This page lists all currently supported model families and their variants.

Model Families

Family	Models	Notes
Llama 3.x	Llama 3.2 (1B, 3B), Llama 3.1 (8B, 70B), Llama 3 (8B, 70B)	Full support including instruct variants
Qwen	Qwen 2, Qwen 2.5, Qwen 3	All sizes supported
Gemma	Gemma 2, Gemma 3	Google’s open models
Mistral	Ministral 3B, Mistral 7B	Including instruct variants
OLMo	OLMo 3	AI2’s open language model
GPT-OSS	Various	Community open-source GPT variants

Downloading Models

Use the Pie CLI to download models from HuggingFace:


# Download a model # Download a modelpie model download meta-llama/Llama-3.2-1B-Instructpie model download meta-llama/Llama-3.2-1B-Instruct  # List downloaded models # List downloaded models pie model list pie model list  # Remove a model # Remove a modelpie model remove meta-llama/Llama-3.2-1B-Instructpie model remove meta-llama/Llama-3.2-1B-Instruct

Configuring Models

Set your default model in ~/.pie/config.toml:


[[model]][[model]]hf_repo = "meta-llama/Llama-3.2-1B-Instruct"hf_repo = "meta-llama/Llama-3.2-1B-Instruct"device = ["cuda:0"]device = ["cuda:0"]

For multi-GPU setups:


[[model]][[model]]hf_repo = "meta-llama/Llama-3.1-70B-Instruct"hf_repo = "meta-llama/Llama-3.1-70B-Instruct"device = ["cuda:0", "cuda:1", "cuda:2", "cuda:3"]device = ["cuda:0", "cuda:1", "cuda:2", "cuda:3"]

Quantization

Pie supports quantized inference to reduce memory usage. Configure quantization in your ~/.pie/config.toml:


[[model]][[model]]hf_repo = "meta-llama/Llama-3.2-1B-Instruct"hf_repo = "meta-llama/Llama-3.2-1B-Instruct"device = ["cuda:0"]device = ["cuda:0"]activation_dtype = "bfloat16"  # or "float8"activation_dtype = "bfloat16"  # or "float8"weight_dtype = "float8"        # or "int8", "float4"weight_dtype = "float8"        # or "int8", "float4"

Format	Description
bfloat16	Default, full precision
float8	8-bit floating point, good balance of speed and quality
int8	8-bit integer quantization
float4	4-bit floating point, maximum memory savings

Adding Model Support

Want support for a new model? Open an issue on GitHub with:

Model name and HuggingFace link
Architecture details (if non-standard)
Your use case

We prioritize models based on community demand.

Model Families
Downloading Models
Configuring Models
Quantization
Adding Model Support

Supported Models

Model Families​

Downloading Models​

Configuring Models​

Quantization​

Adding Model Support​

Model Families

Downloading Models

Configuring Models

Quantization

Adding Model Support