Last Updated: 3/9/2026
inferlet
Python SDK for writing Pie inferlets.
Setup (One-Time)
From the pie repository root:
cd sdk/python
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
uv pip install -e ../tools/bakery
# Verify
componentize-py --version
bakery --helpBuilding Python Inferlets
Activate the venv, then use Bakery to build inferlets:
# If needed, activate Python venv (e.g., sdk/python/.venv)
# source .venv/bin/activate
# Build inferlet
bakery build "$PWD/<input>" -o "$PWD/<output.wasm>"Example
# From pie root
bakery build "$PWD/sdk/examples/python/text-completion" \
-o "$PWD/text-completion.wasm"Run (Requires Pie Engine)
When a Pie engine is running, submit the built inferlet:
pie-cli submit text-completion.wasm -- --prompt "What is Python?"Writing an Inferlet
Create main.py:
from inferlet import Context, get_auto_model, get_arguments, send
args = get_arguments()
prompt = args.get("prompt", "Hello!")
model = get_auto_model()
with Context(model) as ctx:
ctx.system("You are a helpful assistant.")
ctx.user(prompt)
for token in ctx.generate(stream=True):
send(token, streaming=True)Beam Search Example
from inferlet import Context, get_auto_model, get_arguments, set_return
args = get_arguments()
prompt = args.get("prompt", "Hello!")
model = get_auto_model()
with Context(model) as ctx:
ctx.system("You are a helpful assistant.")
ctx.user(prompt)
# Generate with beam search for higher quality output
result = ctx.generate_with_beam(beam_size=4, max_tokens=256)
set_return(result)API Reference
Runtime
get_version()- Get Pie runtime versionget_instance_id()- Get unique instance IDget_arguments()- Get CLI arguments as dictset_return(value)- Set return value
Messaging
send(message, streaming=False)- Send outputreceive()- Receive inputbroadcast(topic, message)- Broadcast to topic
Model
get_auto_model()- Get default modelget_model(service_id)- Get specific modelget_all_models()- List available models
Context
system(content)- Add system messageuser(content)- Add user messageassistant(content)- Add assistant messagegenerate(...)- Generate text (supports streaming)generate_with_beam(beam_size, max_tokens, stop)- Generate with beam search
Limitations
Python inferlets run in WASM. These are not available:
- Network libraries (requests, httpx)
- Native extensions (numpy, pandas)
- Threading/multiprocessing
- File system (limited)
Check compatibility:
# With venv activated
python scripts/validate_imports.py <your-app>/