Serving¶
This notebook configures a deployment for the Breast Cancer schema without writing any repository checkpoints. It demonstrates serving mechanics, request validation, and target handling; it does not produce a useful trained classifier.
Serving starts with the same model schema used for training. Install the serving extra for FastAPI-backed deployment paths. Pydantic request models define the raw payload accepted by the deployment wrapper. The /predict endpoint accepts one JSON object or a JSON array of objects for a batched API call.
uv sync --extra serving
import os
import polars as pl
import pydantic
import torch
from loguru import logger
import json2vec as j2v
logger.remove()
The serving payload uses the same nested measurements shape as the nested supervised training tutorial, but omits the answer field. Keeping train and serve shapes aligned is the main reason schemas, queries, and preprocessors live together.
records = pl.read_ndjson("docs/data/breast-cancer.jsonl").head(32)
class Request(pydantic.BaseModel):
measurements: list[dict]
The model includes a diagnosis target so it can be trained or inspected like the nested supervised training example. The request model validates the raw API payload before json2vec encodes it.
model = j2v.Model.from_schema(
j2v.Array(
j2v.Category("name", max_vocab_size=16),
j2v.Number("value"),
name="measurements",
max_length=8,
),
j2v.Category("diagnosis", target=True, max_vocab_size=2),
d_model=16,
n_layers=1,
n_heads=4,
batch_size=8,
embed=True,
name="measurements",
optimizer=lambda module: torch.optim.AdamW(module.parameters(), lr=1e-2),
)
Inspect the model before serving. The display should show diagnosis as a target, which is the field the endpoint will predict from the request.
model
Model [model] batch_size=8 d_model=16 parameters=24,965 arrays=2 fields=3 targets=1 embeds=1
`-- measurements [root] embed attention=mha n_layers=1 n_heads=4 n_linear=1
|-- measurements [array] max_length=8 overflow=head attention=mha n_layers=1 n_heads=4 n_linear=1
| |-- name [category] active query=[*].measurements[*].name
| | pooling=query weight=1 p_mask=0 p_prune=0 n_heads=4 n_linear=1
| | max_vocab_size=16 p_unavailable=0.01 topk=[]
| `-- value [number] active query=[*].measurements[*].value
| pooling=query weight=1 p_mask=0 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
`-- diagnosis [category] active target query=[*].diagnosis
pooling=query weight=1 p_mask=0 p_prune=1 n_heads=4 n_linear=1
max_vocab_size=2 p_unavailable=0.01 topk=[]
For serving, keep diagnosis as target=True. In prediction mode, target fields are supplied to the model as empty masked fields and decoded as outputs, so incoming requests do not need to include the answer you want the model to predict.
deployment = j2v.Deployment(model=model).forge(request=Request)
The final cell is guarded by an environment variable so documentation builds configure the server object without starting a long-running process.
if os.environ.get("JSON2VEC_SERVE") == "1":
deployment.serve()