Model Mutations¶
A core tenet of json2vec is the ability to dynamically and expressively modify model architecture at any point in time.
Model developers may update, delete, add, or reset fields before or after any loop (training, validation, testing, or inference).
from rich.pretty import pprint
import json2vec as j2v
Let's quickly create a basic model. It includes a flower species, and history metrics of flower metrics and the flower age during the time of measurement.
model = j2v.Model.from_schema(
j2v.Category("species", target=True, max_vocab_size=10),
j2v.Array( # include an list of flower metrics over time
j2v.Number("age"),
j2v.Number("sepal_length"),
j2v.Number("petal_length"),
max_length=10,
name="samples",
),
d_model=16,
n_layers=1,
n_heads=4,
embed=True,
)
2026-06-09 20:47:29.548 | INFO | json2vec.architecture.root:__init__:167 - initialized Model module
Before mutating any component of a model, we need to understand predicates.
All model mutations are defined with predicates. Predicates are composable and can be combined with boolean logic (a & b, ~b, a | b) to enable complex querying and mutations, even with dozens or hundreds of fields. Predicates can be defined not only on built in attributes per request, but also based on the parents or children of nodes.
Here, the example selects all numbers, all categories except the target, the target itself, and finally the root record node.
print("find all fields of type `number`")
pprint(model.select(j2v.where("type") == "number"))
print("find all fields of type `category`")
pprint(model.select(j2v.where("type") == "category"))
print("find all fields of name 'species'`")
pprint(model.select(j2v.where("name") == "species"))
print("find all fields where `target=True`")
pprint(model.select(j2v.where("target")))
find all fields of type `number`
[ age [number] active query=[*].samples[*].age pooling=query weight=1 p_mask=0 p_prune=0 n_heads=4 n_linear=1 jitter=0 n_bands=8 offset=4 objective=mae, sepal_length [number] active query=[*].samples[*].sepal_length pooling=query weight=1 p_mask=0 p_prune=0 n_heads=4 n_linear=1 jitter=0 n_bands=8 offset=4 objective=mae, petal_length [number] active query=[*].samples[*].petal_length pooling=query weight=1 p_mask=0 p_prune=0 n_heads=4 n_linear=1 jitter=0 n_bands=8 offset=4 objective=mae ]
find all fields of type `category`
[ species [category] active target query=[*].species pooling=query weight=1 p_mask=0 p_prune=1 n_heads=4 n_linear=1 max_vocab_size=10 p_unavailable=0.01 topk=[] ]
find all fields of name 'species'`
[ species [category] active target query=[*].species pooling=query weight=1 p_mask=0 p_prune=1 n_heads=4 n_linear=1 max_vocab_size=10 p_unavailable=0.01 topk=[] ]
find all fields where `target=True`
[ species [category] active target query=[*].species pooling=query weight=1 p_mask=0 p_prune=1 n_heads=4 n_linear=1 max_vocab_size=10 p_unavailable=0.01 topk=[] ]
Updates¶
model.update(*predicates, **updates) applies explicit schema mutations to one or more nodes. Nodes can be queried with complex predicates.
Common update APIs and field changes:
| Mutation | Purpose |
|---|---|
target=True |
Withhold a field and decode it as a supervised target. |
p_mask=... |
Randomly hide values for self-supervised reconstruction. |
p_prune=... |
Remove whole field instances from input. |
embed=True |
Return embeddings from a selected node. |
target=False |
Clear target behavior so the field is encoded as an input when present. |
active=False |
Reversibly remove a leaf field from encoding, forward passes, losses, and predictions. |
Users may make such updates temporary by using a context manager with model.override(...).
model.update(j2v.where("type") == "number", p_mask=0.15)
model.update((j2v.where("type") == "category") & j2v.where("target"), p_mask=0.05)
model.update(j2v.where("target"), target=False)
model
2026-06-09 20:47:29.587 | INFO | json2vec.architecture.mutations:_log_attribute_changes:397 - mutated record/samples/age: p_mask 0.0 -> 0.15
2026-06-09 20:47:29.587 | INFO | json2vec.architecture.mutations:_log_attribute_changes:397 - mutated record/samples/sepal_length: p_mask 0.0 -> 0.15
2026-06-09 20:47:29.588 | INFO | json2vec.architecture.mutations:_log_attribute_changes:397 - mutated record/samples/petal_length: p_mask 0.0 -> 0.15
2026-06-09 20:47:29.601 | INFO | json2vec.architecture.mutations:_log_attribute_changes:397 - mutated record/species: p_mask 0.0 -> 0.05
2026-06-09 20:47:29.614 | INFO | json2vec.architecture.mutations:_log_attribute_changes:397 - mutated record/species: target True -> False
Model [model] batch_size=1 d_model=16 parameters=29,301 arrays=2 fields=4 targets=0 embeds=1
`-- record [root] embed attention=mha n_layers=1 n_heads=4 n_linear=1
|-- species [category] active query=[*].species
| pooling=query weight=1 p_mask=0.05 p_prune=0 n_heads=4 n_linear=1
| max_vocab_size=10 p_unavailable=0.01 topk=[]
`-- samples [array] max_length=10 overflow=head attention=mha n_layers=1 n_heads=4 n_linear=1
|-- age [number] active query=[*].samples[*].age
| pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
|-- sepal_length [number] active query=[*].samples[*].sepal_length
| pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
`-- petal_length [number] active query=[*].samples[*].petal_length
pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
jitter=0 n_bands=8 offset=4 objective=mae
model.override(*predicates, **updates) is for temporary mutations. The schema and runtime modules are restored when the context exits.
This is useful for reversible updates, such as temporarily deactivating a feature to determine its relative value-add to a model.
With json2vec, you do not need to train multiple models for optimal feature selection. You may delete or "deactivate" them dynamically with a one-liner.
with model.override(j2v.where("address") == "record/species", active=False):
print("with overrides")
pprint(model.select(j2v.where("address") == "record/species"))
print("without overrides")
pprint(model.select(j2v.where("address") == "record/species"))
2026-06-09 20:47:29.639 | INFO | json2vec.architecture.mutations:_log_attribute_changes:397 - mutated record/species: active True -> False
with overrides
[ species [category] inactive query=[*].species pooling=query weight=1 p_mask=0.05 p_prune=0 n_heads=4 n_linear=1 max_vocab_size=10 p_unavailable=0.01 topk=[] ]
2026-06-09 20:47:29.654 | INFO | json2vec.architecture.mutations:_log_attribute_changes:397 - restored record/species: active False -> True
without overrides
[ species [category] active query=[*].species pooling=query weight=1 p_mask=0.05 p_prune=0 n_heads=4 n_linear=1 max_vocab_size=10 p_unavailable=0.01 topk=[] ]
Extending Model Tree¶
json2vec is not limited to merely updating the model architecture. Users may also extend the tree by adding nodes to it.
You may add a node to the tree by its target destination and the new node definition.
Use model.extend(*predicates, *requests) to add fields after the model already exists.
This is a premiere feature of json2vec because it enables continuous schema evolution with a one-liner. If new features become available, you may easily mutate the model's architecture to integrate them.
Node extension is uniquely powerful in the case of maintaining an organizational foundation model (customer behavior at a bank) while also being able to add use-case specific data targets as a new node in the model tree.
# add a new number "sepal_width" to the array of samples
model.extend(j2v.where("name") == "samples", j2v.Number("sepal_width", p_mask=0.15))
model
2026-06-09 20:47:29.676 | INFO | json2vec.architecture.mutations:_log_node_mutation:418 - extended schema node record/samples/sepal_width under record/samples
Model [model] batch_size=1 d_model=16 parameters=33,388 arrays=2 fields=5 targets=0 embeds=1
`-- record [root] embed attention=mha n_layers=1 n_heads=4 n_linear=1
|-- species [category] active query=[*].species
| pooling=query weight=1 p_mask=0.05 p_prune=0 n_heads=4 n_linear=1
| max_vocab_size=10 p_unavailable=0.01 topk=[]
`-- samples [array] max_length=10 overflow=head attention=mha n_layers=1 n_heads=4 n_linear=1
|-- age [number] active query=[*].samples[*].age
| pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
|-- sepal_length [number] active query=[*].samples[*].sepal_length
| pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
|-- petal_length [number] active query=[*].samples[*].petal_length
| pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
`-- sepal_width [number] active query=[*].samples[*].sepal_width
pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
jitter=0 n_bands=8 offset=4 objective=mae
# add a new categorical variable "caretaker" to the root of the treee
model.extend(j2v.where("name") == "record", j2v.Category("caretaker", max_vocab_size=10))
model
2026-06-09 20:47:29.704 | INFO | json2vec.architecture.mutations:_log_node_mutation:418 - extended schema node record/caretaker under record
Model [model] batch_size=1 d_model=16 parameters=37,212 arrays=2 fields=6 targets=0 embeds=1
`-- record [root] embed attention=mha n_layers=1 n_heads=4 n_linear=1
|-- species [category] active query=[*].species
| pooling=query weight=1 p_mask=0.05 p_prune=0 n_heads=4 n_linear=1
| max_vocab_size=10 p_unavailable=0.01 topk=[]
|-- samples [array] max_length=10 overflow=head attention=mha n_layers=1 n_heads=4 n_linear=1
| |-- age [number] active query=[*].samples[*].age
| | pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| | jitter=0 n_bands=8 offset=4 objective=mae
| |-- sepal_length [number] active query=[*].samples[*].sepal_length
| | pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| | jitter=0 n_bands=8 offset=4 objective=mae
| |-- petal_length [number] active query=[*].samples[*].petal_length
| | pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| | jitter=0 n_bands=8 offset=4 objective=mae
| `-- sepal_width [number] active query=[*].samples[*].sepal_width
| pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
`-- caretaker [category] active query=[*].caretaker
pooling=query weight=1 p_mask=0 p_prune=0 n_heads=4 n_linear=1
max_vocab_size=10 p_unavailable=0.01 topk=[]
Deleting From Node Tree¶
Use model.delete(*predicates) when a node should leave the schema permanently. This is different from active=False, which keeps the node around for later reactivation.
model.delete is an irreversible operation. However, in some cases, it may be preferable to active=False because it removes parameters from the model, which will save memory and storage costs.
model.delete(j2v.where("name") == "petal_length")
model
2026-06-09 20:47:29.732 | INFO | json2vec.architecture.mutations:_log_node_mutation:418 - deleted schema node record/samples/petal_length descendants=0
Model [model] batch_size=1 d_model=16 parameters=33,125 arrays=2 fields=5 targets=0 embeds=1
`-- record [root] embed attention=mha n_layers=1 n_heads=4 n_linear=1
|-- species [category] active query=[*].species
| pooling=query weight=1 p_mask=0.05 p_prune=0 n_heads=4 n_linear=1
| max_vocab_size=10 p_unavailable=0.01 topk=[]
|-- samples [array] max_length=10 overflow=head attention=mha n_layers=1 n_heads=4 n_linear=1
| |-- age [number] active query=[*].samples[*].age
| | pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| | jitter=0 n_bands=8 offset=4 objective=mae
| |-- sepal_length [number] active query=[*].samples[*].sepal_length
| | pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| | jitter=0 n_bands=8 offset=4 objective=mae
| `-- sepal_width [number] active query=[*].samples[*].sepal_width
| pooling=query weight=1 p_mask=0.15 p_prune=0 n_heads=4 n_linear=1
| jitter=0 n_bands=8 offset=4 objective=mae
`-- caretaker [category] active query=[*].caretaker
pooling=query weight=1 p_mask=0 p_prune=0 n_heads=4 n_linear=1
max_vocab_size=10 p_unavailable=0.01 topk=[]
Resetting Nodes From Tree¶
Use model.reset(*predicates) to discard learned parameters and runtime state for selected nodes while keeping their hyperparameters intact.
This may be useful while refitting models after extreme drift is observed for specific features, such that you need to remove old vocabulary or reset the standardization hyperparameters.
Reset keeps schema configuration in place while rebuilding selected runtime state.
In the future, json2vec may enable targeted arguments for the model.reset(...), such that you may target a parameter group (vocabulary, counters, embeddings tables, etc.)
model.reset(j2v.where("type") == "number")
2026-06-09 20:47:29.748 | INFO | json2vec.architecture.mutations:_log_node_mutation:418 - reset runtime node record/samples/age descendants=False
2026-06-09 20:47:29.749 | INFO | json2vec.architecture.mutations:_log_node_mutation:418 - reset runtime node record/samples/sepal_length descendants=False
2026-06-09 20:47:29.749 | INFO | json2vec.architecture.mutations:_log_node_mutation:418 - reset runtime node record/samples/sepal_width descendants=False