Array
Use Array for repeated nested objects. An array groups child fields, gives
them a shared repeated shape, and builds a context encoder over the repeated
items.
{
"measurements": [
{"name": "mean_radius", "value": 17.99},
{"name": "mean_texture", "value": 10.38}
]
}
import json2vec as j2v
measurements = j2v.Array(
j2v.Category("name", max_vocab_size=32),
j2v.Number("value"),
name="measurements",
max_length=8,
overflow="tail",
n_layers=2,
)
Use a different type for scalar vectors (Vector) or unordered labels (Set).
If repeated rows are only a melted representation of a flat table, preprocessing
or flattening is usually more efficient.
Input Values
Array expects a list of child objects at the source path named by name,
unless the child fields use explicit query expressions. Each child tensorfield
is padded to the configured max_length, and overlong inputs are handled by the
array's overflow policy.
Note
Set max_length deliberately for real repeated data. The default is 1,
which is useful for singleton contexts but usually too small for meaningful
histories.
With max_length=2 and the default overflow="head", the first two items are
retained, extra items are truncated, and missing slots are padded:
[
{"measurements": [{"name": "a", "value": 1.0}, {"name": "b", "value": 2.0}, {"name": "c", "value": 3.0}]},
{"measurements": [{"name": "a", "value": 4.0}]}
]
The first record keeps a and b; c is truncated. The second record keeps
a and adds one padded slot. Array order is the order returned by the query.
Sort, window, or filter in a preprocessor when
order matters.
Use overflow="tail" when the query result is ordered oldest-to-newest and the
most recent records should be retained:
events = j2v.Array(
j2v.Category("event_type", max_vocab_size=128),
j2v.Number("amount"),
name="events",
max_length=128,
overflow="tail",
)
Use overflow="error" when any truncated input would indicate a bad schema,
bad query, or upstream data contract failure.
For a top-level schema like:
model = j2v.Model.from_schema(
j2v.Array(
j2v.Category("name", max_vocab_size=32),
j2v.Number("value"),
name="measurements",
max_length=8,
),
d_model=32,
n_layers=1,
n_heads=4,
)
json2vec infers child queries like [*].measurements[*].name and
[*].measurements[*].value.
Note
Field name controls the public schema name. query controls where values
are read. When omitted, json2vec infers the request query from the field and
parent array names. Array itself does not read a value with query; its
child leaves do.
Use explicit queries when the source keys do not match the public schema names:
items = j2v.Array(
j2v.Category("sku", query="[*].line_items[*].product_sku", max_vocab_size=2048),
j2v.Number("quantity", query="[*].line_items[*].qty"),
name="items",
max_length=32,
)
Explicit queries can also stack repeated semantic roles, such as origin and destination, into one shared field. See Field Stacking.
Examples
Common array fields include:
- Transaction line items, order items, cart contents, or invoice rows.
- Time-ordered events in a session, visit, trip, or workflow.
- Repeated measurements, sensor readings, observations, or lab results.
- Related parties or counterparties attached to one record.
Configuration
| Option | Default | Notes |
|---|---|---|
name |
required | Public schema name and inferred source key. |
fields |
[] |
Child arrays or tensorfields. Positional constructor arguments become fields. |
max_length |
1 |
Number of repeated slots retained per observation. Must be positive. |
overflow |
"head" |
How to handle more than max_length child records: "head" keeps the first records, "tail" keeps the last records, and "error" raises. |
attention |
"mha" |
Attention implementation for the array encoder. |
n_layers |
1 |
Number of encoder layers for this array. |
n_heads |
4 |
Attention heads for this array. Must be even. |
n_linear |
1 |
Number of feed-forward linear layers in this array. |
dropout |
None |
Optional dropout rate. |
embed |
False |
Includes this array node in Model.predict(...) outputs under embedding. See Learning Modes & Embeddings. |
description |
None |
Optional schema metadata. |
Nesting
Arrays can contain tensorfields or other arrays. Each nested array contributes another repeated dimension to the child fields' shape.
session = j2v.Array(
j2v.Array(
j2v.Number("amount"),
name="transactions",
max_length=32,
),
name="sessions",
max_length=4,
)
Deep nesting is useful when the source shape matters. If repeated records are only an artifact of upstream storage, flattening or preprocessing the data can be more efficient.
Target And Prediction Behavior
Array itself is not a supervised target. Its child tensorfields can be
targets, and the array context is used to encode those children and route
information through the model.
p_mask, p_prune, and target=True are configured on leaf tensorfields, not
on Array. To apply the same masking or pruning rate to every child field, use
model.update(...) with a selector that matches those leaves.
Configure embed=True on an array when you want Model.predict(...) to return
a representation for the grouped context under that array address. See
Learning Modes & Embeddings for root,
array, and leaf embedding patterns.