Skip to content

Array

Use Array for repeated nested objects. An array groups child fields, gives them a shared repeated shape, and builds a context encoder over the repeated items.

{
  "measurements": [
    {"name": "mean_radius", "value": 17.99},
    {"name": "mean_texture", "value": 10.38}
  ]
}
import json2vec as j2v

measurements = j2v.Array(
    j2v.Category("name", max_vocab_size=32),
    j2v.Number("value"),
    name="measurements",
    max_length=8,
    overflow="tail",
    n_layers=2,
)

Use a different type for scalar vectors (Vector) or unordered labels (Set). If repeated rows are only a melted representation of a flat table, preprocessing or flattening is usually more efficient.

Input Values

Array expects a list of child objects at the source path named by name, unless the child fields use explicit query expressions. Each child tensorfield is padded to the configured max_length, and overlong inputs are handled by the array's overflow policy.

Note

Set max_length deliberately for real repeated data. The default is 1, which is useful for singleton contexts but usually too small for meaningful histories.

With max_length=2 and the default overflow="head", the first two items are retained, extra items are truncated, and missing slots are padded:

[
  {"measurements": [{"name": "a", "value": 1.0}, {"name": "b", "value": 2.0}, {"name": "c", "value": 3.0}]},
  {"measurements": [{"name": "a", "value": 4.0}]}
]

The first record keeps a and b; c is truncated. The second record keeps a and adds one padded slot. Array order is the order returned by the query. Sort, window, or filter in a preprocessor when order matters.

Use overflow="tail" when the query result is ordered oldest-to-newest and the most recent records should be retained:

events = j2v.Array(
    j2v.Category("event_type", max_vocab_size=128),
    j2v.Number("amount"),
    name="events",
    max_length=128,
    overflow="tail",
)

Use overflow="error" when any truncated input would indicate a bad schema, bad query, or upstream data contract failure.

For a top-level schema like:

model = j2v.Model.from_schema(
    j2v.Array(
        j2v.Category("name", max_vocab_size=32),
        j2v.Number("value"),
        name="measurements",
        max_length=8,
    ),
    d_model=32,
    n_layers=1,
    n_heads=4,
)

json2vec infers child queries like [*].measurements[*].name and [*].measurements[*].value.

Note

Field name controls the public schema name. query controls where values are read. When omitted, json2vec infers the request query from the field and parent array names. Array itself does not read a value with query; its child leaves do.

Use explicit queries when the source keys do not match the public schema names:

items = j2v.Array(
    j2v.Category("sku", query="[*].line_items[*].product_sku", max_vocab_size=2048),
    j2v.Number("quantity", query="[*].line_items[*].qty"),
    name="items",
    max_length=32,
)

Explicit queries can also stack repeated semantic roles, such as origin and destination, into one shared field. See Field Stacking.

Examples

Common array fields include:

  • Transaction line items, order items, cart contents, or invoice rows.
  • Time-ordered events in a session, visit, trip, or workflow.
  • Repeated measurements, sensor readings, observations, or lab results.
  • Related parties or counterparties attached to one record.

Configuration

Option Default Notes
name required Public schema name and inferred source key.
fields [] Child arrays or tensorfields. Positional constructor arguments become fields.
max_length 1 Number of repeated slots retained per observation. Must be positive.
overflow "head" How to handle more than max_length child records: "head" keeps the first records, "tail" keeps the last records, and "error" raises.
attention "mha" Attention implementation for the array encoder.
n_layers 1 Number of encoder layers for this array.
n_heads 4 Attention heads for this array. Must be even.
n_linear 1 Number of feed-forward linear layers in this array.
dropout None Optional dropout rate.
embed False Includes this array node in Model.predict(...) outputs under embedding. See Learning Modes & Embeddings.
description None Optional schema metadata.

Nesting

Arrays can contain tensorfields or other arrays. Each nested array contributes another repeated dimension to the child fields' shape.

session = j2v.Array(
    j2v.Array(
        j2v.Number("amount"),
        name="transactions",
        max_length=32,
    ),
    name="sessions",
    max_length=4,
)

Deep nesting is useful when the source shape matters. If repeated records are only an artifact of upstream storage, flattening or preprocessing the data can be more efficient.

Target And Prediction Behavior

Array itself is not a supervised target. Its child tensorfields can be targets, and the array context is used to encode those children and route information through the model.

p_mask, p_prune, and target=True are configured on leaf tensorfields, not on Array. To apply the same masking or pruning rate to every child field, use model.update(...) with a selector that matches those leaves.

Configure embed=True on an array when you want Model.predict(...) to return a representation for the grouped context under that array address. See Learning Modes & Embeddings for root, array, and leaf embedding patterns.