API Reference
This page is generated from public docstrings and is meant as a lookup companion to the tutorials. Start with the notebooks when learning the workflow, then use this page to inspect constructor options, mutation methods, and extension base classes.
Common Entry Points
Model.from_schema(...)builds the model tree from field constructors and arrays.Array(...)declares a repeated nested context.OverflowenumeratesArrayoverflow policies:head,tail, anderror.Number,Category,Set,DateParts,Entity,Vector, andTextdeclare typed fields.CustomDataModule(...)wraps user-provided PyTorch iterable datasets.PolarsDataModule(...)builds data loaders from a configured model.StreamingDataModule(...)streams local or S3-backed files into Lightning loops.Model.predict(...)returns configured target predictions and embeddings.Writer(...)writes batch prediction output fromTrainer.predict(...).Postprocessorreshapes predictions after decoding; see Postprocessors.Deploymentwraps a checkpoint or model instance for serving; install theservingextra for FastAPI-backed deployment paths.
Learning-oriented entry points:
- Getting Started
- Model Tree
- Query Paths
- Built-In Data Types
- Training With Lightning
- Data Modules
- Batch Inference
- Serving
Package
json2vec
Public json2vec SDK surface.
The top-level package exports the constructors and helpers used by most
applications: Model.from_schema(...) for model construction, tensorfield
request constructors such as Category and Number, data modules, schema
mutation predicates, and the @preprocess decorator.
Postprocessor
module-attribute
Postprocessor: TypeAlias = Callable[
[dict[str, Any], dict[Address, dict[str, Any]]],
dict[Address, dict[str, Any]] | None,
]
UpdateOperation
module-attribute
UpdateOperation: TypeAlias = tuple[
tuple[
NodePredicate
| NodeAttribute
| Callable[[Node], bool],
...,
],
dict[str, Any],
]
RollbackCheckpoint
Bases: ModelCheckpoint
Checkpoint the best model during fit and restore it into the module at fit end.
Source code in src/json2vec/architecture/checkpoint.py
on_fit_end
Source code in src/json2vec/architecture/checkpoint.py
MutationLockCallback
Bases: Callback
Prevent runtime schema mutations while Lightning owns an active loop.
locks
class-attribute
instance-attribute
on_train_start
class-attribute
instance-attribute
on_train_end
class-attribute
instance-attribute
on_validation_start
class-attribute
instance-attribute
on_validation_end
class-attribute
instance-attribute
on_test_start
class-attribute
instance-attribute
on_test_end
class-attribute
instance-attribute
on_predict_start
class-attribute
instance-attribute
on_predict_end
class-attribute
instance-attribute
on_exception
RuntimePlacementCallback
Bases: Callback
Move late-created modules onto the Lightning module's active device.
on_train_start
class-attribute
instance-attribute
on_validation_start
class-attribute
instance-attribute
on_test_start
class-attribute
instance-attribute
Model
Model(
hyperparameters: Hyperparameters,
*,
batch_size: int = 1,
optimizer: OptimizerConfig | None = None,
scheduler: SchedulerConfig | None = None,
)
Bases: LightningModule, Renderable
Neural model generated from a json2vec schema tree.
Model owns the schema hyperparameters, tensorfield embedders, array
encoders, decoders, and convenience methods for prediction, checkpointing,
schema display and mutation.
Example
Source code in src/json2vec/architecture/root.py
validation_step
class-attribute
instance-attribute
from_schema
classmethod
from_schema(
*field_args: SchemaField,
d_model: int,
n_layers: int,
n_heads: int,
batch_size: int = 1,
fields: Sequence[SchemaField] | None = None,
name: str = "record",
description: str | None = None,
embed: bool = False,
attention: AttentionMode | str = AttentionMode.mha,
n_linear: int = 1,
dropout: Rate | None = None,
optimizer: OptimizerConfig | None = None,
scheduler: SchedulerConfig | None = None,
) -> Self
Build a model directly from schema fields.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*field_args
|
SchemaField
|
Field constructors such as |
()
|
d_model
|
int
|
Shared model width. |
required |
n_layers
|
int
|
Number of encoder layers on generated array nodes. |
required |
n_heads
|
int
|
Attention heads used by generated nodes. |
required |
batch_size
|
int
|
Batch size used by data modules, examples, and mocked Lightning input arrays. |
1
|
fields
|
Sequence[SchemaField] | None
|
Optional sequence form of |
None
|
name
|
str
|
Root array name. Defaults to |
'record'
|
description
|
str | None
|
Optional description on the generated root array. |
None
|
embed
|
bool
|
Configure the generated root array as an embedding output. |
False
|
attention
|
AttentionMode | str
|
Attention mode for the generated root array. |
mha
|
n_linear
|
int
|
Feed-forward block count on the generated root array. |
1
|
dropout
|
Rate | None
|
Optional dropout rate on the generated root array. |
None
|
optimizer
|
OptimizerConfig | None
|
Optimizer instance or factory used by Lightning training. |
None
|
scheduler
|
SchedulerConfig | None
|
Optional scheduler config or factory. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A compiled |
Source code in src/json2vec/architecture/root.py
select
select(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
include_root: bool = True,
use_cache: bool = True,
) -> list[Node]
Return schema nodes that satisfy every predicate.
Source code in src/json2vec/architecture/root.py
update
update(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> None
Mutate selected schema nodes and rebuild compatible modules.
target=True is shorthand for p_prune=1.0; target=False clears
target behavior by setting p_prune=0.0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*predicates
|
NodePredicate | NodeAttribute | Callable[[Node], bool]
|
Predicates used to select nodes. |
()
|
strict
|
bool
|
Raise when a selected node cannot accept one of |
True
|
allow_extra
|
bool
|
Permit updates to extra metadata fields on models that allow unknown fields. |
False
|
include_root
|
bool
|
Include the root node in predicate matching. |
True
|
validate
|
bool
|
Validate each node after applying candidate values. |
True
|
use_cache
|
bool
|
Permit cached selector results. Mutations default this to
|
False
|
**values
|
Any
|
Schema attributes to update. |
{}
|
Source code in src/json2vec/architecture/root.py
extend
extend(
*args: NodePredicate
| NodeAttribute
| Callable[[Node], bool]
| SchemaField,
include_root: bool = True,
use_cache: bool = True,
) -> None
Append new schema fields under one selected array node and rebuild modules.
Source code in src/json2vec/architecture/root.py
delete
delete(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
include_root: bool = False,
use_cache: bool = True,
) -> None
Permanently remove selected schema nodes and rebuild modules.
Source code in src/json2vec/architecture/root.py
reset
reset(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
include_root: bool = True,
use_cache: bool = True,
descendants: bool = False,
) -> None
Reinitialize selected runtime node modules while preserving schema values.
Source code in src/json2vec/architecture/root.py
override
override(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> Iterator[None]
Temporarily mutate selected schema nodes and keep runtime modules synchronized.
Source code in src/json2vec/architecture/root.py
configure_callbacks
Source code in src/json2vec/architecture/root.py
track
Source code in src/json2vec/architecture/root.py
save
Save model weights and schema hyperparameters to a checkpoint.
forward
forward(
inputs: TensorDict[Address, TensorFieldBase],
*,
strata: Strata | str,
dataloader_idx: int = 0,
) -> list[Prediction]
Source code in src/json2vec/architecture/root.py
configure_optimizers
Source code in src/json2vec/architecture/root.py
on_save_checkpoint
restore_checkpoint_state
Restore this model in place from a json2vec checkpoint dictionary.
load
classmethod
write
encode
encode(
batch: EncodedBatch | list[dict[str, Any]],
preprocess: Preprocessor | None = None,
strata: Strata | str = Strata.predict,
mask: bool = True,
) -> EncodedInput
Return encoded tensorfield inputs for raw or processed observations.
Source code in src/json2vec/architecture/root.py
predict
predict(
batch: EncodedBatch | list[dict[str, Any]],
preprocess: Preprocessor | None = None,
postprocess: Postprocessor | None = None,
) -> dict[Address, dict[str, Any]]
Return typed predictions and configured embeddings for a raw or encoded batch.
Source code in src/json2vec/architecture/root.py
CustomDataModule
CustomDataModule(
model: Model,
train: IterableDataset | None = None,
validate: IterableDataset | None = None,
test: IterableDataset | None = None,
predict: IterableDataset | None = None,
preprocessor: str
| Callable[..., Any]
| Preprocessor
| None = None,
datasets: DatasetMap | None = None,
num_workers: NonNegativeInt
| None
| StrataMap[NonNegativeInt | None] = None,
persistent_workers: bool | StrataMap[bool] = True,
pin_memory: bool | StrataMap[bool] = True,
observation_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
sample_rate: SampleRate | StrataMap[SampleRate] = 1.0,
**kwargs: Any,
)
Bases: LightningDataModule
Lightning data module for user-provided iterable datasets.
Source code in src/json2vec/data/datasets/custom.py
persistent_workers
instance-attribute
observation_buffer_size
instance-attribute
sample_rate
instance-attribute
interprocess_encoding_context
property
writable
train_dataloader
class-attribute
instance-attribute
val_dataloader
class-attribute
instance-attribute
test_dataloader
class-attribute
instance-attribute
predict_dataloader
class-attribute
instance-attribute
dataloader
Source code in src/json2vec/data/datasets/custom.py
PolarsDataModule
PolarsDataModule(
model: Model,
train: DataFrame | None = None,
validate: DataFrame | None = None,
test: DataFrame | None = None,
predict: DataFrame | None = None,
preprocessor: str
| Callable[..., Any]
| Preprocessor
| None = None,
dataframe: DataFrame | DataFrameMap | None = None,
num_workers: NonNegativeInt
| None
| StrataMap[NonNegativeInt | None] = None,
persistent_workers: bool | StrataMap[bool] = True,
pin_memory: bool | StrataMap[bool] = True,
sharding: ShardingStrategy
| str
| StrataMap[
ShardingStrategy | str
] = ShardingStrategy.chunk,
chunk_batch_size: PositiveInt
| StrataMap[PositiveInt] = 4096,
observation_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
sample_rate: SampleRate | StrataMap[SampleRate] = 1.0,
replacement: bool | StrataMap[bool] = False,
**kwargs: Any,
)
Bases: LightningDataModule
Lightning data module for in-memory Polars DataFrames.
Source code in src/json2vec/data/datasets/polars.py
persistent_workers
instance-attribute
observation_buffer_size
instance-attribute
sample_rate
instance-attribute
interprocess_encoding_context
property
writable
train_dataloader
class-attribute
instance-attribute
val_dataloader
class-attribute
instance-attribute
test_dataloader
class-attribute
instance-attribute
predict_dataloader
class-attribute
instance-attribute
dataloader
Source code in src/json2vec/data/datasets/polars.py
StreamingDataModule
StreamingDataModule(
model: Model,
root: str | Path,
suffix: Suffix | str,
train: PatternInput | None = None,
validate: PatternInput | None = None,
test: PatternInput | None = None,
predict: PatternInput | None = None,
preprocessor: str
| Callable[..., Any]
| Preprocessor
| None = None,
num_workers: NonNegativeInt
| None
| StrataMap[NonNegativeInt | None] = None,
persistent_workers: bool | StrataMap[bool] = True,
pin_memory: bool | StrataMap[bool] = True,
sharding: ShardingStrategy
| str
| StrataMap[
ShardingStrategy | str
] = ShardingStrategy.file,
chunk_batch_size: PositiveInt
| StrataMap[PositiveInt] = 4096,
file_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
observation_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
sample_rate: SampleRate | StrataMap[SampleRate] = 1.0,
replacement: bool | StrataMap[bool] | None = None,
**kwargs: Any,
)
Bases: LightningDataModule
Lightning data module for streaming records from files.
Reads file-backed records, applies an optional preprocessor, batches observations, and encodes them with model hyperparameters.
Source code in src/json2vec/data/datasets/streaming.py
validate
instance-attribute
persistent_workers
instance-attribute
observation_buffer_size
instance-attribute
sample_rate
instance-attribute
replacement
instance-attribute
replacement = (
{strata: (strata == train) for strata in Strata}
if replacement is None
else expand(replacement, default=False)
)
interprocess_encoding_context
property
writable
train_dataloader
class-attribute
instance-attribute
val_dataloader
class-attribute
instance-attribute
test_dataloader
class-attribute
instance-attribute
predict_dataloader
class-attribute
instance-attribute
dataloader
Source code in src/json2vec/data/datasets/streaming.py
Writer
Writer(
path: PathLike | str,
flush_every_n_batches: int | None = None,
postprocessor: Postprocessor | None = None,
)
Bases: BasePredictionWriter
Source code in src/json2vec/inference/callback.py
write_on_batch_end
write_on_batch_end(
trainer: Trainer,
pl_module: Model,
output: dict[str, list[Prediction]],
batch_indices: list[int] | None,
batch: TensorDict[Address, TensorFieldBase],
batch_idx: int,
dataloader_idx: int,
) -> None
Source code in src/json2vec/inference/callback.py
Preprocessor
Bases: BaseModel
Registered observation preprocessor.
A transformation preprocessor returns one dict. A generator preprocessor yields or returns multiple dict objects, each of which becomes a processed observation.
model_config
class-attribute
instance-attribute
accepted_kwargs
cached
staticmethod
Source code in src/json2vec/preprocessors/base.py
filter_supported_kwargs
classmethod
Source code in src/json2vec/preprocessors/base.py
register
classmethod
outputs
Yield normalized processed observations for one raw observation.
Source code in src/json2vec/preprocessors/base.py
require_object
Source code in src/json2vec/preprocessors/base.py
PreprocessorMode
Bases: StrEnum
Execution mode for a registered preprocessor.
from_yields
classmethod
AttentionMode
Bases: StrEnum
normalize
classmethod
kv_heads
Source code in src/json2vec/structs/enums.py
Component
Bases: StrEnum
Metric
Bases: StrEnum
Overflow
ShardingStrategy
Strata
Bases: StrEnum
normalize
classmethod
expand
classmethod
Source code in src/json2vec/structs/enums.py
Suffix
Bases: StrEnum
TensorKey
Bases: StrEnum
Tokens
Hyperparameters
Bases: Node
Serializable schema and training metadata used to build a Model.
name
class-attribute
instance-attribute
type
class-attribute
instance-attribute
description
class-attribute
instance-attribute
update_values
classmethod
Source code in src/json2vec/structs/experiment.py
jmespath_member
classmethod
query_for_source
classmethod
Infer a request-level query for a leaf source field.
The encoder prepends the outer batch selector during search. Inferred
queries therefore start at the processed-observation level: [*].amount,
not [*][*].amount.
Source code in src/json2vec/structs/experiment.py
request_from_leaf
classmethod
Source code in src/json2vec/structs/experiment.py
from_schema_node
classmethod
Source code in src/json2vec/structs/experiment.py
from_schema
classmethod
from_schema(
*field_args: SchemaField,
d_model: int,
n_layers: int,
n_heads: int,
fields: Sequence[SchemaField] | None = None,
name: str = "record",
description: str | None = None,
embed: bool = False,
attention: AttentionMode | str = AttentionMode.mha,
n_linear: Annotated[int, Field(gt=0)] = 1,
dropout: Rate | None = None,
) -> Self
Build hyperparameters from schema fields.
Source code in src/json2vec/structs/experiment.py
model_post_init
Source code in src/json2vec/structs/experiment.py
overflows
array_masks_for
Source code in src/json2vec/structs/experiment.py
clear_selection_cache
refresh_selection_cache
Source code in src/json2vec/structs/experiment.py
select
select(
*predicates: NodeSelector,
include_root: bool = True,
use_cache: bool = True,
) -> list[Node]
Source code in src/json2vec/structs/experiment.py
update
update(
*predicates: NodeSelector,
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> None
Mutate matching schema nodes.
target=True is normalized to p_prune=1.0; target=False clears the
target prune rate by setting p_prune=0.0.
Source code in src/json2vec/structs/experiment.py
extend
Append new schema fields under the single array selected by predicates.
Source code in src/json2vec/structs/experiment.py
delete
Permanently remove selected schema nodes from the tree.
Source code in src/json2vec/structs/experiment.py
override
override(
*predicates: NodeSelector,
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> Iterator[None]
Source code in src/json2vec/structs/experiment.py
NodeAttribute
Bases: BaseModel
Queryable schema node attribute returned by where(...).
name
class-attribute
instance-attribute
name: str = Field(
description="Queryable node attribute. Built-ins include name, type, address, parent, children, ancestors, descendants, and target. Pydantic fields and extra metadata fields are also queryable."
)
named
classmethod
get
Source code in src/json2vec/structs/selectors.py
exists
is_in
Source code in src/json2vec/structs/selectors.py
matches
Source code in src/json2vec/structs/selectors.py
contains
is_null
NodePredicate
Bases: BaseModel
Composable predicate used to select schema nodes.
model_config
class-attribute
instance-attribute
from_callable
classmethod
from_selector
classmethod
Source code in src/json2vec/structs/selectors.py
Array
Bases: Node
Repeated nested object group in a json2vec schema.
Positional children are treated as fields inside the array.
Source code in src/json2vec/structs/structure.py
type
class-attribute
instance-attribute
max_length
class-attribute
instance-attribute
fields
class-attribute
instance-attribute
normalize_mask_shorthand
classmethod
Source code in src/json2vec/structs/structure.py
model_post_init
check_unique_child_names
Source code in src/json2vec/structs/structure.py
post_bind_validate
Source code in src/json2vec/structs/structure.py
excluded_leaves
Source code in src/json2vec/structs/structure.py
Mask
Bases: BaseModel
Structured masking policy attached to an Array.
model_config
class-attribute
instance-attribute
check_rate_or_count
normalize_exclude
classmethod
Source code in src/json2vec/structs/structure.py
Address
Bases: str
Slash-delimited stable path to a schema node.
Leaf
Bases: Node
Base tensorfield request node.
Concrete tensorfield constructors such as Number and Category inherit
from this class through their registered request models.
Source code in src/json2vec/structs/tree.py
weight
class-attribute
instance-attribute
resolve_role_shorthands
classmethod
Source code in src/json2vec/structs/tree.py
merge_constructor_kwargs
classmethod
Source code in src/json2vec/structs/tree.py
validate_type
classmethod
Source code in src/json2vec/structs/tree.py
check_jmespath_query
Source code in src/json2vec/structs/tree.py
DecoderBase
Bases: Module
Base class for tensorfield decoders.
Source code in src/json2vec/tensorfields/base.py
decode
forward
Source code in src/json2vec/tensorfields/base.py
EmbedderBase
Plugin
Registry object for a tensorfield implementation.
Register request, tensorfield, embedder, decoder, loss, and write
components with @plugin.register. Creating a plugin with an existing
name replaces the registry entry and emits a warning.
Source code in src/json2vec/tensorfields/base.py
register
Register one tensorfield component with this plugin.
Source code in src/json2vec/tensorfields/base.py
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
callback
Register one or more Lightning callback factories for this tensorfield.
Source code in src/json2vec/tensorfields/base.py
TensorFieldBase
Bases: Renderable
Tensorized field values plus trainable target state.
STATE_LABELS
class-attribute
instance-attribute
STATE_STYLES
class-attribute
instance-attribute
STATE_STYLES: dict[int, str] = {
value: "bold green",
value: "bold yellow",
value: "dim",
value: "bold magenta",
value: "bold cyan",
}
new
abstractmethod
classmethod
empty
abstractmethod
classmethod
mask
abstractmethod
target
abstractmethod
Category
Bases: RequestBase
Categorical scalar tensorfield request backed by an online vocabulary.
max_vocab_size
class-attribute
instance-attribute
p_unavailable
class-attribute
instance-attribute
reject_removed_options
classmethod
Source code in src/json2vec/tensorfields/extensions/category.py
check_topk
Source code in src/json2vec/tensorfields/extensions/category.py
DateParts
Bases: RequestBase
Date/time tensorfield request that extracts configured calendar parts.
pattern
class-attribute
instance-attribute
check_dateparts
classmethod
Source code in src/json2vec/tensorfields/extensions/dateparts.py
check_date_pattern
classmethod
Source code in src/json2vec/tensorfields/extensions/dateparts.py
Entity
Bases: RequestBase
Per-observation entity tensorfield request for local identity matching.
check_topk
Source code in src/json2vec/tensorfields/extensions/entity.py
post_bind_validate
Source code in src/json2vec/tensorfields/extensions/entity.py
Number
Bases: RequestBase
Numeric scalar tensorfield request.
jitter
class-attribute
instance-attribute
alpha
class-attribute
instance-attribute
Set
Bases: RequestBase
Multi-label set tensorfield request backed by an online vocabulary.
max_vocab_size
class-attribute
instance-attribute
p_unavailable
class-attribute
instance-attribute
Text
Bases: RequestBase
Text tensorfield request encoded by a frozen Hugging Face model.
max_length
class-attribute
instance-attribute
encoder_batch_size
class-attribute
instance-attribute
normalize_model_name
classmethod
normalize_revision
classmethod
Source code in src/json2vec/tensorfields/extensions/text.py
Vector
Bases: RequestBase
Fixed-width numeric vector tensorfield request.
VocabularySyncCallback
Bases: Callback
Synchronize online vocabularies registered by tensorfield extensions.
on_fit_start
class-attribute
instance-attribute
on_train_epoch_end
class-attribute
instance-attribute
on_fit_end
Accelerator
Deployment
Bases: BaseSettings
Serving configuration for a json2vec checkpoint or model instance.
Deployment queues request/response schemas, optional preprocessors,
optional postprocessors, and update(...) mutations before the model is
loaded by FastAPI application startup.
model_config
class-attribute
instance-attribute
model_config = SettingsConfigDict(
extra="ignore",
case_sensitive=False,
validate_by_name=True,
validate_by_alias=True,
arbitrary_types_allowed=True,
)
checkpoint
class-attribute
instance-attribute
checkpoint: ModelSource = Field(
default="model.ckpt",
validation_alias=AliasChoices(
"JSON2VEC_CHECKPOINT", "CHECKPOINT"
),
)
max_batch_size
class-attribute
instance-attribute
max_batch_size: int = Field(
default=128,
ge=1,
validation_alias=AliasChoices(
"JSON2VEC_MAX_BATCH_SIZE", "MAX_BATCH_SIZE"
),
)
batch_timeout
class-attribute
instance-attribute
batch_timeout: float = Field(
default=0.0,
ge=0.0,
validation_alias=AliasChoices(
"JSON2VEC_BATCH_TIMEOUT", "BATCH_TIMEOUT"
),
)
workers
class-attribute
instance-attribute
workers: int = Field(
default=1,
ge=1,
validation_alias=AliasChoices(
"JSON2VEC_WORKERS", "WORKERS"
),
)
accelerator
class-attribute
instance-attribute
accelerator: Accelerator = Field(
default=auto,
validation_alias=AliasChoices(
"JSON2VEC_ACCELERATOR", "ACCELERATOR"
),
)
host
class-attribute
instance-attribute
port
class-attribute
instance-attribute
port: int = Field(
default=8000,
ge=1,
le=65535,
validation_alias=AliasChoices("JSON2VEC_PORT", "PORT"),
)
log_level
class-attribute
instance-attribute
log_level: str = Field(
default="info",
validation_alias=AliasChoices(
"JSON2VEC_LOG_LEVEL", "LOG_LEVEL"
),
)
monitor_queries
class-attribute
instance-attribute
monitor_queries: bool = Field(
default=False,
validation_alias=AliasChoices(
"JSON2VEC_MONITOR_QUERIES", "MONITOR_QUERIES"
),
)
query_monitor_every
class-attribute
instance-attribute
query_monitor_every: int = Field(
default=1000,
gt=0,
validation_alias=AliasChoices(
"JSON2VEC_QUERY_MONITOR_EVERY",
"QUERY_MONITOR_EVERY",
),
)
json_backend
class-attribute
instance-attribute
json_backend: JSONBackend = Field(
default=orjson,
validation_alias=AliasChoices(
"JSON2VEC_JSON_BACKEND", "JSON_BACKEND"
),
)
strip_checkpoint
classmethod
Source code in src/json2vec/inference/deployment.py
check_model_source
forge
forge(
request: type[BaseModel] | None = None,
response: type[BaseModel] | None = None,
) -> Deployment
Attach optional Pydantic request and response signatures.
Source code in src/json2vec/inference/deployment.py
preprocess
Attach an optional request preprocessor.
If this method is not called, request objects are encoded unchanged.
Source code in src/json2vec/inference/deployment.py
postprocess
Attach an optional response postprocessor.
Source code in src/json2vec/inference/deployment.py
update
update(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
**values: Any,
) -> Deployment
Queue a model schema mutation to apply during server startup.
This mirrors Model.update(...) and is useful for serving-time changes
such as target=False.
Source code in src/json2vec/inference/deployment.py
app
Build a FastAPI app for the configured checkpoint or model.
Source code in src/json2vec/inference/deployment.py
546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 | |
serve
Start the FastAPI server for the configured checkpoint or model.
Source code in src/json2vec/inference/deployment.py
JSONBackend
preprocess
preprocess(
func: Callable[..., Any] | None = None,
*,
yields: bool | None = None,
**kwargs: Any,
) -> Callable[..., Any]
Register a callable as a json2vec preprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable[..., Any] | None
|
Callable to register when used as |
None
|
yields
|
bool | None
|
Set to |
None
|
**kwargs
|
Any
|
Reserved for validation of unsupported decorator arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
Callable[..., Any]
|
The original callable, after registering it in |
Example
Source code in src/json2vec/preprocessors/base.py
predicate
Create a cacheable node predicate from a callable.
Model
Model
Model(
hyperparameters: Hyperparameters,
*,
batch_size: int = 1,
optimizer: OptimizerConfig | None = None,
scheduler: SchedulerConfig | None = None,
)
Bases: LightningModule, Renderable
Neural model generated from a json2vec schema tree.
Model owns the schema hyperparameters, tensorfield embedders, array
encoders, decoders, and convenience methods for prediction, checkpointing,
schema display and mutation.
Example
Source code in src/json2vec/architecture/root.py
from_schema
classmethod
from_schema(
*field_args: SchemaField,
d_model: int,
n_layers: int,
n_heads: int,
batch_size: int = 1,
fields: Sequence[SchemaField] | None = None,
name: str = "record",
description: str | None = None,
embed: bool = False,
attention: AttentionMode | str = AttentionMode.mha,
n_linear: int = 1,
dropout: Rate | None = None,
optimizer: OptimizerConfig | None = None,
scheduler: SchedulerConfig | None = None,
) -> Self
Build a model directly from schema fields.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*field_args
|
SchemaField
|
Field constructors such as |
()
|
d_model
|
int
|
Shared model width. |
required |
n_layers
|
int
|
Number of encoder layers on generated array nodes. |
required |
n_heads
|
int
|
Attention heads used by generated nodes. |
required |
batch_size
|
int
|
Batch size used by data modules, examples, and mocked Lightning input arrays. |
1
|
fields
|
Sequence[SchemaField] | None
|
Optional sequence form of |
None
|
name
|
str
|
Root array name. Defaults to |
'record'
|
description
|
str | None
|
Optional description on the generated root array. |
None
|
embed
|
bool
|
Configure the generated root array as an embedding output. |
False
|
attention
|
AttentionMode | str
|
Attention mode for the generated root array. |
mha
|
n_linear
|
int
|
Feed-forward block count on the generated root array. |
1
|
dropout
|
Rate | None
|
Optional dropout rate on the generated root array. |
None
|
optimizer
|
OptimizerConfig | None
|
Optimizer instance or factory used by Lightning training. |
None
|
scheduler
|
SchedulerConfig | None
|
Optional scheduler config or factory. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A compiled |
Source code in src/json2vec/architecture/root.py
select
select(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
include_root: bool = True,
use_cache: bool = True,
) -> list[Node]
Return schema nodes that satisfy every predicate.
Source code in src/json2vec/architecture/root.py
update
update(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> None
Mutate selected schema nodes and rebuild compatible modules.
target=True is shorthand for p_prune=1.0; target=False clears
target behavior by setting p_prune=0.0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*predicates
|
NodePredicate | NodeAttribute | Callable[[Node], bool]
|
Predicates used to select nodes. |
()
|
strict
|
bool
|
Raise when a selected node cannot accept one of |
True
|
allow_extra
|
bool
|
Permit updates to extra metadata fields on models that allow unknown fields. |
False
|
include_root
|
bool
|
Include the root node in predicate matching. |
True
|
validate
|
bool
|
Validate each node after applying candidate values. |
True
|
use_cache
|
bool
|
Permit cached selector results. Mutations default this to
|
False
|
**values
|
Any
|
Schema attributes to update. |
{}
|
Source code in src/json2vec/architecture/root.py
extend
extend(
*args: NodePredicate
| NodeAttribute
| Callable[[Node], bool]
| SchemaField,
include_root: bool = True,
use_cache: bool = True,
) -> None
Append new schema fields under one selected array node and rebuild modules.
Source code in src/json2vec/architecture/root.py
delete
delete(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
include_root: bool = False,
use_cache: bool = True,
) -> None
Permanently remove selected schema nodes and rebuild modules.
Source code in src/json2vec/architecture/root.py
reset
reset(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
include_root: bool = True,
use_cache: bool = True,
descendants: bool = False,
) -> None
Reinitialize selected runtime node modules while preserving schema values.
Source code in src/json2vec/architecture/root.py
override
override(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> Iterator[None]
Temporarily mutate selected schema nodes and keep runtime modules synchronized.
Source code in src/json2vec/architecture/root.py
save
Save model weights and schema hyperparameters to a checkpoint.
load
classmethod
predict
predict(
batch: EncodedBatch | list[dict[str, Any]],
preprocess: Preprocessor | None = None,
postprocess: Postprocessor | None = None,
) -> dict[Address, dict[str, Any]]
Return typed predictions and configured embeddings for a raw or encoded batch.
Source code in src/json2vec/architecture/root.py
Schema
Array
Bases: Node
Repeated nested object group in a json2vec schema.
Positional children are treated as fields inside the array.
Source code in src/json2vec/structs/structure.py
type
class-attribute
instance-attribute
max_length
class-attribute
instance-attribute
fields
class-attribute
instance-attribute
normalize_mask_shorthand
classmethod
Source code in src/json2vec/structs/structure.py
model_post_init
check_unique_child_names
Source code in src/json2vec/structs/structure.py
post_bind_validate
Source code in src/json2vec/structs/structure.py
excluded_leaves
Source code in src/json2vec/structs/structure.py
Hyperparameters
Bases: Node
Serializable schema and training metadata used to build a Model.
from_schema
classmethod
from_schema(
*field_args: SchemaField,
d_model: int,
n_layers: int,
n_heads: int,
fields: Sequence[SchemaField] | None = None,
name: str = "record",
description: str | None = None,
embed: bool = False,
attention: AttentionMode | str = AttentionMode.mha,
n_linear: Annotated[int, Field(gt=0)] = 1,
dropout: Rate | None = None,
) -> Self
Build hyperparameters from schema fields.
Source code in src/json2vec/structs/experiment.py
select
select(
*predicates: NodeSelector,
include_root: bool = True,
use_cache: bool = True,
) -> list[Node]
Source code in src/json2vec/structs/experiment.py
update
update(
*predicates: NodeSelector,
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> None
Mutate matching schema nodes.
target=True is normalized to p_prune=1.0; target=False clears the
target prune rate by setting p_prune=0.0.
Source code in src/json2vec/structs/experiment.py
extend
Append new schema fields under the single array selected by predicates.
Source code in src/json2vec/structs/experiment.py
delete
Permanently remove selected schema nodes from the tree.
Source code in src/json2vec/structs/experiment.py
override
override(
*predicates: NodeSelector,
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
use_cache: bool = False,
**values: Any,
) -> Iterator[None]
Source code in src/json2vec/structs/experiment.py
where
Tensorfield Constructors
Number
Bases: RequestBase
Numeric scalar tensorfield request.
jitter
class-attribute
instance-attribute
alpha
class-attribute
instance-attribute
Category
Bases: RequestBase
Categorical scalar tensorfield request backed by an online vocabulary.
max_vocab_size
class-attribute
instance-attribute
p_unavailable
class-attribute
instance-attribute
reject_removed_options
classmethod
Source code in src/json2vec/tensorfields/extensions/category.py
check_topk
Source code in src/json2vec/tensorfields/extensions/category.py
Set
Bases: RequestBase
Multi-label set tensorfield request backed by an online vocabulary.
max_vocab_size
class-attribute
instance-attribute
p_unavailable
class-attribute
instance-attribute
DateParts
Bases: RequestBase
Date/time tensorfield request that extracts configured calendar parts.
pattern
class-attribute
instance-attribute
check_dateparts
classmethod
Source code in src/json2vec/tensorfields/extensions/dateparts.py
check_date_pattern
classmethod
Source code in src/json2vec/tensorfields/extensions/dateparts.py
Entity
Bases: RequestBase
Per-observation entity tensorfield request for local identity matching.
check_topk
Source code in src/json2vec/tensorfields/extensions/entity.py
post_bind_validate
Source code in src/json2vec/tensorfields/extensions/entity.py
Vector
Bases: RequestBase
Fixed-width numeric vector tensorfield request.
Text
Bases: RequestBase
Text tensorfield request encoded by a frozen Hugging Face model.
max_length
class-attribute
instance-attribute
encoder_batch_size
class-attribute
instance-attribute
normalize_model_name
classmethod
normalize_revision
classmethod
Source code in src/json2vec/tensorfields/extensions/text.py
Data
Use Data Modules for the workflow-level guide to
CustomDataModule, PolarsDataModule, and StreamingDataModule.
CustomDataModule
CustomDataModule(
model: Model,
train: IterableDataset | None = None,
validate: IterableDataset | None = None,
test: IterableDataset | None = None,
predict: IterableDataset | None = None,
preprocessor: str
| Callable[..., Any]
| Preprocessor
| None = None,
datasets: DatasetMap | None = None,
num_workers: NonNegativeInt
| None
| StrataMap[NonNegativeInt | None] = None,
persistent_workers: bool | StrataMap[bool] = True,
pin_memory: bool | StrataMap[bool] = True,
observation_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
sample_rate: SampleRate | StrataMap[SampleRate] = 1.0,
**kwargs: Any,
)
Bases: LightningDataModule
Lightning data module for user-provided iterable datasets.
Source code in src/json2vec/data/datasets/custom.py
train_dataloader
class-attribute
instance-attribute
val_dataloader
class-attribute
instance-attribute
test_dataloader
class-attribute
instance-attribute
predict_dataloader
class-attribute
instance-attribute
dataloader
Source code in src/json2vec/data/datasets/custom.py
PolarsDataModule
PolarsDataModule(
model: Model,
train: DataFrame | None = None,
validate: DataFrame | None = None,
test: DataFrame | None = None,
predict: DataFrame | None = None,
preprocessor: str
| Callable[..., Any]
| Preprocessor
| None = None,
dataframe: DataFrame | DataFrameMap | None = None,
num_workers: NonNegativeInt
| None
| StrataMap[NonNegativeInt | None] = None,
persistent_workers: bool | StrataMap[bool] = True,
pin_memory: bool | StrataMap[bool] = True,
sharding: ShardingStrategy
| str
| StrataMap[
ShardingStrategy | str
] = ShardingStrategy.chunk,
chunk_batch_size: PositiveInt
| StrataMap[PositiveInt] = 4096,
observation_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
sample_rate: SampleRate | StrataMap[SampleRate] = 1.0,
replacement: bool | StrataMap[bool] = False,
**kwargs: Any,
)
Bases: LightningDataModule
Lightning data module for in-memory Polars DataFrames.
Source code in src/json2vec/data/datasets/polars.py
train_dataloader
class-attribute
instance-attribute
val_dataloader
class-attribute
instance-attribute
test_dataloader
class-attribute
instance-attribute
predict_dataloader
class-attribute
instance-attribute
dataloader
Source code in src/json2vec/data/datasets/polars.py
StreamingDataModule
StreamingDataModule(
model: Model,
root: str | Path,
suffix: Suffix | str,
train: PatternInput | None = None,
validate: PatternInput | None = None,
test: PatternInput | None = None,
predict: PatternInput | None = None,
preprocessor: str
| Callable[..., Any]
| Preprocessor
| None = None,
num_workers: NonNegativeInt
| None
| StrataMap[NonNegativeInt | None] = None,
persistent_workers: bool | StrataMap[bool] = True,
pin_memory: bool | StrataMap[bool] = True,
sharding: ShardingStrategy
| str
| StrataMap[
ShardingStrategy | str
] = ShardingStrategy.file,
chunk_batch_size: PositiveInt
| StrataMap[PositiveInt] = 4096,
file_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
observation_buffer_size: PositiveInt
| StrataMap[PositiveInt] = 1,
sample_rate: SampleRate | StrataMap[SampleRate] = 1.0,
replacement: bool | StrataMap[bool] | None = None,
**kwargs: Any,
)
Bases: LightningDataModule
Lightning data module for streaming records from files.
Reads file-backed records, applies an optional preprocessor, batches observations, and encodes them with model hyperparameters.
Source code in src/json2vec/data/datasets/streaming.py
train_dataloader
class-attribute
instance-attribute
val_dataloader
class-attribute
instance-attribute
test_dataloader
class-attribute
instance-attribute
predict_dataloader
class-attribute
instance-attribute
dataloader
Source code in src/json2vec/data/datasets/streaming.py
Batch Inference
Use Batch Inference for the workflow-level guide
to Trainer.predict(...), Writer, and postprocessed Parquet output.
Writer
Writer(
path: PathLike | str,
flush_every_n_batches: int | None = None,
postprocessor: Postprocessor | None = None,
)
Bases: BasePredictionWriter
Source code in src/json2vec/inference/callback.py
write_on_batch_end
write_on_batch_end(
trainer: Trainer,
pl_module: Model,
output: dict[str, list[Prediction]],
batch_indices: list[int] | None,
batch: TensorDict[Address, TensorFieldBase],
batch_idx: int,
dataloader_idx: int,
) -> None
Source code in src/json2vec/inference/callback.py
Preprocessing
preprocess
preprocess(
func: Callable[..., Any] | None = None,
*,
yields: bool | None = None,
**kwargs: Any,
) -> Callable[..., Any]
Register a callable as a json2vec preprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable[..., Any] | None
|
Callable to register when used as |
None
|
yields
|
bool | None
|
Set to |
None
|
**kwargs
|
Any
|
Reserved for validation of unsupported decorator arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
Callable[..., Any]
|
The original callable, after registering it in |
Example
Source code in src/json2vec/preprocessors/base.py
Preprocessor
Bases: BaseModel
Registered observation preprocessor.
A transformation preprocessor returns one dict. A generator preprocessor yields or returns multiple dict objects, each of which becomes a processed observation.
outputs
Yield normalized processed observations for one raw observation.
Source code in src/json2vec/preprocessors/base.py
Serving
Deployment
Bases: BaseSettings
Serving configuration for a json2vec checkpoint or model instance.
Deployment queues request/response schemas, optional preprocessors,
optional postprocessors, and update(...) mutations before the model is
loaded by FastAPI application startup.
forge
forge(
request: type[BaseModel] | None = None,
response: type[BaseModel] | None = None,
) -> Deployment
Attach optional Pydantic request and response signatures.
Source code in src/json2vec/inference/deployment.py
preprocess
Attach an optional request preprocessor.
If this method is not called, request objects are encoded unchanged.
Source code in src/json2vec/inference/deployment.py
postprocess
Attach an optional response postprocessor.
Source code in src/json2vec/inference/deployment.py
update
update(
*predicates: NodePredicate
| NodeAttribute
| Callable[[Node], bool],
strict: bool = True,
allow_extra: bool = False,
include_root: bool = True,
validate: bool = True,
**values: Any,
) -> Deployment
Queue a model schema mutation to apply during server startup.
This mirrors Model.update(...) and is useful for serving-time changes
such as target=False.
Source code in src/json2vec/inference/deployment.py
serve
Start the FastAPI server for the configured checkpoint or model.
Source code in src/json2vec/inference/deployment.py
Tensorfield Extension API
Plugin
Registry object for a tensorfield implementation.
Register request, tensorfield, embedder, decoder, loss, and write
components with @plugin.register. Creating a plugin with an existing
name replaces the registry entry and emits a warning.
Source code in src/json2vec/tensorfields/base.py
register
Register one tensorfield component with this plugin.
Source code in src/json2vec/tensorfields/base.py
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
callback
Register one or more Lightning callback factories for this tensorfield.
Source code in src/json2vec/tensorfields/base.py
TensorFieldBase
Bases: Renderable
Tensorized field values plus trainable target state.
STATE_LABELS
class-attribute
instance-attribute
STATE_STYLES
class-attribute
instance-attribute
STATE_STYLES: dict[int, str] = {
value: "bold green",
value: "bold yellow",
value: "dim",
value: "bold magenta",
value: "bold cyan",
}
new
abstractmethod
classmethod
empty
abstractmethod
classmethod
mask
abstractmethod
target
abstractmethod
EmbedderBase
DecoderBase
Bases: Module
Base class for tensorfield decoders.