Torch¶
- Context
- DataLoader
- Distributors
- Initializers
- Model
- Schedulers
- Utils
- Modules
- Training
- Workflow
formed.integrations.torch.context
¶
Context management for PyTorch operations.
use_device
¶
use_device(device=None)
Context manager to set and restore the default PyTorch device.
This context manager allows temporarily setting the default device
used in PyTorch operations (e.g., in ensure_torch_tensor). It saves
the current device on entry and restores it on exit.
| PARAMETER | DESCRIPTION |
|---|---|
device
|
Device to use within the context. Can be a torch.device,
a string like
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
device
|
The current device within the context. |
Examples:
>>> import torch
>>> from formed.integrations.torch import use_device, ensure_torch_tensor
>>> import numpy as np
>>> with use_device("cuda:0" if torch.cuda.is_available() else "cpu"):
... arr = np.array([1.0, 2.0, 3.0])
... tensor = ensure_torch_tensor(arr)
... print(tensor.device)
cuda:0 # or cpu if CUDA not available
Source code in src/formed/integrations/torch/context.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
get_device
¶
get_device()
Get the current default PyTorch device from context.
| RETURNS | DESCRIPTION |
|---|---|
device | None
|
The current device set in the context, or |
Examples:
>>> from formed.integrations.torch import use_device, get_device
>>> with use_device("cuda:0"):
... print(get_device())
cuda:0
Source code in src/formed/integrations/torch/context.py
53 54 55 56 57 58 59 60 61 62 63 64 65 | |
formed.integrations.torch.dataloader
¶
DataLoader utilities for PyTorch training.
This module provides convenient wrappers for creating PyTorch DataLoaders that work seamlessly with the formed training framework.
Examples:
>>> from formed.integrations.torch import DataLoader
>>>
>>> # Create a simple dataloader
>>> train_loader = DataLoader(
... batch_size=32,
... shuffle=True,
... collate_fn=my_collate_fn
... )
>>>
>>> # Use with trainer
>>> trainer = TorchTrainer(
... train_dataloader=train_loader,
... ...
... )
DataLoader
¶
DataLoader(
batch_size,
shuffle=False,
collate_fn=None,
num_workers=0,
drop_last=False,
pin_memory=False,
**kwargs,
)
Simple DataLoader wrapper for PyTorch training.
This class wraps PyTorch's DataLoader with a simpler interface that works with the formed training framework.
| PARAMETER | DESCRIPTION |
|---|---|
batch_size
|
Number of samples per batch.
TYPE:
|
shuffle
|
Whether to shuffle the data at every epoch.
TYPE:
|
collate_fn
|
Function to collate samples into batches.
TYPE:
|
num_workers
|
Number of subprocesses for data loading.
TYPE:
|
drop_last
|
Whether to drop the last incomplete batch.
TYPE:
|
pin_memory
|
If True, tensors are copied to CUDA pinned memory.
TYPE:
|
**kwargs
|
Additional arguments passed to torch.utils.data.DataLoader.
DEFAULT:
|
Examples:
>>> def collate_fn(batch):
... # Convert list of samples to batch tensors
... return {"features": torch.stack([x["features"] for x in batch])}
>>>
>>> loader = DataLoader(
... batch_size=32,
... shuffle=True,
... collate_fn=collate_fn
... )
Source code in src/formed/integrations/torch/dataloader.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
formed.integrations.torch.distributors
¶
Distributed computing abstractions for PyTorch models.
This module provides abstractions for distributed training across multiple devices, supporting both single-device and data-parallel training strategies.
Key Components
BaseDistributor: Abstract interface for device distribution strategiesSingleDeviceDistributor: No-op distributor for single-device trainingDataParallelDistributor: Data-parallel training using torch.nn.DataParallel
Features
- Transparent device sharding and replication
- Reduction operations (mean, sum) across devices
- Compatible with TorchTrainer
Examples:
>>> from formed.integrations.torch import DataParallelDistributor
>>> import torch
>>>
>>> # Create data-parallel distributor for all available GPUs
>>> distributor = DataParallelDistributor()
>>>
>>> # Shard batch across devices
>>> sharded_batch = distributor.shard(batch)
BaseDistributor
¶
Bases: Registrable, ABC, Generic[ModelInputT]
Abstract base class for device distribution strategies.
BaseDistributor defines the interface for distributing computations across devices in a PyTorch training pipeline. It provides a unified API for single-device, data-parallel, and distributed data-parallel training.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
ModelInputT
|
Type of model input data.
|
Key Methods
- device: Primary device for computation
- is_main_process: Whether this is the main process (for logging, saving, etc.)
- wrap_model: Wrap model for distributed training
- prepare_data_loader: Prepare data loader with appropriate sampler
- reduce: Reduce tensor across devices/processes
- barrier: Synchronize all processes
- all_gather: Gather tensors from all processes
is_main_process
property
¶
is_main_process
Whether this is the main process.
The main process is responsible for: - Logging to console - Saving models and checkpoints - Writing metrics to file
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if this is the main process (rank 0), False otherwise. |
world_size
property
¶
world_size
Total number of processes/devices.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of processes in distributed training, or 1 for single device. |
rank
property
¶
rank
Global rank of this process.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Rank of this process (0 for main process). |
wrap_model
¶
wrap_model(model)
Wrap model for distributed training.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Model to wrap.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Module
|
Wrapped model (DataParallel, DDP, or unchanged). |
Source code in src/formed/integrations/torch/distributors.py
105 106 107 108 109 110 111 112 113 114 115 | |
prepare_data_loader
¶
prepare_data_loader(
dataset,
batch_size,
shuffle=False,
num_workers=0,
drop_last=False,
**kwargs,
)
Prepare data loader with appropriate sampler for this distributor.
For single device: uses default sampler For DataParallel: uses default sampler (data split happens in forward) For DDP: uses DistributedSampler to split data across processes
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Dataset to load.
TYPE:
|
batch_size
|
Batch size per device/process.
TYPE:
|
shuffle
|
Whether to shuffle data.
TYPE:
|
num_workers
|
Number of worker processes.
TYPE:
|
drop_last
|
Whether to drop last incomplete batch.
TYPE:
|
**kwargs
|
Additional arguments for DataLoader.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
DataLoader
|
Configured DataLoader. |
Source code in src/formed/integrations/torch/distributors.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
reduce
abstractmethod
¶
reduce(tensor, op='mean')
Reduce a tensor across devices/processes.
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Tensor to reduce.
TYPE:
|
op
|
Reduction operation (
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
_TensorT
|
Reduced tensor. |
Source code in src/formed/integrations/torch/distributors.py
155 156 157 158 159 160 161 162 163 164 165 166 167 | |
barrier
¶
barrier()
Synchronize all processes.
This is a no-op for single device and DataParallel. For DDP, it blocks until all processes reach this point.
Source code in src/formed/integrations/torch/distributors.py
169 170 171 172 173 174 175 176 | |
all_gather
¶
all_gather(tensor)
Gather tensors from all processes.
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Tensor to gather.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Tensor]
|
List of tensors from all processes. |
list[Tensor]
|
For single device/DataParallel, returns [tensor]. |
Source code in src/formed/integrations/torch/distributors.py
178 179 180 181 182 183 184 185 186 187 188 189 | |
cleanup
¶
cleanup()
Cleanup resources (e.g., distributed process group).
This is a no-op for single device and DataParallel. For DDP, destroys the process group.
Source code in src/formed/integrations/torch/distributors.py
191 192 193 194 195 196 197 198 | |
SingleDeviceDistributor
¶
SingleDeviceDistributor(device=None)
Bases: BaseDistributor[ModelInputT]
Distributor for single-device training.
This distributor operates on a single device without any distribution. All shard, replicate, and unreplicate operations are no-ops.
| PARAMETER | DESCRIPTION |
|---|---|
device
|
Device to use (default:
TYPE:
|
Examples:
>>> distributor = SingleDeviceDistributor(device="cuda:0")
>>> model = model.to(distributor.device)
Source code in src/formed/integrations/torch/distributors.py
217 218 219 220 | |
is_main_process
property
¶
is_main_process
Whether this is the main process.
The main process is responsible for: - Logging to console - Saving models and checkpoints - Writing metrics to file
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if this is the main process (rank 0), False otherwise. |
world_size
property
¶
world_size
Total number of processes/devices.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of processes in distributed training, or 1 for single device. |
rank
property
¶
rank
Global rank of this process.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Rank of this process (0 for main process). |
reduce
¶
reduce(tensor, op='mean')
Return tensor unchanged (no reduction needed for single device).
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Input tensor.
TYPE:
|
op
|
Reduction operation (ignored).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
_TensorT
|
Input tensor unchanged. |
Source code in src/formed/integrations/torch/distributors.py
226 227 228 229 230 231 232 233 234 235 236 237 | |
wrap_model
¶
wrap_model(model)
Wrap model for distributed training.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Model to wrap.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Module
|
Wrapped model (DataParallel, DDP, or unchanged). |
Source code in src/formed/integrations/torch/distributors.py
105 106 107 108 109 110 111 112 113 114 115 | |
prepare_data_loader
¶
prepare_data_loader(
dataset,
batch_size,
shuffle=False,
num_workers=0,
drop_last=False,
**kwargs,
)
Prepare data loader with appropriate sampler for this distributor.
For single device: uses default sampler For DataParallel: uses default sampler (data split happens in forward) For DDP: uses DistributedSampler to split data across processes
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Dataset to load.
TYPE:
|
batch_size
|
Batch size per device/process.
TYPE:
|
shuffle
|
Whether to shuffle data.
TYPE:
|
num_workers
|
Number of worker processes.
TYPE:
|
drop_last
|
Whether to drop last incomplete batch.
TYPE:
|
**kwargs
|
Additional arguments for DataLoader.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
DataLoader
|
Configured DataLoader. |
Source code in src/formed/integrations/torch/distributors.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
barrier
¶
barrier()
Synchronize all processes.
This is a no-op for single device and DataParallel. For DDP, it blocks until all processes reach this point.
Source code in src/formed/integrations/torch/distributors.py
169 170 171 172 173 174 175 176 | |
all_gather
¶
all_gather(tensor)
Gather tensors from all processes.
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Tensor to gather.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Tensor]
|
List of tensors from all processes. |
list[Tensor]
|
For single device/DataParallel, returns [tensor]. |
Source code in src/formed/integrations/torch/distributors.py
178 179 180 181 182 183 184 185 186 187 188 189 | |
cleanup
¶
cleanup()
Cleanup resources (e.g., distributed process group).
This is a no-op for single device and DataParallel. For DDP, destroys the process group.
Source code in src/formed/integrations/torch/distributors.py
191 192 193 194 195 196 197 198 | |
DataParallelDistributor
¶
DataParallelDistributor(
device_ids=None, output_device=None
)
Bases: BaseDistributor[ModelInputT]
Distributor for data-parallel training across multiple GPUs.
This distributor uses torch.nn.DataParallel to execute the same computation
on different data shards across multiple GPUs. Data is automatically
sharded along the batch dimension.
| PARAMETER | DESCRIPTION |
|---|---|
device_ids
|
List of GPU device IDs to use. Defaults to all available GPUs.
TYPE:
|
output_device
|
Device for outputs. Defaults to device_ids[0].
TYPE:
|
Examples:
>>> # Train on GPUs 0 and 1 with data parallelism
>>> distributor = DataParallelDistributor(device_ids=[0, 1])
>>>
>>> # Wrap model for data parallel training
>>> model = distributor.wrap_model(model)
Note
Batch size must be divisible by the number of devices for proper sharding.
Source code in src/formed/integrations/torch/distributors.py
264 265 266 267 268 269 270 271 272 273 274 275 276 | |
is_main_process
property
¶
is_main_process
Whether this is the main process.
The main process is responsible for: - Logging to console - Saving models and checkpoints - Writing metrics to file
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if this is the main process (rank 0), False otherwise. |
world_size
property
¶
world_size
Total number of processes/devices.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of processes in distributed training, or 1 for single device. |
rank
property
¶
rank
Global rank of this process.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Rank of this process (0 for main process). |
wrap_model
¶
wrap_model(model)
Wrap model with DataParallel.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Model to wrap.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Module
|
|
Source code in src/formed/integrations/torch/distributors.py
282 283 284 285 286 287 288 289 290 291 292 | |
reduce
¶
reduce(tensor, op='mean')
Reduce tensor across devices.
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Tensor to reduce across device dimension.
TYPE:
|
op
|
Reduction operation -
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
_TensorT
|
Reduced tensor. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If unsupported reduction operation is specified. |
Source code in src/formed/integrations/torch/distributors.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 | |
prepare_data_loader
¶
prepare_data_loader(
dataset,
batch_size,
shuffle=False,
num_workers=0,
drop_last=False,
**kwargs,
)
Prepare data loader with appropriate sampler for this distributor.
For single device: uses default sampler For DataParallel: uses default sampler (data split happens in forward) For DDP: uses DistributedSampler to split data across processes
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Dataset to load.
TYPE:
|
batch_size
|
Batch size per device/process.
TYPE:
|
shuffle
|
Whether to shuffle data.
TYPE:
|
num_workers
|
Number of worker processes.
TYPE:
|
drop_last
|
Whether to drop last incomplete batch.
TYPE:
|
**kwargs
|
Additional arguments for DataLoader.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
DataLoader
|
Configured DataLoader. |
Source code in src/formed/integrations/torch/distributors.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
barrier
¶
barrier()
Synchronize all processes.
This is a no-op for single device and DataParallel. For DDP, it blocks until all processes reach this point.
Source code in src/formed/integrations/torch/distributors.py
169 170 171 172 173 174 175 176 | |
all_gather
¶
all_gather(tensor)
Gather tensors from all processes.
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Tensor to gather.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Tensor]
|
List of tensors from all processes. |
list[Tensor]
|
For single device/DataParallel, returns [tensor]. |
Source code in src/formed/integrations/torch/distributors.py
178 179 180 181 182 183 184 185 186 187 188 189 | |
cleanup
¶
cleanup()
Cleanup resources (e.g., distributed process group).
This is a no-op for single device and DataParallel. For DDP, destroys the process group.
Source code in src/formed/integrations/torch/distributors.py
191 192 193 194 195 196 197 198 | |
DistributedDataParallelDistributor
¶
DistributedDataParallelDistributor(
backend=None,
init_method="env://",
world_size=None,
rank=None,
local_rank=None,
find_unused_parameters=False,
broadcast_buffers=True,
bucket_cap_mb=25,
)
Bases: BaseDistributor[ModelInputT]
Distributor for distributed data-parallel training using DDP.
This distributor uses torch.nn.parallel.DistributedDataParallel to execute training across multiple processes and devices. This is more efficient than DataParallel for multi-GPU training as it uses one process per GPU.
| PARAMETER | DESCRIPTION |
|---|---|
backend
|
Backend to use for distributed training (
TYPE:
|
init_method
|
URL specifying how to initialize the process group.
Defaults to
TYPE:
|
world_size
|
Total number of processes. If
TYPE:
|
rank
|
Rank of this process. If None, reads from environment.
TYPE:
|
local_rank
|
Local rank on this machine. If
TYPE:
|
find_unused_parameters
|
Whether to find unused parameters. Default
TYPE:
|
broadcast_buffers
|
Whether to broadcast buffers. Default
TYPE:
|
bucket_cap_mb
|
Bucket size in MB for gradient allreduce. Default
TYPE:
|
Environment Variables
RANK: Global rank of the processLOCAL_RANK: Local rank on the machineWORLD_SIZE: Total number of processesMASTER_ADDR: Address of the master nodeMASTER_PORT: Port of the master node
Examples:
>>> # On each process, initialize the distributor
>>> distributor = DistributedDataParallelDistributor(
... backend="nccl",
... init_method="env://"
... )
>>>
>>> # Wrap model with DDP
>>> model = distributor.wrap_model(model)
>>>
>>> # Train as usual - gradients are automatically synchronized
Note
- Requires launching multiple processes (e.g., using
torch.distributed.launch) - Each process should initialize its own distributor
- Batch size should be per-process batch size
Source code in src/formed/integrations/torch/distributors.py
361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 | |
wrap_model
¶
wrap_model(model)
Wrap model with DistributedDataParallel.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Model to wrap.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Module
|
DDP wrapped model. |
Source code in src/formed/integrations/torch/distributors.py
447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 | |
prepare_data_loader
¶
prepare_data_loader(
dataset,
batch_size,
shuffle=False,
num_workers=0,
drop_last=False,
**kwargs,
)
Prepare data loader with DistributedSampler for DDP.
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Dataset to load.
TYPE:
|
batch_size
|
Batch size per process.
TYPE:
|
shuffle
|
Whether to shuffle data.
TYPE:
|
num_workers
|
Number of worker processes.
TYPE:
|
drop_last
|
Whether to drop last incomplete batch.
TYPE:
|
**kwargs
|
Additional arguments for DataLoader.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
DataLoader
|
DataLoader with DistributedSampler. |
Source code in src/formed/integrations/torch/distributors.py
469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 | |
reduce
¶
reduce(tensor, op='mean')
Reduce tensor across all processes.
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Tensor to reduce.
TYPE:
|
op
|
Reduction operation -
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
_TensorT
|
Reduced tensor. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If unsupported reduction operation is specified. |
Source code in src/formed/integrations/torch/distributors.py
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 | |
all_gather
¶
all_gather(tensor)
Gather tensors from all processes.
| PARAMETER | DESCRIPTION |
|---|---|
tensor
|
Tensor to gather.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[Tensor]
|
List of tensors from all processes. |
Source code in src/formed/integrations/torch/distributors.py
535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 | |
barrier
¶
barrier()
Synchronize all processes.
This creates a barrier that blocks until all processes reach this point.
Source code in src/formed/integrations/torch/distributors.py
551 552 553 554 555 556 557 558 559 | |
cleanup
¶
cleanup()
Cleanup distributed process group.
This should be called at the end of training.
Source code in src/formed/integrations/torch/distributors.py
561 562 563 564 565 566 567 568 569 570 | |
formed.integrations.torch.initializers
¶
BaseTensorInitializer
¶
Bases: Registrable
UniformTensorInitializer
¶
UniformTensorInitializer(shape, low=0.0, high=1.0)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
14 15 16 17 | |
NormalTensorInitializer
¶
NormalTensorInitializer(shape, mean=0.0, std=1.0)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
25 26 27 28 | |
XavierUniformTensorInitializer
¶
XavierUniformTensorInitializer(shape, gain=1.0)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
36 37 38 | |
XavierNormalTensorInitializer
¶
XavierNormalTensorInitializer(shape, gain=1.0)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
48 49 50 | |
KaimingUniformTensorInitializer
¶
KaimingUniformTensorInitializer(
shape, a=0, mode="fan_in", nonlinearity="leaky_relu"
)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
60 61 62 63 64 65 66 67 68 69 70 | |
KaimingNormalTensorInitializer
¶
KaimingNormalTensorInitializer(
shape, a=0, mode="fan_in", nonlinearity="leaky_relu"
)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
80 81 82 83 84 85 86 87 88 89 90 | |
OrthogonalTensorInitializer
¶
OrthogonalTensorInitializer(shape, gain=1.0)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
100 101 102 | |
SparseTensorInitializer
¶
SparseTensorInitializer(shape, sparsity=0.1, std=0.01)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
112 113 114 115 | |
ZerosTensorInitializer
¶
ZerosTensorInitializer(shape)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
125 126 | |
OnesTensorInitializer
¶
OnesTensorInitializer(shape)
Bases: BaseTensorInitializer
Source code in src/formed/integrations/torch/initializers.py
134 135 | |
formed.integrations.torch.model
¶
Base model abstraction for PyTorch models.
This module provides the base class for all PyTorch models in the framework, integrating torch.nn.Module with the registrable pattern for configuration-based model instantiation.
Key Features
- Integration with PyTorch Module system
- Registrable pattern for configuration-based instantiation
- Generic type support for inputs, outputs, and parameters
- Compatible with TorchTrainer for end-to-end training
Examples:
>>> from formed.integrations.torch import BaseTorchModel
>>> import torch
>>> import torch.nn as nn
>>>
>>> @BaseTorchModel.register("my_model")
... class MyModel(BaseTorchModel[dict, torch.Tensor, None]):
... def __init__(self, hidden_dim: int):
... super().__init__()
... self.linear = nn.Linear(10, hidden_dim)
...
... def forward(self, inputs: dict, params: None = None) -> torch.Tensor:
... return self.linear(inputs["features"])
BaseTorchModel
¶
Bases: Module, Registrable, Generic[ModelInputT, ModelOutputT, ModelParamsT]
Base class for all PyTorch models in the framework.
This class combines PyTorch's nn.Module with the registrable pattern, allowing models to be instantiated from configuration files and seamlessly integrated with the training infrastructure.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
ModelInputT
|
Type of input data to the model.
|
ModelOutputT
|
Type of model output.
|
ModelParamsT
|
Type of additional parameters (typically None or a dataclass).
|
Note
Subclasses should implement forward() to define the forward pass.
Models are automatically compatible with TorchTrainer when registered.
forward
¶
forward(inputs, params=None)
Forward pass of the model.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input data to the model.
TYPE:
|
params
|
Optional additional parameters for the forward pass.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ModelOutputT
|
Model output. |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
This method must be implemented by subclasses. |
Source code in src/formed/integrations/torch/model.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
formed.integrations.torch.schedulers
¶
Learning rate schedulers for PyTorch models.
This module provides custom learning rate schedulers that extend PyTorch's standard scheduler functionality, including cosine annealing with warm restarts and warmup phases.
Available Schedulers
CosineLRScheduler: Cosine annealing with optional restarts and warmup
Features
- Cosine decay with configurable cycle length
- Warm restarts with cycle multiplier
- Learning rate warmup phase
- Cycle-based decay multiplier
- Compatible with Colt registration system
Examples:
>>> from formed.integrations.torch.schedulers import CosineLRScheduler
>>>
>>> scheduler = CosineLRScheduler(
... optimizer,
... t_initial=100,
... lr_min=1e-6,
... warmup_t=5,
... warmup_lr_init=1e-5
... )
>>> for epoch in range(num_epochs):
... train(...)
... scheduler.step(epoch + 1)
CosineLRScheduler
¶
CosineLRScheduler(
optimizer,
t_initial,
lr_min=0.0,
cycle_mul=1.0,
cycle_decay=1.0,
cycle_limit=1,
warmup_t=0,
warmup_lr_init=0.0,
warmup_prefix=False,
t_in_epochs=True,
last_epoch=-1,
)
Bases: LRScheduler
Cosine annealing learning rate scheduler with warm restarts.
Implements the SGDR (Stochastic Gradient Descent with Warm Restarts) algorithm described in https://arxiv.org/abs/1608.03983.
This scheduler decreases the learning rate following a cosine curve, optionally restarting the schedule multiple times during training. It also supports a warmup phase at the beginning.
| PARAMETER | DESCRIPTION |
|---|---|
optimizer
|
Wrapped optimizer.
TYPE:
|
t_initial
|
Number of iterations/epochs for the first cycle.
TYPE:
|
lr_min
|
Minimum learning rate. Default:
TYPE:
|
cycle_mul
|
Multiplier for cycle length after each restart. Default:
TYPE:
|
cycle_decay
|
Decay factor applied to learning rate at each restart. Default:
TYPE:
|
cycle_limit
|
Maximum number of restart cycles (0 means no limit). Default:
TYPE:
|
warmup_t
|
Number of warmup iterations/epochs. Default:
TYPE:
|
warmup_lr_init
|
Initial learning rate during warmup. Default:
TYPE:
|
warmup_prefix
|
If
TYPE:
|
t_in_epochs
|
If
TYPE:
|
last_epoch
|
The index of last epoch. Default:
TYPE:
|
Examples:
>>> # Create scheduler with 100 epoch cycles and 5 epoch warmup
>>> scheduler = CosineLRScheduler(
... optimizer,
... t_initial=100,
... lr_min=1e-6,
... cycle_mul=2.0, # Each cycle is 2x longer
... warmup_t=5,
... warmup_lr_init=1e-5
... )
>>>
>>> # Update learning rate each epoch
>>> for epoch in range(num_epochs):
... train_one_epoch(...)
... scheduler.step(epoch + 1)
Source code in src/formed/integrations/torch/schedulers.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | |
warmup_steps
instance-attribute
¶
warmup_steps = [
((base_lr - warmup_lr_init) / warmup_t)
for base_lr in (base_lrs)
]
get_lr
¶
get_lr()
Compute learning rate at the current step.
| RETURNS | DESCRIPTION |
|---|---|
list[float | Tensor]
|
List of learning rates for each parameter group. |
Source code in src/formed/integrations/torch/schedulers.py
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
get_cycle_length
¶
get_cycle_length(cycles=0)
Calculate total number of iterations for a given number of cycles.
| PARAMETER | DESCRIPTION |
|---|---|
cycles
|
Number of cycles (
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Total number of iterations. |
Source code in src/formed/integrations/torch/schedulers.py
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
state_dict
¶
state_dict()
Return the state of the scheduler as a dict.
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary containing scheduler state. |
Source code in src/formed/integrations/torch/schedulers.py
191 192 193 194 195 196 197 198 199 200 201 202 203 | |
load_state_dict
¶
load_state_dict(state_dict)
Load the scheduler state.
| PARAMETER | DESCRIPTION |
|---|---|
state_dict
|
Scheduler state dict.
TYPE:
|
Source code in src/formed/integrations/torch/schedulers.py
205 206 207 208 209 210 211 212 | |
formed.integrations.torch.utils
¶
Utility functions for PyTorch integration.
PoolingMethod
module-attribute
¶
PoolingMethod = Literal[
"mean", "max", "min", "sum", "first", "last", "hier"
]
set_random_seed
¶
set_random_seed(seed)
Set random seed for reproducibility across torch, numpy, and random.
| PARAMETER | DESCRIPTION |
|---|---|
seed
|
Random seed value.
TYPE:
|
Source code in src/formed/integrations/torch/utils.py
17 18 19 20 21 22 23 24 25 26 27 | |
ensure_torch_tensor
¶
ensure_torch_tensor(x, dtype=None, device=None)
Convert array-like objects to PyTorch tensors.
This function converts various array-like objects (numpy arrays, lists, etc.) to PyTorch tensors. If the input is already a tensor, it returns it with the appropriate dtype and device.
The device can be specified explicitly via the device parameter, or it will
be taken from the context set by use_device(). If neither is provided and
the input is not already a tensor, the tensor will be created on CPU.
| PARAMETER | DESCRIPTION |
|---|---|
x
|
Input data (tensor, numpy array, list, etc.)
TYPE:
|
dtype
|
Optional dtype for the output tensor.
TYPE:
|
device
|
Optional device for the output tensor. If None, uses the device
from context (set by
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
PyTorch tensor on the specified device with the specified dtype. |
Examples:
>>> import numpy as np
>>> from formed.integrations.torch import ensure_torch_tensor, use_device
>>> arr = np.array([1, 2, 3])
>>>
>>> # Without context
>>> tensor = ensure_torch_tensor(arr)
>>> tensor.device
device(type='cpu')
>>>
>>> # With context
>>> with use_device("cuda:0"):
... tensor = ensure_torch_tensor(arr)
... print(tensor.device)
cuda:0
Source code in src/formed/integrations/torch/utils.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
move_to_device
¶
move_to_device(inputs, device)
Move tensor inputs to the appropriate device.
This function only moves existing torch.Tensor objects to the target device.
Other types (numpy arrays, primitives, etc.) are left unchanged.
Users should explicitly convert numpy arrays to tensors in their model's
forward method using ensure_torch_tensor().
Source code in src/formed/integrations/torch/utils.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
determine_ndim
¶
determine_ndim(first, *args)
Source code in src/formed/integrations/torch/utils.py
164 165 166 167 168 169 170 171 172 173 174 175 176 | |
masked_pool
¶
masked_pool(
inputs,
*,
mask=None,
pooling="mean",
normalize=False,
window_size=None,
)
Apply masked pooling over the sequence dimension.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input tensor of shape
TYPE:
|
mask
|
Mask tensor of shape
TYPE:
|
pooling
|
Pooling method or sequence of methods.
TYPE:
|
normalize
|
Whether to L2-normalize before pooling.
TYPE:
|
window_size
|
Window size for hierarchical pooling (required if
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Pooled tensor of shape |
Source code in src/formed/integrations/torch/utils.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 | |
info_value_of_dtype
¶
info_value_of_dtype(dtype)
Returns the finfo or iinfo object of a given PyTorch data type. Does not allow torch.bool.
Source code in src/formed/integrations/torch/utils.py
277 278 279 280 281 282 283 284 | |
min_value_of_dtype
¶
min_value_of_dtype(dtype)
Returns the minimum value of a given PyTorch data type. Does not allow torch.bool.
Source code in src/formed/integrations/torch/utils.py
287 288 289 | |
max_value_of_dtype
¶
max_value_of_dtype(dtype)
Returns the maximum value of a given PyTorch data type. Does not allow torch.bool.
Source code in src/formed/integrations/torch/utils.py
292 293 294 | |
tiny_value_of_dtype
¶
tiny_value_of_dtype(dtype)
Returns a moderately tiny value for a given PyTorch data type that is used to avoid numerical
issues such as division by zero.
This is different from info_value_of_dtype(dtype).tiny because it causes some NaN bugs.
Only supports floating point dtypes.
Source code in src/formed/integrations/torch/utils.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |
masked_mean
¶
masked_mean(vector, mask, dim, keepdim=False)
Source code in src/formed/integrations/torch/utils.py
314 315 316 317 318 319 320 321 322 323 | |
masked_softmax
¶
masked_softmax(
vector, mask, dim=-1, memory_efficient=False
)
Source code in src/formed/integrations/torch/utils.py
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 | |
formed.integrations.torch.modules.embedders
¶
Text embedding modules for PyTorch models.
This module provides embedders that convert tokenized text into dense vector representations. Embedders handle various text representations including surface forms, part-of-speech tags, and character sequences.
Key Components
BaseEmbedder: Abstract base class for all embeddersTokenEmbedder: Embeds token ID sequences into dense vectorsAnalyzedTextEmbedder: Combines multiple embedding types (surface, POS, chars)
Features
- Support for nested token sequences (e.g., word -> character)
- Automatic masking and padding handling
- Configurable vectorization for character-level embeddings
- Concatenation of multiple embedding types
Examples:
>>> from formed.integrations.torch.modules import TokenEmbedder, AnalyzedTextEmbedder
>>> import torch.nn as nn
>>>
>>> # Simple token embedder
>>> embedder = TokenEmbedder(
... vocab_size=10000,
... embedding_dim=128
... )
>>>
>>> # Multi-feature embedder
>>> embedder = AnalyzedTextEmbedder(
... surface=TokenEmbedder(vocab_size=10000, embedding_dim=128),
... postag=TokenEmbedder(vocab_size=50, embedding_dim=32)
... )
SurfaceBatchT
module-attribute
¶
SurfaceBatchT = TypeVar(
"SurfaceBatchT", bound="IIDSequenceBatch", default=Any
)
PostagBatchT
module-attribute
¶
PostagBatchT = TypeVar(
"PostagBatchT",
bound=Union["IIDSequenceBatch", None],
default=Any,
)
CharacterBatchT
module-attribute
¶
CharacterBatchT = TypeVar(
"CharacterBatchT",
bound=Union["IIDSequenceBatch", None],
default=Any,
)
TokenVectorBatchT
module-attribute
¶
TokenVectorBatchT = TypeVar(
"TokenVectorBatchT",
bound=Union["IVariableTensorBatch", None],
default=Any,
)
IVariableTensorBatch
¶
IAnalyzedTextBatch
¶
Bases: Protocol[SurfaceBatchT, PostagBatchT, CharacterBatchT, TokenVectorBatchT]
Protocol for analyzed text batches with multiple linguistic features.
| ATTRIBUTE | DESCRIPTION |
|---|---|
surfaces |
Surface form token IDs.
TYPE:
|
postags |
Part-of-speech tag IDs (optional).
TYPE:
|
characters |
Character sequence IDs (optional).
TYPE:
|
token_vectors |
Token-level dense vectors (optional).
TYPE:
|
EmbedderOutput
¶
Bases: NamedTuple
Output from an embedder.
| ATTRIBUTE | DESCRIPTION |
|---|---|
embeddings |
Dense embeddings of shape
TYPE:
|
mask |
Attention mask of shape
TYPE:
|
BaseEmbedder
¶
Bases: Module, Registrable, Generic[_TextBatchT], ABC
Abstract base class for text embedders.
Embedders convert tokenized text into dense vector representations. They output both embeddings and attention masks.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
_TextBatchT
|
Type of input batch (e.g.,
|
forward
abstractmethod
¶
forward(inputs)
Embed input tokens into dense vectors.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Batch of tokenized text.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
EmbedderOutput
|
EmbedderOutput containing embeddings and mask. |
Source code in src/formed/integrations/torch/modules/embedders.py
132 133 134 135 136 137 138 139 140 141 142 143 | |
get_output_dim
abstractmethod
¶
get_output_dim()
Get the output embedding dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Embedding dimension. |
Source code in src/formed/integrations/torch/modules/embedders.py
145 146 147 148 149 150 151 152 153 | |
PassThroughEmbedder
¶
Bases: BaseEmbedder[IVariableTensorBatch[TensorCompatibleT]]
Embedder that passes through input tensors unchanged.
This embedder is useful when the input tensors are already in the desired embedding format. It simply returns the input tensors and their masks.
Examples:
>>> from formed.integrations.torch.modules import PassThroughEmbedder
>>>
>>> embedder = PassThroughEmbedder()
>>> output = embedder(variable_tensor_batch)
>>> assert torch.equal(output.embeddings, variable_tensor_batch.tensor)
>>> assert torch.equal(output.mask, variable_tensor_batch.mask)
forward
¶
forward(inputs)
Source code in src/formed/integrations/torch/modules/embedders.py
176 177 178 179 180 181 182 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/embedders.py
184 185 | |
TokenEmbedder
¶
TokenEmbedder(
initializer,
*,
padding_idx=0,
freeze=False,
vectorizer=None,
)
Bases: BaseEmbedder['IIDSequenceBatch']
Embedder for token ID sequences.
This embedder converts token IDs into dense embeddings using a learned
embedding matrix. It supports both 2D (batch_size, seq_len) and 3D
(batch_size, seq_len, char_len) token ID tensors.
For 3D inputs (e.g., character-level tokens within words), the embedder can either average the embeddings or apply a custom vectorizer.
| PARAMETER | DESCRIPTION |
|---|---|
initializer
|
Tensor initializer or callable that returns the embedding tensor.
TYPE:
|
padding_idx
|
Index of the padding token (default:
TYPE:
|
vectorizer
|
Optional vectorizer for 3D inputs (character sequences).
TYPE:
|
Examples:
>>> # Simple word embeddings
>>> embedder = TokenEmbedder(vocab_size=10000, embedding_dim=128)
>>> output = embedder(word_ids_batch)
>>>
>>> # Character-level embeddings with pooling
>>> from formed.integrations.torch.modules import BagOfEmbeddingsSequenceVectorizer
>>> embedder = TokenEmbedder(
... vocab_size=256,
... embedding_dim=32,
... vectorizer=BagOfEmbeddingsSequenceVectorizer(pooling="max")
... )
Source code in src/formed/integrations/torch/modules/embedders.py
219 220 221 222 223 224 225 226 227 228 229 230 231 | |
forward
¶
forward(inputs)
Source code in src/formed/integrations/torch/modules/embedders.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/embedders.py
268 269 | |
PretrainedTransformerEmbedder
¶
PretrainedTransformerEmbedder(
model,
auto_class=None,
subcmodule=None,
freeze=False,
eval_mode=False,
layer_to_use="last",
gradient_checkpointing=None,
**kwargs,
)
Bases: BaseEmbedder[IIDSequenceBatch]
Embedder using pretrained transformer models from Hugging Face.
This embedder wraps pretrained transformer models (BERT, RoBERTa, etc.) to extract contextualized embeddings. It uses the last hidden state from the transformer as the embedding representation.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Either a model name/path string,
TYPE:
|
auto_class
|
The auto class to use for loading the model.
TYPE:
|
subcmodule
|
Optional submodule path to extract from the loaded model (e.g.,
TYPE:
|
freeze
|
If
TYPE:
|
**kwargs
|
Additional keyword arguments passed to the model loader.
TYPE:
|
Examples:
>>> # Load a pretrained BERT model
>>> embedder = PretrainedTransformerEmbedder(
... model="bert-base-uncased",
... freeze=True
... )
>>>
>>> # Use a specific auto class
>>> from transformers import AutoModel
>>> embedder = PretrainedTransformerEmbedder(
... model="roberta-base",
... auto_class=AutoModel,
... freeze=False
... )
>>>
>>> # Use an already loaded model
>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("bert-base-uncased")
>>> embedder = PretrainedTransformerEmbedder(model=model)
Note
Models are cached using LRU cache by the load_pretrained_transformer utility.
When freeze=True, all model parameters have requires_grad=False.
Source code in src/formed/integrations/torch/modules/embedders.py
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 | |
forward
¶
forward(inputs)
Source code in src/formed/integrations/torch/modules/embedders.py
370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/embedders.py
387 388 | |
get_vocab_size
¶
get_vocab_size()
Source code in src/formed/integrations/torch/modules/embedders.py
390 391 | |
train
¶
train(mode=True)
Source code in src/formed/integrations/torch/modules/embedders.py
393 394 395 396 397 398 399 400 | |
AnalyzedTextEmbedder
¶
AnalyzedTextEmbedder(
surface=None,
postag=None,
character=None,
token_vector=None,
)
Bases: BaseEmbedder['IAnalyzedTextBatch']
Embedder for analyzed text with multiple linguistic features.
This embedder combines embeddings from multiple linguistic representations (surface forms, part-of-speech tags, character sequences) by concatenating them along the feature dimension.
| PARAMETER | DESCRIPTION |
|---|---|
surface
|
Optional embedder for surface form tokens.
TYPE:
|
postag
|
Optional embedder for part-of-speech tags.
TYPE:
|
character
|
Optional embedder for character sequences.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If all embedders are None (at least one is required). |
Examples:
>>> from formed.integrations.torch.modules import (
... AnalyzedTextEmbedder,
... TokenEmbedder
... )
>>>
>>> embedder = AnalyzedTextEmbedder(
... surface=TokenEmbedder(vocab_size=10000, embedding_dim=128),
... postag=TokenEmbedder(vocab_size=50, embedding_dim=32),
... character=TokenEmbedder(vocab_size=256, embedding_dim=32)
... )
>>>
>>> # Output dimension is sum of all embedding dimensions (128 + 32 + 32 = 192)
>>> assert embedder.get_output_dim() == 192
Note
All provided embedders share the same mask, which is taken from the last non-None embedder processed.
Source code in src/formed/integrations/torch/modules/embedders.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 | |
forward
¶
forward(inputs)
Source code in src/formed/integrations/torch/modules/embedders.py
456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/embedders.py
480 481 482 483 484 485 | |
formed.integrations.torch.modules.encoders
¶
Sequence encoding modules for PyTorch models.
This module provides encoders that process sequential data, including RNN-based encoders, positional encoders, and Transformer encoders.
Key Components
BaseSequenceEncoder: Abstract base for sequence encodersLSTMSequenceEncoder: LSTM-specific encoderGRUSequenceEncoder: GRU-specific encoderBasePositionalEncoder: Abstract base for positional encodersSinusoidalPositionalEncoder: Sinusoidal positional encodingRotaryPositionalEncoder: Rotary positional encoding (RoPE)LearnablePositionalEncoder: Learnable positional embeddingsTransformerEncoder: Transformer-based encoder with configurable masking
Features
- Bidirectional RNN support
- Stacked layers with dropout
- Masked sequence processing
- Various positional encoding strategies
- Flexible attention masking
Examples:
>>> from formed.integrations.torch.modules import LSTMSequenceEncoder
>>>
>>> # Bidirectional LSTM encoder
>>> encoder = LSTMSequenceEncoder(
... input_dim=128,
... hidden_dim=256,
... num_layers=2,
... bidirectional=True,
... dropout=0.1
... )
BaseSequenceEncoder
¶
Bases: Module, Registrable, ABC
Abstract base class for sequence encoders.
Sequence encoders process sequential data and output encoded representations.
forward
abstractmethod
¶
forward(inputs, mask=None)
Encode input sequence.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input sequence of shape
TYPE:
|
mask
|
Optional mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Encoded sequence of shape |
Source code in src/formed/integrations/torch/modules/encoders.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
get_input_dim
abstractmethod
¶
get_input_dim()
Get the expected input dimension.
Source code in src/formed/integrations/torch/modules/encoders.py
75 76 77 78 | |
get_output_dim
abstractmethod
¶
get_output_dim()
Get the output dimension.
Source code in src/formed/integrations/torch/modules/encoders.py
80 81 82 83 | |
LSTMSequenceEncoder
¶
LSTMSequenceEncoder(
input_dim,
hidden_dim,
num_layers=1,
bidirectional=False,
dropout=0.0,
batch_first=True,
)
Bases: BaseSequenceEncoder
LSTM-based sequence encoder.
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Input dimension.
TYPE:
|
hidden_dim
|
Hidden state dimension.
TYPE:
|
num_layers
|
Number of LSTM layers.
TYPE:
|
bidirectional
|
Whether to use bidirectional LSTM.
TYPE:
|
dropout
|
Dropout rate between layers.
TYPE:
|
batch_first
|
Whether input is batch-first (default:
TYPE:
|
Examples:
>>> encoder = LSTMSequenceEncoder(
... input_dim=128,
... hidden_dim=256,
... num_layers=2,
... bidirectional=True
... )
Source code in src/formed/integrations/torch/modules/encoders.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
lstm
instance-attribute
¶
lstm = LSTM(
input_size=input_dim,
hidden_size=hidden_dim,
num_layers=num_layers,
bidirectional=bidirectional,
dropout=dropout if num_layers > 1 else 0.0,
batch_first=batch_first,
)
forward
¶
forward(inputs, mask=None)
Encode input sequence.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape
TYPE:
|
mask
|
Optional mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Encoded sequence of shape |
Source code in src/formed/integrations/torch/modules/encoders.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
165 166 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
168 169 | |
GRUSequenceEncoder
¶
GRUSequenceEncoder(
input_dim,
hidden_dim,
num_layers=1,
bidirectional=False,
dropout=0.0,
batch_first=True,
)
Bases: BaseSequenceEncoder
GRU-based sequence encoder.
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Input dimension.
TYPE:
|
hidden_dim
|
Hidden state dimension.
TYPE:
|
num_layers
|
Number of GRU layers.
TYPE:
|
bidirectional
|
Whether to use bidirectional GRU.
TYPE:
|
dropout
|
Dropout rate between layers.
TYPE:
|
batch_first
|
Whether input is batch-first (default:
TYPE:
|
Examples:
>>> encoder = GRUSequenceEncoder(
... input_dim=128,
... hidden_dim=256,
... num_layers=2,
... bidirectional=True
... )
Source code in src/formed/integrations/torch/modules/encoders.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | |
gru
instance-attribute
¶
gru = GRU(
input_size=input_dim,
hidden_size=hidden_dim,
num_layers=num_layers,
bidirectional=bidirectional,
dropout=dropout if num_layers > 1 else 0.0,
batch_first=batch_first,
)
forward
¶
forward(inputs, mask=None)
Encode input sequence.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape
TYPE:
|
mask
|
Optional mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Encoded sequence of shape |
Source code in src/formed/integrations/torch/modules/encoders.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
244 245 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
247 248 | |
ResidualSequenceEncoder
¶
ResidualSequenceEncoder(encoder)
Bases: BaseSequenceEncoder
Residual wrapper for sequence encoders.
Adds the input to the encoder output (residual connection). Requires input and output dimensions to match.
| PARAMETER | DESCRIPTION |
|---|---|
encoder
|
Base encoder to wrap. Must have matching input and output dimensions.
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.encoders import (
... ResidualSequenceEncoder,
... LSTMSequenceEncoder
... )
>>>
>>> # Wrap LSTM with residual connection
>>> base_encoder = LSTMSequenceEncoder(input_dim=128, hidden_dim=128)
>>> encoder = ResidualSequenceEncoder(encoder=base_encoder)
Source code in src/formed/integrations/torch/modules/encoders.py
273 274 275 276 277 | |
forward
¶
forward(inputs, mask=None)
Source code in src/formed/integrations/torch/modules/encoders.py
279 280 281 282 283 284 285 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
287 288 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
290 291 | |
FeedForwardSequenceEncoder
¶
FeedForwardSequenceEncoder(feedforward)
Bases: BaseSequenceEncoder
Position-wise feedforward sequence encoder.
Applies a feedforward network independently to each position in the sequence. The same transformation is applied at each position (no cross-position interaction).
| PARAMETER | DESCRIPTION |
|---|---|
feedforward
|
Feedforward network to apply at each position.
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.encoders import (
... FeedForwardSequenceEncoder
... )
>>> from formed.integrations.torch.modules.feedforward import FeedForward
>>>
>>> # Apply feedforward to each position independently
>>> feedforward = FeedForward(input_dim=128, hidden_dims=[256, 128])
>>> encoder = FeedForwardSequenceEncoder(feedforward=feedforward)
Source code in src/formed/integrations/torch/modules/encoders.py
316 317 318 | |
forward
¶
forward(inputs, mask=None)
Source code in src/formed/integrations/torch/modules/encoders.py
320 321 322 323 324 325 326 327 328 329 330 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
332 333 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
335 336 | |
GatedCnnSequenceEncoder
¶
GatedCnnSequenceEncoder(
input_dim, layers, output_dim=None, dropout=0.0
)
Bases: BaseSequenceEncoder
Gated Convolutional Neural Network sequence encoder.
Uses stacked residual blocks with gated linear units (GLU) for efficient sequence modeling. Processes sequences in both forward and backward directions, then concatenates the results for bidirectional context capture.
Based on "Language Modeling with Gated Convolutional Networks" (Dauphin et al., 2017).
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Input dimension.
TYPE:
|
layers
|
List of layer configurations for each residual block.
Each block is a list of
TYPE:
|
output_dim
|
Optional output dimension. If provided, applies linear projection. Default is input_dim * 2 (concatenation of forward + backward).
TYPE:
|
dropout
|
Dropout rate applied to the first convolution of each block.
TYPE:
|
Examples:
>>> # Simple gated CNN encoder
>>> encoder = GatedCnnSequenceEncoder(
... input_dim=128,
... layers=[
... [GatedCnnSequenceEncoder.Layer(kernel_size=3, output_dim=128)],
... [GatedCnnSequenceEncoder.Layer(kernel_size=3, output_dim=128)],
... ]
... )
>>>
>>> # With dilated convolutions for larger receptive field
>>> encoder = GatedCnnSequenceEncoder(
... input_dim=128,
... layers=[
... [GatedCnnSequenceEncoder.Layer(kernel_size=2, output_dim=128, dilation=1)],
... [GatedCnnSequenceEncoder.Layer(kernel_size=2, output_dim=128, dilation=2)],
... [GatedCnnSequenceEncoder.Layer(kernel_size=2, output_dim=128, dilation=4)],
... ],
... output_dim=256,
... dropout=0.1
... )
Source code in src/formed/integrations/torch/modules/encoders.py
499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 | |
Layer
¶
Bases: NamedTuple
Configuration for a single convolutional layer.
| ATTRIBUTE | DESCRIPTION |
|---|---|
kernel_size |
Size of the convolution kernel.
TYPE:
|
output_dim |
Output dimension of the layer. Must match input_dim for residual connections to work.
TYPE:
|
dilation |
Dilation rate for the convolution. When dilation > 1, kernel_size must be 2.
TYPE:
|
ResidualBlock
¶
ResidualBlock(
input_dim,
layers,
direction,
do_weight_norm=True,
dropout=0.0,
)
Bases: Module
Residual block with gated convolutions for sequence encoding.
Stacks multiple gated convolutional layers with residual connections. Supports causal masking via directional processing (forward/backward).
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Input dimension. Must match output dimension of all layers for residual connection.
TYPE:
|
layers
|
Sequence of Layer configurations defining the convolutional stack.
TYPE:
|
direction
|
Direction of causal masking (
TYPE:
|
do_weight_norm
|
Whether to apply weight normalization to convolutions.
TYPE:
|
dropout
|
Dropout rate applied to the first convolution.
TYPE:
|
Source code in src/formed/integrations/torch/modules/encoders.py
413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 | |
forward
¶
forward(inputs)
Apply gated convolutions with residual connection.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Encoded sequence with residual connection of shape |
Source code in src/formed/integrations/torch/modules/encoders.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
525 526 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
528 529 | |
forward
¶
forward(inputs, mask=None)
Encode input sequence using gated CNN.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape
TYPE:
|
mask
|
Optional mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Encoded sequence of shape |
Source code in src/formed/integrations/torch/modules/encoders.py
531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 | |
StackedSequenceEncoder
¶
StackedSequenceEncoder(encoders)
Bases: BaseSequenceEncoder
Stacks multiple sequence encoders sequentially.
Applies encoders in order, passing the output of each as input to the next. The output dimension of each encoder must match the input dimension of the next.
| PARAMETER | DESCRIPTION |
|---|---|
encoders
|
List of encoders to apply in sequence. Each encoder's output dimension must match the next encoder's input dimension.
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.encoders import (
... StackedSequenceEncoder,
... LSTMSequenceEncoder,
... GRUSequenceEncoder,
... ResidualSequenceEncoder
... )
>>>
>>> # Stack LSTM and GRU
>>> encoders = [
... LSTMSequenceEncoder(input_dim=128, hidden_dim=128),
... GRUSequenceEncoder(input_dim=128, hidden_dim=64),
... ]
>>> encoder = StackedSequenceEncoder(encoders=encoders)
>>>
>>> # More complex: LSTM -> Residual LSTM -> GRU
>>> base_lstm = LSTMSequenceEncoder(input_dim=128, hidden_dim=128)
>>> residual_lstm = ResidualSequenceEncoder(encoder=base_lstm)
>>> gru = GRUSequenceEncoder(input_dim=128, hidden_dim=128)
>>> encoder = StackedSequenceEncoder(encoders=[base_lstm, residual_lstm, gru])
Source code in src/formed/integrations/torch/modules/encoders.py
599 600 601 602 603 | |
forward
¶
forward(inputs, mask=None)
Source code in src/formed/integrations/torch/modules/encoders.py
605 606 607 608 609 610 611 612 613 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
615 616 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
618 619 | |
ConcatSequenceEncoder
¶
ConcatSequenceEncoder(encoders)
Bases: BaseSequenceEncoder
Concatenates outputs from multiple sequence encoders.
Applies multiple encoders in parallel to the same input and concatenates their outputs along the feature dimension. All encoders receive the same input tensor.
| PARAMETER | DESCRIPTION |
|---|---|
encoders
|
List of encoders to apply in parallel. All encoders must have the same input dimension.
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.encoders import (
... ConcatSequenceEncoder,
... LSTMSequenceEncoder,
... GRUSequenceEncoder
... )
>>>
>>> # Concatenate LSTM and GRU outputs
>>> encoders = [
... LSTMSequenceEncoder(input_dim=128, hidden_dim=64),
... GRUSequenceEncoder(input_dim=128, hidden_dim=64),
... ]
>>> encoder = ConcatSequenceEncoder(encoders=encoders)
Source code in src/formed/integrations/torch/modules/encoders.py
649 650 651 652 653 | |
get_input_dim
¶
get_input_dim()
Get the expected input dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Sum of input dimensions across all encoders. |
Source code in src/formed/integrations/torch/modules/encoders.py
655 656 657 658 659 660 661 662 | |
get_output_dim
¶
get_output_dim()
Get the output dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Sum of output dimensions across all encoders. |
Source code in src/formed/integrations/torch/modules/encoders.py
664 665 666 667 668 669 670 671 | |
forward
¶
forward(inputs, mask=None)
Encode input sequence by concatenating outputs from all encoders.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape
TYPE:
|
mask
|
Optional mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Concatenated encoded sequence of shape |
Source code in src/formed/integrations/torch/modules/encoders.py
673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 | |
WindowConcatSequenceEncoder
¶
WindowConcatSequenceEncoder(
input_dim, window_size, output_dim=None
)
Bases: BaseSequenceEncoder
Concatenates context window features for each position in the sequence.
For each position, concatenates the embeddings from surrounding positions within a specified window. This creates richer positional representations by explicitly including local context.
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Input dimension.
TYPE:
|
window_size
|
Size of context window on each side. If int, uses symmetric window. If tuple (left, right), uses asymmetric window.
TYPE:
|
output_dim
|
Optional output dimension. If provided, applies linear projection to the concatenated features. Otherwise, output dimension is (left_window + 1 + right_window) * input_dim.
TYPE:
|
Examples:
>>> # Symmetric 2-position window on each side
>>> encoder = WindowConcatSequenceEncoder(
... input_dim=128,
... window_size=2
... )
>>>
>>> # Asymmetric window with projection
>>> encoder = WindowConcatSequenceEncoder(
... input_dim=128,
... window_size=(1, 2),
... output_dim=256
... )
Source code in src/formed/integrations/torch/modules/encoders.py
722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 | |
get_input_dim
¶
get_input_dim()
Get the expected input dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Input dimension of the embeddings. |
Source code in src/formed/integrations/torch/modules/encoders.py
742 743 744 745 746 747 748 749 | |
get_output_dim
¶
get_output_dim()
Get the output dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Output dimension after window concatenation and optional projection. |
Source code in src/formed/integrations/torch/modules/encoders.py
751 752 753 754 755 756 757 758 759 760 | |
forward
¶
forward(inputs, mask=None)
Encode input sequence by concatenating context windows.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape (batch_size, seq_len, input_dim).
TYPE:
|
mask
|
Optional mask of shape (batch_size, seq_len). True indicates valid positions, False indicates padding.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Window-concatenated sequence of shape (batch_size, seq_len, output_dim). |
Source code in src/formed/integrations/torch/modules/encoders.py
762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 | |
BasePositionalEncoder
¶
Bases: Module, Registrable, ABC
Abstract base class for positional encoders.
Positional encoders add positional information to sequential data.
forward
abstractmethod
¶
forward(inputs, mask=None)
Add positional encoding to input sequence.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input sequence of shape (batch_size, seq_len, input_dim).
TYPE:
|
mask
|
Optional mask of shape (batch_size, seq_len).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Position-encoded sequence of shape (batch_size, seq_len, output_dim). |
Source code in src/formed/integrations/torch/modules/encoders.py
807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 | |
get_input_dim
abstractmethod
¶
get_input_dim()
Get the expected input dimension.
Source code in src/formed/integrations/torch/modules/encoders.py
825 826 827 828 | |
get_output_dim
abstractmethod
¶
get_output_dim()
Get the output dimension.
Source code in src/formed/integrations/torch/modules/encoders.py
830 831 832 833 | |
SinusoidalPositionalEncoder
¶
SinusoidalPositionalEncoder(
input_dim, max_len=5000, dropout=0.0
)
Bases: BasePositionalEncoder
Sinusoidal positional encoding.
Uses sine and cosine functions of different frequencies to encode position information, as introduced in "Attention Is All You Need".
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Dimension of the embeddings.
TYPE:
|
max_len
|
Maximum sequence length to pre-compute.
TYPE:
|
dropout
|
Dropout rate to apply after adding positional encoding.
TYPE:
|
Examples:
>>> encoder = SinusoidalPositionalEncoder(
... input_dim=512,
... max_len=5000,
... dropout=0.1
... )
Source code in src/formed/integrations/torch/modules/encoders.py
866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 | |
forward
¶
forward(inputs, mask=None)
Add sinusoidal positional encoding to inputs.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape (batch_size, seq_len, input_dim).
TYPE:
|
mask
|
Optional mask of shape (batch_size, seq_len).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Position-encoded sequence of shape (batch_size, seq_len, input_dim). |
Source code in src/formed/integrations/torch/modules/encoders.py
887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
909 910 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
912 913 | |
RotaryPositionalEncoder
¶
RotaryPositionalEncoder(
input_dim, max_len=2048, base=10000.0
)
Bases: BasePositionalEncoder
Rotary positional encoding (RoPE).
Applies rotary position embeddings by rotating pairs of dimensions in the feature space, as introduced in "RoFormer: Enhanced Transformer with Rotary Position Embedding".
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Dimension of the embeddings (must be even).
TYPE:
|
max_len
|
Maximum sequence length to pre-compute.
TYPE:
|
base
|
Base for the geometric progression (default: 10000).
TYPE:
|
Examples:
>>> encoder = RotaryPositionalEncoder(
... input_dim=512,
... max_len=2048
... )
Source code in src/formed/integrations/torch/modules/encoders.py
940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 | |
forward
¶
forward(inputs, mask=None)
Apply rotary positional encoding to inputs.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape (batch_size, seq_len, input_dim).
TYPE:
|
mask
|
Optional mask of shape (batch_size, seq_len).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Position-encoded sequence of shape (batch_size, seq_len, input_dim). |
Source code in src/formed/integrations/torch/modules/encoders.py
970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
995 996 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
998 999 | |
LearnablePositionalEncoder
¶
LearnablePositionalEncoder(
input_dim, max_len=1024, dropout=0.0
)
Bases: BasePositionalEncoder
Learnable positional embeddings.
Uses a learnable embedding table to encode position information, similar to token embeddings.
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Dimension of the embeddings.
TYPE:
|
max_len
|
Maximum sequence length (vocabulary size for positions).
TYPE:
|
dropout
|
Dropout rate to apply after adding positional encoding.
TYPE:
|
Examples:
>>> encoder = LearnablePositionalEncoder(
... input_dim=512,
... max_len=1024,
... dropout=0.1
... )
Source code in src/formed/integrations/torch/modules/encoders.py
1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 | |
forward
¶
forward(inputs, mask=None)
Add learnable positional encoding to inputs.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape (batch_size, seq_len, input_dim).
TYPE:
|
mask
|
Optional mask of shape (batch_size, seq_len).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Position-encoded sequence of shape (batch_size, seq_len, input_dim). |
Source code in src/formed/integrations/torch/modules/encoders.py
1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
1062 1063 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
1065 1066 | |
TransformerEncoder
¶
TransformerEncoder(
input_dim,
num_heads,
num_layers,
feedforward_dim,
dropout=0.1,
positional_encoder=None,
attention_mask=None,
activation="relu",
layer_norm_eps=1e-05,
batch_first=True,
)
Bases: BaseSequenceEncoder
Transformer-based sequence encoder.
Uses stacked TransformerEncoderLayers with positional encoding and configurable attention masking via dependency injection.
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Dimension of the embeddings (d_model).
TYPE:
|
num_heads
|
Number of attention heads.
TYPE:
|
num_layers
|
Number of transformer layers.
TYPE:
|
feedforward_dim
|
Dimension of feedforward network.
TYPE:
|
dropout
|
Dropout rate.
TYPE:
|
positional_encoder
|
Optional positional encoder to add position information.
TYPE:
|
attention_mask
|
Optional mask generator for self-attention.
TYPE:
|
activation
|
Activation function (default: "relu").
TYPE:
|
layer_norm_eps
|
Epsilon for layer normalization.
TYPE:
|
batch_first
|
Whether input is batch-first (default: True).
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.encoders import (
... TransformerEncoder,
... SinusoidalPositionalEncoder,
... CausalMask
... )
>>>
>>> # Standard transformer encoder
>>> encoder = TransformerEncoder(
... input_dim=512,
... num_heads=8,
... num_layers=6,
... feedforward_dim=2048,
... dropout=0.1,
... positional_encoder=SinusoidalPositionalEncoder(input_dim=512)
... )
>>>
>>> # Transformer with causal masking (for autoregressive tasks)
>>> causal_encoder = TransformerEncoder(
... input_dim=512,
... num_heads=8,
... num_layers=6,
... feedforward_dim=2048,
... dropout=0.1,
... positional_encoder=SinusoidalPositionalEncoder(input_dim=512),
... attention_mask=CausalMask()
... )
Source code in src/formed/integrations/torch/modules/encoders.py
1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 | |
transformer_encoder
instance-attribute
¶
transformer_encoder = TransformerEncoder(
encoder_layer=encoder_layer, num_layers=num_layers
)
forward
¶
forward(inputs, mask=None)
Encode input sequence using transformer.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input of shape (batch_size, seq_len, input_dim) if batch_first=True, or (seq_len, batch_size, input_dim) if batch_first=False.
TYPE:
|
mask
|
Optional mask of shape (batch_size, seq_len) where 1=valid, 0=padding.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Encoded sequence of same shape as input. |
Source code in src/formed/integrations/torch/modules/encoders.py
1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
1207 1208 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/encoders.py
1210 1211 | |
formed.integrations.torch.modules.feedforward
¶
Feed-forward neural network modules for PyTorch models.
This module provides feed-forward network layers with support for multiple layers, dropout, layer normalization, and residual connections.
Key Components
- FeedForward: Multi-layer feed-forward network
Features
- Configurable activation functions
- Layer normalization
- Dropout for regularization
- Residual connections
Examples:
>>> from formed.integrations.torch.modules import FeedForward
>>> import torch.nn as nn
>>>
>>> # Simple 3-layer feed-forward network
>>> ffn = FeedForward(
... input_dim=256,
... hidden_dims=[512, 512, 256],
... dropout=0.1,
... activation=nn.GELU()
... )
FeedForward
¶
FeedForward(
input_dim, hidden_dims, dropout=0.0, activation=ReLU()
)
Bases: Module
A simple feed forward neural network.
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
The dimension of the input.
TYPE:
|
hidden_dims
|
A sequence of integers specifying the dimensions of each layer.
TYPE:
|
dropout
|
The dropout probability. Defaults to
TYPE:
|
activation
|
The activation function. Defaults to
TYPE:
|
Source code in src/formed/integrations/torch/modules/feedforward.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | |
forward
¶
forward(inputs)
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
A tensor of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FloatTensor
|
A tensor of shape |
Source code in src/formed/integrations/torch/modules/feedforward.py
71 72 73 74 75 76 77 78 79 80 81 82 | |
get_input_dim
¶
get_input_dim()
Source code in src/formed/integrations/torch/modules/feedforward.py
84 85 | |
get_output_dim
¶
get_output_dim()
Source code in src/formed/integrations/torch/modules/feedforward.py
87 88 | |
formed.integrations.torch.modules.losses
¶
Loss functions for classification tasks.
This module provides loss functions for classification with support for label weighting and different reduction strategies.
Key Components
BaseClassificationLoss: Abstract base class for classification lossesCrossEntropyLoss: Standard cross-entropy loss with optional weighting
Examples:
>>> from formed.integrations.torch.modules import CrossEntropyLoss
>>> import torch
>>>
>>> # Simple cross-entropy
>>> loss_fn = CrossEntropyLoss()
>>> logits = torch.randn(4, 10) # (batch_size, num_classes)
>>> labels = torch.randint(0, 10, (4,)) # (batch_size,)
>>> loss = loss_fn(logits, labels)
>>>
>>> # With label weighting
>>> from formed.integrations.torch.modules import StaticLabelWeighter
>>> weighter = StaticLabelWeighter(weights=torch.ones(10))
>>> loss_fn = CrossEntropyLoss(weighter=weighter)
BaseClassificationLoss
¶
Bases: Module, Registrable, Generic[_ParamsT], ABC
Abstract base class for classification loss functions.
A ClassificationLoss defines a strategy for computing loss based on model logits and true labels.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
_ParamsT
|
Type of additional parameters used during loss computation.
|
forward
abstractmethod
¶
forward(logits, labels, params=None)
Compute the classification loss.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Model output logits of shape
TYPE:
|
labels
|
True target labels of shape
TYPE:
|
params
|
Optional additional parameters for loss computation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Computed loss as a scalar tensor. |
Source code in src/formed/integrations/torch/modules/losses.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
CrossEntropyLoss
¶
CrossEntropyLoss(weighter=None, reduce='mean')
Bases: BaseClassificationLoss[_ParamsT]
Cross-entropy loss for classification tasks.
| PARAMETER | DESCRIPTION |
|---|---|
weighter
|
An optional label weighter to assign weights to each class.
TYPE:
|
reduce
|
Reduction method -
TYPE:
|
Examples:
>>> loss_fn = CrossEntropyLoss()
>>> logits = torch.randn(4, 10)
>>> labels = torch.randint(0, 10, (4,))
>>> loss = loss_fn(logits, labels)
Source code in src/formed/integrations/torch/modules/losses.py
89 90 91 92 93 94 95 96 | |
forward
¶
forward(logits, labels, params=None)
Compute cross-entropy loss.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Logits of shape
TYPE:
|
labels
|
Labels of shape
TYPE:
|
params
|
Optional parameters for the weighter.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Loss scalar. |
Source code in src/formed/integrations/torch/modules/losses.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
BCEWithLogitsLoss
¶
BCEWithLogitsLoss(
weighter=None, reduce="mean", pos_weight=None
)
Bases: BaseClassificationLoss[_ParamsT]
Binary cross-entropy loss with logits for multilabel classification tasks.
This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by BCELoss.
| PARAMETER | DESCRIPTION |
|---|---|
weighter
|
An optional label weighter to assign weights to each class.
TYPE:
|
reduce
|
Reduction method -
TYPE:
|
pos_weight
|
Optional weight for positive examples per class.
TYPE:
|
Examples:
>>> loss_fn = BCEWithLogitsLoss()
>>> logits = torch.randn(4, 10) # (batch_size, num_classes)
>>> labels = torch.randint(0, 2, (4, 10)).float() # (batch_size, num_classes)
>>> loss = loss_fn(logits, labels)
Source code in src/formed/integrations/torch/modules/losses.py
155 156 157 158 159 160 161 162 163 164 | |
forward
¶
forward(logits, labels, params=None)
Compute BCE with logits loss.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Logits of shape
TYPE:
|
labels
|
Binary labels of shape
TYPE:
|
params
|
Optional parameters for the weighter.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Loss scalar. |
Source code in src/formed/integrations/torch/modules/losses.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |
BaseRegressionLoss
¶
Bases: Module, Registrable, Generic[_ParamsT], ABC
Abstract base class for regression loss functions.
A RegressionLoss defines a strategy for computing loss based on model predictions and true labels.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
_ParamsT
|
Type of additional parameters used during loss computation.
|
forward
abstractmethod
¶
forward(predictions, labels, params=None)
Compute the regression loss.
| PARAMETER | DESCRIPTION |
|---|---|
predictions
|
Model output predictions of shape
TYPE:
|
labels
|
True target labels of shape
TYPE:
|
params
|
Optional additional parameters for loss computation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Computed loss as a scalar tensor. |
Source code in src/formed/integrations/torch/modules/losses.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | |
MeanSquaredErrorLoss
¶
MeanSquaredErrorLoss(reduce='mean')
Bases: BaseRegressionLoss[_ParamsT]
Mean Squared Error (MSE) loss for regression tasks.
| PARAMETER | DESCRIPTION |
|---|---|
reduce
|
Reduction method -
TYPE:
|
Examples:
>>> loss_fn = MeanSquaredErrorLoss()
>>> predictions = torch.randn(4)
>>> labels = torch.randn(4)
>>> loss = loss_fn(predictions, labels)
Source code in src/formed/integrations/torch/modules/losses.py
247 248 249 250 251 252 | |
forward
¶
forward(predictions, labels, params=None)
Compute MSE loss.
| PARAMETER | DESCRIPTION |
|---|---|
predictions
|
Predictions of shape
TYPE:
|
labels
|
Labels of shape
TYPE:
|
params
|
Ignored.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Loss scalar. |
Source code in src/formed/integrations/torch/modules/losses.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
formed.integrations.torch.modules.masks
¶
Attention mask generation for transformer models. This module provides reusable attention mask generators for transformer-based models in PyTorch. Attention masks control which positions in a sequence can attend to which other positions, enabling various attention patterns such as causal masking for autoregressive models or sliding window attention for long sequences.
Key Components
BaseAttentionMask: Abstract base class for attention mask generatorsCausalMask: Generates causal (autoregressive) attention masksSlidingWindowAttentionMask: Generates sliding window attention masksCombinedMask: Combines multiple attention masks into a single mask
Features
- Standardized attention mask format compatible with PyTorch Transformer modules
- Support for batch-wise and sequence-wise masks
- Easily extensible via registration system for custom masks
Examples:
>>> from formed.integrations.torch.modules import CausalMask
>>>
>>> # Create a causal mask generator
>>> mask_generator = CausalMask()
>>>
>>> # Generate a causal mask for sequence length 5 and batch size 2
>>> mask = mask_generator(seq_len=5, batch_size=2, device=torch.device('cpu'))
>>> # mask shape will be (5, 5) with float values: 0.0 for attendable positions,
>>> # float('-inf') for masked positions
BaseAttentionMask
¶
Bases: Registrable, ABC
Base class for attention mask generation.
Attention masks control which positions can attend to which other positions in transformer models.
All attention masks must return a mask of shape (seq_len, seq_len) or
(batch_size, seq_len, seq_len) using float values where:
0.0indicates positions that CAN be attended tofloat('-inf')indicates positions that should NOT be attended to
This standardized format ensures compatibility with PyTorch's
TransformerEncoder.mask parameter.
CausalMask
¶
Bases: BaseAttentionMask
Generates causal (autoregressive) attention masks.
Causal masks ensure that each position can only attend to itself and previous positions, enabling autoregressive generation.
Examples:
>>> masks = CausalMask()
>>> mask = masks(seq_len=4, batch_size=1, device=torch.device('cpu'))
>>> # mask[i, j] = 0.0 if j <= i else float('-inf')
SlidingWindowAttentionMask
¶
SlidingWindowAttentionMask(window_size)
Bases: BaseAttentionMask
Sliding window attention mask.
Restricts attention to a local window around each position, enabling efficient processing of long sequences. Commonly used in models like Longformer and Mistral.
| PARAMETER | DESCRIPTION |
|---|---|
window_size
|
Size of the attention window on each side.
Total window is
TYPE:
|
Examples:
>>> # Window size of 1 means each position can attend to itself and
>>> # one position on each side
>>> mask_gen = SlidingWindowAttentionMask(window_size=1)
>>> mask = mask_gen(seq_len=4, batch_size=1, device=torch.device('cpu'))
>>> # Position 0: can attend to [0, 1]
>>> # Position 1: can attend to [0, 1, 2]
>>> # Position 2: can attend to [1, 2, 3]
>>> # Position 3: can attend to [2, 3]
Source code in src/formed/integrations/torch/modules/masks.py
148 149 150 151 | |
CombinedMask
¶
CombinedMask(masks)
Bases: BaseAttentionMask
Combines multiple attention masks.
Applies multiple masks in sequence and combines their results.
A position is masked if ANY mask blocks it (logical OR for -inf values).
| PARAMETER | DESCRIPTION |
|---|---|
masks
|
List of attention masks to combine.
TYPE:
|
Examples:
>>> # Combine multiple structural masks
>>> mask1 = CausalMask()
>>> mask2 = SomeOtherMask()
>>> combined = CombinedMask(masks=[mask1, mask2])
Source code in src/formed/integrations/torch/modules/masks.py
202 203 | |
formed.integrations.torch.modules.samplers
¶
Label samplers for classification tasks.
This module provides samplers that convert model logits into discrete labels.
Key Components
BaseLabelSampler: Abstract base class for label samplersArgmaxLabelSampler: Selects the label with highest logitMultinomialLabelSampler: Samples from categorical distributionBaseMultilabelSampler: Abstract base class for multilabel samplersThresholdMultilabelSampler: Selects labels above a thresholdTopKMultilabelSampler: Selects top-k labelsBernoulliMultilabelSampler: Samples labels from independent Bernoulli distributions
Examples:
>>> from formed.integrations.torch.modules import ArgmaxLabelSampler, MultinomialLabelSampler
>>> import torch
>>>
>>> logits = torch.randn(4, 10) # (batch_size, num_classes)
>>>
>>> # Argmax sampling (deterministic)
>>> argmax_sampler = ArgmaxLabelSampler()
>>> labels = argmax_sampler(logits)
>>>
>>> # Multinomial sampling (stochastic)
>>> multi_sampler = MultinomialLabelSampler()
>>> labels = multi_sampler(logits, temperature=0.8)
BaseLabelSampler
¶
Bases: Module, Registrable, Generic[_ParamsT], ABC
Abstract base class for label samplers.
A LabelSampler defines a strategy for sampling labels based on model logits.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
_ParamsT
|
Type of additional parameters used during sampling.
|
forward
abstractmethod
¶
forward(logits, params=None)
Sample labels from logits.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Model output logits of shape
TYPE:
|
params
|
Additional parameters for sampling.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Sampled labels of shape |
Source code in src/formed/integrations/torch/modules/samplers.py
52 53 54 55 56 57 58 59 60 61 62 63 64 | |
ArgmaxLabelSampler
¶
Bases: BaseLabelSampler[None]
Label sampler that selects the label with the highest logit.
Examples:
>>> sampler = ArgmaxLabelSampler()
>>> logits = torch.randn(4, 10)
>>> labels = sampler(logits) # Shape: (4,)
forward
¶
forward(logits, params=None)
Select the argmax label.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Logits of shape
TYPE:
|
params
|
Ignored.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Labels of shape |
Source code in src/formed/integrations/torch/modules/samplers.py
81 82 83 84 85 86 87 88 89 90 91 92 | |
MultinomialLabelSamplerParams
¶
Bases: TypedDict
Parameters for MultinomialLabelSampler.
| ATTRIBUTE | DESCRIPTION |
|---|---|
temperature |
Sampling temperature to control randomness. Higher temperature = more random, lower = more deterministic.
TYPE:
|
MultinomialLabelSampler
¶
Bases: BaseLabelSampler[MultinomialLabelSamplerParams]
Label sampler that samples labels from a multinomial distribution.
Examples:
>>> sampler = MultinomialLabelSampler()
>>> logits = torch.randn(4, 10)
>>>
>>> # Sample with default temperature
>>> labels = sampler(logits)
>>>
>>> # Sample with temperature scaling
>>> labels = sampler(logits, temperature=0.5)
forward
¶
forward(logits, params=None)
Sample labels from categorical distribution.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Logits of shape
TYPE:
|
params
|
Optional parameters containing temperature for sampling.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Sampled labels of shape |
Source code in src/formed/integrations/torch/modules/samplers.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | |
BaseMultilabelSampler
¶
Bases: Module, Registrable, Generic[_ParamsT], ABC
Abstract base class for multilabel samplers.
A MultilabelSampler defines a strategy for sampling multiple labels based on model logits.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
_ParamsT
|
Type of additional parameters used during sampling.
|
forward
abstractmethod
¶
forward(logits, params=None)
Sample multiple labels from logits.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Model output logits of shape
TYPE:
|
params
|
Additional parameters for sampling.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Sampled labels of shape |
Source code in src/formed/integrations/torch/modules/samplers.py
153 154 155 156 157 158 159 160 161 162 163 164 165 | |
ThresholdMultilabelSamplerParams
¶
ThresholdMultilabelSampler
¶
ThresholdMultilabelSampler(threshold=0.5)
Bases: BaseMultilabelSampler[ThresholdMultilabelSamplerParams]
Multilabel sampler that selects labels above a certain threshold.
Examples:
>>> sampler = ThresholdMultilabelSampler(threshold=0.5)
>>> logits = torch.randn(4, 10)
>>> labels = sampler(logits) # Shape: (4, num_labels)
Source code in src/formed/integrations/torch/modules/samplers.py
193 194 195 | |
forward
¶
forward(logits, params=None)
Select labels above the threshold.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Logits of shape
TYPE:
|
params
|
Optional parameters containing threshold.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Labels of shape |
Source code in src/formed/integrations/torch/modules/samplers.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | |
TopKMultilabelSamplerParams
¶
TopKMultilabelSampler
¶
TopKMultilabelSampler(k=1)
Bases: BaseMultilabelSampler[TopKMultilabelSamplerParams]
Multilabel sampler that selects the top-k labels.
Examples:
>>> sampler = TopKMultilabelSampler(k=3)
>>> logits = torch.randn(4, 10)
>>> labels = sampler(logits) # Shape: (4, num_labels)
Source code in src/formed/integrations/torch/modules/samplers.py
239 240 241 | |
forward
¶
forward(logits, params=None)
Select the top-k labels.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Logits of shape
TYPE:
|
params
|
Optional parameters containing k for top-k selection.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Labels of shape |
Source code in src/formed/integrations/torch/modules/samplers.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | |
BernoulliMultilabelSampler
¶
Bases: BaseMultilabelSampler[None]
Multilabel sampler that samples labels from independent Bernoulli distributions.
Examples:
>>> sampler = BernoulliMultilabelSampler()
>>> logits = torch.randn(4, 10)
>>> labels = sampler(logits) # Shape: (4, num_labels)
forward
¶
forward(logits, params=None)
Sample labels from Bernoulli distributions.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Logits of shape
TYPE:
|
params
|
Ignored.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Sampled labels of shape |
Source code in src/formed/integrations/torch/modules/samplers.py
275 276 277 278 279 280 281 282 283 284 285 286 287 | |
formed.integrations.torch.modules.scalarmix
¶
ScalarMix
¶
ScalarMix(
mixture_size,
do_layer_norm=False,
initial_scalar_parameters=None,
trainable=True,
)
Bases: Module
Computes a parameterised scalar mixture of N tensors, mixture = gamma * sum(s_k * tensor_k)
where s = softmax(w), with w and gamma scalar parameters.
In addition, if do_layer_norm=True then apply layer normalization to each tensor
before weighting.
Note
This script is based on the AllenNLP implementation of ScalarMix: https://github.com/allenai/allennlp/blob/v2.10.0/allennlp/modules/scalar_mix.py
Source code in src/formed/integrations/torch/modules/scalarmix.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
scalar_parameters
instance-attribute
¶
scalar_parameters = ParameterList(
[
(
Parameter(
FloatTensor(
[initial_scalar_parameters[i]]
),
requires_grad=trainable,
)
)
for i in (range(mixture_size))
]
)
forward
¶
forward(tensors, mask=None)
Compute a weighted average of the tensors. The input tensors an be any shape
with at least two dimensions, but must all be the same shape.
When do_layer_norm=True, the mask is required input. If the tensors are
dimensioned (dim_0, ..., dim_{n-1}, dim_n), then the mask is dimensioned
(dim_0, ..., dim_{n-1}), as in the typical case with tensors of shape
(batch_size, timesteps, dim) and mask of shape (batch_size, timesteps).
When do_layer_norm=False the mask is ignored.
Source code in src/formed/integrations/torch/modules/scalarmix.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |
formed.integrations.torch.modules.vectorizers
¶
Sequence vectorization modules for PyTorch models.
This module provides vectorizers that convert variable-length sequences into fixed-size vectors. Vectorizers apply pooling operations over the sequence dimension to produce single vectors per sequence.
Key Components
BaseSequenceVectorizer: Abstract base class for vectorizersBagOfEmbeddingsSequenceVectorizer: Pools sequence embeddings
Features
- Multiple pooling strategies (mean, max, min, sum, first, last, hier)
- Masked pooling to ignore padding tokens
- Optional normalization before pooling
- Hierarchical pooling with sliding windows
Examples:
>>> from formed.integrations.torch.modules import BagOfEmbeddingsSequenceVectorizer
>>>
>>> # Mean pooling over sequence
>>> vectorizer = BagOfEmbeddingsSequenceVectorizer(pooling="mean")
>>> vector = vectorizer(embeddings, mask=mask)
>>>
>>> # Max pooling with normalization
>>> vectorizer = BagOfEmbeddingsSequenceVectorizer(
... pooling="max",
... normalize=True
... )
BaseSequenceVectorizer
¶
Bases: Module, Registrable, ABC
Abstract base class for sequence vectorizers.
Vectorizers convert variable-length sequences into fixed-size vectors by applying pooling operations over the sequence dimension.
forward
abstractmethod
¶
forward(inputs, *, mask=None)
Vectorize a sequence into a fixed-size vector.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input embeddings of shape
TYPE:
|
mask
|
Optional attention mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Vectorized output of shape |
Source code in src/formed/integrations/torch/modules/vectorizers.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | |
get_input_dim
abstractmethod
¶
get_input_dim()
Get the expected input dimension.
| RETURNS | DESCRIPTION |
|---|---|
Optional[int]
|
Input dimension or None if dimension-agnostic. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
70 71 72 73 74 75 76 77 78 | |
get_output_dim
abstractmethod
¶
get_output_dim()
Get the output dimension.
| RETURNS | DESCRIPTION |
|---|---|
Union[int, Callable[[int], int]]
|
Output feature dimension or a function mapping input dim to output dim. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
80 81 82 83 84 85 86 87 88 | |
BagOfEmbeddingsSequenceVectorizer
¶
BagOfEmbeddingsSequenceVectorizer(
pooling="mean", normalize=False, window_size=None
)
Bases: BaseSequenceVectorizer
Bag-of-embeddings vectorizer using pooling operations.
This vectorizer applies pooling over the sequence dimension to create fixed-size vectors. Multiple pooling strategies are supported, and padding tokens are properly masked during pooling.
| PARAMETER | DESCRIPTION |
|---|---|
pooling
|
Pooling strategy to use:
-
TYPE:
|
normalize
|
Whether to L2-normalize embeddings before pooling.
TYPE:
|
window_size
|
Window size for hierarchical pooling (required if
TYPE:
|
Examples:
>>> # Mean pooling
>>> vectorizer = BagOfEmbeddingsSequenceVectorizer(pooling="mean")
>>> vector = vectorizer(embeddings, mask=mask)
>>>
>>> # Max pooling with normalization
>>> vectorizer = BagOfEmbeddingsSequenceVectorizer(
... pooling="max",
... normalize=True
... )
>>>
>>> # Multiple pooling methods combined
>>> vectorizer = BagOfEmbeddingsSequenceVectorizer(
... pooling=["mean", "max"]
... )
>>>
>>> # Hierarchical pooling
>>> vectorizer = BagOfEmbeddingsSequenceVectorizer(
... pooling="hier",
... window_size=3
... )
Note
This vectorizer is dimension-agnostic - it preserves the embedding dimension from input to output (multiplied by number of pooling methods).
Source code in src/formed/integrations/torch/modules/vectorizers.py
148 149 150 151 152 153 154 155 156 157 | |
forward
¶
forward(inputs, *, mask=None)
Vectorize sequence using bag-of-embeddings pooling.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input embeddings of shape
TYPE:
|
mask
|
Optional attention mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Vectorized output of shape |
Tensor
|
If multiple pooling methods are used, output_dim = input_dim * num_pooling. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
get_input_dim
¶
get_input_dim()
Get the expected input dimension.
| RETURNS | DESCRIPTION |
|---|---|
None
|
None (dimension-agnostic vectorizer). |
Source code in src/formed/integrations/torch/modules/vectorizers.py
185 186 187 188 189 190 191 192 | |
get_output_dim
¶
get_output_dim()
Get the output dimension.
| RETURNS | DESCRIPTION |
|---|---|
Callable[[int], int]
|
Function mapping input dimension to output dimension. |
Callable[[int], int]
|
Output dimension = input_dim * number_of_pooling_methods. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
194 195 196 197 198 199 200 201 202 203 | |
CnnSequenceVectorizer
¶
CnnSequenceVectorizer(
input_dim,
num_filters,
ngram_filter_sizes=(2, 3, 4, 5),
conv_layer_activation=None,
output_dim=None,
)
Bases: BaseSequenceVectorizer
CNN-based sequence vectorizer using multiple n-gram filters.
This vectorizer applies multiple 1D convolutions with different kernel sizes (n-gram filters) to capture local patterns of different lengths. Max pooling is applied over each filter's output to create a fixed-size representation.
Based on "A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification" by Zhang and Wallace (2016).
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Input embedding dimension.
TYPE:
|
num_filters
|
Number of filters for each n-gram size.
TYPE:
|
ngram_filter_sizes
|
Tuple of n-gram sizes for convolution filters. Default is (2, 3, 4, 5) for bigrams through 5-grams.
TYPE:
|
conv_layer_activation
|
Activation function after convolution. Default is ReLU.
TYPE:
|
output_dim
|
Optional output dimension. If provided, applies linear projection after concatenating filter outputs.
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.vectorizers import CnnSequenceVectorizer
>>>
>>> # Standard CNN with multiple n-gram filters
>>> vectorizer = CnnSequenceVectorizer(
... input_dim=128,
... num_filters=100,
... ngram_filter_sizes=(2, 3, 4, 5)
... )
>>> # Output dim = 100 * 4 = 400
>>>
>>> # With output projection
>>> vectorizer = CnnSequenceVectorizer(
... input_dim=128,
... num_filters=100,
... ngram_filter_sizes=(3, 4, 5),
... output_dim=256
... )
>>>
>>> # Custom activation
>>> import torch.nn as nn
>>> vectorizer = CnnSequenceVectorizer(
... input_dim=128,
... num_filters=50,
... ngram_filter_sizes=(2, 3),
... conv_layer_activation=nn.Tanh()
... )
Note
- Properly handles padding masks to avoid max-pooling over padding positions
- Output dimension without projection: num_filters * len(ngram_filter_sizes)
- Each filter extracts patterns of a specific n-gram size
Source code in src/formed/integrations/torch/modules/vectorizers.py
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 | |
get_input_dim
¶
get_input_dim()
Get the expected input dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Input embedding dimension. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
296 297 298 299 300 301 302 303 | |
get_output_dim
¶
get_output_dim()
Get the output dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Output vector dimension (num_filters * len(ngram_filter_sizes) or custom output_dim). |
Source code in src/formed/integrations/torch/modules/vectorizers.py
305 306 307 308 309 310 311 312 | |
forward
¶
forward(inputs, *, mask=None)
Vectorize sequence using CNN with multiple n-gram filters.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input embeddings of shape
TYPE:
|
mask
|
Optional attention mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Vectorized output of shape |
Source code in src/formed/integrations/torch/modules/vectorizers.py
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | |
SelfAttentiveSequenceVectorizer
¶
SelfAttentiveSequenceVectorizer(
input_dim, num_heads=1, hidden_dims=()
)
Bases: BaseSequenceVectorizer
Self-attentive sequence vectorizer using learned attention weights.
This vectorizer uses learned attention mechanisms to compute weighted averages of sequence embeddings. Multiple attention heads can be used to capture different aspects of the sequence.
Based on "A Structured Self-attentive Sentence Embedding" by Lin et al. (2017).
| PARAMETER | DESCRIPTION |
|---|---|
input_dim
|
Input embedding dimension. Must be divisible by num_heads.
TYPE:
|
num_heads
|
Number of attention heads. Each head learns different attention patterns.
TYPE:
|
hidden_dims
|
Hidden dimensions for the attention scoring network. Empty tuple means direct scoring without hidden layers.
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.vectorizers import (
... SelfAttentiveSequenceVectorizer
... )
>>>
>>> # Single attention head
>>> vectorizer = SelfAttentiveSequenceVectorizer(
... input_dim=128,
... num_heads=1
... )
>>>
>>> # Multiple attention heads
>>> vectorizer = SelfAttentiveSequenceVectorizer(
... input_dim=128,
... num_heads=4
... )
>>>
>>> # With hidden layers in attention scorer
>>> vectorizer = SelfAttentiveSequenceVectorizer(
... input_dim=128,
... num_heads=2,
... hidden_dims=(64,)
... )
Note
- Each attention head operates on input_dim // num_heads dimensions
- Outputs are concatenated across heads to preserve input dimension
- Properly handles padding masks via masked softmax
Source code in src/formed/integrations/torch/modules/vectorizers.py
421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
get_input_dim
¶
get_input_dim()
Get the expected input dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Input embedding dimension. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
449 450 451 452 453 454 455 456 | |
get_output_dim
¶
get_output_dim()
Get the output dimension.
| RETURNS | DESCRIPTION |
|---|---|
int
|
Output dimension (same as input dimension). |
Source code in src/formed/integrations/torch/modules/vectorizers.py
458 459 460 461 462 463 464 465 | |
forward
¶
forward(inputs, *, mask=None)
Vectorize sequence using self-attention.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input embeddings of shape
TYPE:
|
mask
|
Optional attention mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Vectorized output of shape |
Source code in src/formed/integrations/torch/modules/vectorizers.py
467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 | |
ConcatSequenceVectorizer
¶
ConcatSequenceVectorizer(vectorizers)
Bases: BaseSequenceVectorizer
Concatenates outputs from multiple sequence vectorizers.
Applies multiple vectorizers to the same input sequence and concatenates their outputs along the feature dimension. This allows combining different vectorization strategies (e.g., mean pooling + max pooling + attention).
| PARAMETER | DESCRIPTION |
|---|---|
vectorizers
|
List of vectorizers to apply in parallel. All vectorizers receive the same input sequence.
TYPE:
|
Examples:
>>> from formed.integrations.torch.modules.vectorizers import (
... ConcatSequenceVectorizer,
... BagOfEmbeddingsSequenceVectorizer,
... SelfAttentiveSequenceVectorizer
... )
>>>
>>> # Combine mean pooling and max pooling
>>> vectorizers = [
... BagOfEmbeddingsSequenceVectorizer(pooling="mean"),
... BagOfEmbeddingsSequenceVectorizer(pooling="max"),
... ]
>>> vectorizer = ConcatSequenceVectorizer(vectorizers=vectorizers)
>>>
>>> # Combine pooling and attention
>>> vectorizers = [
... BagOfEmbeddingsSequenceVectorizer(pooling="mean"),
... SelfAttentiveSequenceVectorizer(input_dim=128, num_heads=2),
... ]
>>> vectorizer = ConcatSequenceVectorizer(vectorizers=vectorizers)
Note
- Output dimension is the sum of all vectorizer output dimensions
- Handles both fixed and dynamic output dimensions from vectorizers
Source code in src/formed/integrations/torch/modules/vectorizers.py
550 551 552 | |
get_input_dim
¶
get_input_dim()
Get the expected input dimension.
| RETURNS | DESCRIPTION |
|---|---|
int | None
|
First non-None input dimension from vectorizers, or None if all are None. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
554 555 556 557 558 559 560 561 562 | |
get_output_dim
¶
get_output_dim()
Get the output dimension.
| RETURNS | DESCRIPTION |
|---|---|
int | Callable[[int], int]
|
Sum of all vectorizer output dimensions if all are fixed integers, |
int | Callable[[int], int]
|
otherwise a function that computes the sum given an input dimension. |
Source code in src/formed/integrations/torch/modules/vectorizers.py
564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 | |
forward
¶
forward(inputs, *, mask=None)
Vectorize sequence by concatenating multiple vectorizer outputs.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Input embeddings of shape
TYPE:
|
mask
|
Optional attention mask of shape
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Concatenated vectors of shape |
Source code in src/formed/integrations/torch/modules/vectorizers.py
587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 | |
formed.integrations.torch.modules.weighters
¶
Label weighters for classification tasks.
This module provides weighters that assign weights to class labels, useful for handling imbalanced datasets.
Key Components
BaseLabelWeighter: Abstract base class for label weightersStaticLabelWeighter: Uses fixed weights per classBalancedByDistributionLabelWeighter: Balances based on class distribution
Examples:
>>> from formed.integrations.torch.modules import StaticLabelWeighter
>>> import torch
>>>
>>> # Static weights for 3 classes
>>> weights = torch.tensor([1.0, 2.0, 3.0]) # Weight rare classes more
>>> weighter = StaticLabelWeighter(weights=weights)
>>>
>>> logits = torch.randn(4, 3)
>>> labels = torch.tensor([0, 1, 2, 0])
>>> class_weights = weighter(logits, labels) # Shape: (1, 3)
BaseLabelWeighter
¶
Bases: Module, Registrable, Generic[_ParamsT], ABC
Abstract base class for label weighters.
A LabelWeighter defines a strategy for assigning weights to each label based on model logits and true targets.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
_ParamsT
|
Type of additional parameters used during weighting.
|
forward
abstractmethod
¶
forward(logits, targets, params=None)
Compute weights for each target label.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Model output logits of shape
TYPE:
|
targets
|
True target labels of shape
TYPE:
|
params
|
Optional additional parameters for weighting.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Weights for each logit of shape |
Source code in src/formed/integrations/torch/modules/weighters.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
StaticLabelWeighter
¶
StaticLabelWeighter(weights)
Bases: BaseLabelWeighter[None]
Label weighter that assigns static weights to each class.
| PARAMETER | DESCRIPTION |
|---|---|
weights
|
A tensor of shape
TYPE:
|
Examples:
>>> # Weight class 1 twice as much as class 0
>>> weights = torch.tensor([1.0, 2.0, 1.0])
>>> weighter = StaticLabelWeighter(weights=weights)
>>> logits = torch.randn(4, 3)
>>> labels = torch.tensor([0, 1, 2, 0])
>>> class_weights = weighter(logits, labels)
Source code in src/formed/integrations/torch/modules/weighters.py
87 88 89 90 | |
forward
¶
forward(logits, targets, params=None)
Return static weights.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Ignored.
TYPE:
|
targets
|
Ignored.
TYPE:
|
params
|
Ignored.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Weights of shape |
Source code in src/formed/integrations/torch/modules/weighters.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
BalancedByDistributionLabelWeighter
¶
BalancedByDistributionLabelWeighter(
distribution, eps=1e-08
)
Bases: BaseLabelWeighter[None]
Label weighter that balances classes based on their distribution.
The weight for each class is computed as: 1 / (distribution * num_classes + eps)
| PARAMETER | DESCRIPTION |
|---|---|
distribution
|
A tensor of shape
TYPE:
|
eps
|
A small epsilon value to avoid division by zero.
TYPE:
|
Examples:
>>> # Class distribution: 50%, 30%, 20%
>>> distribution = torch.tensor([0.5, 0.3, 0.2])
>>> weighter = BalancedByDistributionLabelWeighter(distribution=distribution)
>>> logits = torch.randn(4, 3)
>>> labels = torch.tensor([0, 1, 2, 0])
>>> class_weights = weighter(logits, labels)
>>> # Rare classes get higher weights
Source code in src/formed/integrations/torch/modules/weighters.py
134 135 136 137 138 | |
forward
¶
forward(logits, targets, params=None)
Compute balanced weights.
| PARAMETER | DESCRIPTION |
|---|---|
logits
|
Ignored.
TYPE:
|
targets
|
Ignored.
TYPE:
|
params
|
Ignored.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tensor
|
Weights of shape |
Source code in src/formed/integrations/torch/modules/weighters.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
formed.integrations.torch.training.callbacks
¶
Training callbacks for monitoring and controlling PyTorch model training.
This module provides a callback system for PyTorch training, allowing custom logic to be executed at various points in the training loop. Callbacks can monitor metrics, save checkpoints, implement early stopping, and integrate with experiment tracking systems.
Key Components
TorchTrainingCallback: Base class for all callbacksEvaluationCallback: Computes metrics using custom evaluatorsEarlyStoppingCallback: Stops training based on metric improvementsMlflowCallback: Logs metrics to MLflow
Features
- Hook points at training/epoch/batch start and end
- Metric computation and logging
- Model checkpointing
- Early stopping with patience
- MLflow integration
- Extensible for custom callbacks
Examples:
>>> from formed.integrations.torch import (
... TorchTrainer,
... EarlyStoppingCallback,
... EvaluationCallback,
... MlflowCallback
... )
>>>
>>> trainer = TorchTrainer(
... train_dataloader=train_loader,
... val_dataloader=val_loader,
... callbacks=[
... EvaluationCallback(my_evaluator),
... EarlyStoppingCallback(patience=5, metric="-loss"),
... MlflowCallback()
... ]
... )
TorchTrainingCallback
¶
Bases: Registrable
Base class for training callbacks.
Callbacks provide hooks to execute custom logic at various points during training. Subclasses can override any hook method to implement custom behavior such as logging, checkpointing, or early stopping.
Hook execution order
on_training_start- once at the beginningon_epoch_start- at the start of each epochon_batch_start- before each training batchon_batch_end- after each training batchon_eval_start- before evaluation (returns evaluator)on_eval_end- after evaluation with computed metricson_log- when metrics are loggedon_epoch_end- at the end of each epochon_training_end- once at the end (can modify final state)
Examples:
>>> @TorchTrainingCallback.register("my_callback")
... class MyCallback(TorchTrainingCallback):
... def on_epoch_end(self, trainer, model, state, epoch):
... print(f"Completed epoch {epoch} at step {state.step}")
on_training_start
¶
on_training_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
86 87 88 89 90 91 92 | |
on_training_end
¶
on_training_end(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
94 95 96 97 98 99 100 | |
on_epoch_start
¶
on_epoch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
102 103 104 105 106 107 108 109 | |
on_epoch_end
¶
on_epoch_end(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
111 112 113 114 115 116 117 118 | |
on_batch_start
¶
on_batch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
120 121 122 123 124 125 126 127 | |
on_batch_end
¶
on_batch_end(trainer, model, state, epoch, output)
Source code in src/formed/integrations/torch/training/callbacks.py
129 130 131 132 133 134 135 136 137 | |
on_eval_start
¶
on_eval_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
on_eval_end
¶
on_eval_end(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
160 161 162 163 164 165 166 167 168 | |
on_log
¶
on_log(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
170 171 172 173 174 175 176 177 178 | |
EvaluationCallback
¶
EvaluationCallback(evaluator)
Bases: TorchTrainingCallback, Generic[ModelInputT, ModelOutputT]
Callback for computing metrics using a custom evaluator.
This callback integrates a custom evaluator into the training loop, resetting it before each evaluation phase and returning it for metric accumulation.
| PARAMETER | DESCRIPTION |
|---|---|
evaluator
|
Evaluator implementing the
TYPE:
|
Examples:
>>> from formed.integrations.ml.metrics import MulticlassAccuracy
>>>
>>> evaluator = MulticlassAccuracy()
>>> callback = EvaluationCallback(evaluator)
Source code in src/formed/integrations/torch/training/callbacks.py
200 201 | |
on_eval_start
¶
on_eval_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
203 204 205 206 207 208 209 210 211 | |
on_training_start
¶
on_training_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
86 87 88 89 90 91 92 | |
on_training_end
¶
on_training_end(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
94 95 96 97 98 99 100 | |
on_epoch_start
¶
on_epoch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
102 103 104 105 106 107 108 109 | |
on_epoch_end
¶
on_epoch_end(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
111 112 113 114 115 116 117 118 | |
on_batch_start
¶
on_batch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
120 121 122 123 124 125 126 127 | |
on_batch_end
¶
on_batch_end(trainer, model, state, epoch, output)
Source code in src/formed/integrations/torch/training/callbacks.py
129 130 131 132 133 134 135 136 137 | |
on_eval_end
¶
on_eval_end(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
160 161 162 163 164 165 166 167 168 | |
on_log
¶
on_log(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
170 171 172 173 174 175 176 177 178 | |
EarlyStoppingCallback
¶
EarlyStoppingCallback(patience=5, metric='-train/loss')
Bases: TorchTrainingCallback
Callback for early stopping based on metric improvements.
This callback monitors a specified metric and stops training if it doesn't improve for a given number of evaluations (patience). The best model is automatically saved and restored at the end of training.
| PARAMETER | DESCRIPTION |
|---|---|
patience
|
Number of evaluations without improvement before stopping.
TYPE:
|
metric
|
Metric to monitor. Prefix with
TYPE:
|
Examples:
>>> # Stop if validation loss doesn't improve for 5 evaluations
>>> callback = EarlyStoppingCallback(patience=5, metric="-val/loss")
>>>
>>> # Stop if accuracy doesn't improve for 3 evaluations
>>> callback = EarlyStoppingCallback(patience=3, metric="+accuracy")
Note
The best model is saved to the step working directory and automatically restored when training ends early or completes.
Source code in src/formed/integrations/torch/training/callbacks.py
240 241 242 243 244 245 246 247 248 249 | |
on_training_start
¶
on_training_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
259 260 261 262 263 264 265 266 | |
on_eval_end
¶
on_eval_end(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |
on_training_end
¶
on_training_end(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 | |
on_epoch_start
¶
on_epoch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
102 103 104 105 106 107 108 109 | |
on_epoch_end
¶
on_epoch_end(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
111 112 113 114 115 116 117 118 | |
on_batch_start
¶
on_batch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
120 121 122 123 124 125 126 127 | |
on_batch_end
¶
on_batch_end(trainer, model, state, epoch, output)
Source code in src/formed/integrations/torch/training/callbacks.py
129 130 131 132 133 134 135 136 137 | |
on_eval_start
¶
on_eval_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
on_log
¶
on_log(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
170 171 172 173 174 175 176 177 178 | |
MlflowCallback
¶
MlflowCallback()
Bases: TorchTrainingCallback
Callback for logging metrics to MLflow.
This callback automatically logs training and validation metrics to MLflow when used within a workflow step that has MLflow tracking enabled.
Examples:
>>> from formed.integrations.torch import TorchTrainer, MlflowCallback
>>>
>>> trainer = TorchTrainer(
... train_dataloader=train_loader,
... val_dataloader=val_loader,
... callbacks=[MlflowCallback()]
... )
Note
Requires the formed mlflow integration and must be used within a workflow step with MLflow tracking configured.
Source code in src/formed/integrations/torch/training/callbacks.py
376 377 378 379 | |
on_training_start
¶
on_training_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 | |
on_log
¶
on_log(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 | |
on_epoch_end
¶
on_epoch_end(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
428 429 430 431 432 433 434 435 436 437 438 439 440 | |
on_training_end
¶
on_training_end(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
94 95 96 97 98 99 100 | |
on_epoch_start
¶
on_epoch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
102 103 104 105 106 107 108 109 | |
on_batch_start
¶
on_batch_start(trainer, model, state, epoch)
Source code in src/formed/integrations/torch/training/callbacks.py
120 121 122 123 124 125 126 127 | |
on_batch_end
¶
on_batch_end(trainer, model, state, epoch, output)
Source code in src/formed/integrations/torch/training/callbacks.py
129 130 131 132 133 134 135 136 137 | |
on_eval_start
¶
on_eval_start(trainer, model, state)
Source code in src/formed/integrations/torch/training/callbacks.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
on_eval_end
¶
on_eval_end(trainer, model, state, metrics, prefix='')
Source code in src/formed/integrations/torch/training/callbacks.py
160 161 162 163 164 165 166 167 168 | |
formed.integrations.torch.training.engine
¶
Training engine abstractions for PyTorch models.
This module provides the training engine abstraction that defines how models are trained and evaluated. Engines handle loss computation, gradient calculation, and parameter updates.
Key Components
TorchTrainingEngine: Abstract base class for training enginesDefaultTorchTrainingEngine: Default implementation with automatic differentiation
Features
- Customizable loss functions
- Automatic gradient computation using PyTorch autograd
- State creation and management
- Separate train and eval steps
- Compatible with TorchTrainer and distributors
Examples:
>>> from formed.integrations.torch import DefaultTorchTrainingEngine
>>>
>>> # Create engine with custom loss accessor
>>> engine = DefaultTorchTrainingEngine(loss="total_loss")
>>>
>>> # Or with custom loss function
>>> def custom_loss(output):
... return output.loss + 0.1 * output.regularization
>>> engine = DefaultTorchTrainingEngine(loss=custom_loss)
TorchTrainingEngine
¶
Bases: ABC, Registrable, Generic[ModelInputT, ModelOutputT, ModelParamsT]
Abstract base class for PyTorch training engines.
A training engine defines how models are trained by implementing state creation, training steps, and evaluation steps. This allows for custom training loops and loss computations.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
ModelInputT
|
Type of model input.
|
ModelOutputT
|
Type of model output.
|
ModelParamsT
|
Type of additional parameters.
|
create_state
abstractmethod
¶
create_state(trainer, model)
Create initial training state from model and trainer.
| PARAMETER | DESCRIPTION |
|---|---|
trainer
|
Trainer instance.
TYPE:
|
model
|
Model to train.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TrainState
|
Initial training state. |
Source code in src/formed/integrations/torch/training/engine.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | |
train_step
abstractmethod
¶
train_step(inputs, state, trainer)
Execute a single training step.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Batch of training inputs.
TYPE:
|
state
|
Current training state (model and optimizer are updated in-place).
TYPE:
|
trainer
|
Trainer instance.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ModelOutputT
|
Model output. |
Note
This method updates the state in-place for efficiency. The step counter is incremented automatically.
Source code in src/formed/integrations/torch/training/engine.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
eval_step
abstractmethod
¶
eval_step(inputs, state, trainer)
Execute a single evaluation step.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Batch of evaluation inputs.
TYPE:
|
state
|
Current training state.
TYPE:
|
trainer
|
Trainer instance.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ModelOutputT
|
Model output. |
Source code in src/formed/integrations/torch/training/engine.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
DefaultTorchTrainingEngine
¶
DefaultTorchTrainingEngine(
optimizer=None,
lr_scheduler=None,
loss="loss",
gradient_accumulation_steps=1,
max_grad_norm=None,
params=None,
dtype=None,
grad_scaler=None,
)
Bases: TorchTrainingEngine[ModelInputT, ModelOutputT, ModelParamsT]
Default training engine using automatic differentiation.
This engine computes gradients using PyTorch's autograd and updates parameters using the provided optimizer. Loss is extracted from model output either by attribute name or custom function.
| PARAMETER | DESCRIPTION |
|---|---|
optimizer
|
Optimizer factory or instance. Can be a Lazy object, callable that takes model parameters, or an optimizer instance.
TYPE:
|
lr_scheduler
|
Optional learning rate scheduler factory or instance. Can be a Lazy object, callable that takes optimizer, a sequence of schedulers (will be chained), or a scheduler instance.
TYPE:
|
loss
|
Loss accessor - either attribute name (e.g.,
TYPE:
|
gradient_accumulation_steps
|
Number of steps to accumulate gradients before performing an optimizer step.
TYPE:
|
max_grad_norm
|
Maximum gradient norm for clipping. If
TYPE:
|
params
|
Optional additional parameters to pass to the model during training.
TYPE:
|
dtype
|
Data type for mixed precision training (
TYPE:
|
grad_scaler
|
Gradient scaler for mixed precision training.
TYPE:
|
Examples:
>>> # Basic usage with optimizer
>>> engine = DefaultTorchTrainingEngine(
... optimizer=torch.optim.Adam,
... loss="loss"
... )
>>>
>>> # With learning rate scheduler and gradient clipping
>>> engine = DefaultTorchTrainingEngine(
... optimizer=Lazy(cls=torch.optim.Adam, config={"lr": 1e-3}),
... lr_scheduler=Lazy(cls=torch.optim.lr_scheduler.CosineAnnealingLR, config={"T_max": 100}),
... max_grad_norm=1.0,
... loss=lambda output: output.loss + 0.01 * output.regularization
... )
>>>
>>> # With mixed precision training
>>> engine = DefaultTorchTrainingEngine(
... optimizer=torch.optim.AdamW,
... dtype="bfloat16",
... grad_scaler=Lazy(cls=torch.amp.GradScaler),
... max_grad_norm=1.0
... )
Source code in src/formed/integrations/torch/training/engine.py
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | |
create_state
¶
create_state(trainer, model)
Source code in src/formed/integrations/torch/training/engine.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
train_step
¶
train_step(inputs, state, trainer)
Source code in src/formed/integrations/torch/training/engine.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | |
eval_step
¶
eval_step(inputs, state, trainer)
Source code in src/formed/integrations/torch/training/engine.py
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | |
get_default_optimizer_factory
¶
get_default_optimizer_factory()
Get a default optimizer factory (Adam with lr=1e-3).
Source code in src/formed/integrations/torch/training/engine.py
138 139 140 | |
get_default_lr_scheduler_factory
¶
get_default_lr_scheduler_factory()
Get a default learning rate scheduler factory (None).
Source code in src/formed/integrations/torch/training/engine.py
143 144 145 | |
formed.integrations.torch.training.exceptions
¶
StopEarly
¶
Bases: Exception
Raised to stop training early.
formed.integrations.torch.training.state
¶
Training state management for PyTorch models.
This module provides a training state class that encapsulates model parameters, optimizer state, and training progress for PyTorch models.
TrainState
¶
TrainState(
model,
optimizer,
step=0,
lr_scheduler=None,
grad_scaler=None,
)
Training state for PyTorch models.
This class encapsulates the training state including model, optimizer, learning rate scheduler, and training progress counters. Unlike the Flax version, this directly holds references to the model and optimizer for efficiency.
| ATTRIBUTE | DESCRIPTION |
|---|---|
model |
The PyTorch model being trained.
|
optimizer |
The optimizer for training.
|
lr_scheduler |
Optional learning rate scheduler.
|
step |
Training step counter.
|
grad_scaler |
Optional gradient scaler for mixed precision training.
|
Examples:
>>> # Create state from model and optimizer
>>> state = TrainState(
... model=model,
... optimizer=optimizer,
... step=0
... )
>>>
>>> # Access model and optimizer directly
>>> state.model.train()
>>> state.optimizer.zero_grad()
Source code in src/formed/integrations/torch/training/state.py
45 46 47 48 49 50 51 52 53 54 55 56 57 | |
state_dict
¶
state_dict()
Get state dictionary for serialization.
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary containing model state, |
Source code in src/formed/integrations/torch/training/state.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | |
load_state_dict
¶
load_state_dict(state_dict)
Load state from dictionary.
| PARAMETER | DESCRIPTION |
|---|---|
state_dict
|
Dictionary containing model state, optimizer state, lr_scheduler state (optional), grad_scaler state (optional), and step.
TYPE:
|
Source code in src/formed/integrations/torch/training/state.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
get_learning_rate
¶
get_learning_rate()
Get current learning rate from optimizer.
| RETURNS | DESCRIPTION |
|---|---|
Optional[float]
|
Current learning rate from the first parameter group, or |
Examples:
>>> lr = state.get_learning_rate()
>>> if lr is not None:
... print(f"Current learning rate: {lr}")
Source code in src/formed/integrations/torch/training/state.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
get_gradient_norm
¶
get_gradient_norm()
Compute L2 norm of all gradients.
| RETURNS | DESCRIPTION |
|---|---|
Optional[float]
|
L2 norm of all parameter gradients, or |
Examples:
>>> grad_norm = state.get_gradient_norm()
>>> if grad_norm is not None:
... print(f"Gradient norm: {grad_norm:.4f}")
Note
This method computes the gradient norm on-demand. It should be called
after backward() but before optimizer.step() or zero_grad() to get
meaningful results.
Source code in src/formed/integrations/torch/training/state.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
formed.integrations.torch.training.trainer
¶
High-level trainer for PyTorch models.
This module provides the TorchTrainer class, which orchestrates the complete training process for PyTorch models including data loading, optimization, evaluation, callbacks, and distributed training.
Key Features
- Flexible training loop with epoch and step-based logging/evaluation
- Support for callbacks at various training stages
- Distributed training via data parallelism
- Integration with PyTorch optimizers
- Rich progress bars with training metrics
- Early stopping and checkpointing
- MLflow integration
Examples:
>>> from formed.integrations.torch import (
... TorchTrainer,
... EvaluationCallback,
... EarlyStoppingCallback
... )
>>> from formed.integrations.ml import DataLoader, BasicBatchSampler
>>> import torch.optim as optim
>>>
>>> # Setup data loaders and engine
>>> train_dataloader = DataLoader(
... sampler=BasicBatchSampler(batch_size=32, shuffle=True),
... collator=datamodule.batch
... )
>>> engine = DefaultTorchTrainingEngine(
... optimizer=optim.Adam,
... lr_scheduler=optim.lr_scheduler.StepLR
... )
>>>
>>> # Create trainer
>>> trainer = TorchTrainer(
... train_dataloader=train_dataloader,
... val_dataloader=val_dataloader,
... engine=engine,
... max_epochs=10,
... callbacks=[
... EvaluationCallback(my_evaluator),
... EarlyStoppingCallback(patience=3)
... ]
... )
>>>
>>> # Train model
>>> state = trainer.train(model, train_dataset, val_dataset)
TorchTrainer
¶
TorchTrainer(
*,
train_dataloader,
val_dataloader=None,
engine=None,
callbacks=(),
distributor=None,
max_epochs=None,
eval_strategy="epoch",
eval_interval=1,
logging_strategy="epoch",
logging_interval=1,
logging_first_step=True,
train_prefix="train/",
val_prefix="val/",
)
Bases: Generic[ItemT, ModelInputT, ModelOutputT, ModelParamsT]
High-level trainer for PyTorch models.
TorchTrainer provides a complete training loop with support for distributed training, callbacks, evaluation, and metric logging. It handles the coordination of data loading, model training, evaluation, and callback execution.
| CLASS TYPE PARAMETER | DESCRIPTION |
|---|---|
ItemT
|
Type of raw dataset items.
|
ModelInputT
|
Type of batched model inputs.
|
ModelOutputT
|
Type of model outputs.
|
ModelParamsT
|
Type of additional model parameters.
|
| PARAMETER | DESCRIPTION |
|---|---|
train_dataloader
|
Data loader for training dataset.
TYPE:
|
val_dataloader
|
Optional data loader for validation dataset.
TYPE:
|
engine
|
Training engine (defaults to
TYPE:
|
callbacks
|
Sequence of training callbacks.
TYPE:
|
distributor
|
Device distributor (defaults to
TYPE:
|
max_epochs
|
Maximum number of training epochs.
TYPE:
|
eval_strategy
|
When to evaluate -
TYPE:
|
eval_interval
|
Evaluation interval (epochs or steps).
TYPE:
|
logging_strategy
|
When to log -
TYPE:
|
logging_interval
|
Logging interval (epochs or steps).
TYPE:
|
logging_first_step
|
Whether to log after the first training step.
TYPE:
|
train_prefix
|
Prefix for training metrics logging. Default is
TYPE:
|
val_prefix
|
Prefix for validation metrics logging. Default is
TYPE:
|
Examples:
>>> engine = DefaultTorchTrainingEngine(
... optimizer=torch.optim.Adam,
... lr_scheduler=torch.optim.lr_scheduler.StepLR
... )
>>> trainer = TorchTrainer(
... train_dataloader=train_loader,
... val_dataloader=val_loader,
... engine=engine,
... max_epochs=10,
... eval_strategy="epoch",
... logging_strategy="step",
... logging_interval=100
... )
Source code in src/formed/integrations/torch/training/trainer.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
train
¶
train(model, train_dataset, val_dataset=None, state=None)
Train a model on the provided datasets.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Model to train.
TYPE:
|
train_dataset
|
Sequence of training items.
TYPE:
|
val_dataset
|
Optional sequence of validation items.
TYPE:
|
state
|
Optional pre-initialized training state (for resuming).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TrainState
|
Final training state with trained parameters. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Examples:
>>> state = trainer.train(
... model, train_items, val_items
... )
>>> # Load trained parameters
>>> model.load_state_dict(state.model_state)
Source code in src/formed/integrations/torch/training/trainer.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |
get_default_max_epochs
¶
get_default_max_epochs()
Get a default maximum number of training epochs.
Source code in src/formed/integrations/torch/training/trainer.py
81 82 83 | |
get_default_distributor
¶
get_default_distributor()
Get a default single-device distributor.
Source code in src/formed/integrations/torch/training/trainer.py
86 87 88 | |
formed.integrations.torch.workflow
¶
Workflow integration for PyTorch model training.
This module provides workflow steps for training PyTorch models, allowing them to be integrated into the formed workflow system with automatic caching and dependency tracking.
Available Steps
torch::train: Train a PyTorch model using the provided trainer.torch::evaluate: Evaluate a PyTorch model on a dataset.torch::predict: Generate predictions on a dataset using a PyTorch model.torch::predict_without_caching: Generate predictions without caching (same astorch::predictbut uncached).
Examples:
>>> from formed.integrations.torch import train_torch_model
>>>
>>> # In workflow configuration (jsonnet):
>>> # {
>>> # steps: {
>>> # train: {
>>> # type: "torch::train",
>>> # model: { type: "my_model", ... },
>>> # trainer: { type: "torch_trainer", ... },
>>> # train_dataset: { type: "ref", ref: "preprocess" },
>>> # random_seed: 42
>>> # }
>>> # }
>>> # }
TorchModelFormat
¶
Bases: Format[_ModelT]
identifier
property
¶
identifier
Get the unique identifier for this format.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Format identifier string. |
write
¶
write(artifact, directory)
Source code in src/formed/integrations/torch/workflow.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |
read
¶
read(directory)
Source code in src/formed/integrations/torch/workflow.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
is_default_of
classmethod
¶
is_default_of(obj)
Check if this format is the default for the given object type.
| PARAMETER | DESCRIPTION |
|---|---|
obj
|
Object to check.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if this format should be used by default for this type. |
Source code in src/formed/workflow/format.py
101 102 103 104 105 106 107 108 109 110 111 112 | |
train_torch_model
¶
train_torch_model(
model,
trainer,
train_dataset,
val_dataset=None,
random_seed=0,
)
Train a PyTorch model using the provided trainer.
This workflow step trains a PyTorch model on the provided datasets, returning the trained model. The training process is cached based on the model architecture, trainer configuration, and dataset fingerprints.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
PyTorch model to train.
TYPE:
|
trainer
|
Trainer configuration with dataloaders and callbacks.
TYPE:
|
train_dataset
|
Training dataset items.
TYPE:
|
val_dataset
|
Optional validation dataset items.
TYPE:
|
random_seed
|
Random seed for reproducibility.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BaseTorchModel
|
Trained PyTorch model with updated parameters. |
Examples:
>>> # Use in Python code
>>> trained_model = train_torch_model(
... model=my_model,
... trainer=trainer,
... train_dataset=train_data,
... val_dataset=val_data,
... random_seed=42
... )
Source code in src/formed/integrations/torch/workflow.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
evaluate_torch_model
¶
evaluate_torch_model(
model,
evaluator,
dataset,
dataloader,
params=None,
random_seed=None,
device=None,
)
Evaluate a PyTorch model on a dataset using the provided evaluator.
This workflow step evaluates a PyTorch model on the provided dataset, computing metrics using the evaluator. Evaluation is performed in evaluation mode (no gradient computation).
| PARAMETER | DESCRIPTION |
|---|---|
model
|
PyTorch model to evaluate.
TYPE:
|
evaluator
|
Evaluator to compute metrics.
TYPE:
|
dataset
|
Dataset items for evaluation.
TYPE:
|
dataloader
|
DataLoader to convert items to model inputs.
TYPE:
|
params
|
Optional model parameters to use for evaluation.
TYPE:
|
random_seed
|
Optional random seed for reproducibility.
TYPE:
|
device
|
Optional device (e.g.,
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, float]
|
Dictionary of computed evaluation metrics. |
Examples:
>>> # Use in Python code
>>> metrics = evaluate_torch_model(
... model=trained_model,
... evaluator=my_evaluator,
... dataset=test_data,
... dataloader=test_loader
... )
Source code in src/formed/integrations/torch/workflow.py
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
predict
¶
predict(
dataset,
dataloader,
model,
postprocessor,
params=None,
device=None,
random_seed=None,
)
Generate predictions on a dataset using a PyTorch model.
This step applies a model to a dataset and postprocesses the outputs to generate final predictions.
| PARAMETER | DESCRIPTION |
|---|---|
dataset
|
Dataset items for prediction.
TYPE:
|
dataloader
|
DataLoader to convert items to model inputs.
TYPE:
|
model
|
PyTorch model to use for prediction.
TYPE:
|
postprocessor
|
Function to convert model outputs to final results.
TYPE:
|
params
|
Optional model parameters to use for prediction.
TYPE:
|
device
|
Optional device (e.g.,
TYPE:
|
random_seed
|
Optional random seed for reproducibility.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Iterator[_ResultT]
|
Iterator of prediction results. |
Source code in src/formed/integrations/torch/workflow.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 | |