unaiverse.modules.hl.hl_utils
What this module does 🔴
Implements Hamiltonian Learning primitives: the HL optimizer class plus a library of helpers for Euler integration, state/costate initialization, and structure-preserving tensor operations (copy, zero, detach, apply) over nested parameter dicts.
hl_utils
¶
█████ █████ ██████ █████ █████ █████ █████ ██████████ ███████████ █████████ ██████████
░░███ ░░███ ░░██████ ░░███ ░░███ ░░███ ░░███ ░░███░░░░░█░░███░░░░░███ ███░░░░░███░░███░░░░░█
░███ ░███ ░███░███ ░███ ██████ ░███ ░███ ░███ ░███ █ ░ ░███ ░███ ░███ ░░░ ░███ █ ░
░███ ░███ ░███░░███░███ ░░░░░███ ░███ ░███ ░███ ░██████ ░██████████ ░░█████████ ░██████
░███ ░███ ░███ ░░██████ ███████ ░███ ░░███ ███ ░███░░█ ░███░░░░░███ ░░░░░░░░███ ░███░░█
░███ ░███ ░███ ░░█████ ███░░███ ░███ ░░░█████░ ░███ ░ █ ░███ ░███ ███ ░███ ░███ ░ █
░░████████ █████ ░░█████░░████████ █████ ░░███ ██████████ █████ █████░░█████████ ██████████
░░░░░░░░ ░░░░░ ░░░░░ ░░░░░░░░ ░░░░░ ░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░░░░░ ░░░░░░░░░░
A Collectionless AI Project (https://collectionless.ai)
Registration/Login: https://unaiverse.io
Code Repositories: https://github.com/collectionlessai/
Main Developers: Stefano Melacci (Project Leader), Christian Di Maio, Tommaso Guidi
HL
¶
HL(models: Module | Iterable[Dict[str, Module | Any]], *, gamma=1.0, flip=-1.0, theta=0.1, beta=1.0, reset_neuron_costate=False, reset_weight_costate=False, local=True)
Hamiltonian Learning (HL) optimizer for UNaIVERSE neural network modules.
HL implements the Hamiltonian Learning update rule, a biologically motivated
optimization algorithm based on Hamiltonian mechanics. Rather than following
standard back-propagation with a gradient-descent rule, it jointly evolves the
network state (xi) and a momentum-like co-state (p) according to Hamilton's
equations of motion, using an Euler discretization with step size delta (read
from the model).
Each model is treated as an independent parameter group (analogous to parameter
groups in torch.optim.Optimizer). Hyperparameters such as gamma, flip,
theta, and beta can be set globally (as keyword arguments) or overridden
per group by providing a list of dicts. This mirrors the PyTorch optimizer API so
that HL can be used as a drop-in optimizer-like object.
The optimizer distinguishes between two variants:
- Local HL (
local=True): weights are updated using the co-state from the previous step, then the co-state is updated. This is a symmetric, Jacobi-style update. - Non-local HL (
local=False): the co-state is updated first (using the current gradient), and then that updated co-state is used to update the weights. This is an asymmetric, Gauss-Seidel-style update.
Attributes:
| Name | Type | Description |
|---|---|---|
param_groups |
List of parameter-group dictionaries, one per model, each
containing all hyperparameters together with a |
|
state |
List of optimizer-state dictionaries, one per parameter group. Each
dictionary contains |
Examples:
Basic usage with a single model:
>>> import torch
>>> from unaiverse.modules.hl.hl_utils import HL
>>> model = MyHLModule() # an nn.Module exposing .h, .dh, .delta, .h_init
>>> optimizer = HL(model, gamma=1.0, flip=-1.0, theta=0.1, beta=1.0)
>>> for x, y in dataloader:
... optimizer.zero_grad()
... loss = model(x, y)
... ham = optimizer.compute_hamiltonian(loss)
... ham.backward()
... optimizer.step()
Multiple models with per-group hyperparameters:
>>> optimizer = HL([
... {'params': model_a, 'gamma': 0.5, 'beta': 0.8},
... {'params': model_b, 'gamma': 1.0, 'beta': 1.0},
... ])
Initialize the HL optimizer with one or more parameter groups.
When models is a single nn.Module, it is automatically wrapped into a
one-element parameter group using the keyword-argument defaults. When models
is an iterable of dicts, each dict must contain a "params" key holding the
model, and any hyperparameter keys present in the dict override the keyword-argument
defaults for that group. The optimizer state (co-state tensors) for every group
is initialized to zero via _init_state_and_costate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
models
|
Module | Iterable[Dict[str, Module | Any]]
|
Either a single |
required |
gamma
|
Scaling factor applied to the potential term inside the Hamiltonian. Defaults to 1.0. |
1.0
|
|
flip
|
Sign of the Hamiltonian flow (-1 for standard gradient descent direction, +1 for ascent). Defaults to -1.0. |
-1.0
|
|
theta
|
Weight-decay-like regularization coefficient applied to the co-state update. Defaults to 0.1. |
0.1
|
|
beta
|
Step-size scaling factor for the weight update from the co-state. Defaults to 1.0. |
1.0
|
|
reset_neuron_costate
|
If True, the neuron co-state ( |
False
|
|
reset_weight_costate
|
If True, the weight co-state ( |
False
|
|
local
|
If True, use the local (Jacobi-style) HL update where weights are updated with the previous co-state before the co-state is refreshed. If False, the co-state is updated first (Gauss-Seidel style). Defaults to True. |
True
|
Raises:
| Type | Description |
|---|---|
AssertionError
|
If any element of |
Source code in unaiverse/modules/hl/hl_utils.py
state
instance-attribute
¶
step
¶
Perform one Hamiltonian Learning update step for all parameter groups.
For each parameter group, the method reads the current model state (model.h)
and its gradient (obtained via _get_grad), then jointly evolves the neuron
co-state and the weight co-state using an Euler integration step of size
model.delta. Weight parameters are then updated from the (possibly
already-updated) co-state using a second Euler step scaled by beta.
The update order depends on the local flag:
- Local (
local=True): weights are updated using the co-state from the previous step; the co-state is updated after the weight update. - Non-local (
local=False): the co-state is updated before the weight update, so weights see the freshly computed co-state.
The neuron co-state p['xi'] is always updated with flip * theta decay
applied. Weight decay for the weight update itself is disabled (decay=None).
Note
This method is decorated with @torch.no_grad() and therefore does not
accumulate gradients during the parameter update. Call zero_grad before
the forward pass and compute_hamiltonian(...).backward() before calling
step.
Source code in unaiverse/modules/hl/hl_utils.py
compute_hamiltonian
¶
Compute the total Hamiltonian across all parameter groups.
The Hamiltonian for each group is defined as::
H_i = gamma_i * V_i + <dh_i, p_xi_i>
where V_i is the i-th potential term (typically a task loss or energy
function), dh_i is the time derivative of the neuron state (model.dh),
and p_xi_i is the neuron co-state. The kinetic term is the real part of the
dot product between dh and p['xi'], both flattened to 1-D vectors. The
contributions from all groups are summed into a single scalar tensor.
The returned tensor is suitable for calling .backward() on, which will
populate .grad fields used by step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*potential_terms
|
Tensor
|
One scalar |
()
|
Returns:
| Type | Description |
|---|---|
Tensor
|
A scalar |
Tensor
|
over all parameter groups. |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the number of |
Examples:
Source code in unaiverse/modules/hl/hl_utils.py
zero_grad
¶
Zero the gradients of all model parameters and optionally reset co-states.
For each parameter group, the gradient of the neuron state tensor (model.h)
and the gradients of all learnable parameters are zeroed using _zero_grad.
If reset_neuron_costate is True for a group, the neuron co-state (p['xi'])
is also zeroed in place. Likewise, if reset_weight_costate is True, the weight
co-state (p['w']) is zeroed in place.
This method should be called at the beginning of each training iteration, before the forward pass, to prevent gradient accumulation across steps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
set_to_none
|
bool
|
If True, gradient tensors are set to |
False
|