Skip to content

unaiverse.modules.hl.hl_utils

What this module does 🔴

Implements Hamiltonian Learning primitives: the HL optimizer class plus a library of helpers for Euler integration, state/costate initialization, and structure-preserving tensor operations (copy, zero, detach, apply) over nested parameter dicts.

hl_utils

█████ █████ ██████ █████ █████ █████ █████ ██████████ ███████████ █████████ ██████████ ░░███ ░░███ ░░██████ ░░███ ░░███ ░░███ ░░███ ░░███░░░░░█░░███░░░░░███ ███░░░░░███░░███░░░░░█ ░███ ░███ ░███░███ ░███ ██████ ░███ ░███ ░███ ░███ █ ░ ░███ ░███ ░███ ░░░ ░███ █ ░ ░███ ░███ ░███░░███░███ ░░░░░███ ░███ ░███ ░███ ░██████ ░██████████ ░░█████████ ░██████
░███ ░███ ░███ ░░██████ ███████ ░███ ░░███ ███ ░███░░█ ░███░░░░░███ ░░░░░░░░███ ░███░░█
░███ ░███ ░███ ░░█████ ███░░███ ░███ ░░░█████░ ░███ ░ █ ░███ ░███ ███ ░███ ░███ ░ █ ░░████████ █████ ░░█████░░████████ █████ ░░███ ██████████ █████ █████░░█████████ ██████████ ░░░░░░░░ ░░░░░ ░░░░░ ░░░░░░░░ ░░░░░ ░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░░░░░ ░░░░░░░░░░ A Collectionless AI Project (https://collectionless.ai) Registration/Login: https://unaiverse.io Code Repositories: https://github.com/collectionlessai/ Main Developers: Stefano Melacci (Project Leader), Christian Di Maio, Tommaso Guidi

HL

HL(models: Module | Iterable[Dict[str, Module | Any]], *, gamma=1.0, flip=-1.0, theta=0.1, beta=1.0, reset_neuron_costate=False, reset_weight_costate=False, local=True)

Hamiltonian Learning (HL) optimizer for UNaIVERSE neural network modules.

HL implements the Hamiltonian Learning update rule, a biologically motivated optimization algorithm based on Hamiltonian mechanics. Rather than following standard back-propagation with a gradient-descent rule, it jointly evolves the network state (xi) and a momentum-like co-state (p) according to Hamilton's equations of motion, using an Euler discretization with step size delta (read from the model).

Each model is treated as an independent parameter group (analogous to parameter groups in torch.optim.Optimizer). Hyperparameters such as gamma, flip, theta, and beta can be set globally (as keyword arguments) or overridden per group by providing a list of dicts. This mirrors the PyTorch optimizer API so that HL can be used as a drop-in optimizer-like object.

The optimizer distinguishes between two variants:

  • Local HL (local=True): weights are updated using the co-state from the previous step, then the co-state is updated. This is a symmetric, Jacobi-style update.
  • Non-local HL (local=False): the co-state is updated first (using the current gradient), and then that updated co-state is used to update the weights. This is an asymmetric, Gauss-Seidel-style update.

Attributes:

Name Type Description
param_groups

List of parameter-group dictionaries, one per model, each containing all hyperparameters together with a "params" key that holds the associated nn.Module.

state

List of optimizer-state dictionaries, one per parameter group. Each dictionary contains "x" (state: xi and weights w) and "p" (co-state: xi and weights w).

Examples:

Basic usage with a single model:

>>> import torch
>>> from unaiverse.modules.hl.hl_utils import HL
>>> model = MyHLModule()  # an nn.Module exposing .h, .dh, .delta, .h_init
>>> optimizer = HL(model, gamma=1.0, flip=-1.0, theta=0.1, beta=1.0)
>>> for x, y in dataloader:
...     optimizer.zero_grad()
...     loss = model(x, y)
...     ham = optimizer.compute_hamiltonian(loss)
...     ham.backward()
...     optimizer.step()

Multiple models with per-group hyperparameters:

>>> optimizer = HL([
...     {'params': model_a, 'gamma': 0.5, 'beta': 0.8},
...     {'params': model_b, 'gamma': 1.0, 'beta': 1.0},
... ])

Initialize the HL optimizer with one or more parameter groups.

When models is a single nn.Module, it is automatically wrapped into a one-element parameter group using the keyword-argument defaults. When models is an iterable of dicts, each dict must contain a "params" key holding the model, and any hyperparameter keys present in the dict override the keyword-argument defaults for that group. The optimizer state (co-state tensors) for every group is initialized to zero via _init_state_and_costate.

Parameters:

Name Type Description Default
models Module | Iterable[Dict[str, Module | Any]]

Either a single nn.Module or an iterable of dicts. Each dict must have a "params" key whose value is an nn.Module. Additional keys override the default hyperparameters for that group.

required
gamma

Scaling factor applied to the potential term inside the Hamiltonian. Defaults to 1.0.

1.0
flip

Sign of the Hamiltonian flow (-1 for standard gradient descent direction, +1 for ascent). Defaults to -1.0.

-1.0
theta

Weight-decay-like regularization coefficient applied to the co-state update. Defaults to 0.1.

0.1
beta

Step-size scaling factor for the weight update from the co-state. Defaults to 1.0.

1.0
reset_neuron_costate

If True, the neuron co-state (p['xi']) is zeroed at every zero_grad call. Defaults to False.

False
reset_weight_costate

If True, the weight co-state (p['w']) is zeroed at every zero_grad call. Defaults to False.

False
local

If True, use the local (Jacobi-style) HL update where weights are updated with the previous co-state before the co-state is refreshed. If False, the co-state is updated first (Gauss-Seidel style). Defaults to True.

True

Raises:

Type Description
AssertionError

If any element of models (when it is an iterable of dicts) does not contain the required "params" key.

Source code in unaiverse/modules/hl/hl_utils.py
def __init__(self, models: torch.nn.Module | Iterable[Dict[str, torch.nn.Module | Any]], *,
             gamma=1., flip=-1., theta=0.1, beta=1., reset_neuron_costate=False, reset_weight_costate=False,
             local=True):
    """Initialize the HL optimizer with one or more parameter groups.

    When ``models`` is a single ``nn.Module``, it is automatically wrapped into a
    one-element parameter group using the keyword-argument defaults. When ``models``
    is an iterable of dicts, each dict must contain a ``"params"`` key holding the
    model, and any hyperparameter keys present in the dict override the keyword-argument
    defaults for that group. The optimizer state (co-state tensors) for every group
    is initialized to zero via ``_init_state_and_costate``.

    Args:
        models: Either a single ``nn.Module`` or an iterable of dicts. Each dict
            must have a ``"params"`` key whose value is an ``nn.Module``. Additional
            keys override the default hyperparameters for that group.
        gamma: Scaling factor applied to the potential term inside the Hamiltonian.
            Defaults to 1.0.
        flip: Sign of the Hamiltonian flow (-1 for standard gradient descent
            direction, +1 for ascent). Defaults to -1.0.
        theta: Weight-decay-like regularization coefficient applied to the co-state
            update. Defaults to 0.1.
        beta: Step-size scaling factor for the weight update from the co-state.
            Defaults to 1.0.
        reset_neuron_costate: If True, the neuron co-state (``p['xi']``) is zeroed
            at every ``zero_grad`` call. Defaults to False.
        reset_weight_costate: If True, the weight co-state (``p['w']``) is zeroed
            at every ``zero_grad`` call. Defaults to False.
        local: If True, use the local (Jacobi-style) HL update where weights are
            updated with the previous co-state before the co-state is refreshed.
            If False, the co-state is updated first (Gauss-Seidel style).
            Defaults to True.

    Raises:
        AssertionError: If any element of ``models`` (when it is an iterable of
            dicts) does not contain the required ``"params"`` key.
    """

    # Set defaults
    defaults = dict(params=None, gamma=gamma, flip=flip, theta=theta, beta=beta,
                    reset_neuron_costate=reset_neuron_costate, reset_weight_costate=reset_weight_costate,
                    local=local)

    # Ensure models is a list of dicts and assign the specified values
    if isinstance(models, torch.nn.Module):
        models = [{**defaults, 'params': models}]

    self.param_groups = []
    for group in models:
        assert 'params' in group, "Each parameter group must contain a 'params' key storing the model."
        self.param_groups.append({**defaults, **group})

    # Store the optimizer state for each model in a list of dicts, not to be confused with the state of the model
    self.state = [_init_state_and_costate(group['params']) for group in self.param_groups]

param_groups instance-attribute

param_groups = []

state instance-attribute

state = [(_init_state_and_costate(group['params'])) for group in (param_groups)]

step

step()

Perform one Hamiltonian Learning update step for all parameter groups.

For each parameter group, the method reads the current model state (model.h) and its gradient (obtained via _get_grad), then jointly evolves the neuron co-state and the weight co-state using an Euler integration step of size model.delta. Weight parameters are then updated from the (possibly already-updated) co-state using a second Euler step scaled by beta.

The update order depends on the local flag:

  • Local (local=True): weights are updated using the co-state from the previous step; the co-state is updated after the weight update.
  • Non-local (local=False): the co-state is updated before the weight update, so weights see the freshly computed co-state.

The neuron co-state p['xi'] is always updated with flip * theta decay applied. Weight decay for the weight update itself is disabled (decay=None).

Note

This method is decorated with @torch.no_grad() and therefore does not accumulate gradients during the parameter update. Call zero_grad before the forward pass and compute_hamiltonian(...).backward() before calling step.

Source code in unaiverse/modules/hl/hl_utils.py
@torch.no_grad()
def step(self):
    """Perform one Hamiltonian Learning update step for all parameter groups.

    For each parameter group, the method reads the current model state (``model.h``)
    and its gradient (obtained via ``_get_grad``), then jointly evolves the neuron
    co-state and the weight co-state using an Euler integration step of size
    ``model.delta``. Weight parameters are then updated from the (possibly
    already-updated) co-state using a second Euler step scaled by ``beta``.

    The update order depends on the ``local`` flag:

    - **Local** (``local=True``): weights are updated using the co-state from the
      *previous* step; the co-state is updated *after* the weight update.
    - **Non-local** (``local=False``): the co-state is updated *before* the weight
      update, so weights see the freshly computed co-state.

    The neuron co-state ``p['xi']`` is always updated with ``flip * theta`` decay
    applied. Weight decay for the weight update itself is disabled (``decay=None``).

    Note:
        This method is decorated with ``@torch.no_grad()`` and therefore does not
        accumulate gradients during the parameter update. Call ``zero_grad`` before
        the forward pass and ``compute_hamiltonian(...).backward()`` before calling
        ``step``.
    """

    for group, state in zip(self.param_groups, self.state):
        model = group['params']
        delta = model.delta

        # Copy the state (of the model) just to track it during the optimization and get the costate
        # the locality of these operations is handled by the model
        state['x']['xi'] = model.h
        dp_xi = _get_grad(model.h)
        _euler_step(state['p']['xi'], dp_xi, step_size=-delta * group['flip'],
                    decay=-group['flip'] * group['theta'], in_place=True)

        # Copy the weights from the network just to track it during the optimization and get the costates
        dp_w = {}
        for name, param in model.named_parameters():
            state['x']['w'][name] = param
            dp_w[name] = _get_grad(param)

        if group['local']:

            # Local HL uses the old costates to update the weights
            d_w = state['p']['w']
            _euler_step(state['x']['w'], d_w, step_size=-delta*group['beta'], decay=None, in_place=True)
            _euler_step(state['p']['w'], dp_w, step_size=-delta*group['flip'],
                        decay=-group['flip']*group['theta'], in_place=True)
        else:

            # Non-local HL updates the costates before updating the weights
            d_w = _euler_step(state['p']['w'], dp_w, step_size=-delta * group['flip'],
                              decay=-group['flip'] * group['theta'], in_place=True)
            _euler_step(state['x']['w'], d_w, step_size=-delta * group['beta'], decay=None, in_place=True)

compute_hamiltonian

compute_hamiltonian(*potential_terms: Tensor) -> Tensor

Compute the total Hamiltonian across all parameter groups.

The Hamiltonian for each group is defined as::

H_i = gamma_i * V_i + <dh_i, p_xi_i>

where V_i is the i-th potential term (typically a task loss or energy function), dh_i is the time derivative of the neuron state (model.dh), and p_xi_i is the neuron co-state. The kinetic term is the real part of the dot product between dh and p['xi'], both flattened to 1-D vectors. The contributions from all groups are summed into a single scalar tensor.

The returned tensor is suitable for calling .backward() on, which will populate .grad fields used by step.

Parameters:

Name Type Description Default
*potential_terms Tensor

One scalar torch.Tensor per parameter group, in the same order as param_groups. Each tensor represents the potential energy (e.g. a loss value) for the corresponding model.

()

Returns:

Type Description
Tensor

A scalar torch.Tensor containing the total Hamiltonian value summed

Tensor

over all parameter groups.

Raises:

Type Description
AssertionError

If the number of potential_terms does not equal the number of parameter groups in self.param_groups.

Examples:

>>> ham = optimizer.compute_hamiltonian(loss)
>>> ham.backward()
>>> optimizer.step()
Source code in unaiverse/modules/hl/hl_utils.py
def compute_hamiltonian(self, *potential_terms: torch.Tensor) -> torch.Tensor:
    """Compute the total Hamiltonian across all parameter groups.

    The Hamiltonian for each group is defined as::

        H_i = gamma_i * V_i + <dh_i, p_xi_i>

    where ``V_i`` is the ``i``-th potential term (typically a task loss or energy
    function), ``dh_i`` is the time derivative of the neuron state (``model.dh``),
    and ``p_xi_i`` is the neuron co-state. The kinetic term is the real part of the
    dot product between ``dh`` and ``p['xi']``, both flattened to 1-D vectors. The
    contributions from all groups are summed into a single scalar tensor.

    The returned tensor is suitable for calling ``.backward()`` on, which will
    populate ``.grad`` fields used by ``step``.

    Args:
        *potential_terms: One scalar ``torch.Tensor`` per parameter group, in the
            same order as ``param_groups``. Each tensor represents the potential
            energy (e.g. a loss value) for the corresponding model.

    Returns:
        A scalar ``torch.Tensor`` containing the total Hamiltonian value summed
        over all parameter groups.

    Raises:
        AssertionError: If the number of ``potential_terms`` does not equal the
            number of parameter groups in ``self.param_groups``.

    Examples:
        >>> ham = optimizer.compute_hamiltonian(loss)
        >>> ham.backward()
        >>> optimizer.step()
    """

    # The number of potential terms provided should be equal to the number of models
    assert len(potential_terms) == len(self.param_groups), f"A potential term for each model is expected."

    ham = torch.tensor(0., dtype=potential_terms[0].dtype, device=potential_terms[0].device)
    for group, state, potential_term in zip(self.param_groups, self.state, potential_terms):
        model = group['params']
        ham += group['gamma'] * potential_term + torch.dot(model.dh.view(-1), state['p']['xi'].view(-1)).real
    return ham

zero_grad

zero_grad(set_to_none: bool = False) -> None

Zero the gradients of all model parameters and optionally reset co-states.

For each parameter group, the gradient of the neuron state tensor (model.h) and the gradients of all learnable parameters are zeroed using _zero_grad. If reset_neuron_costate is True for a group, the neuron co-state (p['xi']) is also zeroed in place. Likewise, if reset_weight_costate is True, the weight co-state (p['w']) is zeroed in place.

This method should be called at the beginning of each training iteration, before the forward pass, to prevent gradient accumulation across steps.

Parameters:

Name Type Description Default
set_to_none bool

If True, gradient tensors are set to None instead of being filled with zeros. Setting to None can reduce memory usage and improve performance in some cases, but may break code that expects .grad to always be a tensor. Defaults to False.

False
Source code in unaiverse/modules/hl/hl_utils.py
def zero_grad(self, set_to_none: bool = False) -> None:
    """Zero the gradients of all model parameters and optionally reset co-states.

    For each parameter group, the gradient of the neuron state tensor (``model.h``)
    and the gradients of all learnable parameters are zeroed using ``_zero_grad``.
    If ``reset_neuron_costate`` is True for a group, the neuron co-state (``p['xi']``)
    is also zeroed in place. Likewise, if ``reset_weight_costate`` is True, the weight
    co-state (``p['w']``) is zeroed in place.

    This method should be called at the beginning of each training iteration, before
    the forward pass, to prevent gradient accumulation across steps.

    Args:
        set_to_none: If True, gradient tensors are set to ``None`` instead of being
            filled with zeros. Setting to ``None`` can reduce memory usage and improve
            performance in some cases, but may break code that expects ``.grad`` to
            always be a tensor. Defaults to False.
    """

    for group, state in zip(self.param_groups, self.state):
        model = group['params']
        _zero_grad(model.h, set_to_none)
        for param in model.parameters():
            _zero_grad(param, set_to_none)

        # Eventually reset costates
        if group['reset_neuron_costate']:
            _zero_inplace(state['p']['xi'], detach=True)
        if group['reset_weight_costate']:
            _zero_inplace(state['p']['w'], detach=True)