Skip to content

unaiverse.modules.networks

What this module does 🔴

Central catalog of ready-to-use neural network architectures wrapped as ModuleWrapper subclasses: RNN/state-space token language models, CNU-augmented CNNs, torchvision backbones (ResNet, ViT, DenseNet, EfficientNet, FasterRCNN), HuggingFace LLMs/VLMs, and API-backed model wrappers.

networks

█████ █████ ██████ █████ █████ █████ █████ ██████████ ███████████ █████████ ██████████ ░░███ ░░███ ░░██████ ░░███ ░░███ ░░███ ░░███ ░░███░░░░░█░░███░░░░░███ ███░░░░░███░░███░░░░░█ ░███ ░███ ░███░███ ░███ ██████ ░███ ░███ ░███ ░███ █ ░ ░███ ░███ ░███ ░░░ ░███ █ ░ ░███ ░███ ░███░░███░███ ░░░░░███ ░███ ░███ ░███ ░██████ ░██████████ ░░█████████ ░██████
░███ ░███ ░███ ░░██████ ███████ ░███ ░░███ ███ ░███░░█ ░███░░░░░███ ░░░░░░░░███ ░███░░█
░███ ░███ ░███ ░░█████ ███░░███ ░███ ░░░█████░ ░███ ░ █ ░███ ░███ ███ ░███ ░███ ░ █ ░░████████ █████ ░░█████░░████████ █████ ░░███ ██████████ █████ █████░░█████████ ██████████ ░░░░░░░░ ░░░░░ ░░░░░ ░░░░░░░░ ░░░░░ ░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░░░░░ ░░░░░░░░░░ A Collectionless AI Project (https://collectionless.ai) Registration/Login: https://unaiverse.io Code Repositories: https://github.com/collectionlessai/ Main Developers: Stefano Melacci (Project Leader), Christian Di Maio, Tommaso Guidi

RNNTokenLM

RNNTokenLM(num_emb: int, emb_dim: int, y_dim: int, h_dim: int, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

Token-level language model backed by a single-layer Elman RNN.

At each time step the network embeds the previously predicted token, applies the Elman recurrence h = tanh(A h + B u), and projects the hidden state to logit space via C. The embedding, recurrence, and projection matrices (A, B, C) are plain torch.nn.Linear layers with no bias. The initial hidden state h_init is drawn from a standard Normal distribution at construction time and stored as a plain tensor attribute (not a registered buffer), while u_init is initialised to zeros.

On the first step (first=True) the network uses h_init and u_init; on subsequent steps it detaches both the previous hidden state and the argmax of the previous output to avoid backpropagating through time across calls.

This class wraps the inner Net as a ModuleWrapper, so all ModuleWrapper machinery (device handling, stream-based I/O descriptors, optional learning support) is available. The processor input is a single scalar torch.long token index; the processor output is a y_dim-dimensional torch.float32 logit vector.

Examples:

>>> lm = RNNTokenLM(num_emb=256, emb_dim=32, y_dim=256, h_dim=128)
>>> # Single autoregressive step (first call):
>>> import torch
>>> logits = lm.module(first=True)   # returns tensor of shape (1, 256)
>>> # Continue from the previous state:
>>> logits = lm.module(first=False)

Initialize an RNNTokenLM with the given vocabulary and architecture sizes.

Builds the inner Net (embedding layer, three weight matrices, and initial state tensors), then calls ModuleWrapper.__init__ with stream descriptors derived from the architecture: one scalar torch.long input stream and one y_dim-dimensional torch.float32 output stream.

Parameters:

Name Type Description Default
num_emb int

Vocabulary size; the number of rows in the embedding table.

required
emb_dim int

Dimensionality of each token embedding vector.

required
y_dim int

Dimensionality of the output logit vector (equals num_emb for a closed-vocabulary LM).

required
h_dim int

Dimensionality of the hidden state vector.

required
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, num_emb: int, emb_dim: int, y_dim: int, h_dim: int, batch_size: int = 1, *args, **kwargs):
    """Initialize an ``RNNTokenLM`` with the given vocabulary and architecture sizes.

    Builds the inner ``Net`` (embedding layer, three weight matrices, and initial
    state tensors), then calls ``ModuleWrapper.__init__`` with stream descriptors
    derived from the architecture: one scalar ``torch.long`` input stream and one
    ``y_dim``-dimensional ``torch.float32`` output stream.

    Args:
        num_emb: Vocabulary size; the number of rows in the embedding table.
        emb_dim: Dimensionality of each token embedding vector.
        y_dim: Dimensionality of the output logit vector (equals ``num_emb`` for a
            closed-vocabulary LM).
        h_dim: Dimensionality of the hidden state vector.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.embeddings = torch.nn.Embedding(num_emb, emb_dim)
            self.A = torch.nn.Linear(h_dim, h_dim, bias=False)
            self.B = torch.nn.Linear(emb_dim, h_dim, bias=False)
            self.C = torch.nn.Linear(h_dim, y_dim, bias=False)
            self.h_init = torch.randn((batch_size, h_dim))
            self.u_init = torch.zeros((batch_size, emb_dim))
            self.h = None
            self.y = None

        def forward(self, u: torch.Tensor | None = None, first: bool = True):
            if first:
                h = self.h_init
                u = self.u_init if (u is None or not isinstance(u, torch.Tensor) or u.shape != self.u_init.shape) \
                    else u
            else:
                h = self.h.detach()
                y_pred = torch.argmax(self.y.detach(), dim=-1).view(-1) if self.y.shape[-1] > 1 else self.y.detach()
                u = self.embeddings(y_pred)

            self.h = torch.tanh(self.A(h) + self.B(u))
            self.y = self.C(self.h)
            return self.y

    super(RNNTokenLM, self).__init__(module=Net(),
                                     proc_inputs=[StreamType(data_type="tensor", tensor_shape=(1,),
                                                             tensor_dtype=torch.long,
                                                             pubsub=False, private_only=False)],
                                     proc_outputs=[StreamType(data_type="tensor", tensor_shape=(y_dim,),
                                                              tensor_dtype=torch.float32,
                                                              pubsub=False, private_only=False)],
                                     *args, **kwargs)

RNN

RNN(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

Single-layer Elman RNN with a combined main-input and descriptor-input port.

Implements the recurrence h = tanh(A h + B [du; u]) where [du; u] is the concatenation of a descriptor (delta-u) vector and the flattened main input, and y = C h. All three weight matrices are bias-free torch.nn.Linear layers. The initial hidden state h_init is registered as a buffer (so it moves to the correct device automatically) and is used only when first=True; subsequent steps detach the previous hidden state and reuse it.

The processor input signature follows the UNaIVERSE RNN convention produced by get_proc_inputs_and_proc_outputs_for_rnn: two input streams (main tensor of shape u_shape and descriptor tensor of size d_dim) and one output stream of size y_dim.

Examples:

>>> rnn = RNN(u_shape=(16,), d_dim=8, y_dim=4, h_dim=64)
>>> import torch
>>> u = torch.randn(1, 16)
>>> du = torch.randn(1, 8)
>>> y = rnn.module(u, du, first=True)   # returns tensor of shape (1, 4)
>>> y = rnn.module(u, du, first=False)  # continues from detached hidden state

Initialize an RNN module with the given architecture sizes.

Computes the flat input dimensionality from u_shape, builds the inner Net (matrices A, B, C and registered buffer h_init), then delegates to ModuleWrapper.__init__ using stream descriptors generated by get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension (e.g. (16,) for a 16-dimensional vector input).

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input stream.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state.

required
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, batch_size: int = 1, *args, **kwargs):
    """Initialize an ``RNN`` module with the given architecture sizes.

    Computes the flat input dimensionality from ``u_shape``, builds the inner ``Net``
    (matrices ``A``, ``B``, ``C`` and registered buffer ``h_init``), then delegates to
    ``ModuleWrapper.__init__`` using stream descriptors generated by
    ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension (e.g.
            ``(16,)`` for a 16-dimensional vector input).
        d_dim: Dimensionality of the secondary descriptor (delta-u) input stream.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.A = torch.nn.Linear(h_dim, h_dim, bias=False)
            self.B = torch.nn.Linear(u_dim + du_dim, h_dim, bias=False)
            self.C = torch.nn.Linear(h_dim, y_dim, bias=False)
            self.register_buffer('h_init', torch.randn((batch_size, h_dim)))
            self.h = None
            self.u_dim = u_dim
            self.du_dim = du_dim

        def forward(self, u: torch.Tensor, du: torch.Tensor, first: bool = True):
            if first:
                h = self.h_init.data
            else:
                h = self.h.detach()
            if u is None:
                u = torch.zeros((h.shape[0], self.u_dim), dtype=torch.float32, device=self.device)
            else:
                u = u.to(self.device)
            if du is None:
                du = torch.zeros((h.shape[0], self.du_dim), dtype=torch.float32, device=self.device)
            else:
                du = du.to(self.device)

            self.h = torch.tanh(self.A(h) + self.B(torch.cat([du, u], dim=1)))
            y = self.C(self.h)
            return y

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)
    super(RNN, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CSSM

CSSM(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, sigma: Callable = tanh, project_every: int = 0, local: bool = False, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

Continuous-State Space Model: a linear recurrence with a configurable activation.

Implements the update h_new = A h + B [du; u] followed by the output projection y = C sigma(h), where A and B are dense torch.nn.Linear layers. The hidden state is stored across steps in the registered buffer h_next; on the first step h_init (also a registered buffer) is used instead.

Two operating modes are supported via the local flag:

  • local=False (default): the hidden state exposed externally is h = h_new (post-update). The gradient dh is the discrete difference (h - h_prev) / delta.
  • local=True: the hidden state exposed externally is h = h_prev (pre-update). The gradient dh is the discrete difference (h_new - h) / delta.

When project_every > 0 the adjust_eigs hook is called every project_every forward steps. In this base class adjust_eigs is a no-op; subclasses override it to constrain the spectrum of A.

The processor I/O streams follow the RNN convention: two inputs (main tensor and descriptor) and one output tensor.

Examples:

>>> import torch
>>> cssm = CSSM(u_shape=(8,), d_dim=4, y_dim=3, h_dim=32, batch_size=2)
>>> u = torch.randn(2, 8)
>>> du = torch.randn(2, 4)
>>> y = cssm.module(u, du, first=True)   # tensor of shape (2, 3)
>>> y = cssm.module(u, du, first=False)  # continues from stored h_next

Initialize a CSSM module with the given architecture and dynamics options.

Builds the inner Net (matrices A, B, C; registered buffers h_init and h_next; control attributes), then delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state.

required
sigma Callable

Element-wise activation applied to the hidden state before the output projection. Defaults to torch.nn.functional.tanh.

tanh
project_every int

If positive, call adjust_eigs every this many forward steps to constrain the eigenvalues of the recurrence matrix. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, use the pre-update hidden state for the output and gradient computation (local mode). Defaults to False.

False
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, sigma: Callable = F.tanh,
             project_every: int = 0, local: bool = False, batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CSSM`` module with the given architecture and dynamics options.

    Builds the inner ``Net`` (matrices ``A``, ``B``, ``C``; registered buffers
    ``h_init`` and ``h_next``; control attributes), then delegates to
    ``ModuleWrapper.__init__`` using stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state.
        sigma: Element-wise activation applied to the hidden state before the output
            projection. Defaults to ``torch.nn.functional.tanh``.
        project_every: If positive, call ``adjust_eigs`` every this many forward
            steps to constrain the eigenvalues of the recurrence matrix. A value of
            ``0`` disables projection. Defaults to 0.
        local: If ``True``, use the pre-update hidden state for the output and
            gradient computation (local mode). Defaults to False.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.A = torch.nn.Linear(h_dim, h_dim, bias=False)
            self.B = torch.nn.Linear(u_dim + du_dim, h_dim, bias=False)
            self.C = torch.nn.Linear(h_dim, y_dim, bias=False)
            self.register_buffer('h_init', torch.randn((batch_size, h_dim)))
            self.register_buffer('h_next', torch.randn((batch_size, h_dim)))
            self.h = None
            self.dh = None
            self.sigma = sigma
            self.u_dim = u_dim
            self.du_dim = du_dim
            self.batch_size = batch_size
            self.delta = 1.
            self.local = local
            self.forward_count = 0
            self.project_every = project_every

        @torch.no_grad()
        def adjust_eigs(self):
            pass

        # noinspection PyUnusedLocal
        def init_h(self, udu: torch.Tensor) -> torch.Tensor:
            return self.h_init.data

        @staticmethod
        def handle_inputs(du, u):
            return du, u

        def forward(self, u: torch.Tensor, du: torch.Tensor, first: bool = True):
            device = self.h_init.device
            u = u.flatten(1).to(device) if u is not None else torch.zeros((self.batch_size, self.u_dim),
                                                                          device=device)
            du = du.to(device) if du is not None else torch.zeros((self.batch_size, self.du_dim), device=device)

            if first:
                h = self.init_h(torch.cat([du, u], dim=1))
                self.forward_count = 0
            else:
                h = self.h_next.data
            h.requires_grad_()

            if self.project_every:
                if self.forward_count % self.project_every == 0:
                    self.adjust_eigs()

            du, u = self.handle_inputs(du, u)
            h_new = self.A(h) + self.B(torch.cat([du, u], dim=1))

            if self.local:
                self.h = h
                self.dh = (h_new - self.h) / self.delta
            else:
                self.h = h_new
                self.dh = (self.h - h) / self.delta

            y = self.C(self.sigma(self.h))
            self.h_next.data = h_new.detach()
            self.forward_count += 1
            return y

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)
    super(CSSM, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs,
                               *args, **kwargs)

CDiagR

CDiagR(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

State-space model with a real-valued diagonal recurrence matrix.

Replaces the dense square matrix A used in CSSM with a diagonal one parameterized as a single torch.nn.Linear layer (diag) mapping a constant scalar 1 to h_dim values. The recurrence is therefore

``h_new = diag_weights * h + B [du; u]``

where the element-wise multiplication uses the learned diagonal coefficients stored in diag.weight. The output projection and activation follow the same pattern as CSSM: y = C sigma(h).

When project_every > 0, adjust_eigs projects each diagonal entry to its sign (i.e. clips weights to {-1, +1}), enforcing unit-modulus eigenvalues on the diagonal recurrence.

All weight matrices (diag, B, C) use torch.float32. The hidden-state buffers h_init and h_next are registered buffers and move with the module to the correct device.

Examples:

>>> import torch
>>> cdr = CDiagR(u_shape=(10,), d_dim=5, y_dim=4, h_dim=64)
>>> u = torch.randn(1, 10)
>>> du = torch.randn(1, 5)
>>> y = cdr.module(u, du, first=True)   # tensor of shape (1, 4)

Initialize a CDiagR module with a real diagonal recurrence.

Builds the inner Net (diagonal linear layer diag, input matrix B, output matrix C, and registered buffers h_init / h_next), then delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state (and the diagonal recurrence).

required
sigma Callable

Element-wise activation applied to the hidden state before the output projection. Defaults to the identity function.

lambda x: x
project_every int

If positive, snap the diagonal weights to their sign every this many forward steps to enforce unit-modulus eigenvalues. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, expose the pre-update hidden state for output and gradient computation (local mode). Defaults to False.

False
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, sigma: Callable = lambda x: x,
             project_every: int = 0, local: bool = False, batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CDiagR`` module with a real diagonal recurrence.

    Builds the inner ``Net`` (diagonal linear layer ``diag``, input matrix ``B``,
    output matrix ``C``, and registered buffers ``h_init`` / ``h_next``), then
    delegates to ``ModuleWrapper.__init__`` using stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state (and the diagonal recurrence).
        sigma: Element-wise activation applied to the hidden state before the output
            projection. Defaults to the identity function.
        project_every: If positive, snap the diagonal weights to their sign every
            this many forward steps to enforce unit-modulus eigenvalues. A value of
            ``0`` disables projection. Defaults to 0.
        local: If ``True``, expose the pre-update hidden state for output and gradient
            computation (local mode). Defaults to False.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.diag = torch.nn.Linear(in_features=1, out_features=h_dim, bias=False, dtype=torch.float32)
            self.B = torch.nn.Linear(u_dim + du_dim, h_dim, bias=False)
            self.C = torch.nn.Linear(h_dim, y_dim, bias=False)
            self.register_buffer('h_init', torch.randn((batch_size, h_dim)))
            self.register_buffer('h_next', torch.randn((batch_size, h_dim)))
            self.h = None
            self.dh = None
            self.sigma = sigma
            self.u_dim = u_dim
            self.du_dim = du_dim
            self.batch_size = batch_size
            self.delta = 1.
            self.local = local
            self.forward_count = 0
            self.project_every = project_every

        @torch.no_grad()
        def adjust_eigs(self):
            self.diag.weight.copy_(torch.sign(self.diag.weight))

        # noinspection PyUnusedLocal
        def init_h(self, udu: torch.Tensor) -> torch.Tensor:
            return self.h_init.data

        @staticmethod
        def handle_inputs(du, u):
            return du, u

        def forward(self, u: torch.Tensor, du: torch.Tensor, first: bool = True):
            device = self.h_init.device
            u = u.flatten(1).to(device) if u is not None else torch.zeros((self.batch_size, self.u_dim),
                                                                          device=device)
            du = du.to(device) if du is not None else torch.zeros((self.batch_size, self.du_dim), device=device)

            if first:
                h = self.init_h(torch.cat([du, u], dim=1))
                self.forward_count = 0
            else:
                h = self.h_next.data
            h.requires_grad_()

            if self.project_every:
                if self.forward_count % self.project_every == 0:
                    self.adjust_eigs()

            du, u = self.handle_inputs(du, u)
            h_new = self.diag.weight.view(self.diag.out_features) * h + self.B(torch.cat([du, u], dim=1))

            if self.local:
                self.h = h
                self.dh = (h_new - self.h) / self.delta
            else:
                self.h = h_new
                self.dh = (self.h - h) / self.delta

            y = self.C(self.sigma(self.h))
            self.h_next.data = h_new.detach()
            self.forward_count += 1
            return y

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)
    super(CDiagR, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs,
                                 *args, **kwargs)

CDiagC

CDiagC(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

State-space model with a complex-valued diagonal recurrence matrix.

Identical in structure to CDiagR but promotes all weight matrices (diag, B, C) to torch.cfloat (complex float). The recurrence is

``h_new = diag_weights * h + B [du; u]``

with complex arithmetic throughout. The output y is the real part of C sigma(h), ensuring the output stream remains real-valued.

When project_every > 0, adjust_eigs normalizes each complex diagonal entry to unit modulus (diag.weight /= |diag.weight|), keeping all eigenvalues on the unit circle in the complex plane.

The hidden-state buffers h_init and h_next are real-valued registered buffers (they are cast to complex inside the forward pass as needed).

Examples:

>>> import torch
>>> cdc = CDiagC(u_shape=(10,), d_dim=5, y_dim=4, h_dim=64)
>>> u = torch.randn(1, 10)
>>> du = torch.randn(1, 5)
>>> y = cdc.module(u, du, first=True)   # real tensor of shape (1, 4)

Initialize a CDiagC module with a complex diagonal recurrence.

Builds the inner Net (complex-typed diag, B, C layers, and real-typed registered buffers h_init / h_next), then delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input.

required
y_dim int

Dimensionality of the output tensor (real-valued).

required
h_dim int

Dimensionality of the hidden state (complex-valued).

required
sigma Callable

Element-wise activation applied to the complex hidden state before the output projection. Defaults to the identity function.

lambda x: x
project_every int

If positive, normalize each complex diagonal weight to unit modulus every this many forward steps. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, expose the pre-update hidden state for output and gradient computation (local mode). Defaults to False.

False
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, sigma: Callable = lambda x: x,
             project_every: int = 0, local: bool = False, batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CDiagC`` module with a complex diagonal recurrence.

    Builds the inner ``Net`` (complex-typed ``diag``, ``B``, ``C`` layers, and
    real-typed registered buffers ``h_init`` / ``h_next``), then delegates to
    ``ModuleWrapper.__init__`` using stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input.
        y_dim: Dimensionality of the output tensor (real-valued).
        h_dim: Dimensionality of the hidden state (complex-valued).
        sigma: Element-wise activation applied to the complex hidden state before the
            output projection. Defaults to the identity function.
        project_every: If positive, normalize each complex diagonal weight to unit
            modulus every this many forward steps. A value of ``0`` disables
            projection. Defaults to 0.
        local: If ``True``, expose the pre-update hidden state for output and gradient
            computation (local mode). Defaults to False.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.diag = torch.nn.Linear(in_features=1, out_features=h_dim, bias=False, dtype=torch.cfloat)
            self.B = torch.nn.Linear(u_dim + du_dim, h_dim, bias=False, dtype=torch.cfloat)
            self.C = torch.nn.Linear(h_dim, y_dim, bias=False, dtype=torch.cfloat)
            self.register_buffer('h_init', torch.randn((batch_size, h_dim)))
            self.register_buffer('h_next', torch.randn((batch_size, h_dim)))
            self.h = None
            self.dh = None
            self.sigma = sigma
            self.u_dim = u_dim
            self.du_dim = du_dim
            self.batch_size = batch_size
            self.delta = 1.
            self.local = local
            self.forward_count = 0
            self.project_every = project_every

        @torch.no_grad()
        def adjust_eigs(self):
            self.diag.weight.div_(self.diag.weight.abs())

        # noinspection PyUnusedLocal
        def init_h(self, udu: torch.Tensor) -> torch.Tensor:
            return self.h_init.data

        @staticmethod
        def handle_inputs(du, u):
            return du, u

        def forward(self, u: torch.Tensor, du: torch.Tensor, first: bool = True):
            device = self.h_init.device
            u = u.flatten(1).to(device) if u is not None else torch.zeros((self.batch_size, self.u_dim),
                                                                          device=device, dtype=torch.cfloat)
            du = du.to(device) if du is not None else torch.zeros((self.batch_size, self.du_dim),
                                                                  device=device, dtype=torch.cfloat)

            if first:
                h = self.init_h(torch.cat([du, u], dim=1))
                self.forward_count = 0
            else:
                h = self.h_next.data
            h.requires_grad_()

            if self.project_every:
                if self.forward_count % self.project_every == 0:
                    self.adjust_eigs()

            du, u = self.handle_inputs(du, u)
            h_new = self.diag.weight.view(self.diag.out_features) * h + self.B(torch.cat([du, u], dim=1))

            if self.local:
                self.h = h
                self.dh = (h_new - self.h) / self.delta
            else:
                self.h = h_new
                self.dh = (self.h - h) / self.delta

            y = self.C(self.sigma(self.h))
            self.h_next.data = h_new.detach()
            self.forward_count += 1
            return y.real

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)
    super(CDiagC, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs,
                                 *args, **kwargs)

CTE

CTE(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float, sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False, cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

Antisymmetric matrix-exponential state-space model (CTE).

Implements the continuous-time exact (CTE) discretization of a linear state-space model whose recurrence matrix is constrained to be antisymmetric. The skew-symmetric matrix A = 0.5 * (W - W^T) is exponentiated with the matrix exponential, and the input is integrated via the zero-order-hold formula

``h_new = exp(A * delta) * h + A^{-1} * (exp(A * delta) - I) * B [du; u]``

This ensures that all eigenvalues of the recurrence lie on the unit circle in the complex plane, providing inherently stable hidden-state dynamics.

The output is computed as y = C sigma(h) where C is either a standard torch.nn.Linear layer (when cnu_memories <= 0) or a LinearCNU layer (when cnu_memories > 0) for memory-augmented readout. The inner module is _CTENet; see its docstring for the full forward-pass specification.

The local and project_every flags share the same semantics as in CSSM.

Examples:

>>> import torch
>>> cte = CTE(u_shape=(8,), d_dim=4, y_dim=3, h_dim=32, delta=0.1)
>>> u = torch.randn(1, 8)
>>> du = torch.randn(1, 4)
>>> y = cte.module(u, du, first=True)   # tensor of shape (1, 3)
>>> y = cte.module(u, du, first=False)  # continues from stored h_next

Initialize a CTE module with antisymmetric matrix-exponential dynamics.

Computes the flat input dimension from u_shape, builds a _CTENet inner module, then delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state.

required
delta float

Discrete time step used in the matrix-exponential update. Larger values correspond to coarser time discretization.

required
sigma Callable

Element-wise activation applied to the hidden state before the output projection. Defaults to the identity function.

lambda x: x
project_every int

If positive, call adjust_eigs every this many forward steps. In the base _CTENet this is a no-op; subclasses may override it. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, expose the pre-update hidden state for output and gradient computation. Defaults to False.

False
cnu_memories int

If positive, replace the linear output projection with a LinearCNU layer with this many memory units. Defaults to 0.

0
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float,
             sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False,
             cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CTE`` module with antisymmetric matrix-exponential dynamics.

    Computes the flat input dimension from ``u_shape``, builds a ``_CTENet`` inner
    module, then delegates to ``ModuleWrapper.__init__`` using stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state.
        delta: Discrete time step used in the matrix-exponential update. Larger values
            correspond to coarser time discretization.
        sigma: Element-wise activation applied to the hidden state before the output
            projection. Defaults to the identity function.
        project_every: If positive, call ``adjust_eigs`` every this many forward
            steps. In the base ``_CTENet`` this is a no-op; subclasses may override
            it. A value of ``0`` disables projection. Defaults to 0.
        local: If ``True``, expose the pre-update hidden state for output and gradient
            computation. Defaults to False.
        cnu_memories: If positive, replace the linear output projection with a
            ``LinearCNU`` layer with this many memory units. Defaults to 0.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)
    super(CTE, self).__init__(
        module=_CTENet(u_dim, du_dim, y_dim, h_dim, delta, sigma, project_every, local,
                       cnu_memories, batch_size),
        proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CTEInitStateBZeroInput

CTEInitStateBZeroInput(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float, sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False, cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

CTE variant that initializes the hidden state from the input and zeroes inputs after the first step.

Specializes _CTENet in two ways:

  1. init_h: the initial hidden state is set to B(udu) / sum(udu) rather than the random h_init buffer, so the first hidden state is derived directly from the concatenated input [du; u].
  2. handle_inputs: on every forward step (including the first) both du and u are replaced with zero tensors of matching shape, so the recurrence after initialization is input-free (driven purely by the antisymmetric dynamics).

All other aspects (matrix-exponential update, output projection, local mode, project_every projection, cnu_memories readout) are identical to CTE.

Initialize a CTEInitStateBZeroInput module.

Builds a specialized _CTENet subclass (Net) that overrides init_h and handle_inputs, then delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state.

required
delta float

Discrete time step used in the matrix-exponential update.

required
sigma Callable

Element-wise activation applied to the hidden state before the output projection. Defaults to the identity function.

lambda x: x
project_every int

If positive, call adjust_eigs every this many steps. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, expose the pre-update hidden state for output and gradient computation. Defaults to False.

False
cnu_memories int

If positive, use a LinearCNU output layer with this many memory units. Defaults to 0.

0
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float,
             sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False,
             cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CTEInitStateBZeroInput`` module.

    Builds a specialized ``_CTENet`` subclass (``Net``) that overrides ``init_h`` and
    ``handle_inputs``, then delegates to ``ModuleWrapper.__init__`` using stream
    descriptors from ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state.
        delta: Discrete time step used in the matrix-exponential update.
        sigma: Element-wise activation applied to the hidden state before the output
            projection. Defaults to the identity function.
        project_every: If positive, call ``adjust_eigs`` every this many steps. A
            value of ``0`` disables projection. Defaults to 0.
        local: If ``True``, expose the pre-update hidden state for output and gradient
            computation. Defaults to False.
        cnu_memories: If positive, use a ``LinearCNU`` output layer with this many
            memory units. Defaults to 0.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)

    class Net(_CTENet):
        @torch.no_grad()
        def init_h(self, udu: torch.Tensor) -> torch.Tensor:
            return self.B(udu).detach() / torch.sum(udu, dim=1)

        @staticmethod
        def handle_inputs(du, u):
            return torch.zeros_like(du), torch.zeros_like(u)

    super(CTEInitStateBZeroInput, self).__init__(
        module=Net(u_dim, du_dim, y_dim, h_dim, delta, sigma, project_every, local,
                   cnu_memories, batch_size),
        proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CTEToken

CTEToken(num_emb: int, emb_dim: int, d_dim: int, y_dim: int, h_dim: int, *args, **kwargs)

Bases: ModuleWrapper

Token-level variant of CTE with a learned embedding lookup before the recurrence.

Specializes _CTENet by prepending a torch.nn.Embedding layer: when the main input u is not None, it is first passed through self.embeddings before entering the standard _CTENet.forward logic. This makes CTEToken suitable for sequence-to-sequence or language-modelling tasks where inputs are integer token indices rather than continuous vectors.

Architecture parameters are fixed at construction: delta=1.0, sigma=identity, project_every=0, local=False, cnu_memories=0, batch_size=1. These cannot be overridden via constructor arguments; use CTE directly for full control.

The processor input signature matches the standard RNN convention for embedding size emb_dim (two input streams - a emb_dim-dimensional float tensor and a d_dim-dimensional descriptor - and one output stream of size y_dim).

Examples:

>>> import torch
>>> cte_tok = CTEToken(num_emb=128, emb_dim=16, d_dim=8, y_dim=128, h_dim=64)
>>> token_ids = torch.tensor([[42]])   # shape (1, 1)
>>> du = torch.randn(1, 8)
>>> logits = cte_tok.module(token_ids, du, first=True)  # shape (1, 128)

Initialize a CTEToken module with an embedding table and CTE dynamics.

Builds a _CTENet subclass (Net) that adds a torch.nn.Embedding layer and overrides forward to embed integer token inputs before the recurrence. The inner _CTENet is constructed with fixed hyperparameters (delta=1.0, identity activation, no projection, global mode, no CNU memories, batch size 1). Stream descriptors are generated by get_proc_inputs_and_proc_outputs_for_rnn for an input shape of (emb_dim,).

Parameters:

Name Type Description Default
num_emb int

Vocabulary size; the number of rows in the embedding table.

required
emb_dim int

Dimensionality of each token embedding vector. This determines the effective u_dim for the inner recurrence.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input stream.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state.

required
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, num_emb: int, emb_dim: int, d_dim: int, y_dim: int, h_dim: int, *args, **kwargs):
    """Initialize a ``CTEToken`` module with an embedding table and CTE dynamics.

    Builds a ``_CTENet`` subclass (``Net``) that adds a ``torch.nn.Embedding`` layer
    and overrides ``forward`` to embed integer token inputs before the recurrence.
    The inner ``_CTENet`` is constructed with fixed hyperparameters (``delta=1.0``,
    identity activation, no projection, global mode, no CNU memories, batch size 1).
    Stream descriptors are generated by ``get_proc_inputs_and_proc_outputs_for_rnn``
    for an input shape of ``(emb_dim,)``.

    Args:
        num_emb: Vocabulary size; the number of rows in the embedding table.
        emb_dim: Dimensionality of each token embedding vector. This determines the
            effective ``u_dim`` for the inner recurrence.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input stream.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    u_shape = torch.Size((emb_dim,))
    u_dim = u_shape.numel()
    du_dim = d_dim
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)

    class Net(_CTENet):
        def __init__(self):
            super().__init__(u_dim, du_dim, y_dim, h_dim, delta=1.0, sigma=lambda x: x,
                             project_every=0, local=False, cnu_memories=0, batch_size=1)
            self.embeddings = torch.nn.Embedding(num_emb, emb_dim)

        def forward(self, u: torch.Tensor | None, du: torch.Tensor | None,
                    first: bool = True, last: bool = False) -> torch.Tensor:
            if u is not None:
                u = self.embeddings(u.to(self.embeddings.weight.device))
            return super().forward(u, du, first=first, last=last)

    super(CTEToken, self).__init__(
        module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CTB

CTB(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float = 0.1, alpha: float = 0.0, sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

Block-structured state-space model with 2x2 antisymmetric rotation blocks.

Implements a structured linear recurrence whose recurrence matrix is block-diagonal with 2x2 blocks of the form

``[[1 - delta*alpha,  delta*omega], [-delta*omega,  1 - delta*alpha]]``

where omega is a per-block learnable frequency parameter and alpha is a dissipation coefficient. This parameterization is a first-order Euler approximation to exact block-rotation dynamics and is significantly cheaper to compute than the full matrix exponential used by CTBE.

Three eigenvalue projection modes are selected via the sign of alpha at construction time:

  • alpha > 0: constant dissipation mode (project_method = 'const'). The alpha buffer is fixed to the given value.
  • alpha == 0: unit-modulus mode (project_method = 'modulus'). When project_every > 0, adjust_eigs normalizes each block's [ones, omega] pair to unit modulus.
  • alpha == -1: adaptive alpha mode (project_method = 'alpha'). When project_every > 0, adjust_eigs computes alpha from the current omega and delta to keep eigenvalues on the unit circle.

The hidden dimension h_dim must be even because the state is partitioned into h_dim // 2 2x2 blocks. Raises AssertionError at construction if h_dim is odd.

The processor I/O convention follows the standard RNN pattern: two input streams (main tensor of shape u_shape and descriptor of size d_dim) and one output stream of size y_dim.

Examples:

>>> import torch
>>> ctb = CTB(u_shape=(8,), d_dim=4, y_dim=3, h_dim=32, delta=0.1)
>>> u = torch.randn(1, 8)
>>> du = torch.randn(1, 4)
>>> y = ctb.module(u, du, first=True)   # tensor of shape (1, 3)
>>> y = ctb.module(u, du, first=False)  # continues from stored h_next

Initialize a CTB module with block-rotation dynamics.

Validates that h_dim is even, then builds the inner Net (learnable omega frequency vector, buffers alpha and ones, input matrix B, output matrix C, and registered hidden-state buffers h_init / h_next). Delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input stream.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state. Must be even (2x2 block structure).

required
delta float

Discrete time step for the first-order rotation update. Defaults to 0.1.

0.1
alpha float

Dissipation coefficient and projection mode selector. Positive values set constant dissipation; zero selects unit-modulus projection; -1 selects adaptive alpha projection. Defaults to 0.

0.0
sigma Callable

Element-wise activation applied to the hidden state before the output projection. Defaults to the identity function.

lambda x: x
project_every int

If positive, call adjust_eigs every this many forward steps to enforce the selected eigenvalue projection. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, expose the pre-update hidden state for output and gradient computation (local mode). Defaults to False.

False
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}

Raises:

Type Description
AssertionError

If h_dim is not divisible by 2.

Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float = 0.1,
             alpha: float = 0., sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False,
             batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CTB`` module with block-rotation dynamics.

    Validates that ``h_dim`` is even, then builds the inner ``Net`` (learnable
    ``omega`` frequency vector, buffers ``alpha`` and ``ones``, input matrix ``B``,
    output matrix ``C``, and registered hidden-state buffers ``h_init`` / ``h_next``).
    Delegates to ``ModuleWrapper.__init__`` using stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input stream.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state. Must be even (2x2 block structure).
        delta: Discrete time step for the first-order rotation update. Defaults to 0.1.
        alpha: Dissipation coefficient and projection mode selector. Positive values
            set constant dissipation; zero selects unit-modulus projection; ``-1``
            selects adaptive alpha projection. Defaults to 0.
        sigma: Element-wise activation applied to the hidden state before the output
            projection. Defaults to the identity function.
        project_every: If positive, call ``adjust_eigs`` every this many forward
            steps to enforce the selected eigenvalue projection. A value of ``0``
            disables projection. Defaults to 0.
        local: If ``True``, expose the pre-update hidden state for output and gradient
            computation (local mode). Defaults to False.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.

    Raises:
        AssertionError: If ``h_dim`` is not divisible by 2.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim
    assert h_dim % 2 == 0, "Hidden dimension must be even for 2x2 blocks"

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.order = h_dim // 2
            self.omega = torch.nn.Parameter(torch.empty(self.order))
            self.register_buffer('ones', torch.ones(self.order, requires_grad=False))
            self.B = torch.nn.Linear(u_dim + du_dim, h_dim, bias=False)
            self.C = torch.nn.Linear(h_dim, y_dim, bias=False)

            if alpha > 0.:
                self.project_method = 'const'
                self.register_buffer('alpha', torch.full_like(self.omega.data, alpha))
            elif alpha == 0.:
                self.project_method = 'modulus'
                self.register_buffer('alpha', torch.zeros_like(self.omega.data))
            elif alpha == -1.:
                self.project_method = 'alpha'
                self.register_buffer('alpha', torch.zeros_like(self.omega.data))

            self.register_buffer('h_init', torch.randn((batch_size, h_dim)))
            self.register_buffer('h_next', torch.randn((batch_size, h_dim)))
            self.h = None
            self.dh = None
            self.sigma = sigma
            self.u_dim = u_dim
            self.du_dim = du_dim
            self.batch_size = batch_size
            self.delta = delta
            self.local = local
            self.forward_count = 0
            self.project_every = project_every
            self.reset_parameters()

        def reset_parameters(self) -> None:
            torch.nn.init.uniform_(self.omega)

        @torch.no_grad()
        def adjust_eigs(self):
            if self.project_method == 'alpha':
                self.alpha.copy_((1. - torch.sqrt(1. - (self.delta * self.omega) ** 2) / self.delta))
            elif self.project_method == 'modulus':
                module = torch.sqrt(self.ones ** 2 + (self.delta * self.omega) ** 2)
                self.omega.div_(module)
                self.ones.div_(module)

        # noinspection PyUnusedLocal
        def init_h(self, udu: torch.Tensor) -> torch.Tensor:
            return self.h_init.data

        @staticmethod
        def handle_inputs(du, u):
            return du, u

        def forward(self, u: torch.Tensor, du: torch.Tensor, first: bool = True):
            device = self.h_init.device
            u = u.flatten(1).to(device) if u is not None else torch.zeros((self.batch_size, self.u_dim),
                                                                          device=device)
            du = du.to(device) if du is not None else torch.zeros((self.batch_size, self.du_dim), device=device)

            if first:
                h = self.init_h(torch.cat([du, u], dim=1))
                self.forward_count = 0
            else:
                h = self.h_next.data
            h.requires_grad_()
            h_pair = h.view(-1, self.order, 2)

            if self.project_every:
                if self.forward_count % self.project_every == 0:
                    self.adjust_eigs()

            du, u = self.handle_inputs(du, u)
            h1 = (self.ones - self.delta * self.alpha) * h_pair[..., 0] + self.delta * self.omega * h_pair[..., 1]
            h2 = -self.delta * self.omega * h_pair[..., 0] + (self.ones - self.delta * self.alpha) * h_pair[..., 1]
            rec = torch.stack([h1, h2], dim=-1).flatten(start_dim=1)
            inp = self.delta * self.B(torch.cat([du, u], dim=1))

            h_new = rec + inp
            if self.local:
                self.h = h
                self.dh = (h_new - self.h) / self.delta
            else:
                self.h = h_new
                self.dh = (self.h - h) / self.delta

            y = self.C(self.sigma(self.h))
            self.h_next.data = h_new.detach()
            self.forward_count += 1
            return y

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)
    super(CTB, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs,
                              *args, **kwargs)

CTBE

CTBE(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float, sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False, cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

Block-structured state-space model with exact trigonometric rotation.

Implements a 2x2-block antisymmetric recurrence using the exact matrix-exponential solution for each block. For each block k, the recurrence is

``h1_new = cos(omega_k * delta) * h1 + sin(omega_k * delta) * h2 + inp1``
``h2_new = -sin(omega_k * delta) * h1 + cos(omega_k * delta) * h2 + inp2``

where the input terms inp1 and inp2 are derived from the zero-order-hold integral of the input matrix B, ensuring that all eigenvalues of the recurrence lie exactly on the unit circle. This is the exact-discretization counterpart of the first-order approximation used by CTB.

The hidden dimension h_dim must be even. Optional memory-augmented output is supported via a LinearCNU readout layer when cnu_memories > 0. The inner module is _CTBENet; see its docstring for the full forward-pass specification.

The local and project_every flags share the same semantics as in CSSM.

Examples:

>>> import torch
>>> ctbe = CTBE(u_shape=(8,), d_dim=4, y_dim=3, h_dim=32, delta=0.1)
>>> u = torch.randn(1, 8)
>>> du = torch.randn(1, 4)
>>> y = ctbe.module(u, du, first=True)   # tensor of shape (1, 3)
>>> y = ctbe.module(u, du, first=False)  # continues from stored h_next

Initialize a CTBE module with exact trigonometric block-rotation dynamics.

Computes the flat input dimension from u_shape, builds a _CTBENet inner module (which asserts that h_dim is even), then delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input stream.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state. Must be even (2x2 block structure).

required
delta float

Discrete time step passed to the trigonometric rotation formula.

required
sigma Callable

Element-wise activation applied to the hidden state before the output projection. Defaults to the identity function.

lambda x: x
project_every int

If positive, call adjust_eigs every this many forward steps. In the base _CTBENet this is a no-op; subclasses may override it. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, expose the pre-update hidden state for output and gradient computation. Defaults to False.

False
cnu_memories int

If positive, replace the linear output projection with a LinearCNU layer with this many memory units. Defaults to 0.

0
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}

Raises:

Type Description
AssertionError

If h_dim is not divisible by 2.

Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float,
             sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False,
             cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CTBE`` module with exact trigonometric block-rotation dynamics.

    Computes the flat input dimension from ``u_shape``, builds a ``_CTBENet`` inner
    module (which asserts that ``h_dim`` is even), then delegates to
    ``ModuleWrapper.__init__`` using stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input stream.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state. Must be even (2x2 block structure).
        delta: Discrete time step passed to the trigonometric rotation formula.
        sigma: Element-wise activation applied to the hidden state before the output
            projection. Defaults to the identity function.
        project_every: If positive, call ``adjust_eigs`` every this many forward
            steps. In the base ``_CTBENet`` this is a no-op; subclasses may override
            it. A value of ``0`` disables projection. Defaults to 0.
        local: If ``True``, expose the pre-update hidden state for output and gradient
            computation. Defaults to False.
        cnu_memories: If positive, replace the linear output projection with a
            ``LinearCNU`` layer with this many memory units. Defaults to 0.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.

    Raises:
        AssertionError: If ``h_dim`` is not divisible by 2.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)
    super(CTBE, self).__init__(
        module=_CTBENet(u_dim, du_dim, y_dim, h_dim, delta, sigma, project_every, local,
                        cnu_memories, batch_size),
        proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CTBEInitStateBZeroInput

CTBEInitStateBZeroInput(u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float, sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False, cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs)

Bases: ModuleWrapper

CTBE variant that initializes the hidden state from the input and zeroes inputs after the first step.

Specializes _CTBENet in two ways:

  1. init_h: the initial hidden state is set to B(udu) / sum(udu) rather than the random h_init buffer, so the first hidden state is derived directly from the concatenated input [du; u].
  2. handle_inputs: on every forward step (including the first) both du and u are replaced with zero tensors of matching shape, so the recurrence after initialization is input-free (driven purely by the trigonometric rotation).

All other aspects (exact cosine/sine block rotation, output projection, local mode, project_every projection, cnu_memories readout) are identical to CTBE.

Initialize a CTBEInitStateBZeroInput module.

Builds a specialized _CTBENet subclass (Net) that overrides init_h and handle_inputs, then delegates to ModuleWrapper.__init__ using stream descriptors from get_proc_inputs_and_proc_outputs_for_rnn.

Parameters:

Name Type Description Default
u_shape tuple[int]

Shape of the main input tensor, excluding the batch dimension.

required
d_dim int

Dimensionality of the secondary descriptor (delta-u) input stream.

required
y_dim int

Dimensionality of the output tensor.

required
h_dim int

Dimensionality of the hidden state. Must be even (2x2 block structure).

required
delta float

Discrete time step passed to the trigonometric rotation formula.

required
sigma Callable

Element-wise activation applied to the hidden state before the output projection. Defaults to the identity function.

lambda x: x
project_every int

If positive, call adjust_eigs every this many forward steps. A value of 0 disables projection. Defaults to 0.

0
local bool

If True, expose the pre-update hidden state for output and gradient computation. Defaults to False.

False
cnu_memories int

If positive, use a LinearCNU output layer with this many memory units. Defaults to 0.

0
batch_size int

Number of sequences processed in parallel. Defaults to 1.

1
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}

Raises:

Type Description
AssertionError

If h_dim is not divisible by 2.

Source code in unaiverse/modules/networks.py
def __init__(self, u_shape: tuple[int], d_dim: int, y_dim: int, h_dim: int, delta: float,
             sigma: Callable = lambda x: x, project_every: int = 0, local: bool = False,
             cnu_memories: int = 0, batch_size: int = 1, *args, **kwargs):
    """Initialize a ``CTBEInitStateBZeroInput`` module.

    Builds a specialized ``_CTBENet`` subclass (``Net``) that overrides ``init_h``
    and ``handle_inputs``, then delegates to ``ModuleWrapper.__init__`` using stream
    descriptors from ``get_proc_inputs_and_proc_outputs_for_rnn``.

    Args:
        u_shape: Shape of the main input tensor, excluding the batch dimension.
        d_dim: Dimensionality of the secondary descriptor (delta-u) input stream.
        y_dim: Dimensionality of the output tensor.
        h_dim: Dimensionality of the hidden state. Must be even (2x2 block structure).
        delta: Discrete time step passed to the trigonometric rotation formula.
        sigma: Element-wise activation applied to the hidden state before the output
            projection. Defaults to the identity function.
        project_every: If positive, call ``adjust_eigs`` every this many forward
            steps. A value of ``0`` disables projection. Defaults to 0.
        local: If ``True``, expose the pre-update hidden state for output and gradient
            computation. Defaults to False.
        cnu_memories: If positive, use a ``LinearCNU`` output layer with this many
            memory units. Defaults to 0.
        batch_size: Number of sequences processed in parallel. Defaults to 1.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.

    Raises:
        AssertionError: If ``h_dim`` is not divisible by 2.
    """
    u_shape = torch.Size(u_shape)
    u_dim = u_shape.numel()
    du_dim = d_dim
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_rnn(u_shape, du_dim, y_dim)

    class Net(_CTBENet):
        @torch.no_grad()
        def init_h(self, udu: torch.Tensor) -> torch.Tensor:
            return self.B(udu).detach() / torch.sum(udu)

        @staticmethod
        def handle_inputs(du, u):
            return torch.zeros_like(du), torch.zeros_like(u)

    super(CTBEInitStateBZeroInput, self).__init__(
        module=Net(u_dim, du_dim, y_dim, h_dim, delta, sigma, project_every, local,
                   cnu_memories, batch_size),
        proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CNN

CNN(d_dim: int, in_channels: int = 3, in_res: int = 32, *args, **kwargs)

Bases: ModuleWrapper

Convolutional image-feature extractor with a sigmoid-activated output.

Implements a three-block convolutional backbone followed by a fully-connected head. Each convolutional block consists of a Conv2d layer, a ReLU activation, and an AvgPool2d downsampling step. The final feature vector is produced by a lazy linear layer (2048 units, ReLU) followed by a Linear(2048, d_dim) projection and a Sigmoid activation, so every output element lies in (0, 1).

Input transforms (resize, crop, normalization) are selected automatically via transforms_factory based on in_channels and in_res:

  • in_channels == 3: "rgb<in_res>" transform (e.g. "rgb32").
  • otherwise: "gray<in_res>" transform (e.g. "gray32").

The processor I/O streams are configured by get_proc_inputs_and_proc_outputs_for_image_classification: one image input stream and one d_dim-dimensional float output stream.

Examples:

>>> cnn = CNN(d_dim=64, in_channels=3, in_res=32)
>>> import torch
>>> # Process a random RGB image tensor (batch of 1):
>>> img = torch.randn(3, 32, 32)
>>> # (Actual inference is done through the processor pipeline, not directly here.)

Initialize a CNN feature extractor.

Builds the convolutional backbone and fully-connected head, generates input transforms from transforms_factory, and delegates to ModuleWrapper.__init__ with stream descriptors from get_proc_inputs_and_proc_outputs_for_image_classification.

Parameters:

Name Type Description Default
d_dim int

Dimensionality of the output feature vector (number of sigmoid units).

required
in_channels int

Number of input image channels (3 for RGB, 1 for grayscale). Defaults to 3.

3
in_res int

Spatial resolution (height and width) of the input image in pixels. Determines which transform preset is loaded. Defaults to 32.

32
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int, in_channels: int = 3, in_res: int = 32, *args, **kwargs):
    """Initialize a ``CNN`` feature extractor.

    Builds the convolutional backbone and fully-connected head, generates input
    transforms from ``transforms_factory``, and delegates to
    ``ModuleWrapper.__init__`` with stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_image_classification``.

    Args:
        d_dim: Dimensionality of the output feature vector (number of sigmoid units).
        in_channels: Number of input image channels (3 for RGB, 1 for grayscale).
            Defaults to 3.
        in_res: Spatial resolution (height and width) of the input image in pixels.
            Determines which transform preset is loaded. Defaults to 32.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    net = torch.nn.Sequential(
        torch.nn.Conv2d(in_channels, 64, kernel_size=5, padding=2),
        torch.nn.ReLU(inplace=True),
        torch.nn.AvgPool2d(kernel_size=3, stride=2),
        torch.nn.Conv2d(64, 128, kernel_size=5, padding=2),
        torch.nn.ReLU(inplace=True),
        torch.nn.AvgPool2d(kernel_size=3, stride=2),
        torch.nn.Conv2d(128, 256, kernel_size=3, padding=1),
        torch.nn.ReLU(inplace=True),
        torch.nn.AvgPool2d(kernel_size=3, stride=2),
        torch.nn.Flatten(),
        torch.nn.LazyLinear(2048),
        torch.nn.ReLU(inplace=True),
        torch.nn.Linear(2048, d_dim),
        torch.nn.Sigmoid())

    transforms = transforms_factory("rgb" + str(in_res) if in_channels == 3 else "gray" + str(in_res))
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim, transforms)
    super(CNN, self).__init__(net, proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CNNCNU

CNNCNU(d_dim: int, cnu_memories: int, in_channels: int = 3, in_res: int = 32, delta: int = 1, scramble: bool = False, *args, **kwargs)

Bases: ModuleWrapper

Convolutional image-feature extractor with a LinearCNU memory-augmented head.

Shares the same three-block convolutional backbone as CNN (Conv2d -> ReLU -> AvgPool2d, repeated three times, followed by a lazy linear layer with 2048 units and ReLU), but replaces the final torch.nn.Linear projection with a LinearCNU layer. The LinearCNU head uses a content-addressable key-value memory of cnu_memories slots to produce contextually adapted feature vectors, and its output is passed through a Sigmoid activation.

Input transforms are selected by transforms_factory in the same way as CNN. The processor I/O is configured by get_proc_inputs_and_proc_outputs_for_image_classification.

Initialize a CNNCNU feature extractor with a memory-augmented head.

Builds the convolutional backbone with a LinearCNU output layer, generates input transforms from transforms_factory, and delegates to ModuleWrapper.__init__ with stream descriptors from get_proc_inputs_and_proc_outputs_for_image_classification.

Parameters:

Name Type Description Default
d_dim int

Dimensionality of the output feature vector (number of sigmoid units).

required
cnu_memories int

Number of key-value memory slots in the LinearCNU head.

required
in_channels int

Number of input image channels (3 for RGB, 1 for grayscale). Defaults to 3.

3
in_res int

Spatial resolution (height and width) of the input image in pixels. Defaults to 32.

32
delta int

Delta hyperparameter passed to LinearCNU (controls memory retrieval sharpness). Defaults to 1.

1
scramble bool

If True, applies a fixed random scrambling transform to the LinearCNU keys. Defaults to False.

False
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int, cnu_memories: int, in_channels: int = 3, in_res: int = 32,
             delta: int = 1, scramble: bool = False, *args, **kwargs):
    """Initialize a ``CNNCNU`` feature extractor with a memory-augmented head.

    Builds the convolutional backbone with a ``LinearCNU`` output layer, generates
    input transforms from ``transforms_factory``, and delegates to
    ``ModuleWrapper.__init__`` with stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_image_classification``.

    Args:
        d_dim: Dimensionality of the output feature vector (number of sigmoid units).
        cnu_memories: Number of key-value memory slots in the ``LinearCNU`` head.
        in_channels: Number of input image channels (3 for RGB, 1 for grayscale).
            Defaults to 3.
        in_res: Spatial resolution (height and width) of the input image in pixels.
            Defaults to 32.
        delta: Delta hyperparameter passed to ``LinearCNU`` (controls memory
            retrieval sharpness). Defaults to 1.
        scramble: If ``True``, applies a fixed random scrambling transform to the
            ``LinearCNU`` keys. Defaults to False.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    net = torch.nn.Sequential(
        torch.nn.Conv2d(in_channels, 64, kernel_size=5, padding=2),
        torch.nn.ReLU(inplace=True),
        torch.nn.AvgPool2d(kernel_size=3, stride=2),
        torch.nn.Conv2d(64, 128, kernel_size=5, padding=2),
        torch.nn.ReLU(inplace=True),
        torch.nn.AvgPool2d(kernel_size=3, stride=2),
        torch.nn.Conv2d(128, 256, kernel_size=3, padding=1),
        torch.nn.ReLU(inplace=True),
        torch.nn.AvgPool2d(kernel_size=3, stride=2),
        torch.nn.Flatten(),
        torch.nn.LazyLinear(2048),
        torch.nn.ReLU(inplace=True),
        LinearCNU(2048, d_dim, key_mem_units=cnu_memories, delta=delta, scramble=scramble),
        torch.nn.Sigmoid())

    transforms = transforms_factory("rgb" + str(in_res) if in_channels == 3 else "gray" + str(in_res))
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim, transforms)
    super(CNNCNU, self).__init__(net, proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

SingleLayerCNU

SingleLayerCNU(d_dim: int, cnu_memories: int, in_channels: int = 3, in_res: int = 32, delta: int = 1, scramble: bool = False, *args, **kwargs)

Bases: ModuleWrapper

Single-layer image classifier built entirely from a LinearCNU memory layer.

Flattens the input image to a one-dimensional vector and passes it through a single LinearCNU layer followed by a Sigmoid activation. No convolutional backbone is used; the entire spatial structure of the image is handled by the memory-augmented linear layer. This makes the model fast and lightweight at the cost of spatial invariance.

The flat input size is computed as in_res * in_res * in_channels. Input transforms are selected by transforms_factory in the same way as CNN. The processor I/O is configured by get_proc_inputs_and_proc_outputs_for_image_classification.

Initialize a SingleLayerCNU classifier with a flat LinearCNU head.

Builds a sequential model (Flatten -> LinearCNU -> Sigmoid), generates input transforms from transforms_factory, and delegates to ModuleWrapper.__init__ with stream descriptors from get_proc_inputs_and_proc_outputs_for_image_classification.

Parameters:

Name Type Description Default
d_dim int

Dimensionality of the output feature vector (number of sigmoid units).

required
cnu_memories int

Number of key-value memory slots in the LinearCNU layer.

required
in_channels int

Number of input image channels (3 for RGB, 1 for grayscale). Defaults to 3.

3
in_res int

Spatial resolution (height and width) of the input image in pixels. The flat input dimension is in_res * in_res * in_channels. Defaults to 32.

32
delta int

Delta hyperparameter passed to LinearCNU. Defaults to 1.

1
scramble bool

If True, applies a fixed random scrambling transform to the LinearCNU keys. Defaults to False.

False
*args

Additional positional arguments forwarded to ModuleWrapper.__init__.

()
**kwargs

Additional keyword arguments forwarded to ModuleWrapper.__init__.

{}
Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int, cnu_memories: int, in_channels: int = 3, in_res: int = 32,
             delta: int = 1, scramble: bool = False, *args, **kwargs):
    """Initialize a ``SingleLayerCNU`` classifier with a flat ``LinearCNU`` head.

    Builds a sequential model (Flatten -> LinearCNU -> Sigmoid), generates input
    transforms from ``transforms_factory``, and delegates to
    ``ModuleWrapper.__init__`` with stream descriptors from
    ``get_proc_inputs_and_proc_outputs_for_image_classification``.

    Args:
        d_dim: Dimensionality of the output feature vector (number of sigmoid units).
        cnu_memories: Number of key-value memory slots in the ``LinearCNU`` layer.
        in_channels: Number of input image channels (3 for RGB, 1 for grayscale).
            Defaults to 3.
        in_res: Spatial resolution (height and width) of the input image in pixels.
            The flat input dimension is ``in_res * in_res * in_channels``. Defaults to 32.
        delta: Delta hyperparameter passed to ``LinearCNU``. Defaults to 1.
        scramble: If ``True``, applies a fixed random scrambling transform to the
            ``LinearCNU`` keys. Defaults to False.
        *args: Additional positional arguments forwarded to ``ModuleWrapper.__init__``.
        **kwargs: Additional keyword arguments forwarded to ``ModuleWrapper.__init__``.
    """
    net = torch.nn.Sequential(
        torch.nn.Flatten(),
        LinearCNU(in_res * in_res * in_channels, d_dim, key_mem_units=cnu_memories, delta=delta, scramble=scramble),
        torch.nn.Sigmoid())

    transforms = transforms_factory("rgb" + str(in_res) if in_channels == 3 else "gray" + str(in_res))
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim, transforms)
    super(SingleLayerCNU, self).__init__(net, proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

CNNMNIST

CNNMNIST(*args, **kwargs)

Bases: CNN

CNN pre-configured for grayscale MNIST images (28x28 pixels, 1 channel).

A thin convenience subclass that calls CNN.__init__ with in_channels=1 and in_res=28 forced in kwargs. After the parent is initialized, each input stream's per-property transform is overridden with the "gray_mnist" preset from transforms_factory, which applies the standard MNIST normalization.

Source code in unaiverse/modules/networks.py
def __init__(self, *args, **kwargs):
    kwargs['in_channels'] = 1
    kwargs['in_res'] = 28
    super(CNNMNIST, self).__init__(*args, **kwargs)
    for p in self.proc_inputs:
        for prop in p.props:
            prop.set_stream_to_proc_transforms(transforms_factory("gray_mnist"))

CNNCNUMNIST

CNNCNUMNIST(*args, **kwargs)

Bases: CNNCNU

Source code in unaiverse/modules/networks.py
def __init__(self, *args, **kwargs):
    kwargs['in_channels'] = 1
    kwargs['in_res'] = 28
    super(CNNCNUMNIST, self).__init__(*args, **kwargs)
    for p in self.proc_inputs:
        for prop in p.props:
            prop.set_stream_to_proc_transforms(transforms_factory("gray_mnist"))

SingleLayerCNUMNIST

SingleLayerCNUMNIST(*args, **kwargs)

Bases: SingleLayerCNU

Source code in unaiverse/modules/networks.py
def __init__(self, *args, **kwargs):
    kwargs['in_channels'] = 1
    kwargs['in_res'] = 28
    super(SingleLayerCNUMNIST, self).__init__(*args, **kwargs)
    for p in self.proc_inputs:
        for prop in p.props:
            prop.set_stream_to_proc_transforms(transforms_factory("gray_mnist"))

ResNet

ResNet(d_dim: int = -1, freeze_backbone: bool = True, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int = -1, freeze_backbone: bool = True, *args, **kwargs):
    net = torchvision.models.resnet50(weights="IMAGENET1K_V1")
    if freeze_backbone:
        for layer in net.parameters():
            if layer != net.fc:
                layer.requires_grad = False

    if d_dim > 0:
        net.fc = torch.nn.Sequential(
            torch.nn.Linear(net.fc.in_features, d_dim),
            torch.nn.Sigmoid())

    transforms = transforms_factory("rgb224")
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim, transforms)
    super(ResNet, self).__init__(net, proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

ResNetCNU

ResNetCNU(d_dim: int, cnu_memories: int, delta: int = 1, scramble: bool = False, freeze_backbone: bool = True, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int, cnu_memories: int,
             delta: int = 1, scramble: bool = False, freeze_backbone: bool = True, *args, **kwargs):
    net = torchvision.models.resnet50(weights="IMAGENET1K_V1")
    if freeze_backbone:
        for layer in net.parameters():
            if layer != net.fc:
                layer.requires_grad = False

    if d_dim > 0:
        net.fc = torch.nn.Sequential(
            LinearCNU(net.fc.in_features, d_dim, key_mem_units=cnu_memories, delta=delta, scramble=scramble),
            torch.nn.Sigmoid())

    transforms = transforms_factory("rgb224")
    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim, transforms)
    super(ResNetCNU, self).__init__(net, proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

ViT

ViT(d_dim: int = -1, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int = -1, *args, **kwargs):
    weights = torchvision.models.ViT_B_16_Weights.IMAGENET1K_V1
    transforms = torchvision.transforms.Compose([
        weights.transforms(),
        torchvision.transforms.Lambda(lambda x: x.unsqueeze(0))  # Add batch dimension
    ])
    vit = torchvision.models.vit_b_16(weights=weights)

    if d_dim > 0:
        vit.heads = torch.nn.Sequential(
            torch.nn.Linear(vit.heads.head.in_features, 2048),
            torch.nn.ReLU(inplace=True),
            torch.nn.Linear(2048, d_dim),
            torch.nn.Sigmoid()
        )
        self.labels = ["unk"] * d_dim
    else:
        url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
        with urllib.request.urlopen(url) as f:
            self.labels = [line.strip().decode('utf-8') for line in f.readlines()]

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.backbone = vit
            self.tfm = transforms

        def forward(self, y: Image.Image):
            device = next(self.backbone.parameters()).device
            return self.backbone(self.tfm(y).to(device))

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim)
    super(ViT, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs, *args, **kwargs)

labels instance-attribute

labels = [(decode('utf-8')) for line in (readlines())]

DenseNet

DenseNet(d_dim: int = -1, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int = -1, *args, **kwargs):
    transforms = transforms_factory("rgb224")
    densenet = torchvision.models.densenet121(weights=None)

    if d_dim > 0:
        densenet.classifier = torch.nn.Sequential(
            torch.nn.Linear(densenet.classifier.in_features, 2048),
            torch.nn.ReLU(inplace=True),
            torch.nn.Linear(2048, d_dim),
            torch.nn.Sigmoid()
        )

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.backbone = densenet
            self.tfm = transforms

        def forward(self, y: Image.Image):
            device = next(self.backbone.parameters()).device
            return self.backbone(self.tfm(y).to(device))

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim)
    super(DenseNet, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs,
                                   *args, **kwargs)

EfficientNet

EfficientNet(d_dim: int = -1, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, d_dim: int = -1, *args, **kwargs):
    weights = torchvision.models.EfficientNet_B0_Weights.IMAGENET1K_V1
    transforms = weights.transforms
    effnet = torchvision.models.efficientnet_b0(weights=weights)

    if d_dim > 0:
        effnet.classifier = torch.nn.Sequential(
            torch.nn.Linear(effnet.classifier[1].in_features, 2048),
            torch.nn.ReLU(inplace=True),
            torch.nn.Linear(2048, d_dim),
            torch.nn.Sigmoid()
        )

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.backbone = effnet
            self.tfm = transforms

        def forward(self, y: Image.Image):
            device = next(self.backbone.parameters()).device
            o = self.backbone(self.tfm(y).to(device))
            if o.dim() == 1:
                o = o.unsqueeze(0)
            return o

    proc_inputs, proc_outputs = get_proc_inputs_and_proc_outputs_for_image_classification(d_dim)
    super(EfficientNet, self).__init__(module=Net(), proc_inputs=proc_inputs, proc_outputs=proc_outputs,
                                       *args, **kwargs)

FasterRCNN

FasterRCNN(*args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, *args, **kwargs):
    self.labels: list[str] = ['__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
                              'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
                              'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
                              'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
                              'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
                              'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
                              'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
                              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
                              'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
                              'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
                              'cell phone',
                              'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
                              'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
                              ]
    labels = self.labels

    weights = torchvision.models.detection.FasterRCNN_ResNet50_FPN_Weights.DEFAULT
    faster_rcnn = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights=weights)
    faster_rcnn.eval()
    transforms = torchvision.transforms.Compose([transforms_factory("rgb-no_norm"),
                                                 torchvision.transforms.Lambda(lambda x: x.squeeze(0)),
                                                 weights.transforms()])

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.backbone = faster_rcnn
            self.tfm = transforms
            self.labels = labels

        def forward(self, y: Image.Image):
            device = next(self.backbone.parameters()).device
            o = self.backbone([self.tfm(y).to(device)])  # List with 1 image per element (no batch dim)

            found_class_indices = o[0]['labels']
            found_class_scores = o[0]['scores']
            found_class_boxes = o[0]['boxes']
            valid = found_class_scores > 0.8

            found_class_indices: torch.Tensor = found_class_indices[valid]
            found_class_scores = found_class_scores[valid]
            found_class_boxes = found_class_boxes[valid]
            found_class_names: list[str] = [self.labels[int(i.item())] for i in found_class_indices]

            return found_class_indices, found_class_scores, found_class_boxes, ", ".join(found_class_names)

    super(FasterRCNN, self).__init__(
        module=Net(),
        proc_inputs=[StreamType(data_type="img", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="tensor", tensor_dtype=torch.long, tensor_shape=(None,),
                                 pubsub=False, private_only=False),
                      StreamType(data_type="tensor", tensor_dtype=torch.float32, tensor_shape=(None,),
                                 pubsub=False, private_only=False),
                      StreamType(data_type="tensor", tensor_dtype=torch.float32, tensor_shape=(None, 4),
                                 pubsub=False, private_only=False),
                      StreamType(data_type="text",
                                 pubsub=False, private_only=False)],
        *args, **kwargs)

labels instance-attribute

labels: list[str] = ['__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

TinyLLama

TinyLLama(device=None, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, device=None, *args, **kwargs):

    class Net(torch.nn.Module):
        def __init__(self, _device):
            super().__init__()
            self.__pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
                                   torch_dtype=torch.bfloat16, device=_device)

        def forward(self, msg: str) -> str:
            msg_struct = [{"role": "system", "content": "You are a helpful assistant"},
                          {"role": "user", "content": msg}]
            assert self.__pipe.tokenizer is not None
            prompt = self.__pipe.tokenizer.apply_chat_template(msg_struct, tokenize=False,
                                                               add_generation_prompt=True)
            assert isinstance(prompt, str)
            out: list = self.__pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7,
                                    top_k=50, top_p=0.95)
            out: str = out[0]["generated_text"] if (out is not None and len(out) > 0 and
                                                    "generated_text" in out[0]) else "Error!"
            if "<|assistant|>\n" in out:
                out = out.split("<|assistant|>\n")[1]
            return out.strip()

    # Populate self.device
    self.guess_device(device)

    super(TinyLLama, self).__init__(
        module=Net(self.device),
        device=device,
        proc_inputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        *args, **kwargs
    )

LLama

LLama(device=None, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, device=None, *args, **kwargs):

    class Net(torch.nn.Module):
        def __init__(self, _device):
            super().__init__()
            self.__pipe = pipeline("text-generation", model="meta-llama/Llama-3.2-3B-Instruct",
                                   torch_dtype=torch.bfloat16, device=_device)

        def forward(self, msg: str) -> str:
            msg_struct = [{"role": "system", "content": "You are a helpful assistant"},
                          {"role": "user", "content": msg}]
            assert self.__pipe.tokenizer is not None
            prompt = self.__pipe.tokenizer.apply_chat_template(msg_struct, tokenize=False,
                                                               add_generation_prompt=True)
            assert isinstance(prompt, str)
            out = self.__pipe(prompt, max_new_tokens=256, do_sample=True, return_full_text=False,
                              temperature=0.7, top_k=50, top_p=0.95)
            out = out[0]["generated_text"] if (out is not None and len(out) > 0 and
                                               "generated_text" in out[0]) else "Error!"
            if "<|assistant|>\n" in out:
                out = out.split("<|assistant|>\n")[1]
            return out.strip()

    # Populate self.device
    self.guess_device(device)

    super(LLama, self).__init__(
        module=Net(self.device),
        device=device,
        proc_inputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        *args, **kwargs
    )

Phi

Phi(device=None, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, device=None, *args, **kwargs):
    class Net(torch.nn.Module):
        def __init__(self, _device):
            super().__init__()
            self.__pipe = pipeline("text-generation", model="microsoft/Phi-3.5-mini-instruct",
                                   torch_dtype="auto", device=_device)

        def forward(self, msg: str) -> str:
            msg_struct = [{"role": "system", "content": "You are a helpful assistant"},
                          {"role": "user", "content": msg}]
            assert self.__pipe.tokenizer is not None
            prompt = self.__pipe.tokenizer.apply_chat_template(msg_struct, tokenize=False,
                                                               add_generation_prompt=True)
            assert isinstance(prompt, str)
            out_: list = self.__pipe(prompt, max_new_tokens=256, do_sample=True, return_full_text=False)
            out: str = out_[0]["generated_text"] if (out_ is not None and len(out_) > 0 and
                                                     "generated_text" in out_[0]) else "Error!"
            if "<|assistant|>\n" in out:
                out = out.split("<|assistant|>\n")[1]
            return out.strip()

    # Populate self.device
    self.guess_device(device)

    super(Phi, self).__init__(
        module=Net(self.device),
        device=device,
        proc_inputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        *args, **kwargs
    )

LangSegmentAnything

LangSegmentAnything(device=None, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, device=None, *args, **kwargs):
    from lang_sam import LangSAM
    from PIL import ImageDraw, ImageFont

    class Net(torch.nn.Module):
        def __init__(self, _device):
            super().__init__()

            # Generate a 64x64 error image (with text "Error" on it)
            error_img = Image.new("RGB", (64, 64), color="white")
            draw = ImageDraw.Draw(error_img)
            font = ImageFont.load_default()
            text = "Error"
            bbox = draw.textbbox((0, 0), text, font=font)
            text_width = bbox[2] - bbox[0]
            text_height = bbox[3] - bbox[1]
            position = ((64 - text_width) // 2, (64 - text_height) // 2)
            draw.text(position, text, fill="black", font=font)

            self.__sam = LangSAM(device=_device)
            self.__error_img = error_img

        def forward(self, image_pil: Image.Image, msg: str):
            try:
                image_pil = image_pil.convert("RGB") if image_pil.mode != "RGB" else image_pil  # Forcing RGB
                out = self.__sam.predict([image_pil], [msg])
                if (out is None or not isinstance(out, list) or len(out) < 1 or not isinstance(out[0], dict) or
                        'masks' not in out[0]) or out[0]['masks'].ndim != 3:
                    return image_pil
                else:
                    return LangSegmentAnything.highlight_masks_on_image(image_pil, out[0]['masks'])
            except Exception:
                return self.__error_img

    # Populate self.device
    self.guess_device(device)

    super(LangSegmentAnything, self).__init__(
        module=Net(self.device),
        device=device,
        proc_inputs=[StreamType(data_type="img", pubsub=False, private_only=False),
                     StreamType(data_type="text", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="img", pubsub=False, private_only=False)],
        *args, **kwargs
    )

highlight_masks_on_image staticmethod

highlight_masks_on_image(image_pil: Image, masks: ndarray, alpha: float = 0.75)
Source code in unaiverse/modules/networks.py
@staticmethod
def highlight_masks_on_image(image_pil: Image.Image, masks: np.ndarray, alpha: float = 0.75):
    img_np = np.array(image_pil, dtype=np.float32) / 255.0
    height, width, _ = img_np.shape
    num_masks = masks.shape[0]

    overlay_np = np.zeros((height, width, 3), dtype=np.float32)
    alpha_mask_combined = np.zeros((height, width, 1), dtype=np.float32)

    color_palette = [
        (255, 102, 102),  # Light Red
        (102, 255, 102),  # Light Green
        (102, 102, 255),  # Light Blue
        (255, 255, 102),  # Light Yellow
        (255, 102, 255),  # Light Magenta
        (102, 255, 255),  # Light Cyan
        (255, 178, 102),  # Orange
        (178, 102, 255),  # Purple
        (102, 178, 255),  # Sky Blue
    ]

    for i in range(num_masks):
        mask = masks[i, :, :].astype(np.bool)

        color_rgb_int = color_palette[i % len(color_palette)]
        color = np.array(color_rgb_int, dtype=np.float32) / 255.0
        overlay_np[mask] = (1 - alpha) * overlay_np[mask] + alpha * color
        alpha_mask_combined[mask] = np.maximum(alpha_mask_combined[mask], alpha)

    # Final blending and conversion ...
    final_np = (1 - alpha_mask_combined) * img_np + alpha_mask_combined * overlay_np
    final_np = (final_np * 255).astype(np.uint8)
    final_image = Image.fromarray(final_np)
    return final_image

SmolVLM

SmolVLM(device=None, *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self, device=None, *args, **kwargs):
    from transformers import AutoModelForImageTextToText

    class Net(torch.nn.Module):
        def __init__(self, _device):
            super().__init__()
            model_id = "HuggingFaceTB/SmolVLM2-500M-Video-Instruct"
            self.__backbone = (
                AutoModelForImageTextToText.from_pretrained(model_id,
                                                            torch_dtype=torch.bfloat16,
                                                            device_map=_device).to(_device))
            self.__pp = AutoProcessor.from_pretrained(model_id, device_map=_device)

        def forward(self, image_pil: Image.Image, msg: str = "what is this?"):
            image_pil = image_pil.convert("RGB") if image_pil.mode != "RGB" else image_pil  # Forcing RGB
            _device = next(self.__backbone.parameters()).device

            msg_struct = [{"role": "user", "content": [{"type": "text", "text": f"{msg}"},
                                                       {"type": "image", "image": image_pil}]}]

            prompt = self.__pp.apply_chat_template(msg_struct,
                                                   tokenize=True,
                                                   add_generation_prompt=True,
                                                   return_dict=True,
                                                   return_tensors="pt").to(_device, dtype=torch.bfloat16)

            out = self.__backbone.generate(**prompt, do_sample=False, max_new_tokens=128)
            out = self.__pp.batch_decode(out, skip_special_tokens=True)[0] if out is not None else "Error!"
            if "Assistant:" in out:
                out = out.split("Assistant:")[1]
            return out.strip()

    # Populate self.device
    self.guess_device(device)

    super(SmolVLM, self).__init__(
        module=Net(self.device),
        device=device,
        proc_inputs=[StreamType(data_type="img", pubsub=False, private_only=False),
                     StreamType(data_type="text", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        *args, **kwargs
    )

SiteRAG

SiteRAG(site_url: str, site_folder: str = join('rag', 'downloaded_site'), db_folder: str = join('rag', 'chroma_db'), *args, **kwargs)

Bases: ModuleWrapper

Source code in unaiverse/modules/networks.py
def __init__(self,
             site_url: str,
             site_folder: str = os.path.join("rag", "downloaded_site"),
             db_folder: str = os.path.join("rag", "chroma_db"),
             *args, **kwargs):
    # Saving options
    self.site_url = site_url
    self.site_folder = site_folder
    self.db_folder = db_folder

    # Loading neural model
    device_env = os.getenv("PROC_DEVICE", None)
    target_device = torch.device("cpu") if device_env is None else torch.device(device_env)
    model_id = "TheBloke/vicuna-7b-1.1-HF"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16,
                                                 device_map=target_device, offload_folder="offload")
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200)

    # Embedder
    from langchain.embeddings import SentenceTransformerEmbeddings
    self.embedder = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2",
                                                  model_kwargs={"device": target_device.type})

    # Crawling site (uses self.embedder + self.site_folder + self.db_folder + self.site_url)
    self.crawl_website()
    self.crawled_site_to_rag_knowledge_base()

    # Setting up RAG stuff
    from langchain.vectorstores import Chroma
    db = Chroma(persist_directory=db_folder, embedding_function=self.embedder)
    retriever = db.as_retriever(search_kwargs={"k": 3})

    class Net(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self._pipe = pipe
            self._retriever = retriever

        def forward(self, msg: str):
            # Build context
            docs = self._retriever.get_relevant_documents(msg)
            context = "\n\n".join(doc.page_content for doc in docs)
            prompt = f"Answer the question based on the following context:\n\n{context}\n\nQuestion: {msg}\nAnswer:"

            # Generate answer
            out = self._pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7)
            out = out[0]['generated_text'][len(prompt):].strip() if (out is not None and len(out) > 0 and
                                                                     "generated_text" in out[0]) else "Error!"

            # Append source URLs
            best_doc_with_score = self._retriever.vectorstore.similarity_search_with_score(msg, k=1)
            best_doc, _ = best_doc_with_score[0]
            docs = [best_doc]
            sources = set("<a href='" +
                          doc.metadata['source'] +
                          "' onclick='window.open(this.href); return false;' style='color: blue;'>" +
                          doc.metadata['source'] + "</a>" for doc in docs)
            sources_text = "<br/><br/>\nURLs:\n" + "\n".join(sources)

            return out.strip() + sources_text

    super(SiteRAG, self).__init__(
        module=Net(),
        proc_inputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        *args, **kwargs
    )

site_url instance-attribute

site_url = site_url

site_folder instance-attribute

site_folder = site_folder

db_folder instance-attribute

db_folder = db_folder

embedder instance-attribute

embedder = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2', model_kwargs={'device': type})

crawl_website

crawl_website(max_pages=300)
Source code in unaiverse/modules/networks.py
def crawl_website(self, max_pages=300):
    import requests
    from bs4 import BeautifulSoup
    from urllib.parse import urljoin, urlparse

    if os.path.exists(self.site_folder):
        shutil.rmtree(self.site_folder)
    os.makedirs(self.site_folder)
    visited = set()
    to_visit = [self.site_url]

    while to_visit and len(visited) < max_pages:
        url = to_visit.pop(0)
        if url in visited:
            continue
        visited.add(url)

        try:
            r = requests.get(url, timeout=10)
            if "text/html" not in r.headers.get("Content-Type", ""):
                continue

            parsed = urlparse(url)
            filename = parsed.path.strip("/") or "index.html"
            filename += ".crawled"
            file_path = os.path.join(self.site_folder, filename.replace("/", "__"))
            with open(file_path, "w", encoding="utf-8") as f:
                f.write(r.text)

            soup = BeautifulSoup(r.text, "html.parser")
            for link in soup.find_all("a", href=True):
                link: dict
                full_url = urljoin(url, link["href"])
                if full_url.startswith(self.site_url) and full_url not in visited:
                    to_visit.append(full_url)
        except Exception as e:
            print(f"Error fetching {url}: {e}")

    print(f"Crawled {len(visited)} pages.")

crawled_site_to_rag_knowledge_base

crawled_site_to_rag_knowledge_base()
Source code in unaiverse/modules/networks.py
def crawled_site_to_rag_knowledge_base(self):
    from bs4 import BeautifulSoup
    from urllib.parse import urljoin
    from langchain.vectorstores import Chroma
    from langchain.docstore.document import Document
    from langchain.text_splitter import RecursiveCharacterTextSplitter

    docs = []
    for filename in os.listdir(self.site_folder):
        if filename.endswith(".crawled"):
            file_path = os.path.join(self.site_folder, filename)
            with open(file_path, encoding="utf-8") as f:
                html = f.read()

            soup: BeautifulSoup = BeautifulSoup(html, "html.parser")
            text = soup.get_text(separator=" ", strip=True)  # Type: ignore

            page_path = filename.replace("__", "/").replace(".crawled", "")
            url = urljoin(self.site_url, page_path)

            docs.append(Document(page_content=text, metadata={"source": url}))

    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    split_docs = splitter.split_documents(docs)

    chroma_db = Chroma.from_documents(split_docs, self.embedder, persist_directory=self.db_folder)
    chroma_db.persist()

FeatherlessAPI

FeatherlessAPI(model: str | None = None, cost: int = 1, system_prompt: str = '', process_id: str | None = None, max_tokens: int = -1, temperature: float = -1.0, top_p: float = -1.0, top_k: int = -1, frequency_penalty: float | None = None, presence_penalty: float | None = None, repetition_penalty: float | None = None, min_p: float | None = None, sampler: dict | None = None, connect_timeout: float = 15.0, *args, **kwargs)

Bases: ModuleWrapper

Callable handle onto the shared Featherless gateway.

Typical usage::

api = FeatherlessAPI(model="some-model-id", cost=2)
text = api("write me a haiku")  # Routed through the gateway

One instance per logical caller. The model and unit cost are fixed at construction, so the callable takes only the prompt string. Construction bootstraps the shared server if needed (self-spawning, race-safe) and opens this caller's persistent registration socket: the liveness token whose lifetime equals this object's interest in the gateway. Closing the instance (or letting the process die) releases it; when the last instance goes away the server shuts itself down.

The whole client lifecycle (bootstrap, registration, request round-trip) is self-contained here; callers never touch the server internals.

Create a FeatherlessAPI handle and connect it to the shared gateway.

Parameters:

Name Type Description Default
model str | None

The model identifier used for every call (None lets the server fall back to its MODEL_ID default).

None
cost int

The unit cost charged for every call (one of VALID_COSTS) (Default: 1).

1
system_prompt str

The system prompt prepended to every call ("" means no system prompt) (Default: "").

''
process_id str | None

Identifier used for round-robin fairness; defaults to this process's PID.

None
max_tokens int

Maximum number of tokens to generate per call (-1 means no limit) (Default: -1).

-1
temperature float

Sampling temperature for every call (negative lets the API use its default) (Default: -1.).

-1.0
top_p float

Nucleus-sampling probability (negative lets the API use its default) (Default: -1.).

-1.0
top_k int

Top-k sampling cutoff (negative lets the API use its default) (Default: -1).

-1
frequency_penalty float | None

Frequency penalty (None lets the API use its default) (Default: None).

None
presence_penalty float | None

Presence penalty (None lets the API use its default) (Default: None).

None
repetition_penalty float | None

Repetition penalty, a vLLM/Featherless extension (None uses the default) (Default None).

None
min_p float | None

Minimum-probability cutoff, a vLLM/Featherless extension (None uses the default) (Default: None).

None
sampler dict | None

Extra sampler params merged last (its keys win); use it for any knob not covered above.

None
connect_timeout float

Maximum seconds to wait for the gateway server to come up (Default: 15.0).

15.0
Source code in unaiverse/modules/networks.py
def __init__(self, model: str | None = None, cost: int = 1, system_prompt: str = "",
             process_id: str | None = None, max_tokens: int = -1, temperature: float = -1.,
             top_p: float = -1., top_k: int = -1, frequency_penalty: float | None = None,
             presence_penalty: float | None = None, repetition_penalty: float | None = None,
             min_p: float | None = None, sampler: dict | None = None, connect_timeout: float = 15.0,
             *args, **kwargs):
    """Create a FeatherlessAPI handle and connect it to the shared gateway.

    Args:
        model: The model identifier used for every call (None lets the server fall back to its MODEL_ID default).
        cost: The unit cost charged for every call (one of VALID_COSTS) (Default: 1).
        system_prompt: The system prompt prepended to every call ("" means no system prompt) (Default: "").
        process_id: Identifier used for round-robin fairness; defaults to this process's PID.
        max_tokens: Maximum number of tokens to generate per call (-1 means no limit) (Default: -1).
        temperature: Sampling temperature for every call (negative lets the API use its default) (Default: -1.).
        top_p: Nucleus-sampling probability (negative lets the API use its default) (Default: -1.).
        top_k: Top-k sampling cutoff (negative lets the API use its default) (Default: -1).
        frequency_penalty: Frequency penalty (None lets the API use its default) (Default: None).
        presence_penalty: Presence penalty (None lets the API use its default) (Default: None).
        repetition_penalty: Repetition penalty, a vLLM/Featherless extension (None uses the default) (Default None).
        min_p: Minimum-probability cutoff, a vLLM/Featherless extension (None uses the default) (Default: None).
        sampler: Extra sampler params merged last (its keys win); use it for any knob not covered above.
        connect_timeout: Maximum seconds to wait for the gateway server to come up (Default: 15.0).
    """
    if cost not in APIGatewayServer.VALID_COSTS:
        log.critical(f"Invalid cost {cost}: it must be one of {APIGatewayServer.VALID_COSTS}")

    # Bring up the shared gateway server before opening any sockets (idempotent across processes)
    FeatherlessAPI._ensure_server(connect_timeout)

    class Net(torch.nn.Module):
        """Holds the sampler config and the two gateway sockets, and routes each `forward(prompt)`
        through the gateway. Plain Python attributes (socket, dict, str): none get registered as
        torch submodules since none are `nn.Module`/`Parameter`/`Tensor`."""

        def __init__(self):
            super().__init__()
            self.model_name: str | None = model
            self.cost: int = cost
            self.system_prompt: str = system_prompt

            # Per-call sampler: include each knob only when explicitly set, so unset ones fall back
            # to the API default. The free-form `sampler` arg is merged last and overrides.
            self.sampler: dict = {}
            if max_tokens > 0:
                self.sampler["max_tokens"] = max_tokens
            if temperature >= 0.:
                self.sampler["temperature"] = temperature
            if top_p >= 0.:
                self.sampler["top_p"] = top_p
            if top_k >= 0:
                self.sampler["top_k"] = top_k
            if frequency_penalty is not None:
                self.sampler["frequency_penalty"] = frequency_penalty
            if presence_penalty is not None:
                self.sampler["presence_penalty"] = presence_penalty
            if repetition_penalty is not None:
                self.sampler["repetition_penalty"] = repetition_penalty
            if min_p is not None:
                self.sampler["min_p"] = min_p
            if sampler:
                self.sampler.update(sampler)

            # Round-robin fairness is per process; default ID is the PID
            self.process_id: str = str(process_id if process_id is not None else os.getpid())

            # Persistent registration socket: lifetime == this caller's interest in the gateway
            self._reg: socket.socket = socket.create_connection(
                (APIGatewayServer.HOST, APIGatewayServer.PORT))
            self._reg.sendall(b'{"op":"hello"}\n')

            # Separate request socket (one in-flight request per instance; this client is synchronous)
            self._req: socket.socket = socket.create_connection(
                (APIGatewayServer.HOST, APIGatewayServer.PORT))
            self._rf = self._req.makefile("r")

        def forward(self, prompt: str) -> str:
            if not isinstance(prompt, str):
                log.critical(f"Invalid prompt: it must be a str, got {type(prompt).__name__}")
            msg = json.dumps({"op": "generate", "process_id": self.process_id,
                              "sys_prompt": self.system_prompt, "prompt": prompt,
                              "cost": self.cost, "model": self.model_name,
                              "sampler": self.sampler}) + "\n"
            self._req.sendall(msg.encode())
            line = self._rf.readline()
            if not line:
                log.critical("Gateway closed the connection")
            resp = json.loads(line)
            if not resp.get("ok"):
                log.critical(f"Gateway returned an error: {resp.get('error', 'unknown error')}")
            return resp["result"]

        def close(self) -> None:
            """Close both gateway sockets, releasing this caller's interest."""
            for s in (self._req, self._reg):
                try:
                    s.close()
                except OSError:
                    pass

    super(FeatherlessAPI, self).__init__(
        module=Net(),
        proc_inputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        proc_outputs=[StreamType(data_type="text", pubsub=False, private_only=False)],
        *args, **kwargs
    )

close

close() -> None

Close both the request and the persistent registration socket, releasing this caller's interest.

Source code in unaiverse/modules/networks.py
def close(self) -> None:
    """Close both the request and the persistent registration socket, releasing this caller's interest."""
    assert self.module is not None
    assert isinstance(self.module, ModuleWrapper)
    self.module.close()