Episode 3: Two agents, fully automatic¶

Quickstart · Episode 3 of 4

Now the real magic: two AI agents that find each other and exchange data with no human involved, and a new kind of brain that works with images instead of text.

The idea¶

We'll host an image classifier (it looks at a picture and guesses what's in it). Then we'll build a second agent that generates pictures and sends them over for classification, printing the answers. Two programs, talking automatically.

What's a ResNet / image classifier?

ResNet is a famous, ready-made image-recognition model. Show it a photo and it guesses what's in it, choosing among 1,000 everyday categories (Egyptian cat, sports car, banana, …). It's the vision-world equivalent of the language model from Episode 1, same idea, different senses.

What's a tensor, and what do those numbers in the code mean?

A tensor is just a grid of numbers. An image is turned into a tensor so a model can read it: a batch of images has the shape (how_many, 3, height, width), 3 is the red/green/blue colour channels. When you see (None, 3, None, None), None means "any size is fine on this dimension". Don't overthink it, it's a label describing the data's shape. Full story in Data streams.

Step 3a, the classifier¶

In a terminal, create classifier.py:

classifier.py

import torch
import torchvision
from unaiverse.agent import Agent
from unaiverse.streams.dataprops import StreamType
from unaiverse.networking.node.node import Node

# A ready-made image classifier (downloads pretrained weights the first time).
brain = torchvision.models.resnet50(weights="IMAGENET1K_V1").eval()

# This agent accepts images-as-tensors and returns 1000 scores (one per category).
agent = Agent(
    proc=brain,
    proc_inputs=[StreamType(data_type="tensor",
                            tensor_shape=(None, 3, None, None),   # any batch, RGB, any size
                            tensor_dtype=torch.float32)],
    proc_outputs=[StreamType(data_type="tensor",
                             tensor_shape=(None, 1000),           # 1000 category scores
                             tensor_dtype=torch.float32)],
)

node = Node(agent, node_name="MyClassifier", hidden=True, clock_delta=1./5.)
node.run()   # lone wolf: sit and wait for images

Run it and leave it running:

python classifier.py

Why so much more detail than Episode 1?

Text was simple, ["text"] said it all. Images need a precise shape and number type, so we use the full StreamType descriptor instead of the shorthand. It's the same idea as ["text"], just spelled out.

Step 3b, the generator¶

In a second terminal, create generator.py:

generator.py

import torch
from unaiverse.agent import Agent
from unaiverse.streams.dataprops import StreamType
from unaiverse.networking.node.node import Node


# A tiny "brain" that invents a random picture each time it's asked.
class PictureMaker(torch.nn.Module):
    def forward(self, x=None):
        return torch.rand((1, 3, 224, 224), dtype=torch.float32)   # one random 224×224 image


agent = Agent(
    proc=PictureMaker(),
    proc_inputs=[StreamType(data_type="all")],        # it ignores any input
    proc_outputs=[StreamType(data_type="tensor",
                             tensor_shape=(1, 3, 224, 224),
                             tensor_dtype=torch.float32)],
)


# After each round, peek at what the classifier sent back and print the winner.
def on_cycle(node: Node):
    result = node.agent.get_last_streamed_data("MyClassifier")
    if result and result[0] is not None:
        top_category = int(result[0].argmax(dim=1)[0])
        print(f"The classifier's top guess: category #{top_category}")


node = Node(agent, node_name="Generator", hidden=True,
            clock_delta=1./5., run_hook=on_cycle)

# Connect to the classifier and run for 10 seconds, then stop.
node.run(get_in_touch="MyClassifier", max_time=10.0)

Run it:

python generator.py

For about ten seconds you'll see guesses scroll by. Generator is inventing images, sending them to MyClassifier, and reading back its answers, entirely on its own.

Why are the guesses random nonsense?

Because we're sending random pixels, not real photos! The point of this episode is the plumbing, two agents exchanging real data over the network, not the accuracy. Feed it real images and you'd get real predictions.

What just happened¶

sequenceDiagram
    participant G as Generator
    participant Net as P2P network
    participant C as MyClassifier
    G->>Net: get_in_touch("MyClassifier")
    Net-->>G: connected
    loop every tick, for 10s
        G->>C: an image (tensor 1×3×224×224)
        C->>C: ResNet looks at it
        C-->>G: 1000 scores (tensor 1×1000)
        G->>G: print the top category
    end

Two independent programs, possibly on two machines, found each other and exchanged data with no human and no central server.
run_hook=on_cycle ran your function every tick, that's how you add custom automation around an agent.
get_last_streamed_data("MyClassifier") read what the classifier sent back.
The stream types matched (MyClassifier wants (…,3,…,…) images and returns (…,1000) scores) so the connection just worked.

Episode 3 recap

You connected two AI agents, exchanged typed data automatically, met a second kind of model (vision), and added your own logic with a run hook. This is the core of UNaIVERSE in miniature.