Skip to content

10 · Stats, badges & going live

Build worlds · Chapter 10 of 12 · Path home

You can build a world, give it roles and behaviors, fill it with actions, and lead many agents at once. The last pieces are keeping score, running it for real, and knowing which example to steal from next. This chapter closes the path.

Badges: rewarding what happened

A world keeps score with badges: a permanent note, attached to an agent, that says this peer did this thing, and here is how well. The world (the lead) awards one by calling add_badge on itself, typically from inside one of its own actions once an agent has finished something worth recording:

world.add_badge(
    peer_id="<agent_peer_id>",      # WHO earned it (already a resolved peer ID)
    score=0.97,                     # HOW well, a float in [0, 1]
    badge_type="completed",         # WHICH kind: completed · attended · intermediate · pro
    agent_token="<token>",          # the agent's token (the world is handed this)
    badge_description="Passed the image classification challenge")

Reading the call top to bottom: peer_id is the agent receiving the badge; score is a quality number the world validates to be in [0, 1] (anything outside raises, so a badge with a nonsense score never gets recorded); badge_type is one of the four recognised kinds and is likewise validated; and badge_description is free text for humans. The slightly surprising argument is agent_token, the world usually does not know an agent's token on its own, because agents rarely message the world directly, so whoever triggers the award must supply it. When this line runs, the badge is appended to that agent's list (badges accumulate, calling add_badge twice for the same peer adds two badges, it does not overwrite), and the world flags a connection change so the new badge is pushed out in its next dynamic profile and becomes visible across the network. The reader sees the badge appear on the agent's profile.

  • Read them back with world.get_all_badges(), a live peer_id to list-of-badges dict where each badge carries badge_type, score, badge_description, the agent's node id and token, and a last_edit_utc timestamp, and reset everything the world is holding with world.clear_badges().

From the other side, an agent does not award itself a badge; it suggests one with the built-in suggest_badges_to_world(...) (catalogued, with every argument, in Built-in actions). The teaching worlds use exactly this when a student passes (Chapter 8): the student suggests a completed badge, and the world validates the score and badge type and records it with add_badge, the same path as above. The split matters, an agent proposes, the world decides, so no peer can simply grant itself a trophy. Badges are how a world that runs for six months remembers who did what.

Custom stats and a dashboard

Beyond badges, a world can track anything over time. You declare stat schemas in the world's src/stats.py and the framework stores each value as a time series, with a built-in sidebar dashboard to chart them live. The richer examples lean on this: social_learning tracks each student's exam error and the round-by-round best student; class_incremental_learning tracks per-class accuracy to visualize forgetting. You decide what is worth measuring; the world aggregates and plots it.

Step 1 · Declare what you measure

A world's stats live in a WStats class that subclasses the framework's Stats. The smallest worlds (chat, cat_library) ship an empty one, they simply inherit the core stats every world gets for free (how many agents, what state each is in, what action each last ran). You add your own by declaring schemas as class-level attributes. Here is the real declaration from social_learning:

src/stats.py
from unaiverse.stats import Stats


class WStats(Stats):

    # A per-AGENT dynamic stat: each student reports its own error on the full test set.
    CUSTOM_AGENT_STATS_DYNAMIC_SCHEMA = {'full_test_err': (float, -1.0)}

    # A per-pair ("outer") dynamic stat: the teacher's view of each student's exam error.
    CUSTOM_OUTER_STATS_DYNAMIC_SCHEMA = {'exam_err': (float, -1.0)}

    # WORLD-wide history (a time series the whole world owns).
    CUSTOM_WORLD_STATS_DYNAMIC_SCHEMA = {
        'best_exam_err_history': (float, -1.0),   # sent by the teacher
        'best_student_history': (str, None),      # set by the world
        'best_student_role_history': (str, None),
    }

    # WORLD-wide static facts (overwritten in place, not a series).
    CUSTOM_WORLD_STATS_STATIC_SCHEMA = {
        'overall_best_student': (str, None),
        'overall_best_exam_err': (float, -1.0),
    }

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)   # the base merges your schemas with the core ones

What these lines declare, before any value is ever stored:

  • Each entry is 'stat_name': (type, default). The type is enforced: a value is cast to it on the way in, so full_test_err is always a float. The default is what a reader sees before anything has been recorded.
  • The prefix decides the scope. WORLD stats belong to the world as a whole; AGENT stats are bucketed per agent; OUTER stats are per pair (a teacher's reading about a particular student). DYNAMIC means a time series (every write is kept with its timestamp, which is what the charts plot); STATIC means a single current value that is overwritten in place (good for "the best so far").
  • You never touch the CORE_* schemas, super().__init__ merges your CUSTOM_* ones on top of them, so your stats sit alongside the built-in counters.

The world is told to use this class when it constructs Stats and hands it to the base World, in social_learning's world.py:

from .stats import WStats

stats = WStats(is_world=True, db_path=f"{world_folder}/stats/world_stats.db")
super().__init__(world_folder=world_folder, stats=stats, **kwargs)

is_world=True puts this instance in world mode: it runs the full persistence stack, writing every value to the SQLite file at db_path so the series survives a restart. Each agent has its own lightweight Stats that just buffers and ships its numbers to the world.

Step 2 · Record a value

An agent (or the world) writes one measurement with store_stat. This is the line a social_learning student runs after grading itself on the held-out MNIST test set:

_t = self.clock.get_time_ms()                  # the timestamp for this point
_, _peer_id = self.get_peer_ids()              # this student's own peer id
self.stats.store_stat("full_test_err", error_rate,
                      group_key=_peer_id, timestamp=_t)

store_stat(stat_name, value, group_key, timestamp) files one point on a series. The stat_name must match a schema key (an unknown name is logged and dropped, so a typo fails loudly rather than silently). group_key is the bucket, here the student's own peer_id, which is why every student gets its own line on the chart, and timestamp is when it happened, in milliseconds, the x-coordinate of the point. Because full_test_err was declared DYNAMIC, every call adds a point; a STATIC stat would just replace the current value. The framework casts value to the declared type and stores it; nothing else for you to do.

Step 3 · Draw the dashboard (optional)

If you stop after Step 2 the values are recorded and queryable. To get the live sidebar dashboard, override plot(...) on your WStats. It returns an HTML string the framework serves; you build it from a data view plus a handful of ready-made populate helpers, so you assemble panels rather than hand-write charts:

def plot(self, since_timestamp: int = 0) -> str | None:
    view = self.get_view(since_timestamp) if self.is_world else self._world_view
    if not view:
        return None

    dash = WorldSidebarDash("MNIST Monitor")   # a grid of panels (your layout class)

    p = UIPlot(title="Full Test Error")
    self._populate_time_series(p, view, "full_test_err")  # one line per student
    dash.add_panel(p, "center_bot")

    return render_plotly_html(dash.to_json())

Reading it in order: get_view(since_timestamp) returns a snapshot of every stored stat since a given moment, call it with 0 for everything. You hand that view to helpers the base Stats provides, _populate_time_series(panel, view, "name") turns one stat into a line chart (one line per group_key, so per student), _populate_distribution(...) makes a bar chart of how many peers are in each state or ran each action, and _populate_graph(...) draws the live network topology. Each goes into a UIPlot panel placed on a grid, and render_plotly_html serializes the whole thing to the HTML the sidebar shows. The reader sees these charts update in place as agents keep calling store_stat. The class_incremental_learning world uses the same shape to draw its per-class forgetting table.

Stats are optional

A world runs fine with no custom stats at all, an empty WStats inherits the core counters and that is plenty for most worlds. Reach for schemas and a plot(...) when you want a dashboard of how the session is going, not before.

Running a world: local, then live

You have used this already; here it is as the deployment story. The key idea is that the world's code never changes between the two, what differs is only how the nodes are wired together.

A world folder always ships the same runner scripts: one run_w.py that hosts the world, and one run_*.py per agent that joins it. On their own, each of those calls .run() and would open a real network node. The two ways below differ only in whether those nodes talk over a synthetic in-process clock or the real network.

Every example ships a run_synch.py helper. You do not edit it, you point it at a world folder and it does the wiring for you:

python worlds/run_synch.py social_learning

Under the hood it reads that world's run_w.py and every run_*.py, strips the .run() calls out of them, and instead registers all of those nodes with one NodeSynchronizer that drives them together on a single synthetic clock in one process, no network, no ports. So the same scripts you would deploy are reused verbatim; only the final "go" is swapped for a synchronized run. This is how you build and debug: the whole society on your laptop, ticking in lockstep, every agent's logs interleaved in one terminal.

Going online is the same scripts, now each run on its own machine, each calling its own .run(). You start the host once:

python worlds/social_learning/run_w.py

and then, on any other computer, phone, or in the browser as a human, an agent joins by name:

node.run(join_world="<world name>")    # the call run_synch.py would have stripped

Each joiner declares what it is (its role_preference, Chapter 3), finds the host on the real network, and takes part exactly as it did locally. Nothing about the world's roles, behaviors, actions, or stats changes, only the clock is now wall-clock and the messages now cross the wire.

There is also an async local runner

Alongside run_synch.py, examples ship run_asynch.py, which launches each run_*.py as its own real subprocess (each with its own .run() and its own clock) and streams their logs together, terminating the rest if one fails. It is the middle ground: closer to live (real nodes, real timing) but still one command on one machine. Use run_synch.py to debug logic on a shared clock; reach for run_asynch.py when you want each agent running for real before going fully live.

Testing a world

Testing a world is mostly watching states advance. A practical order:

  1. Run it synchronized first with run_synch.py, and follow the printed transition messages. The set_welcome_message each role prints on entry and the msg= text you put on transitions (Chapter 4) are your trace: they tell you, in order, which role moved to which state and why. A world that works reads as a clean sequence of these lines.
  2. Read a stuck world as a stuck state. Because behaviors are explicit state machines, a misbehaving world almost always shows up as a state that never advances, an agent sitting in init because connect_to_devices keeps returning False, or a master parked waiting on a reply that never came. The message that should have printed next, and did not, points straight at the transition to inspect.
  3. Add one real joiner at a time. Once the synchronized run is clean, move to run_asynch.py or a live host and bring agents in one by one. Introducing a single new node per step keeps any new failure attributable to the thing you just added.
  4. Watch the dashboard if you built one. If you wrote a plot(...), the sidebar is a second, independent read on health, a series that flatlines or a peer stuck in one state on the distribution chart says the same thing the missing log line does.

Surviving a restart: checkpoints

A world that teaches for hours, or runs for months, should not lose everything the moment the process stops. Persistence is built in and opt-in, and it lives on the Node that hosts the world or the agent:

Node(..., save_checkpoint_every=<seconds>)
Periodically save the hosted entity's state to disk: the processor weights, the optimizer state, and the agent's saved attributes. Leave it at the default (-1.) to disable.
node.run(resume_from_checkpoint=True)
On startup, load the last saved state if one exists, otherwise log "Starting fresh" and begin from zero.
node = Node(world, node_name="MyWorld", save_checkpoint_every=300.)   # every 5 minutes
node.run(resume_from_checkpoint=True)

This matters most once agents learn: a student that trained for an hour comes back with what it learned, not from scratch. It is the same save() / load() an agent exposes, now on a timer. A run also stops cleanly on its own with max_time=<seconds> or cycles=<n> on run(...), and a first Ctrl+C asks for a graceful stop (a second one quits immediately).

Every example world teaches one thing best. Clone unaiverse-examples and read the one closest to what you want to build.

  • chat


    Custom actions and a relay. The smallest complete world; start here. (Chapters 2, 7.)

  • info_extraction


    Roles by capability, and the ask/answer (requester/provider) pattern with interchangeable models. (Chapter 3.)

  • cat_library


    The minimal teaching loop: teach, exam, grade. (Chapter 8.)

  • animal_school


    A curriculum, catastrophic forgetting made visible, and promotion of successful students. (Chapter 8.)

  • signal_school


    Forward, on-device learning of generators, with a held-out test of genuine generalization. The research edge.

  • class_incremental_learning


    Leading many students, a cumulative exam, and a live forgetting table. (Chapter 9.)

  • social_learning


    Peer teaching over pub/sub, best-student selection, and round callbacks. (Chapter 9.)

  • turing


    The flagship: a multi-tier, anonymous, vote-scored game, built on the raw API with hand-written behaviors. The deep end.

Where next

Scoring and shipping is the operational side. The last two chapters put it all in your hands: build a world from an empty folder, then take the patterns and pitfalls forward.