9 · The world master at work¶
Build worlds · Chapter 9 of 12 · Path home
A teacher leading one student (Chapter 8) is the skeleton. A master leading many is where worlds become powerful, and where UNaIVERSE pulls clearly ahead of anything that just wires steps together. The master's toolkit is small: fan out to a group, run rounds, wait with callbacks, and change roles on the fly. We will see all of it in two worlds.
The two flows behind this chapter
Both moves here, one leader to many followers and publish once, many receive, are diagrammed and shown from both sides in the communication scenarios; the completion guards a master waits on are in the action catalogue.
Leading a group: class_incremental_learning¶
This world teaches many students the MNIST digits one class at a time, then
tests all classes seen so far after each lesson, so you watch forgetting happen
live. The master's teacher.py is one Agent subclass; its behavior (teacher.json)
loops through four states, welcoming, brainstorm, lesson, exam, and
the actions below are the body of each. Everything the master does to lead the group
fits in two moves: fan out a lesson, then examine the whole class.
Keeping the roster current¶
Before it can lead anyone, the master has to know who is in the room. It does not
ask once and assume; it refreshes the set on every clock tick from the built-in
on_tick:
@action
async def on_tick(self):
await super().on_tick()
connected = set(self.get_agents_by_role("student", handshake_completed=True))
joined = connected - set(self._students.keys()) # who is new this tick
gone = set(self._students.keys()) - connected # who dropped out
for s in joined:
self._students[s] = {"exams": [], "lessons": [], "evaluations": []}
for s in gone:
del self._students[s]
return True
Step by step: get_agents_by_role("student", handshake_completed=True) returns the
peers that currently hold the student role and have finished the handshake (so
the master can actually talk to them). The two set differences tell it who joined
and who left since last tick; it opens a fresh history dict for each newcomer and
forgets anyone who disconnected. Because this runs every tick, self._students is
always the live group, students can come and go mid-session and the next fan-out
simply addresses whoever is present. The welcoming action then just blocks until
len(self._students) >= 1 (or a timer expires), so the master never starts teaching
an empty room.
Fan out a lesson¶
When the brainstorm state decides it is time to teach (it sets _current_intent
to "teach" while there are still classes left), init_lesson sends one request
to the whole group at once, there is no per-student loop:
New to stdin / stdtar?
These are an agent's fixed input/target slots. If the names are unfamiliar, read An agent's own streams first, it explains the whole convention in one place.
@action
async def init_lesson(self):
if not self._students:
return False
self._current_lesson += 1
cls = self._current_lesson
_, prv_id = self.get_peer_ids()
ok = await self.send(
action_name="learn",
target=list(self._students.keys()), # the whole group
streams={"stdin": [f"{prv_id}:images@teach_{cls}"],
"stdtar": [f"{prv_id}:labels@teach_{cls}"],
"stdext": []},
num_steps=self.TEACH_PER_CLASS,
)
if not ok:
self._current_lesson -= 1 # roll back, retry next tick
return False
return True
What happens in order:
- Pick the next class.
_current_lessonadvances from-1to0, then1, and so on;clsis the digit this lesson teaches. State lives on the instance, so the master remembers where it is in the curriculum across ticks (the same "keep state onself" idiom from Chapter 5). - Address everyone.
target=list(self._students.keys())is the whole roster. A singlesendto a list of targets reaches them all, this is the "lead a whole group" move, not a loop. - Point each student at the master's own data. The
streamsmap wires the students' inputs to the master's streams:stdinto this teacher'simagesof groupteach_{cls}, andstdtar(the supervision target) to its matchinglabels. The master built one(images, labels)stream pair per class up front inaccept_new_role, so "teach class 3" is just namingteach_3.prv_idis the master's own in-world peer id, the streams belong to it. - Bound the lesson.
num_steps=TEACH_PER_CLASStells every student how many samples to train on before the request is considered finished. - Fail safe. If the
sendcould not go out, the master un-does the lesson counter and returnsFalse, so the behavior retries the very same class next tick instead of skipping it.
The send returns immediately; it does not wait for the students to finish. The
waiting is the next state's job. The behavior leaves init_lesson and parks on a
completion guard, the built-in all_sent_completed(action_name="learn"), which
returns True only once every student has reported its learn request done.
That is the fan-out-then-wait shape: one send, then one guard that gates the group.
Examine cumulatively¶
After a lesson the brainstorm state flips to the exam branch and send_exam runs.
It is the same fan-out, but now the master also has to collect answers back:
@action
async def send_exam(self):
if self._current_exam is not None:
return True
students = list(self._students.keys())
seen = self._classes_taught_so_far() # e.g. [0, 1, 2]
if not students or not seen:
return False
self._current_exam = {
"id": f"exam_{int(datetime.datetime.now().timestamp())}",
"seen_classes": seen,
"predictions_by_student": {s: [] for s in students},
"scores": {},
}
_, prv_id = self.get_peer_ids()
ok = await self.send(
action_name="process",
target=students,
streams={"stdin": [f"{prv_id}:images@eval"], "stdtar": [], "stdext": []},
num_steps=self.EVAL_PER_CLASS * len(seen), # all classes seen so far
timeout=self.EXAM_TIMEOUT_SEC, # per-student deadline
)
if not ok:
self._current_exam = None
return False
return True
Reading it as a round:
- Open an exam record.
_current_examis a scratchpad for this exam: the classes being tested, an empty prediction list per student, and a place for scores. Guarding on_current_exam is not Nonemakes the action idempotent, if the behavior re-enters the state, it does not start a second exam. - Test everything taught so far.
seenis[0..current_lesson]. The exam streams the combinedevalset (all classes seen, in order) to every student and asks them toprocessit, run their model and emit a prediction per sample.num_stepsis sized to cover every class seen, which is what makes forgetting visible: a student that learned class 0 but later overwrote it will now get class 0 wrong. - Give each student its own deadline.
timeout=EXAM_TIMEOUT_SECis per-student, so one slow or dead learner auto-completes after the deadline instead of stalling the whole exam. The group guard fires on partial results and the master scores whatever arrived.
Predictions do not come back all at once, they trickle in as each student works.
The master drains them with the built-in
received_some_asked_data(processing_fcn="collect_predictions"), which calls one
method of the master's once per sample received:
def collect_predictions(self, agent, props, data, data_tag):
if self._current_exam is None or agent not in self._current_exam["predictions_by_student"]:
return
prediction = int(data.item()) if isinstance(data, torch.Tensor) else int(data)
self._current_exam["predictions_by_student"][agent].append(prediction)
collect_predictions is a plain method, not an action, the guard invokes it as
fcn(agent, props, data, data_tag) for each incoming sample. agent is the student
that sent it, data is its predicted class index (the student's processor already
applied argmax, so it arrives as a bare class index, not raw logits). The master
just files each prediction under the right student. The behavior stays on this state,
looping the guard, until all_sent_completed(action_name="process") confirms the
fan-out is over; then score_exam runs _compute_accuracy per student and prints
the table, green where a class is remembered, red where it is forgotten. That table
is the whole point: a continual-learning result you can see.
Fan-out, then wait
The pattern to take away: one send to a list of targets, then a
completion guard (all_sent_completed to wait for the group, or
received_some_asked_data while results stream in and you process each one). The
master never loops over students and never holds a list of futures; it sends
once, parks on a guard, and the guard tells it when the whole group is done. The
guards themselves are catalogued in the
action catalogue.
Coordinating peers: social_learning¶
social_learning is the most advanced teaching world, and it shows the master's
two remaining tools: callbacks and pub/sub peer teaching. Its behavior
(teacher.json) runs a full round per lecture: teach the group, exam it, find the
best student, and then, instead of teaching the next round itself, hand the
teaching over to that best student. The states that matter here are begin_teaching
to exam_time to compare_time to best_found to best_teaching.
Picking who leads the round¶
The master does not nominate a best student by hand, it scores the class and lets
the result choose. After the exam (exam_time sends process with
wait_completion=True and parks until the group is done), the compare_time state
runs the built-in compare_eval(cmp="min", thres=0.5). That action looks at the
exam errors the master collected and keeps only the agents that pass the threshold,
leaving them in self._valid_cmp_agents. In this world the comparison is tuned to
yield a single winner, so _valid_cmp_agents ends up holding exactly one peer id:
the best student. If nobody clears the bar, compare_time falls through to
best_not_found and the round skips the social phase entirely. Selection is just
scoring plus a threshold, no manual choice.
Two requests, back to back¶
With a winner in hand, the best_found state runs social_round. This is the heart
of the chapter: the master sets up the best student to teach the others, peer to
peer, then steps back and waits. Walk it in order:
await self.find_agents("student")
all_students = copy.deepcopy(self._engaged_agents)
not_isolated_students = copy.deepcopy(self._found_agents)
teacher = self.get_peer_id()
best_student = next(iter(self._valid_cmp_agents)) # the single winner
other_not_isolated = not_isolated_students - {best_student}
# 1) Engage exactly the audience for this round
await self.set_engaged_partner(other_not_isolated)
# 2) Find the best student's pub/sub stream and subscribe the others to it
streams = self.find_streams(best_student, "best_student_stream")
best_stream_hash = next(iter(streams), None)
await self.send_subscribe(stream_hashes=[best_stream_hash])
# 3) Ask the OTHER students to learn from the best student's stream
learn_interaction = await self._send(
action_name="learn_from_student", target=list(other_not_isolated),
streams={"stdin": [f"{best_student}:images"], "stdtar": [f"{best_student}:labels"]},
num_steps=self.get_unlabeled_steps(), callback="on_asked_done")
self._pending_asks.update(learn_interaction.target)
relay_uuid = learn_interaction.uuid
# 4) Ask the BEST student to label the master's unlabeled data, publishing under that uuid
gen_interaction = await self._send(
action_name="teach", target=best_student,
action_kwargs={"relay_uuid": relay_uuid},
streams={"stdin": [f"{teacher}:images@unlabeled"]},
num_steps=self.get_unlabeled_steps(), callback="on_asked_done")
self._pending_asks.update(gen_interaction.target)
- Sort the cast for this round.
find_agents("student")refreshes the discovered set._engaged_agentsis everyone in the room;_found_agentsis the non-isolated students (isolated ones never join peer teaching). The master subtracts the best student to get the audience,other_not_isolated, the students who will learn this round. - Engage just that audience.
set_engaged_partner(other_not_isolated)narrows the master's active partners to exactly the learners for this round, so the two sends that follow act on the right set and the gossip mesh links the right peers. - Wire the pub/sub link. The best student owns a
best_student_streamit only fills when nominated.find_streams(best_student, "best_student_stream")returns a{net-hash: stream}dict; that net-hash is the stream's unambiguous global address.send_subscribe(stream_hashes=[best_stream_hash])then tells the engaged audience to subscribe to it, so the best student's output reaches them over a standing channel, no per-sample request, no direct link (Chapter 7). - Ask the audience to learn (request A).
learn_from_studentpoints each listener'sstdin/stdtarat the best student's images and labels and runs forget_unlabeled_steps()steps. It is a dedicated action name (a thin wrapper over the built-inlearn) so that the next round's ordinarylearncannot accidentally re-trigger this one. The send returns anInteraction; the master stashes itstargets in_pending_asksand grabs itsuuidasrelay_uuid. - Ask the best student to teach (request B).
teachstreams the master's ownunlabeledimages into the best student, which runs its model and re-publishes each image plus its predicted label intobest_student_stream, underrelay_uuid, the same id the audience is reading from. That shared uuid is what lines the two requests up: the labels the best student publishes land exactly where the audience'slearn_from_studentis listening.
Two _send calls, two Interactions, two sets of targets dropped into
_pending_asks. The master has now set the whole round in motion without doing any
teaching itself.
Waiting on callbacks¶
The master must not start the next lecture until both requests above have
finished. It does that with callbacks, not a wait loop. Each _send named
callback="on_asked_done"; the interaction manager fires that callback once per
interaction, when all of that interaction's targets have reported (whether they
finished cleanly, timed out, or dropped):
@action
async def on_asked_done(self, interaction=None) -> bool:
if interaction is not None:
self._pending_asks.difference_update(interaction.target) # cross these off
return True
@action
async def all_asks_done(self) -> bool:
return len(self._pending_asks) == 0 # the guard
on_asked_done removes that interaction's targets from _pending_asks; the
best_teaching state loops on all_asks_done(), which is True only once
_pending_asks is empty, that is, once both the audience's learning and the best
student's teaching have completed. Only then does the behavior move on to the next
lecture. The two callbacks plus one guard replace any manual joining of the two
asynchronous requests.
No agent changed role here, the best student is simply asked to run a different action this round. Learning spreads sideways, between peers, with the master only selecting the winner, wiring the channel, issuing two requests, and waiting on a guard.
graph TD
M[Master] -->|compare_eval picks best| Best[Best student]
M -->|send_subscribe others to best| Sub[(pub/sub link)]
M -->|teach + relay_uuid| Best
M -->|learn_from_student| O[Other students]
Best -->|labels via pub/sub| O
O -->|callback: on_asked_done| M
Best -->|callback: on_asked_done| M
Changing roles mid-session¶
In social_learning the best student takes over the teaching without changing
role, the master just asks it to run a different action (teach) for one round.
That is the lighter-weight option, and often the right one: if the change is only "do
this other thing now", a different send is enough.
When a master does want to change who is what, it uses
set_role (or applies an agent's
suggest_role_to_world). Promotion in animal_school
(Chapter 8) is the simplest case; in larger worlds a master
can reshape the whole cast between rounds. The new role's behavior is shipped
automatically, the agent just starts running it. The distinction is worth keeping:
ask for a one-off action when the change is momentary, change the role when the
agent should permanently behave differently.
Rewarding the round's winner¶
Leading a session is not only sending work out; the master also records and rewards
what happened. After the social round, social_learning's manage_best_of_class
turns the best student's exam result into a stat and a badge:
@action
async def manage_best_of_class(self):
if len(self._valid_cmp_agents) > 0:
best_student_peer_id = next(iter(self._valid_cmp_agents)) # same winner
best_student_result = self._eval_results[best_student_peer_id]
if best_student_result >= 0:
self.stats.store_stat("best_exam_err_history", best_student_result,
group_key=best_student_peer_id, timestamp=...)
return await super().suggest_badges_to_world(
agent=best_student_peer_id, score=best_student_result,
badge_type="intermediate",
badge_description="Best student of a class, MNIST ...")
return True
It reuses the same _valid_cmp_agents winner the social round used, reads that
student's score from _eval_results (filled by the master's evaluate action),
stores a per-student stat keyed by peer id, and then suggests a badge to the
world. The master proposes; the world awards. Scoring, stat-keeping, and badges are
how a session leaves a durable record, covered in full in
Chapter 10.
When one master is not enough: tiers¶
Everything above assumes a single master leading every peer. That works up to a point, but a world with hundreds of participants, or several independent activities at once, needs more than one leader. Nothing limits a world to one master, so you give it a small hierarchy of leading roles.
The flagship turing world does exactly this. Its assign_role reads a list of
manager names from a file and hands out three tiers:
# turing/src/world.py (shape)
if unaid in self.hotel_managers: return "hotel_manager" # top: load-balances guests onto floors
if unaid in self.floor_managers: return "floor_manager" # mid: runs one floor's activity
return "guest" # everyone else
The pattern is general:
- Several leading roles, not one. Each is just a role
with its own behavior; a world can define
hotel_manager,floor_manager, and more. - Leaders come from a list. turing keeps the manager names in a
managers.txtit can reload while the world runs, so you add or remove leaders without restarting. - Each tier owns a slice. The top leader load-balances newcomers onto mid leaders; each mid leader then runs the single-master loop from this chapter, scoped to its own slice.
Reach for tiers when one master would be a bottleneck: many simultaneous participants, or independent sub-activities that should run in parallel.
The master's toolkit, in one place¶
| Move | How |
|---|---|
| Know who is in the room | get_agents_by_role / find_agents, refreshed on on_tick |
| Engage the right peers | send_engage / set_engaged_partner |
| Lead a whole group | one send with target=[many] |
| Wait for the group | all_sent_completed, received_some_asked_data |
| React on completion | callback= + a guard like all_asks_done() |
| Pick who leads next | evaluate then compare_eval (fills _valid_cmp_agents) |
| Teach peer to peer | subscribe / send_subscribe + a shared relay_uuid |
| Reshape the cast | set_role / suggest_role_to_world |
| Score and reward | evaluate, compare_eval, store_stat, suggest_badges_to_world |
That is the whole of leading a session. There is no pipeline and no central script: just a master sending, waiting, and adjusting, while a society of agents acts and learns. For the concept-level reference behind this chapter, see Learning and teaching and Roles and the world master.
Where next¶
-
Score it, run it locally and online, and the world gallery.
-
Revisit
callback,wait_completion, and multi-step sends. -
The completion guards a master waits on.