4 · Behaviors as state machines¶
Build worlds · Chapter 4 of 12 · Path home
A role is a name; a behavior is what that name actually does, step by step.
In UNaIVERSE a behavior is a hybrid state machine (HSM):
a small graph of states connected by transitions, where each transition
runs an action. You build one per role inside
create_behav_files, and the world ships it to whoever takes that role.
Why a state machine
Because behavior over time is naturally "I am waiting, then I connect, then I am ready, then I respond, then back to ready." States and transitions capture exactly that, and the framework runs the loop for you so you never write an event loop by hand.
States¶
A state is a named condition the agent can be in. You add one with
add_state:
behav.add_state("init", blocking=True)
behav.add_state("ready", action="check_messages",
args={"max_silence_seconds": 25.0})
Read this as two declarations:
- The first line registers a state called
initand marks it blocking. The first state you add also becomes the machine's initial state automatically, soinitis where every agent that takes this role starts. - The second registers a state called
readythat carries an in-state action: while the agent sits inready, the framework keeps callingcheck_messages(with theargsyou gave) on every clock tick. The state action is the "what I do while I wait here", distinct from the transition actions that move the agent out of the state.
Three things are happening, and each matters at runtime:
- The name is yours (
init,ready,waiting). It is only a label; nothing about the name is magic. - The optional in-state action (
action=...,args=...) runs while the agent sits in that state. Herecheck_messageskeeps running while the agent isready, polling the room. Theargsdict is passed to it as keyword arguments, the same channel a transition uses (see Chapter 5). blockingcontrols pacing, and it is the one flag people misread. A blocking state (blocking=True, the default) makes theactloop stop once the agent lands there: the machine runs its inner action and then yields, waiting for the next tick or event before it tries to move on. A non-blocking state (blocking=False) does the opposite, the machine flows straight through it in the same cycle, immediately attempting the next transition. You mix the two to get the rhythm you want: blocking where the agent should pause and react to the outside world, non-blocking for bookkeeping states it should pass through without pausing.
What blocking is not
blocking=True does not mean "freeze the agent" or "run synchronously". The
agent's clock keeps ticking; a blocking state simply means the per-cycle
transition loop ends there instead of cascading onward. A long synchronous call
inside an action is what actually freezes an agent, see the "Do not block the
tick" warning in Chapter 5.
Exactly one state is active at a time.
Transitions¶
A transition is a rule: from one state, run an action, and to another
state if that action succeeds. You add one with add_transit:
behav.add_transit("init", "waiting_handshake",
action="connect_to_broadcaster", args={"role": "broadcaster"},
msg="🔗 Connecting to the room...")
Read the call argument by argument:
"init"is the from-state and"waiting_handshake"is the to-state. If either state has not been declared yet,add_transitcreates it for you, so you can sketch a behavior purely in transitions and let the states appear implicitly.action="connect_to_broadcaster"names the method that guards the edge. On each tick while the agent is ininit, the framework calls it.args={"role": "broadcaster"}is passed to that method as keyword arguments.- The action's return value decides the edge: if
connect_to_broadcasterreturnsTrue, the transition fires and the agent moves towaiting_handshake; if it returnsFalse, nothing moves and the action is simply retried next tick (this is the success/retry contract from Chapter 5). msg="🔗 Connecting to the room..."is an optional human-readable label printed when the transition fires, purely for following along in the logs.
The takeaway: a transition is "from here, keep calling this action until it
succeeds, then go there." You never write the polling loop; returning False from
the action is the wait.
Two flags shape when a transition is eligible, and both are worth getting exactly right because they decide who drives the edge:
ready(defaultTrue) sets the action's inner-readiness. Aready=Trueaction is inner-ready: the policy may dispatch it autonomously, on the agent's own initiative, every tick. A transition withready=Falseis outer-only: it is never fired proactively, it sits dormant until an external interaction request arrives for it, and only then does it run, serving that request. This is how an agent exposes a capability for others to invoke rather than something it does on its own. Thebroadcaster's relay loop is exactly this: a singleready=Falseself-loop that only acts when auserpushes a message to be broadcast.avoid_changing_ready: keep the readiness state as-is across this transition. By default, an action whose signature declares aninteraction/requesterparameter is automatically forced to outer-only (because such actions are meant to answer requests). Passingavoid_changing_ready=Truesuppresses that override, you use it when an action stages work for the next step and you want it to stay proactively dispatchable rather than wait for a request.
A state can have several outgoing transitions. Which one fires is decided by the policy.
The policy¶
When more than one transition out of the current state could fire, a policy picks exactly one. The default policy walks the feasible actions in a fixed order and returns the first match:
- High-priority actions first. Any transition flagged
high_priority=Truewins over everything else (rarely needed; off by default). - Then actions with a pending request. If an outer-only (
ready=False) transition has an external interaction waiting, it is served next, oldest request first. This is why an agent answers what it was asked before doing its own thing. - Then the first inner-ready transition. Among the remaining
ready=Truetransitions, the first one (in the order you added them) that is currently ready fires.
So ordering matters: when a state has several proactive transitions, the one you
add_transit first is tried first. You rarely need to change any of this; when you
do, you pass your own policy to the HSM or set a policy_filter on the agent. For
everything in this path, the default is what you want.
Pacing actions with a policy_filter
A policy_filter is a callable on the Agent (the
policy_filter constructor argument) that can veto or delay the action the
policy just chose. The most common one ships with UNaIVERSE,
PolicyFilterDelayAction, which enforces a minimum wait before a named action
fires. The flagship turing world uses it so its chat bots feel human and never
reply faster than about a second:
from unaiverse.utils.misc import PolicyFilterDelayAction
agent = Agent(proc=my_model, proc_inputs=["text"], proc_outputs=["text"],
policy_filter=PolicyFilterDelayAction({"process"}, wait=1.,
add_random_up_to=1.))
{"process"} is the set of action names to pace; wait is the minimum seconds
before each fires again; add_random_up_to adds up to that many seconds of random
jitter. Unlike a transition's per-edge delay (which belongs to one edge), a
policy_filter is cross-cutting: it applies to an action by name wherever
it appears.
Teleports: rules that apply from anywhere¶
Sometimes you need "no matter what state I'm in, if X happens, go to Y", a timeout, a disconnect, a reset. That is a teleport: a transition that can fire from any state.
# If the partner vanished, no matter where we are, go home and wait.
behav.add_global_teleport("init", action="disconnected", args={"delay": 5.0})
Notice the argument order: the first argument is the destination
(add_global_teleport(to_state, action, ...)), because there is no single source,
the call wires the same transition from every state currently registered, all at
once. So this one line means: "from anywhere, if disconnected returns True
(waiting at least delay=5.0 seconds first), go to init." Under the hood a
teleport is an ordinary transition flagged teleport=True; it is only hidden in
the diagram to keep the visual graph readable, but it is selected by the policy and
runs exactly like any other edge. If you want a teleport out of one specific state
instead of all of them, pass teleport=True to a normal add_transit.
This is how the teaching worlds recover when a student drops mid-lesson: a single
global teleport sends agents back to init on disconnected from anywhere, so you
do not have to wire a "what if they leave" edge onto every state by hand.
The chat world spells it out per state instead
A global teleport is the concise way to express "from anywhere, on X, go to Y".
The chat world you will read below makes the same recovery explicit with
two ordinary transitions, waiting_handshake to init and ready to init, both on
disconnected, rather than one teleport. Both styles are correct; the teleport
just collapses the repetition.
Timing controls¶
Transitions accept three timing arguments so you can pace and bound them. Each is in
seconds, each defaults to 0. (meaning "no constraint"), and each does something
different:
delay: a minimum dwell time. The transition is not even considered by the policy until at least this many seconds have elapsed since the agent entered the source state. Use it to hold an agent in a state for a beat, for example, waiting5.0seconds before treating a partner as trulydisconnected.timeout: a patience limit on retrying. If the action keeps returningFalse, the machine keeps retrying it tick after tick, untiltimeoutseconds have accumulated, at which point it gives up on this transition. Use it so a stuck handshake does not wait forever.total_time: a hard wall-clock budget for the whole transition once it starts, useful for bounding a multi-step action.
Read this as: in waiting, wait at least 1 second before even trying
connected; then retry it until it returns True, but if 30 seconds pass
without success, abandon the edge. So delay gates the start, timeout bounds the
retrying, and total_time caps the total run. These three names are consumed by
the transition itself, they are stripped before your action sees them, so do not add
parameters of the same name to your method.
Wildcards: one behavior, many agents¶
A behavior file is a template. Placeholders written in angle brackets, like
<agent>, <world>, or <stream_name>, are filled in at runtime so the same
file adapts to each agent and each session:
# In the teaching worlds, a teacher's playlist references the student's streams:
behav.add_transit("engagement_complete", "begin_teaching",
action="set_pref_streams", args={"net_hashes": ["<agent>:recorded1"]})
When you write this transition you do not yet know which student the teacher
will pair with, so you cannot hard-code a peer. Instead you leave the placeholder
<agent> inside the argument. The transition is saved with the placeholder intact;
at runtime, once the teacher has actually engaged a student, the machine substitutes
the real partner for <agent>, so args becomes the concrete "<real-peer>:recorded1"
before the action runs. The behavior file ships generic and specialises itself per
session.
You fill placeholders with the HSM's set_wildcards({...}) / update_wildcard(...),
or, on an agent, add_behav_wildcard("<stream_name>", "animal_stream"). This is what
lets one teacher.json teach any student, and one user.json connect to
whichever broadcaster happens to be present.
Reusing templates¶
You do not have to build every behavior from scratch. Common patterns ship as reusable behavior JSON you load and fill with wildcards:
engage_by_role: find and engage agents of a given role.service_requester/service_provider: the ask/answer pair theinfo_extractionworld uses.listening_to_teacher: the student side of the teaching worlds.
You snap these together and override the parts that differ, which is why a
teaching world's create_behav_files is short despite a rich behavior.
Saving, and the world ships it¶
Each behavior ends with a save, writing the <role>.json the world hands out to
whoever takes that role:
save serialises the whole machine, states, transitions, wildcards, the welcome
message, into one JSON file. The only_if_changed=dummy_agent argument is a small
convenience: it rewrites the file only if the behavior actually differs from what
is already on disk (checked against the dummy agent the machine was built with), so
re-running create_behav_files does not churn unchanged files.
Read one for real: the chat user¶
Putting it together, here is the user role from the chat world, the whole
loop a chat member runs, taken verbatim from create_behav_files:
dummy_agent = UserAgent(proc=None)
behav = HybridStateMachine(dummy_agent)
behav.set_welcome_message(welcome_msg)
behav.set_role("user")
behav.add_state("init", blocking=True)
behav.add_state("waiting_handshake", blocking=False)
behav.add_state("message_sent", blocking=False)
behav.add_state("ready", action="check_messages",
args={"max_silence_seconds": 25.0, "talk_probability": 0.01, "history_len": 3},
msg="👍 Ready!")
behav.add_transit("init", "waiting_handshake",
action="connect_to_broadcaster", args={"role": "broadcaster"},
msg="🔗 Connecting to the room...")
behav.add_transit("waiting_handshake", "ready",
action="connected", args={"handshake_completed": True})
behav.add_transit("waiting_handshake", "init", action="disconnected", args={"delay": 5.0})
behav.add_transit("ready", "init", action="disconnected", args={"delay": 5.0})
behav.add_transit("ready", "message_sent",
action="generate_and_send", args={"samples": 1},
ready=True, avoid_changing_ready=True)
behav.add_transit("message_sent", "ready", action="nop")
behav.save(os.path.join(self.world_folder, "user.json"), only_if_changed=dummy_agent)
The first three lines set up the machine: a HybridStateMachine is built around a
dummy UserAgent (created with proc=None, no model, no network) whose only job
is to let the machine verify at build time that every action name below really
exists on the user class. set_welcome_message is the greeting logged the first
time the agent reaches its initial state; set_role("user") stamps the file. Now walk
the four states in the order a chat member lives them.
init (blocking). The starting state, the first one added, so the machine begins
here. It is blocking, so the agent pauses here and reacts rather than racing onward.
Its single outgoing transition runs connect_to_broadcaster(role="broadcaster"),
which locates the local processor stream and dials the room's broadcaster. While that
is still in progress the action returns False and is retried; when the connection is
made it returns True and the agent advances to waiting_handshake, logging
"🔗 Connecting to the room...".
waiting_handshake (non-blocking). A short staging state with two outgoing
edges, and this is where ordering and the policy meet:
connected(handshake_completed=True)toready. This is the happy path: it returnsTrueonce the connection has fully settled, sending the agent toready.disconnected(delay=5.0)toinit. The recovery path:delay=5.0means it is not even considered for the first 5 seconds, and if the partner has actually dropped it sends the agent back toinitto start over.
Both are proactive (ready=True), so the policy takes the first ready one,
connected is added first and wins whenever the handshake is succeeding; only if
it keeps failing does the disconnected edge get its chance. Because the state is
non-blocking, the agent does not linger here once an edge fires.
ready (in-state action check_messages). The heart of the loop, and the only
state with a long-running in-state action: while the agent sits here, the
framework calls check_messages every tick with max_silence_seconds=25.0,
talk_probability=0.01, history_len=3. That action polls the broadcaster's stream
and the agent's own messages, keeps a short rolling history, and decides whether
there is anything worth saying, if so it stages a prompt in stdin for the next
step. The msg="👍 Ready!" is printed once on entry. ready has three outgoing
edges:
generate_and_send(samples=1)tomessage_sent. The proactive reply path. Note its two flags:ready=Truekeeps it inner-ready (the agent fires it on its own), andavoid_changing_ready=Trueprevents the automatic "force outer-only" override, so the agent keeps the initiative to speak rather than waiting to be asked. Whencheck_messageshas staged something, this action runs the model on it and forwards the result to the broadcaster.disconnected(delay=5.0)toinit. The same recovery edge as before: if the room vanishes, go home and reconnect.- (The
check_messagesstate action keeps running underneath all of this, it is not a transition, it is the work the agent does while inready.)
message_sent (non-blocking) to ready via nop. After a message goes out the
agent passes through message_sent and immediately runs nop, a built-in no-op that
simply returns True, to loop straight back to ready. Because message_sent is
non-blocking, this is a clean "reset the cycle" hop with no pause, leaving the agent
back in ready, watching the room again.
So the life of a chat member is: init dials the broadcaster, waiting_handshake
confirms the connection (or bails to init), ready watches the room and, when
moved to speak, runs generate_and_send, then message_sent to ready closes
the loop, with a disconnected escape hatch on the two live states. In
Chapter 7 we follow the actual data through this exact loop end
to end.
Where next¶
-
The transitions above call actions; now write your own.
-
The reference for states, transitions, wildcards, and saving/loading.
-
See this exact
userloop carry real messages.