Skip to content

Tooling

Four verbs, each a distinct job, each writing a self-describing run-dir: manifest.toml (git SHA + seeds + versions) · CSVs · figures/*.png in the house palette · a README.md callout with the headline result.

VerbJobRun-dir
benchmarka roster of nodes across a task grid — rank and compare, with baseline-relative statisticsbench/runs/<UTCstamp>_<shortgit>_<id>/
profileone node, in depth — the full analytic suite + a behaviour GIF per taskprofile/runs/<node>/<UTCstamp>_<shortgit>_<id>/
sweepperturb parameter axes, measure signatures per cellsweeps/<id>/
ablatea config-free preset sweep: baseline vs each registered ablationsweeps/ablate_<node>_<task>/

benchmark and profile timestamp each run (<UTCstamp>_<shortgit>_<id>, so repeats never collide). sweep and ablate instead key the run-dir on the sweep id — a re-run with the same id resumes/overwrites in place (completed cells are skipped), which is why sweeps/ is not timestamped.

The commands

Every command is the real entrypoint. benchmark and profile are their own directories with a local project; sweep and ablate are the same sweep/run.jl (ablate is a subcommand, not a separate directory).

Terminal window
# benchmark — from bench/, roster across a task grid
cd bench && julia --project=. run.jl --neurons falandays_base,compartmental_structured --tasks wall,pong
# profile — from profile/, one node in depth
cd profile && julia --project=. run.jl falandays_base
# sweep — from the repo root, a TOML config
julia --project=. sweep/run.jl configs/sweep_falandays_wall.toml
# ablate — same runner, a subcommand: NODE TASK, no config needed
julia --project=. sweep/run.jl ablate falandays_base wall

Discover a sweep’s tunable axes before writing a config:

Terminal window
julia --project=. sweep/run.jl --list-axes --node falandays_base --task wall

Authoring a benchmark run

bench reads bench/configs/core.toml (or --config <path>; flags like --neurons / --tasks / --no-gifs override it). The config sets the roster (neurons = [] means all registered variants), the task grid, n_trials, the baseline every table is scored against, and a [prep] block that encodes the fairness rule:

neurons = [] # empty = all registered variants
tasks = ["wall", "tracking", "pong", "cartpole", "cartpole_swingup"]
n_trials = 20
baseline = "falandays_base"
[prep]
# falandays* default to "untrained" (seeded wiring + online plasticity);
# compartmental* default to "trained" (untrained non-plastic weights aren't a
# meaningful benchmark). A cell needing a trained genome that has none falls
# back to untrained and is flagged "trained-required-but-untrained".

Source: bench/README.md; bench/configs/core.toml; bench/run.jl.

Sweep — “what makes it work, how it breaks”

A sweep perturbs the parameters that shape a run and records the analytic signatures per cell, so you can see performance and criticality against the knob.

[sweep]
id = "falandays_wall_perturb" # names the run-dir: sweeps/falandays_wall_perturb/
mode = "one_at_a_time" # default: each axis varied alone (Σ of axis lengths)
# "factorial" = full cartesian product
seeds = [0, 1, 2, 3]
max_cells = 200 # cost guard; the preview shows cells × seeds rollouts
[baseline] # the canonical setup every axis perturbs around
node = "falandays_base"
task = "wall"
N = 100
ticks = 2000
[axes] # namespaced parameter path -> values
"node.threshold_mult" = [1.5, 1.75, 2.0, 2.25, 2.5]
"node.lrate_targ" = [0.0, 0.005, 0.01, 0.02]
"env.lam" = [0.5, 1.0, 2.0]
"ablation" = ["none", "freeze_plasticity", "zero_recurrent"]
[analytics]
measures = ["sigma_mr", "spectral_radius", "liveness"]

Axis namespaces: node.*, env.*, drive.*, task.*, ablation, seed — routed into the real simulate kwargs, validated up front (a wrong or inapplicable axis is a clear “did you mean…” error, not a silent no-op). A failing cell records an error and the sweep continues.

Outputs: results.csv (one row per cell: axis × value × score + each measure) · per-axis breakdown figures (score, σ, liveness vs the knob — with σ\sigma and ρ(W)\rho(W) shown together) · a README.md callout (best value / breakdown point / regime flip) · cells/cell_NNN/ holding each cell’s metrics.csv and manifest. Behaviour GIFs are opt-in: add a [capture] block (group = …) to record a representative GIF per cell, otherwise the numeric metrics are the whole output.

Source: src/run/Sweep.jl (run_sweep, ablate, sweepable_axes); CLI sweep/run.jl; examples in configs/sweep_*.toml.