Evolution
Some nodes learn online and run as-is; others have emergent weights with no plasticity and mean nothing until they are evolved. This page covers the optimizer, how fitness is computed, and how evolved genomes are stored and traced.
Evolution is experimental — the platform around the stable Falandays baseline,
not part of it. :falandays_base with default 2021 constants is the settled,
authors-faithful model; evolving its 7 control parameters is an optional experiment.
The compartmental / CTRNN nodes are our own construction and must be evolved,
because their weights are the genotype — an untrained one is random noise.
Who needs evolving
| family | online plasticity | evolution |
|---|---|---|
Falandays base (:falandays_base, alias :falandays) | yes | optional — the settled baseline is the default 2021 constants |
Falandays variants (:falandays_noisy, :falandays_ablated, :falandays_hemispheric, :falandays_oosawa) | yes; target homeostasis off in :falandays_ablated | optional / experimental — useful perturbations, not the authors-faithful baseline |
Compartmental (:compartmental_dense, :compartmental_structured) | no | required — untrained weights are random; only an evolved genome is a fair test |
See Nodes — overview for the families and their default benchmark prep.
The optimizer
A hand-rolled separable CMA-ES (diagonal covariance), validated to ~1e-6 against pycma. It minimizes .
- Initialization —
find_alive_centroidsweeps random genomes and starts CMA from one that is alive on at least two seeds, avoiding a dead region of a non-plastic genome space. For Falandays models the default start vector ispack_params(FalandaysParams()). - Population — sep-CMA default for genome dim (about 20 for the 220-dim structured genome); raise it (e.g. 32) to explore more on short runs.
- Common random numbers — every candidate in a generation is evaluated on the same trial seeds, , so within-generation comparisons are fair.
Source: src/drivers/Evolve.jl (SepCMA, find_alive_centroid, EvolveRunner).
Fitness
Per candidate, per train task: run k_trials rollouts at distinct CRN seeds and take
the mean of their normalized scores. Then aggregate across tasks with the
aggregator:
:min(default) — worst-case over tasks; rewards a genome good on every task.:mean— average over tasks.
The per-seed step is always a mean; :min/:mean acts only across tasks. For
single-task training (the common case) the across-task step is a no-op, so fitness is
just the mean over the k_trials seeds. Normalized score is the task’s TaskSpec
floor/ceiling transform clamped to ; see
Contracts for the exact mapping and why scores are not comparable across tasks.
Training workflow & the genotype store
bench/train.jl evolves one (neuron, task) and writes a tagged store entry:
cd benchjulia --project=. -t 8 train.jl compartmental_structured wall \ --generations 20 --popsize 32 --k-trials 8 --N 200 --sigma0 2.5Output:
bench/genomes/<neuron>__<task>/genome.jld2— the evolved weight / parameter vector.bench/genomes/<neuron>__<task>/train_manifest.toml— git SHA, neuron, task, seed, generations, popsize,k_trials,N, ticks,sigma0,best_fitness, timestamp, and a content-hashtagidentifying the run.
The benchmark then loads these for :trained cells, copies the genome +
provenance into each cell’s output, and records prep = trained:<tag> — so every
reported number is traceable to the exact weights and the run that produced them. A cell
that needs training but has no stored genome is run untrained and flagged.
Source: bench/train.jl (the entrypoint), bench/src/Store.jl (genome + manifest store), bench/run.jl (trained-cell loading and flagging).
A worked readiness run
compartmental_structured, 20 generations, , fitness = mean over 8 CRN
trials (single task), N=200, alive-centroid init:
| task | evolved best fitness | untrained baseline |
|---|---|---|
| wall | 0.728 | ~0 |
| tracking | 0.523 | ~0 / negative |
| pong | 0.327 | ≈ floor (0.33) |
| cartpole | 0.043 | ~0 |
| cartpole_swingup | 0.109 | ~0 |
Reading it: the loop works — wall went from a dead 0 to a competent ~0.73 avoider, and tracking made real progress. But 20 generations on a 220-dim genome is a short probe — pong sits at its floor and cartpole / swing-up barely moved. Competent agents on the hard control tasks need hundreds of generations, and pong likely wants the author size N=500. The machinery is ready; the budget here was deliberately small.
Open design points
Flagged for a dedicated design pass:
- Statistical honesty — a trained cell currently uses one evolved genome over n eval-seeds (evaluation variance only). The fair version runs K independent evolution runs per cell (search variance) and reports the distribution.
- Specialist vs generalist — train per-
(neuron, task)vs one genome on a task suite. - Ergonomics —
train --all, training profiles (quick / standard / thorough), resumable, parallel. - Store management — list / inspect / supersede / best-of-K genomes.
- Co-evolving morphology — extend the genome with the bounded sensor/effector layout (see Receptors & Effectors for the receptor/effector contract).