Introduction
ISOKANN.jl learns slow reaction coordinates (χ, "chi") as eigenfunctions of the Koopman operator from short trajectory data. This page covers the data model, the training loop, and the handful of types you will use most.
The data model
ISOKANN needs two kinds of samples:
xs: a matrix of shape(d, n)—nstarting points in ad-dimensional state or feature space.ys: a 3-D array of shape(d, nk, n)— for each starting point inxs,nkKoopman replicas obtained by propagating the dynamics with a common lag time.
These are used to estimate the action of the Koopman operator via a Monte-Carlo approximation,
\[[K\chi](x) = \mathbb{E}_{X_0 = x} [\chi(X_t)] \approx \frac{1}{N} \sum_{i=1}^{N} \chi(y_i).\]
The starting points can be sampled arbitrarily (they do not need to come from the stationary distribution), but the learned χ can only represent the part of the dynamics covered by xs. The ys must be propagated from the corresponding xs under the system's dynamics with a common lag time.
Minimal example
For raw (xs, ys) data you can construct an Iso directly:
using ISOKANN
xs = rand(3, 100)
ys = rand(3, 10, 100)
iso = Iso((xs, ys))
run!(iso, 100) # 100 ISOKANN iterations
chis(iso) # learned χ valuesUsing a simulation
For adaptive sampling and for bookkeeping the link between coordinates and network features, wrap the dynamics in a SimulationData:
using ISOKANN
sim = Doublewell() # 1D analytical potential
data = SimulationData(sim, 100, 10) # 100 starting points, 10 Koopman samples each
iso = Iso(data)
run!(iso, 100)
chis(iso)Built-in simulations include Doublewell, Triplewell, MuellerBrown, Diffusion (general Langevin dynamics) and OpenMMSimulation (molecular dynamics via OpenMM). You can also plug in your own dynamics by implementing the IsoSimulation interface (at minimum propagate) or provide externally-generated trajectories via ExternalSimulation / data_from_trajectory.
OpenMM quick start
using ISOKANN
sim = OpenMMSimulation() # bundled alanine dipeptide
data = SimulationData(sim, 100, 5)
iso = Iso(data)
run!(iso, 100) # or: run_kde!(iso; generations=5, iter=100)
plot_training(iso)
scatter_ramachandran(iso)
rates(iso) # macro-state transition rates
save_reactive_path(iso, out="path.pdb")Classical ISOKANN on precomputed trajectories
If you already have trajectory data and just want to run ISOKANN on it (no new sampling), turn your trajectories into (xs, ys) pairs with data_from_trajectory (single trajectory) or data_from_trajectories (many). Both support stride, lag, and reverse (treat the trajectory as time-reversible, which doubles the data):
# single trajectory of shape (d, nframes)
data = SimulationData(sim, data_from_trajectory(xs; lag=10, stride=1))
# many trajectories of varying length
data = SimulationData(sim, data_from_trajectories([xs1, xs2, ...]; lag=10))
# or, if you have no `sim` and only raw arrays
data = SimulationData(xs, ys)For OpenMM trajectories on disk, load_trajectory / readchemfile return a (d, nframes) coordinate matrix ready to pass into data_from_trajectory.
What the pieces do
Isois the main training object. It holds the neural network, the optimizer, training data, the target transform, and loss/log history.SimulationDatawraps anIsoSimulationtogether with its sampledcoordsand the derivedfeaturesthat are actually fed into the network. The mappingcoords → featuresis defined by the simulation's featurizer (forOpenMMSimulationthe default is pairwise distances, selectable between all atoms, locally close atoms, or only C-α atoms; see its docstring).- Training is driven by
run!. Each iteration computes a target fromE[χ(ys)]via a configurable transform (TransformShiftscalefor 1-D χ,TransformISA/TransformPseudoInvfor multi-dim χ) and takes a batch of gradient steps against‖model(xs) − target‖². - Adaptive sampling is available via
run_kde!, which interleaves KDE-based resampling along the current χ with training. (The olderrunadaptive!is a deprecated alias.) - Results:
chisreturns the learned χ values,ratesreturns the macro-state transition rate matrix,save_reactive_pathwrites a PDB of the extracted reaction path.
Accessing data
SimulationData lets you reach either view of the samples:
coords/propcoords— rawxs/yscoordinates.features/propfeatures— featurizedxs/ysactually passed to the network.
Source layout
Core:
src/iso.jl— theIsotype and the training loop (run!,run_kde!)src/simulation.jl—SimulationDataand theIsoSimulationinterfacesrc/data.jl— low-level operations on the(xs, ys)tuplesrc/isotarget.jl— ISOKANN target transforms (1-D and multi-dim)src/models.jl— default networks and optimizers
Simulators:
src/simulators/langevin.jl— analytical potentials (Doublewell, Triplewell, MuellerBrown) integrated via StochasticDiffEqsrc/simulators/openmm.jl— OpenMM wrapper (through PythonCall)src/simulators/bridge.jl— guided Langevin bridges toward target χ valuessrc/simulators/metadynamics.jl— draft metadynamics implementation
Utilities:
src/molutils.jl— alignment, dihedrals, and other molecular helperssrc/pairdists.jl— pairwise-distance featuressrc/plots.jl— training / χ plotssrc/subsample.jl— stratified and KDE-based subsamplingsrc/reactionpath.jl,src/reactionpath2.jl— reaction-path extraction