Dynamical models

Author

Aaron Meyer (based on slides from Bruce Tidor)

Ordinary Differential Equations

  • ODE models are typically most useful when we already have an idea of the system components
    • As opposed to data-driven approaches when we don’t know how to connect the data
    • Incredibly powerful for making specific predictions about how a system works
  • Limits of these approaches:
    • Results can be extremely sensitive to missing components or model errors
    • Can quickly explode in complexity
    • May rely on variables that are impossible to measure
  • ODEs are extremely valuable
  • During fitting, ML models are essentially a dynamic process — we will return to this at the end of the lecture and show that optimizers like Adam are coupled ODE systems

Applications of ODE models

Molecular kinetics

Remember BE 110!

Let’s say we have two ligands that dimerize, then this dimer binds to a receptor as one unit:

\[ L_f + L_f \leftrightarrow L_D \]

\[ L_D + R_f \leftrightarrow R_b \]

If we want to know about how these species interact, we can model their behavior with the rate equations:

BOARD

  • \(\frac{dL_f}{dt} = -k_1 L_f^2 + k_{-1}L_D\)
  • \(\frac{dL_D}{dt} = k_1 L_f^2 - k_2 L_D R_f + k_{-2} R_b\)
  • \(\frac{dR_b}{dt} = k_2 L_D R_f - k_{-2} R_b\)
  • \(\frac{dR_f}{dt} = k_{-2} R_b - k_2 L_D R_f\)
  • What could we measure here?
    • How would that be implemented?
  • Function-wise what do we need here?

Pharmacokinetics

flowchart LR
    drugInjection("drug injection") --> C1
    
    subgraph CentralCompartment["central compartment"]
        C1["C<sub>1</sub><br/>V<sub>1</sub>"]
    end
    
    subgraph PeripheralCompartment["peripheral compartment"]
        C2["C<sub>2</sub><br/>V<sub>2</sub>"]
    end
    
    C1 -->|k1| C2
    C2 -->|k2| C1
    C1 -->|ke| Elimination

    style CentralCompartment rect
    style PeripheralCompartment rect

  • Copy to board
  • What does each compartment represent?
  • \(V_1 \frac{dC_1}{dt} = -k_e C_1 V_1 - k_1 C_1 V_1 + k_2 C_2 V_2\)
  • \(V_2 \frac{dC_2}{dt} = k_1 C_1 V_1 - k_2 C_2 V_2\)

Applications of ODE models: Pharmacokinetics

  • Central compartment corresponds to the plasma in the body.
    • \(V_1\) is the distribution volume of plasma in the body.
    • \(C_1\) is the concentration of drug in the plasma.
  • Peripheral compartment represents a group of organs that significantly take up the particular drug.
    • \(V_2\) is the volume of these group of organs.
    • \(C_2\) is the concentration of drug in the group of organs.

Applications of ODE models: Pharmacokinetics

  • \(k_e\) is the rate constant for clearance.
    • \(k_e C_1 V_1\) is the mass of drug/time that’s cleared.
  • \(k_1\) is the rate constant for mass transfer from the central to peripheral compartment.
    • \(k_1 C_1 V_1\) is the mass of drug/time that transfers from the central to peripheral compartment.
  • \(k_2\) is the rate constant for mass transfer from the peripheral to central compartment.
    • \(k_2 C_2 V_2\) is the mass of drug/time that transfers from the peripheral to the central compartment.

Applications of ODE models: Pharmacokinetics

  • Have a bolus i.v. injection
    • No drug in both compartments for \(t<0\)
    • Total dose \(D\) (μg) administered at once at \(t=0\)
    • Drug distribution occurs instantaneously in the central compartment.
      • Also get well-mixed instantaneously.
      • Concentration in central compartment at \(t=0\) is \(C_1(0) = D/V_1\) μg/mL
  • No chemical reactions in the compartment

Work out equations above. Going to learn a mathematical trick to solve!

Applications of ODE models: Population kinetics

Lotka-Volterra Equations

  • Prey population \(x\), predator population \(y\):

\[\dot{x} = \alpha x - \beta x y\]

\[\dot{y} = \delta x y - \gamma y\]

  • \(\alpha\): prey birth rate; \(\beta\): predation rate per encounter
  • \(\gamma\): predator death rate; \(\delta\): predator growth per prey consumed
  • Prey is x
  • Predator is y
  • \(\dot{x} = \alpha x - \beta x y\)
  • \(\dot{y} = \delta x y - \gamma y\)

Types of Analysis

Note about difference from other models we’ve covered

  • ODE models can be part of inference techniques just as elsewhere
  • But we often don’t have a symbolic expression of the answer
    • Have to simulate the model every time
    • Can only focus on the input-output we get from the black box
  • In this respect, what we do with ODE models will be very similar to what you could do with any computational simulation
  • If we have a symbolic integral, then fitting an ODE model to data is just non-linear least squares

Analytic vs Numerical Modeling

Analytic

  • Wider range of parameters
  • Avoid numerical problems
  • Physical intuition more direct
  • Often must simplify model

Numerical

  • Can handle complex models
  • Dependence on parameters & initial conditions
  • Physical insight may be difficult to extract
  • Convergence, numerical stability

Analytic vs Numerical Modeling

Reality often requires handling in between:

  • Use analytic treatment to study entire parameter space
  • Use numerical treatment to study interesting regions
  • Use both to handle complex behavior

Stability Analysis

  • Can solve for steady-states of a system \[\frac{dF}{dt} = 0\]
  • Results of this can be both stable or unstable points
    • With stable points, slope of \(\frac{dF}{dt}\) is negative
    • In multivariate case, this means eigenvalues of Jacobian are negative
  • Steady-state points aren’t necessarily realistic or feasible!
    • NNLSQ can solve for points
    • Only simulating system ensures they are accessible

Walk through 1D case of stability.

Linear vs. Nonlinear Systems

  • Linear models are easier to simulate and understand than non-linear
    • Linearity: If \(\mathbf{x}_1\) and \(\mathbf{x}_2\) are both solutions, then \(c_1 \mathbf{x}_1 + c_2 \mathbf{x}_2\) is also a solution
  • Linear systems tend to be separable (effective decoupling)
  • Non-linear systems exhibit interesting properties
  • Go through how linear systems come to a solution.
  • \(\frac{dx'}{dt} = c_1 x\)
  • What is this like if \(c_1\) is negative? Positive? Imaginary?
  • \(e^{i\theta} = \cos \theta + i \sin \theta\)

Linearity & Separability

Linear separability and superposition principles in dynamical systems

  • What is a Jacobian?
  • What is steady-state here?
  • Combined with above relationship, this means we can solve linear systems as a special case

Phase Portraits

\[ \dot{\vec{x}} = \vec{f}(\vec{x}) \]

Phase portrait showing trajectories in a dynamical system

Non-linear systems

  • No general analytic approach to finding trajectory
  • So, goal is to understand qualitative trajectory behavior

Features in Phase Portraits

Features of phase portraits including fixed points and closed orbits

  • Fixed Points (A, B, C). Steady states
  • Closed Orbits (D). Periodic solutions
  • Flow patterns in trajectory (A and C are similar to each other, different from B)
  • Stability of fixed points and closed orbits (A, B, and C are unstable, D is stable)

Solving a Set of Equations for Phase Portrait

  • Numerical computation
    • i.e., Runge-Kutta integration
  • Qualitative
    • Sufficient for some purposes
  • Analytic/symbolic
    • Elegant, though not always tractable

Step 1: Find the fixed points

Identification of fixed points where derivatives are zero

Step 2: Determine stability of fixed points

  • If the systems moves slightly away from each fixed point, will it return or will it move further away?
  • Another way to ask the same question is to ask whether, as time approaches infinity, does the system tend toward or away from a given stable point.
  • Note \(y\) solution must be of form:
    • \(y = y_0 e^{-t}\) (because \(\dot{y} = \frac{dy}{dt} = -y\))
    • So \(y \rightarrow 0\) for \(t \rightarrow \infty\)
  • Thus, \(\dot{x} = x + e^{-y}\) becomes \(\dot{x} \rightarrow x + 1\) for long times
    • This has exponentially growing solutions
    • Toward \(\infty\) for \(x > -1\) and \(-\infty\) for \(x < -1\)

Solution grows exponentially in at least one dimension, so it is unstable.

Step 3: Sketch nullclines

Nullclines are the sets of points for which \(\dot{x} = 0\) or \(\dot{y} = 0\), so flow is either horizontal or vertical.

Nullclines separating regions of horizontal and vertical flow

Step 4: Plot flow lines

Flow lines indicating system trajectories and saddle point behavior

  • Why do solutions partition the graph?
    • Uniqueness. To cross, you must be on the solution.
  • This is a non-linear version of a saddle point.
  • Can we identify saddle-like behavior in linearized version of system?

Existence & Uniqueness

Non-linear \(\dot{\mathbf{x}} = f(\mathbf{x})\) and given an initial condition.

  • Existence and uniqueness of solution guaranteed if \(f\) is continuously differentiable
  • Corollary: Trajectories do not intersect, because if they did, then there would be two solutions for the same initial condition at the crossing point

Linearization About Fixed Points

Linearization of a non-linear system around a fixed point

  • Think of the Jacobian as the way to convert position to velocity.

Solving Linearized Systems

Eigenvalue analysis for solving linearized dynamical systems

  • Third row is the definition of an eigenvalue.
  • So if we move in \(\vec{v}\) (remember that this is delta), \(\vec{A}\) has the same effect as \(\lambda I\).
  • At stable point: \(J \vec{x} = 0 = \dot{x}\)
  • Definition of an eigenvalue: \(J x = \lambda x = \dot{x}\)
    • Say this is positive?
    • Say this is negative?

Example

Example phase portrait of a linearized system

More Examples

Various types of fixed points in phase portraits

Classification of Fixed Points

Classification of fixed points based on eigenvalue properties

Eigenvalue Behavior Summary

Eigenvalues Fixed Point Type Behavior
Real, both negative Stable node Trajectories converge to fixed point
Real, both positive Unstable node Trajectories diverge from fixed point
Real, opposite signs Saddle point Stable in one direction, unstable in another
Complex, negative real part Stable spiral Oscillatory decay toward fixed point
Complex, positive real part Unstable spiral Oscillatory divergence from fixed point
Purely imaginary Center Neutral closed orbits (undamped oscillation)

Relevance for Nonlinear Dynamics

  • So, we have said that we can find fixed points of nonlinear dynamics, linearize about each fixed point, and characterize the dynamics about each fixed point in the non-linear model by the corresponding linear model.
  • Is this always true? Do the nonlinearities ever disturb this approach?
  • A theorem can be proven which states
    • That all the regions on the previous slide are “robust” (nodes, spirals, saddles) and correspond between linear and nonlinear models.
    • But that all the lines on the previous slide are “delicate” (centers, stars, degenerate nodes, non-isolated fixed points) and can have different behaviors in linear and non-linear models.

Bifurcations

  • The phase portraits we have been looking at describe the trajectory of the system for a given set of initial conditions. However, for “fixed” parameters (rate constants in eqns, for instance).
  • What we might like is a series of phase portraits corresponding to different sets of parameters.
  • Many will be qualitatively similar. The interesting ones will be where a small change of parameters creates a qualitative change in the phase portrait (bifurcations).
  • What we will find is that fixed points & closed orbits can be created/destroyed and stabilized/destabilized.

Saddle-Node Bifurcation

Diagram of a saddle-node bifurcation

  • Canonical example: \(\dot{x} = r + x^2\)
    • For \(r < 0\): two fixed points (one stable, one unstable)
    • At \(r = 0\): the two fixed points collide and annihilate
    • For \(r > 0\): no fixed points — system escapes to infinity

Hopf Bifurcation

  • A Hopf bifurcation occurs when a fixed point changes stability and a limit cycle (sustained oscillation) is born or destroyed
  • Eigenvalues of \(J\) cross the imaginary axis as a parameter varies
  • Common in biological oscillators: circadian rhythms, cell cycle checkpoints, neural firing
  • A stable limit cycle is born at a supercritical Hopf bifurcation
  • An unstable limit cycle is born at a subcritical Hopf bifurcation — the system can then jump discontinuously to a distant attractor

Example: Genetic Control Network

Genetic Control Network

The Griffith model is a canonical example of a saddle-node bifurcation giving rise to bistability — two coexisting stable steady states separated by an unstable one. Griffith (1971) model of genetic control:

  • x = protein concentration
  • y = mRNA concentration

Griffith (1971) genetic control network model diagram

Genetic Control Network

Biochemical version of a bistable switch:

  1. Only stable points are no protein and mRNA or a fixed composition
  2. If degradation rates too great, only stable point is origin

Phase portrait of a genetic control network acting as a bistable switch

  • How could we compare this to data?
  • Measure steady-state
  • Start at points and see bifurcation
  • Measure over time

Implementation

Testing

  • Many properties one can test
    • Mass balance
    • Changes upon parameter adjustment
  • Good to test these before and after integration

Go through example of mass balance at deriv and solution.

Python Packages

SciPy provides several interfaces for ODE solving:

Notes:

  • All can solve stiff and non-stiff equations.
  • scipy.integrate.ode exposes multiple backends; use set_integrator to select (e.g., 'vode', 'lsoda').
  • solve_ivp uses the method argument instead (default 'RK45'; use 'Radau' or 'BDF' for stiff systems).

Code Example: Pendulum Oscillation

The second order differential equation for the angle \(\theta\) of a pendulum acted on by gravity with friction can be written:

\[\theta''(t) + b*\theta'(t) + c*\sin(\theta(t)) = 0\]

where b and c are positive constants, and a prime (‘) denotes a derivative. To solve this equation with odeint, we must first convert it to a system of first order equations. By defining the angular velocity \(\omega(t) = \theta'(t)\), we obtain the system:

\[\theta'(t) = \omega(t)\]

\[\omega'(t) = -b*\omega(t) - c*\sin(\theta(t))\]

Pendulum Oscillation

Let y be the vector \([\theta, \omega]\). We implement this system in python as:

def pend(y, t, b, c):
    theta, omega = y
    dydt = [omega, -b*omega - c*np.sin(theta)]
    return dydt

We assume the constants are \(b = 0.25\) and \(c = 5.0\):

b, c = 0.25, 5.0

Pendulum Oscillation

For initial conditions, we assume the pendulum is nearly vertical with \(\theta(0) = \pi - 0.1\), and it initially at rest, so \(\omega(0) = 0\). Then the vector of initial conditions is

y0 = [np.pi - 0.1, 0.0]

We generate a solution 101 evenly spaced samples in the interval \(0 \leq t \leq 10\). So our array of times is:

t = np.linspace(0, 10, 101)

Pendulum Oscillation

Call odeint to generate the solution. To pass the parameters b and c to pend, we give them to odeint using the args argument.

from scipy.integrate import odeint
sol = odeint(pend, y0, t, args=(b, c))

The solution is an array with shape (101, 2). The first column is \(\theta(t)\), and the second is \(\omega(t)\). The following code plots both components.

Pendulum Oscillation

import matplotlib.pyplot as plt
plt.plot(t, sol[:, 0], 'b', label='theta(t)')
plt.plot(t, sol[:, 1], 'g', label='omega(t)')
plt.legend(loc='best')
plt.xlabel('t')
plt.grid()
plt.show()

Pendulum Oscillation

Plot of pendulum angle and angular velocity over time

Analysis Example: The Cell Cycle

Consider a simple model of cells transitioning between two phases of the cell cycle:

graph LR
    G1[G1 Phase] -->|α| G2S[G2/S Phase]
    G2S -->|β| G1
    G1 -->|γ| Death[Death]
    G2S -->|γ| Death
    
    style G1 fill:#f9d5e5,stroke:#333,stroke-width:2px
    style G2S fill:#a1e7e6,stroke:#333,stroke-width:2px
    style Death fill:#f8b195,stroke:#333,stroke-width:2px

Can this model oscillate under physically meaningful situations and, if so, when?

\[\frac{dG1}{dt} = - \alpha [G1] + 2 \beta [G2] - \gamma [G1]\]

\[\frac{dG2}{dt} = \alpha [G1] - \beta [G2] - \gamma [G2]\]

\[J = \begin{pmatrix} -\alpha - \gamma & 2\beta \\ \alpha & -\beta - \gamma \\ \end{pmatrix} \]

\[\lambda = \frac{1}{2} \left( -\alpha -\beta -2 \gamma \pm \sqrt{\alpha^2 + 6 \alpha \beta + \beta ^2} \right)\]

\(\alpha\) and/or \(\beta\) would have to be negative to oscillate.

Implementation Note: Stiff Systems

  • Very roughly, most ODE solvers take steps inversely proportional to the rate at which the state is changing
  • For systems where there are two processes operating on differing timescales, this can be problematic
    • If everything happens really fast, the system will come to equilibrium quickly
    • If everything is slow, you can take longer steps
  • A typical biological example: receptor–ligand binding equilibrates in milliseconds, while downstream transcriptional responses occur over hours — a \(10^6\)-fold difference in timescale
  • Stiff solvers additionally require the Jacobian matrix
    • This very roughly allows them to keep track of these differences in timescales
  • odeint can automatically find this for you
    • Sometimes it’s faster/better to provide this as parameter Dfun
  • For stiff problems with solve_ivp, prefer method='Radau' or method='BDF'

Fitting ODE Models to Data

  • ODE models have no closed-form solution in general — must simulate to evaluate the likelihood
  • Fitting is an outer optimization loop around numerical integration:
from scipy.integrate import solve_ivp
from scipy.optimize import minimize

def residuals(params):
    sol = solve_ivp(model, t_span, y0, args=(params,), t_eval=t_obs)
    return np.sum((sol.y.T - y_obs)**2)

result = minimize(residuals, params_init)
  • Can constrain to oscillatory solutions by checking eigenvalues of \(J\) at steady state
    • Oscillation requires eigenvalues with nonzero imaginary part

Machine Learning as a Dynamical System

Optimization as a Discrete-Time ODE

  • Every iterative ML optimizer is a discrete-time dynamical system
    • State vector: model parameters \(\theta_t\) (and, for adaptive optimizers, auxiliary moment estimates)
    • Update rule: \(\theta_{t+1} = \theta_t - \eta \, g_t\) (analogue of an Euler integration step)
  • The learning rate \(\eta\) plays the role of the integration step size
    • Too small: slow convergence (fine-grained but inefficient)
    • Too large: oscillation or divergence — exactly as in stiff ODE solvers
  • Gradient descent with step size η on L(θ) = θ²/2 gives θ_{t+1} = (1−η)θ_t
  • This is a linear map with discrete eigenvalue λ = 1−η
  • |λ| < 1 ↔︎ convergence; λ < 0 ↔︎ sign-alternating (oscillatory)
  • The stability condition 0 < η < 2/L (L = Lipschitz constant of gradient) mirrors the ODE condition that all eigenvalues have negative real part

Stability Analysis of Gradient Descent

For a quadratic loss \(L(\theta) = \frac{1}{2}\theta^T H \theta\), the gradient descent update is:

\[\theta_{t+1} = (I - \eta H)\,\theta_t\]

The discrete eigenvalues are \(\lambda_i = 1 - \eta h_i\), where \(h_i\) are eigenvalues of the Hessian \(H\).

Eigenvalue \(\lambda_i\) Fixed-Point Type Behavior
\(0 < \lambda < 1\) Stable node Monotone convergence
\(-1 < \lambda < 0\) Stable spiral Oscillatory convergence
\(\lambda < -1\) Unstable Divergence
\(\lambda = 0\) Dead-beat Converges in one step

Stability Analysis of Gradient Descent

Stability requires \(0 < \eta < \frac{2}{h_\text{max}}\) — controlled entirely by the largest Hessian eigenvalue.

Three regimes of gradient descent for different learning rates: monotone convergence, oscillatory convergence, and divergence

The Condition Number Problem

  • Realistic loss surfaces have Hessian eigenvalues spanning many orders of magnitude (ill-conditioned)
  • For \(L(\theta) = \frac{a}{2}\theta_1^2 + \frac{b}{2}\theta_2^2\) with \(a \gg b\):
    • Stability requires \(\eta < 2/a\) — set by the stiff direction
    • The slow direction \(\theta_2\) then requires \(\sim \frac{a}{b}\) more steps to converge

The Condition Number Problem

  • This is precisely the stiff ODE problem: fast and slow timescales coexist
    • Just as stiff ODE solvers need the Jacobian, adaptive optimizers track local curvature

Phase portraits of gradient descent on an ill-conditioned quadratic loss showing monotone, oscillatory, and diverging trajectories

  • Condition number a/b is directly analogous to the ratio of timescales in a stiff ODE
  • The same phenomenon that makes receptor-ligand binding + transcription hard to simulate also makes training deep networks hard
  • Preconditioning (Adam, L-BFGS) addresses this exactly as implicit ODE solvers do for stiff systems

Learning Rate as a Bifurcation Parameter

  • As \(\eta\) increases past \(1/h_\text{max}\), the fixed point changes from a stable node to a stable spiral — a bifurcation
  • Past \(2/h_\text{max}\): the fixed point becomes unstable — training diverges
  • This is structurally identical to the bifurcations we studied for ODE systems

Bifurcation diagram showing how final parameter values depend on learning rate, with stable, oscillatory, and divergent regimes

  • Learning rate schedules (warmup + decay) are a way to navigate the bifurcation landscape
  • Start with small η to find the basin, then increase (warmup) for speed, then decay as you approach the optimum
  • “Loss spikes” during training are transient exits from the basin of attraction — the system recovers when the trajectory re-enters

Adam as a Coupled Dynamical System

Adam maintains three state variables — exactly like an ODE system with three coupled equations:

\[m_t = \beta_1 m_{t-1} + (1-\beta_1)\,g_t \qquad \text{(1st moment — smoothed gradient)}\]

\[v_t = \beta_2 v_{t-1} + (1-\beta_2)\,g_t^2 \qquad \text{(2nd moment — smoothed curvature)}\]

\[\theta_t = \theta_{t-1} - \eta \, \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \varepsilon} \qquad \text{(parameter update)}\]

  • \(m_t\) and \(v_t\) are state variables with their own dynamics (exponential moving averages)
  • \(\beta_1, \beta_2\) are decay constants — analogous to rate constants in ODE models
  • The effective step size \(\eta / \sqrt{v_t}\) adapts to local curvature, making Adam more robust to ill-conditioning than SGD

Adam as a Coupled Dynamical System

Adam optimizer state variables over training: parameter trajectory, first moment, second moment, and effective step size

  • The bias correction 1/(1−β^t) is the transient response of the exponential filter — same as the early-time behavior of a first-order ODE with zero initial condition
  • Fixed points of the Adam update equations: m* = g, v = g*² → effective step = η (constant), which is SGD at the optimum
  • Momentum (β₁) is literally the same as inertia in a damped harmonic oscillator: it smooths the gradient signal, suppressing oscillations

Reading & Resources

Review Questions

  1. What is the steady-state solution to an ODE model? How do you solve for them?
  2. What is a phase-space diagram?
  3. What property of ODE models ensures that solutions can never cross in phase space?
  4. What is a Jacobian matrix? How do you calculate it from an ODE system?
  5. What are the eigenvalues and eigenvectors of a matrix?
  6. What is the behavior of an ODE system if its eigenvalues are all positive and real? How about negative and real?
  7. What is the behavior of an ODE system if all its eigenvalues are imaginary and positive? How about imaginary and negative?
  8. What does it mean if the eigenvalues of an ODE system give conflicting answers (e.g. one is real and negative, the other is imaginary)?
  9. How can you fit an ODE model to a series of measurements over time?
  10. How could you constrain an ODE system so that you only consider parameter values that have oscillation?
  11. Why does gradient descent on an ill-conditioned loss surface behave like a stiff ODE? What do adaptive optimizers like Adam do to address this?
  12. What is the analogue of a bifurcation in the context of optimizer dynamics? What happens at the stability boundary \(\eta = 2/h_\text{max}\)?