

\[\mathrm{PE}(\mathbf{z}_0) = \sigma_\epsilon^2 + \mathrm{Bias}^2\!\left(\hat{f}(\mathbf{z}_0)\right) + \mathrm{Var}\!\left(\hat{f}(\mathbf{z}_0)\right)\]

Question: Is this always true?
numpy.linalg.lstsq) return the minimum-norm solution: \[\hat{\boldsymbol{\theta}}^+ = \mathbf{X}^\top (\mathbf{X}\mathbf{X}^\top)^{-1} \mathbf{y}\]| Analogy | Binding | Machine Learning |
|---|---|---|
| Parameters | \((k_\text{on}, k_\text{off})\) | Weights \(\boldsymbol{\theta}\) |
| Constraint | \(k_\text{off}/k_\text{on} = K_d\) | \(\mathbf{X}\boldsymbol{\theta} = \mathbf{y}\) |
| Prediction | \(\theta = [L]/(K_d + [L])\) | \(\hat{y} = \mathbf{x}^\top \boldsymbol{\theta}\) |
| Solution space | 1-D manifold | Affine subspace |



\[\hat{\boldsymbol{\theta}}^+ = \arg\min_{\boldsymbol{\theta}} \|\boldsymbol{\theta}\|_2 \quad \text{s.t.} \quad \mathbf{X}\boldsymbol{\theta} = \mathbf{y}\]
Theorem (Hornik et al., 1989; Cybenko, 1989): A feedforward network with a single hidden layer containing a sufficient number of neurons with a non-polynomial activation function can approximate any continuous function on a compact subset of \(\mathbb{R}^n\) to arbitrary precision.