Sample size: n = 30
True variance: σ² = 1.0
CRB for μ: 0.033333
Empirical Var(μ̂): 0.033935
Ratio (should be ≈ 1.0): 1.018049
✓ Sample mean attains CRB: True
Cramér-Rao Bound
Normal Model: Known vs Unknown Variance
This notebook demonstrates the Cramér-Rao Bound (CRB) using a natural, realistic example: estimating the mean and variance of a normal distribution. We’ll explore two scenarios:
- Case A: Variance \sigma^2 is known → 1D parameter \theta = \mu
- Case B: Variance \sigma^2 is unknown → 2D parameter \theta = (\mu, \sigma^2)
Case A: \sigma^2 Known — Classical CRB Example
Model: X_1, \dots, X_n \sim \mathcal{N}(\mu, \sigma^2) with \sigma^2 known.
Parameter: \theta = \mu (1D)
Likelihood and Score
For one observation X, the log-likelihood is:
\log f(x; \mu) = -\frac{1}{2}\log(2\pi\sigma^2) - \frac{(x-\mu)^2}{2\sigma^2}
The score function is:
\frac{\partial}{\partial \mu} \log f(x; \mu) = \frac{x-\mu}{\sigma^2}
Fisher Information
The Fisher information is defined as the expected value of the squared score function:
I_1(\mu) = \mathcal{E} \left\{ \left(\frac{\partial}{\partial \mu} \log f(X; \mu)\right)^2 \right\} = \mathcal{E} \left\{ \left(\frac{X-\mu}{\sigma^2}\right)^2 \right\} = \frac{1}{\sigma^2}
For n i.i.d. samples: I_n(\mu) = \frac{n}{\sigma^2}
Cramer-Rao Bound
\mathrm{Var}(\hat{\mu}) \geq \frac{1}{I_n(\mu)} = \frac{\sigma^2}{n}
Estimator: Sample Mean
\hat{\mu} = \frac{1}{n}\sum_{i=1}^n X_i
Derivation of Properties
1. Unbiasedness:
Since X_i \sim \mathcal{N}(\mu, \sigma^2) for i = 1, \dots, n, we have \mathcal{E} \{ X_i \} = \mu:
\mathcal{E} \{ \hat{\mu} \} = \mathcal{E} \left\{ \frac{1}{n}\sum_{i=1}^n X_i \right\} = \frac{1}{n}\sum_{i=1}^n \mathcal{E} \{ X_i \} = \frac{1}{n} \cdot n\mu = \mu
2. Variance:
Since the X_i are i.i.d. with \mathrm{Var}(X_i) = \sigma^2:
\mathrm{Var}(\hat{\mu}) = \mathrm{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right) = \frac{1}{n^2}\sum_{i=1}^n \mathrm{Var}(X_i) = \frac{1}{n^2} \cdot n\sigma^2 = \frac{\sigma^2}{n}
This exactly equals the CRB: \frac{\sigma^2}{n} = \frac{1}{I_n(\mu)}.
Conclusion
\hat{\mu} is efficient — it achieves the CRB.
Numerical Example
Case B: \sigma^2 Unknown — 2D CRB
Model: X_1, \dots, X_n \sim \mathcal{N}(\mu, \sigma^2) with \sigma^2 unknown.
Parameter vector: \theta = (\mu, \sigma^2) (2D). Let v = \sigma^2 for clarity.
Likelihood and Score
For one observation X, the log-likelihood is:
\log f(x; \mu, v) = -\frac{1}{2}\log(2\pi v) - \frac{(x-\mu)^2}{2v}
The gradient components are:
\frac{\partial}{\partial \mu} \log f = \frac{x-\mu}{v}, \qquad \frac{\partial}{\partial v} \log f = -\frac{1}{2v} + \frac{(x-\mu)^2}{2v^2}
Fisher Information Matrix
The Fisher information matrix is defined as the expected value of the outer product of the score vector:
I_1(\mu, v) = \mathcal{E} \left\{ \nabla \log f(X; \mu, v) \cdot (\nabla \log f(X; \mu, v))^T \right\}
where the (i,j)-th element is:
[I_1(\mu, v)]_{ij} = \mathcal{E} \left\{ \frac{\partial}{\partial \theta_i} \log f(X; \mu, v) \cdot \frac{\partial}{\partial \theta_j} \log f(X; \mu, v) \right\}
For one sample, computing the expected values gives:
I_1(\mu, v) = \begin{pmatrix} 1/v & 0 \\ 0 & 1/(2v^2) \end{pmatrix}
For n samples: I_n(\mu, v) = n I_1(\mu, v), so the inverse is:
I_n(\mu, v)^{-1} = \frac{1}{n}\begin{pmatrix} v & 0 \\ 0 & 2v^2 \end{pmatrix}
Cramer-Rao Bounds
For any unbiased estimator (\hat{\mu}, \widehat{v}):
\mathrm{Var}(\hat{\mu}) \geq \frac{v}{n} = \frac{\sigma^2}{n}, \qquad \mathrm{Var}(\widehat{v}) \geq \frac{2v^2}{n} = \frac{2\sigma^4}{n}
Key observation: The bound for \mu is exactly the same as in the “known variance” case!
Estimators
Sample mean: \hat{\mu} = \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i
- Unbiased: \mathcal{E} \{ \hat{\mu} \} = \mu
- Variance: \mathrm{Var}(\hat{\mu}) = \sigma^2/n → attains CRB exactly
Unbiased sample variance: S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2
- Unbiased: \mathcal{E} \{ S^2 \} = \sigma^2
- Variance: \mathrm{Var}(S^2) = \frac{2\sigma^4}{n-1}
- Compare with CRB: \frac{2\sigma^4}{n-1} > \frac{2\sigma^4}{n} for finite n
- Does not attain CRB for finite sample size
MLE for variance: \hat{v}_{\text{ML}} = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2
- Biased: \mathcal{E} \{ \hat{v}_{\text{ML}} \} = \frac{n-1}{n}\sigma^2
- Smaller variance than S^2
- Asymptotically efficient: approaches CRB as n \to \infty
Summary
This gives us a natural comparison in the same model:
- \mu (known \sigma^2): Efficient estimator
- \mu (unknown \sigma^2): Still efficient
- \sigma^2: Natural unbiased estimator does not hit CRB; MLE is asymptotically efficient but biased
Monte Carlo Results (n_trials = 10000 )
================================================================================
For μ̂ (sample mean):
n CRB_mu empirical_var_mu
10 0.100000 0.096057
30 0.033333 0.033168
100 0.010000 0.009984
✓ μ̂ attains CRB for all n (ratio ≈ 1.0)
================================================================================
For σ² estimators:
n CRB_sigma2 empirical_var_S2 empirical_var_ML bias_S2 bias_ML
10 0.200000 0.226105 0.183145 0.007915 -0.092876
30 0.066667 0.069099 0.064569 0.001492 -0.031891
100 0.020000 0.019565 0.019175 0.001725 -0.008292
• S² (unbiased): Does NOT attain CRB for finite n
• v̂_ML (MLE): Approaches CRB as n increases, but is biased