Cramér-Rao Bound

Normal Model: Known vs Unknown Variance

This notebook demonstrates the Cramér-Rao Bound (CRB) using a natural, realistic example: estimating the mean and variance of a normal distribution. We’ll explore two scenarios:

  1. Case A: Variance \sigma^2 is known → 1D parameter \theta = \mu
  2. Case B: Variance \sigma^2 is unknown → 2D parameter \theta = (\mu, \sigma^2)

Case A: \sigma^2 Known — Classical CRB Example

Model: X_1, \dots, X_n \sim \mathcal{N}(\mu, \sigma^2) with \sigma^2 known.

Parameter: \theta = \mu (1D)

Likelihood and Score

For one observation X, the log-likelihood is:

\log f(x; \mu) = -\frac{1}{2}\log(2\pi\sigma^2) - \frac{(x-\mu)^2}{2\sigma^2}

The score function is:

\frac{\partial}{\partial \mu} \log f(x; \mu) = \frac{x-\mu}{\sigma^2}

Fisher Information

The Fisher information is defined as the expected value of the squared score function:

I_1(\mu) = \mathcal{E} \left\{ \left(\frac{\partial}{\partial \mu} \log f(X; \mu)\right)^2 \right\} = \mathcal{E} \left\{ \left(\frac{X-\mu}{\sigma^2}\right)^2 \right\} = \frac{1}{\sigma^2}

For n i.i.d. samples: I_n(\mu) = \frac{n}{\sigma^2}

Cramer-Rao Bound

\mathrm{Var}(\hat{\mu}) \geq \frac{1}{I_n(\mu)} = \frac{\sigma^2}{n}

Estimator: Sample Mean

\hat{\mu} = \frac{1}{n}\sum_{i=1}^n X_i

Derivation of Properties

1. Unbiasedness:

Since X_i \sim \mathcal{N}(\mu, \sigma^2) for i = 1, \dots, n, we have \mathcal{E} \{ X_i \} = \mu:

\mathcal{E} \{ \hat{\mu} \} = \mathcal{E} \left\{ \frac{1}{n}\sum_{i=1}^n X_i \right\} = \frac{1}{n}\sum_{i=1}^n \mathcal{E} \{ X_i \} = \frac{1}{n} \cdot n\mu = \mu

2. Variance:

Since the X_i are i.i.d. with \mathrm{Var}(X_i) = \sigma^2:

\mathrm{Var}(\hat{\mu}) = \mathrm{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right) = \frac{1}{n^2}\sum_{i=1}^n \mathrm{Var}(X_i) = \frac{1}{n^2} \cdot n\sigma^2 = \frac{\sigma^2}{n}

This exactly equals the CRB: \frac{\sigma^2}{n} = \frac{1}{I_n(\mu)}.

Conclusion

\hat{\mu} is efficient — it achieves the CRB.

Numerical Example

Sample size: n = 30
True variance: σ² = 1.0

CRB for μ: 0.033333
Empirical Var(μ̂): 0.033935
Ratio (should be ≈ 1.0): 1.018049

✓ Sample mean attains CRB: True

Case B: \sigma^2 Unknown — 2D CRB

Model: X_1, \dots, X_n \sim \mathcal{N}(\mu, \sigma^2) with \sigma^2 unknown.

Parameter vector: \theta = (\mu, \sigma^2) (2D). Let v = \sigma^2 for clarity.

Likelihood and Score

For one observation X, the log-likelihood is:

\log f(x; \mu, v) = -\frac{1}{2}\log(2\pi v) - \frac{(x-\mu)^2}{2v}

The gradient components are:

\frac{\partial}{\partial \mu} \log f = \frac{x-\mu}{v}, \qquad \frac{\partial}{\partial v} \log f = -\frac{1}{2v} + \frac{(x-\mu)^2}{2v^2}

Fisher Information Matrix

The Fisher information matrix is defined as the expected value of the outer product of the score vector:

I_1(\mu, v) = \mathcal{E} \left\{ \nabla \log f(X; \mu, v) \cdot (\nabla \log f(X; \mu, v))^T \right\}

where the (i,j)-th element is:

[I_1(\mu, v)]_{ij} = \mathcal{E} \left\{ \frac{\partial}{\partial \theta_i} \log f(X; \mu, v) \cdot \frac{\partial}{\partial \theta_j} \log f(X; \mu, v) \right\}

For one sample, computing the expected values gives:

I_1(\mu, v) = \begin{pmatrix} 1/v & 0 \\ 0 & 1/(2v^2) \end{pmatrix}

For n samples: I_n(\mu, v) = n I_1(\mu, v), so the inverse is:

I_n(\mu, v)^{-1} = \frac{1}{n}\begin{pmatrix} v & 0 \\ 0 & 2v^2 \end{pmatrix}

Cramer-Rao Bounds

For any unbiased estimator (\hat{\mu}, \widehat{v}):

\mathrm{Var}(\hat{\mu}) \geq \frac{v}{n} = \frac{\sigma^2}{n}, \qquad \mathrm{Var}(\widehat{v}) \geq \frac{2v^2}{n} = \frac{2\sigma^4}{n}

Key observation: The bound for \mu is exactly the same as in the “known variance” case!

Estimators

  1. Sample mean: \hat{\mu} = \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i

    • Unbiased: \mathcal{E} \{ \hat{\mu} \} = \mu
    • Variance: \mathrm{Var}(\hat{\mu}) = \sigma^2/nattains CRB exactly
  2. Unbiased sample variance: S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2

    • Unbiased: \mathcal{E} \{ S^2 \} = \sigma^2
    • Variance: \mathrm{Var}(S^2) = \frac{2\sigma^4}{n-1}
    • Compare with CRB: \frac{2\sigma^4}{n-1} > \frac{2\sigma^4}{n} for finite n
    • Does not attain CRB for finite sample size
  3. MLE for variance: \hat{v}_{\text{ML}} = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2

    • Biased: \mathcal{E} \{ \hat{v}_{\text{ML}} \} = \frac{n-1}{n}\sigma^2
    • Smaller variance than S^2
    • Asymptotically efficient: approaches CRB as n \to \infty

Summary

This gives us a natural comparison in the same model:

  • \mu (known \sigma^2): Efficient estimator
  • \mu (unknown \sigma^2): Still efficient
  • \sigma^2: Natural unbiased estimator does not hit CRB; MLE is asymptotically efficient but biased
Monte Carlo Results (n_trials = 10000 )
================================================================================

For μ̂ (sample mean):
  n   CRB_mu  empirical_var_mu
 10 0.100000          0.096057
 30 0.033333          0.033168
100 0.010000          0.009984

✓ μ̂ attains CRB for all n (ratio ≈ 1.0)

================================================================================

For σ² estimators:
  n  CRB_sigma2  empirical_var_S2  empirical_var_ML  bias_S2   bias_ML
 10    0.200000          0.226105          0.183145 0.007915 -0.092876
 30    0.066667          0.069099          0.064569 0.001492 -0.031891
100    0.020000          0.019565          0.019175 0.001725 -0.008292

• S² (unbiased): Does NOT attain CRB for finite n
• v̂_ML (MLE): Approaches CRB as n increases, but is biased