Classical Estimation

Estimating Total Production from Observed Items

Problem Statement

  • A factory produces an unknown but deterministic number of objects: \phi
  • All items are numbered consecutively from 1 to \phi
  • We can find K of these items (randomly selected)
  • From the observed item numbers, we want to estimate \widehat{\phi}

Mean and Variance Estimation

Considering a unknown distribution of X and we want to estimate its moments.

  • Let’s assume a uniform distribution for X \sim U[\mu-1, \mu+1]
  • We observe N = 5 samples \hat{\mu} = \frac{1}{N} \sum_{i=1}^{N} x_i, \quad \hat{\sigma}^2 = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \hat{\mu})^2

MMSE and Least Squares for Linear Regression

Consider a regression problem with vector input \mathbf{X} and scalar output Y. We trying to predict Y from \mathbf{X} with a linear function with vector coefficients \mathbf{a}: \widehat{Y} = \mathbf{a}^T \mathbf{X}

The Mean Square Error (MSE) cost function is J_{\mathrm{MSE}}(\mathbf{a}) = \mathcal{E}\big\{(Y - \hat{Y})^2\big\} = \mathcal{E}\big\{(Y - \mathbf{a}^T \mathbf{X})^2\big\}.

For minimization, the gradient of J(\mathbf{a}) with respect to \mathbf{a} (using \mathbf{a}^T\mathbf{X} = \mathbf{X}^T \mathbf{a}):

\nabla_\mathbf{\mathbf{a}} J_{\mathrm{MSE}}(\mathbf{a}) = -2\mathcal{E}\{\mathbf{X}(Y - \mathbf{X}^T \mathbf{a})\} \overset{!}{=} 0.

For more details on vector derivatives see the Wikipedia entry.

Solving for \mathbf{a} yields

\mathbf{a}_{\mathrm{MMSE}} = \big(\mathcal{E}\{\mathbf{X}\mathbf{X}^T\}\big)^{-1} \mathcal{E}\{\mathbf{X}Y\} = \mathbf{R}_{\mathbf{XX}}^{-1},\mathbf{r}_{\mathbf{X}Y},

Example

Applying the linear regression to a jointly Gaussian X and Y can be seen in the following plot. Note that the regression line given by a_\textrm{MMSE} is not the correlation coefficient \rho_{XY}, but

a_{\textrm{MMSE}} = \dfrac{\mathcal{E}\{XY\}}{\mathcal{E}\{X^2\}} = \rho_{XY} \, \dfrac{\sigma_Y}{\sigma_X} .

Likelihood Function: Triangular Distribution Example

Setup:

  • Model: X \sim \text{Tri}[a, b] with a < b and a \leq x_i \leq b for all i
  • Unknown parameters a and b
  • Observations: \mathbf{x} = [3.0, 4.0, 7.0]

Key Concepts:

  • The likelihood function L(a,b | \mathbf{x}) \propto \prod_i f(x_i | a,b) is the product of PDFs evaluated at each observation

Maximum Likelihood

  • Taking the logarithm converts the product into a sum: \log L(a,b | \mathbf{x}) = \sum_i \log f(x_i | a,b)
  • The likelihood is zero when a > \min(x_i) or b < \max(x_i) because at least one observation falls outside the support [a, b]

Inconsistent Estimator: Periodogram

The periodogram is an estimate of the power spectral density (PSD) computed as:

P[k] = \frac{1}{N} |X[k]|^2 \quad \textrm{ with } \quad X[k] = \sum_{n=0}^{N-1} x[n] e^{-j2\pi kn/N}

Signal Setup

For white noise x[n] with variance \sigma^2, we estimate the PSD S_{XX}[k] at a fixed frequency k:

  • Mean: \mathcal{E}\{P[k]\} = \sigma^2 (unbiased)
  • Variance: \text{Var}\{P[k]\} = \sigma^4 (constant, independent of N)