Classical Estimation
Estimating Total Production from Observed Items
Problem Statement
- A factory produces an unknown but deterministic number of objects: \phi
- All items are numbered consecutively from 1 to \phi
- We can find K of these items (randomly selected)
- From the observed item numbers, we want to estimate \widehat{\phi}
Mean and Variance Estimation
Considering a unknown distribution of X and we want to estimate its moments.
- Let’s assume a uniform distribution for X \sim U[\mu-1, \mu+1]
- We observe N = 5 samples \hat{\mu} = \frac{1}{N} \sum_{i=1}^{N} x_i, \quad \hat{\sigma}^2 = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - \hat{\mu})^2
MMSE and Least Squares for Linear Regression
Consider a regression problem with vector input \mathbf{X} and scalar output Y. We trying to predict Y from \mathbf{X} with a linear function with vector coefficients \mathbf{a}: \widehat{Y} = \mathbf{a}^T \mathbf{X}
The Mean Square Error (MSE) cost function is J_{\mathrm{MSE}}(\mathbf{a}) = \mathcal{E}\big\{(Y - \hat{Y})^2\big\} = \mathcal{E}\big\{(Y - \mathbf{a}^T \mathbf{X})^2\big\}.
For minimization, the gradient of J(\mathbf{a}) with respect to \mathbf{a} (using \mathbf{a}^T\mathbf{X} = \mathbf{X}^T \mathbf{a}):
\nabla_\mathbf{\mathbf{a}} J_{\mathrm{MSE}}(\mathbf{a}) = -2\mathcal{E}\{\mathbf{X}(Y - \mathbf{X}^T \mathbf{a})\} \overset{!}{=} 0.
For more details on vector derivatives see the Wikipedia entry.
Solving for \mathbf{a} yields
\mathbf{a}_{\mathrm{MMSE}} = \big(\mathcal{E}\{\mathbf{X}\mathbf{X}^T\}\big)^{-1} \mathcal{E}\{\mathbf{X}Y\} = \mathbf{R}_{\mathbf{XX}}^{-1},\mathbf{r}_{\mathbf{X}Y},
Example
Applying the linear regression to a jointly Gaussian X and Y can be seen in the following plot. Note that the regression line given by a_\textrm{MMSE} is not the correlation coefficient \rho_{XY}, but
a_{\textrm{MMSE}} = \dfrac{\mathcal{E}\{XY\}}{\mathcal{E}\{X^2\}} = \rho_{XY} \, \dfrac{\sigma_Y}{\sigma_X} .
Likelihood Function: Triangular Distribution Example
Setup:
- Model: X \sim \text{Tri}[a, b] with a < b and a \leq x_i \leq b for all i
- Unknown parameters a and b
- Observations: \mathbf{x} = [3.0, 4.0, 7.0]
Key Concepts:
- The likelihood function L(a,b | \mathbf{x}) \propto \prod_i f(x_i | a,b) is the product of PDFs evaluated at each observation
Maximum Likelihood
- Taking the logarithm converts the product into a sum: \log L(a,b | \mathbf{x}) = \sum_i \log f(x_i | a,b)
- The likelihood is zero when a > \min(x_i) or b < \max(x_i) because at least one observation falls outside the support [a, b]
Inconsistent Estimator: Periodogram
The periodogram is an estimate of the power spectral density (PSD) computed as:
P[k] = \frac{1}{N} |X[k]|^2 \quad \textrm{ with } \quad X[k] = \sum_{n=0}^{N-1} x[n] e^{-j2\pi kn/N}
Signal Setup
For white noise x[n] with variance \sigma^2, we estimate the PSD S_{XX}[k] at a fixed frequency k:
- Mean: \mathcal{E}\{P[k]\} = \sigma^2 (unbiased)
- Variance: \text{Var}\{P[k]\} = \sigma^4 (constant, independent of N)