Linear Inverse Problems and Probabilistic Estimators

This notebook outlines how to treat a linear inverse problem with maximum likelihood (MLE), maximum a posteriori (MAP), and minimum mean-square error (MMSE) estimators.

1. Problem Setup

We consider a linear observation model Y = A X + N, where

X \in \mathbb{R}^d is the unknown signal or parameter vector,
A \in \mathbb{R}^{m \times d} is a known linear operator,
N \in \mathbb{R}^m is noise,
Y \in \mathbb{R}^m is the observation.

We will derive different estimators \hat X = g(Y) under various probabilistic assumptions.

2. Probabilistic Model

The key ingredients are:

Noise model
We assume a known distribution for N.
This defines the likelihood p_{Y \mid X}(y \mid x) = p_N(y - A x).
Prior on X (optional)
If we have prior beliefs about X, we specify a prior density p_X(x).
Joint distribution
When both are specified: p_{X,Y}(x,y) = p_{Y \mid X}(y \mid x)\, p_X(x).

3. Special Case: Gaussian–Gaussian Model

Assume

Prior: X \sim \mathcal N(\mu_X, C_X)
Noise: N \sim \mathcal N(0, C_N)
Independent: X \perp N

Then Y = A X + N.

3.1 Likelihood

Given X = x, (Y \mid X=x) \sim \mathcal N(Ax, C_N), so p_{Y \mid X}(y \mid x) \propto \exp\left( -\tfrac{1}{2} (y - A x)^\top C_N^{-1} (y - A x) \right).

3.2 Prior

p_X(x) \propto \exp\left( -\tfrac{1}{2} (x - \mu_X)^\top C_X^{-1} (x - \mu_X) \right).

3.3 Estimators

The posterior is Gaussian; its mean equals its mode.
Thus \hat x_{\mathrm{MAP}}(y) = \hat x_{\mathrm{MMSE}}(y).

Closed-form: \hat x(y) = \left(A^\top C_N^{-1} A + C_X^{-1}\right)^{-1} \left(A^\top C_N^{-1} y + C_X^{-1} \mu_X\right).

Special cases:

No prior (MLE):
Take C_X^{-1} \to 0 (weak prior), then \hat x_{\mathrm{MLE}}(y) = (A^\top C_N^{-1} A)^{-1} A^\top C_N^{-1} y, which is (weighted) least squares.
Isotropic prior and noise:
C_X = \sigma_X^2 I, C_N = \sigma_N^2 I, then \hat x(y) = (A^\top A + \lambda I)^{-1} A^\top y \quad\text{with}\quad \lambda = \frac{\sigma_N^2}{\sigma_X^2}, i.e. ridge regression / Tikhonov regularization.

4. Special Case: Gaussian Noise + Arbitrary Prior (Generalized Tweedie Formula)

Consider the linear inverse problem
Y = AX + N, \qquad N \sim \mathcal N(0, \sigma^2 I), with any prior on X.
This setting admits a direct generalization of the classical Tweedie formula.

4.1 Classical Tweedie Formula (1-D Denoising)

For the simple model
Y = X + N, the MMSE estimator is
\mathbb E[X \mid Y=y] = y + \sigma^2 \, \frac{d}{dy} \log p_Y(y),
where p_Y is the marginal density of Y.
This holds for any prior on X.

4.2 Generalized Tweedie Formula for Linear Inverse Problems

For the linear model Y = AX + N, \qquad N \sim \mathcal N(0, \sigma^2 I), one can show that the posterior mean of AX satisfies \mathbb E[AX \mid Y = y] = y + \sigma^2 \, \nabla_y \log p_Y(y).

If A^\top A is invertible (full column rank), this gives a direct expression for the MMSE estimator of X: \mathbb E[X \mid Y = y] = (A^\top A)^{-1} A^\top \bigl( y + \sigma^2 \, \nabla_y \log p_Y(y) \bigr)

For A = I, this reduces to the classical Tweedie formula.

5. Sensor Fusion as an Inverse Problem

Sensor fusion appears when we want to estimate the same underlying quantity X from multiple measurements: Y^{(1)},\; Y^{(2)},\; \ldots,\; Y^{(K)}.

Each sensor has its own measurement model: Y^{(k)} = A_k X + N_k, \qquad k = 1,\dots,K, where A_k is the sensor’s observation matrix and N_k is its noise.

5.1 Likelihood (Fusion of Information)

Assuming independent noise terms \{N_k\}, the joint likelihood is p_{Y^{(1)},\ldots,Y^{(K)}\mid X}(y^{(1)},\ldots,y^{(K)}\mid x) = \prod_{k=1}^K p_{Y^{(k)}\mid X}(y^{(k)}\mid x).

This leads to the standard estimators MLE, MAP, and MMSE.

Sensor fusion is therefore nothing more than an inverse problem with multiple observation channels.

5.2 Gaussian Special Case (Weighted Least Squares)

If each N_k is Gaussian: N_k \sim \mathcal N(0, C_k), and independent,

then the joint model is Y = \begin{bmatrix} Y^{(1)} \\ \vdots \\ Y^{(K)} \end{bmatrix}, \qquad A = \begin{bmatrix} A_1 \\ \vdots \\ A_K \end{bmatrix}, \qquad C = \operatorname{blkdiag}(C_1,\ldots,C_K),

and MLE becomes weighted least squares: \hat X_{\text{MLE}} = (A^\top C^{-1} A)^{-1} A^\top C^{-1} Y.

5.3. Scalar Observations and Sensitivity Weighting

For scalar observations of the same unknown, the resulting estimator (MLE = MAP = MMSE) is \hat X = \frac{\displaystyle \sum_{k=1}^K a_k\, y^{(k)} / \sigma_k^2} {\displaystyle \sum_{k=1}^K a_k^{\,2} / \sigma_k^2 }. This shows that sensor fusion performs sensitivity weighting: each sensor contributes proportionally to its precision a_k^{2}/\sigma_k^{2}. Sensors with higher sensitivity and lower noise variance receive larger weight in the fused estimate.

5.3 Relation to Kalman Filtering

Kalman filtering is the dynamic version of the same fusion idea:
each new sensor measurement provides one more likelihood factor, and the posterior is updated recursively.

6. Linear MMSE Solution

This section derives the linear MMSE estimator for a standard linear inverse problem and expresses the corresponding posterior (error) covariance in a form consistent with the inverse-problem setup.

6.1. Problem Setup

We consider the linear observation model Y = A X + N, where

X \in \mathbb{R}^d is the unknown random vector with zero mean,
A \in \mathbb{R}^{m \times d} is a known linear operator,
N \in \mathbb{R}^m is additive noise, independent of X,
Y \in \mathbb{R}^m is the observation with zero mean.

We restrict ourselves to affine (linear) estimators \widehat{X} = K Y, and seek the estimator minimizing the mean-square error.

6.2. Estimation Error and Error Covariance

Define the estimation error e := X - \widehat{X} = X - K Y .

Its covariance matrix is \begin{aligned} C_{ee} &= \mathbb{E}[e e^H] \\ &= \mathbb{E}\!\left[(X - K Y)(X - K Y)^H\right] \\ &= C_{XX} - K C_{YX} - C_{XY} K^H + K C_{YY} K^H , \end{aligned} where \begin{aligned} C_{XX} &= \mathbb{E}[X X^H], C_{YY} &= \mathbb{E}[Y Y^H],\\ C_{XY} &= \mathbb{E}[X Y^H], C_{YX} &= C_{XY}^H . \end{aligned}

Minimizing \mathrm{tr}(C_{ee}) with respect to K yields the linear MMSE (LMMSE) gain K = C_{XY} C_{YY}^{-1}.

6.3. Posterior (Error) Covariance

Substituting the optimal gain into the general expression for the error covariance we obtain, step by step, \begin{aligned} C_{ee} &= C_{XX} - (C_{XY} C_{YY}^{-1}) C_{YX} - C_{XY} (C_{XY} C_{YY}^{-1})^H + (C_{XY} C_{YY}^{-1}) C_{YY} (C_{YY}^{-1})^H C_{YX} \\ &= C_{XX} - C_{XY} C_{YY}^{-1} C_{YX} - C_{XY} C_{YY}^{-1} C_{YX} + C_{XY} C_{YY}^{-1} C_{YY} C_{YY}^{-1} C_{YX}. \end{aligned}

Using that C_{YY} is Hermitian and positive definite, so that C_{YY}^{-1} = (C_{YY}^{-1})^H, the last term simplifies to C_{XY} C_{YY}^{-1} C_{YX}.

Collecting terms yields \begin{aligned} C_{ee} &= C_{XX} - 2\, C_{XY} C_{YY}^{-1} C_{YX} + C_{XY} C_{YY}^{-1} C_{YX} \\ &= C_{XX} - C_{XY} C_{YY}^{-1} C_{YX} = C_{XX} - K C_{YX}. \end{aligned}

Using C_{YX} = A C_{XX}, the posterior covariance can also be written as C_{ee} = (I - K A)\, C_{XX}.

This simplification relies on using the optimal LMMSE gain.

6.4. Interpretation

C_{ee} quantifies the remaining uncertainty in X after observing Y.
(I - K A) acts as an uncertainty-reduction operator applied to the prior covariance.
If X and N are jointly Gaussian, the affine MMSE estimator coincides with:
- the true MMSE estimator, and
- the MAP estimator.

This places the LMMSE estimator as the natural bridge between classical inverse problems and probabilistic Bayesian estimation.