Partial Least Squares

Introduction

The goal of Least-Squares Method is to find a good estimation of parameters that fit a function, f(x), of a set of data, [math]x_1 ... x_n[/math]. The Least-Squares Method requires that the estimated function has to deviate as little as possible from f(x) in the sense of a 2-norm. Generally speaking, Least-Squares Method has two categories, linear and non-linear. We can also classify these methods further: ordinary least squares (OLS), weighted least squares (WLS), and alternating least squares (ALS) and partial least squares (PLS).

550 px

To fit a set of data best, the least-squares method minimizes the sum of squared residuals (it is also called the Sum of Squared Errors, SSE.)

[math] S=\sum_{i=1}^{i=m}r_i^2[/math],

with, [math]r_i[/math], the residual, which is the difference between the actual points and the regression line, and is defined as

[math] r_i= y_i - f(x_i)[/math]

where the m data pairs are [math] (x_i, y_i)\! [/math], and the model function is [math] f(x_i) [/math].

Here, we can choose n different parameters for f(x), so that the approximated function can best fit the data set.

Theory

Linear Least-Squares (LLS) Method assumes that the data set falls on a straight line. Therefore, [math] f(x) = ax + b [/math], where a and b are constants. However, due to experimental error, some data might not be on the line exactly. There must be error (residual) between the estimated function and real data. Linear Least-Squares Method (or [math] l_2 [/math] approximation) defined the best-fit function as the function that minimizes [math] S=\sum_{i=1}^{i=n}(y_i - (ax_i + b) )^2[/math]

The advantages of LLS:

1. If we assume that the errors have a normal probability distribution, then minimizing S gives us the best approximation of a and b.

2. We can easily use calculus determined the approximated value of a and b.

To minimize S, the following conditions must be satisfied [math] \frac{\partial S}{\partial a}=0 [/math], and [math] \frac{\partial S}{\partial b}=0 [/math]

Taking the partial derivatives, we obtain [math] \sum_{i=0}^{i=n}(2*((ax_i + b) - y_i))*x_i = 0[/math], and [math] \sum_{i=0}^{i=n}(2*((ax_i + b) - y_i)) = 0[/math].

This system actually consists of two simultaneous linear equations with two unknowns a and b. (These two equations are so-called normal equations.)

Based on the simple calculation on summation, we can easily find out that

[math] a = \frac{1}{c}*[(n+1)*\sum_{i=0}^{i=n}(x_i*y_i)-(\sum_{i=0}^{i=n}(x_i))(\sum_{i=0}^{i=n}(y_i))] [/math] and [math] b = \frac{1}{c}*[(\sum_{i=0}^{i=n}((x_i)^2))*(\sum_{i=0}^{i=n}(y_i))-(\sum_{i=0}^{i=n}(x_i))(\sum_{i=0}^{i=n}(x_i*y_i))] [/math]

where

[math] c = (n+1)*(\sum_{i=0}^{i=n}((x_i)^2))-(\sum_{i=0}^{i=n}(x_i))^2 [/math] .

Thus, the best estimated function for data set [math] (i, y_i) [/math], for i is an integer between [1, n], is

[math] y = ax + b [/math], where [math] a = \frac{1}{n^3-n}*[12*\sum_{i=1}^{i=n}(i*y_i)-6*(n+1)(\sum_{i=1}^{i=n}(y_i))] [/math] and [math] b = \frac{1}{n^2-n}*[(4*n+2)*\sum_{i=1}^{i=n}(y_i)-6*(\sum_{i=1}^{i=n}(i*y_i))] [/math].

Least-Squares Method in statistical view

From equation [math] [[X]^T[X]]{A} = {[X]^T[Y]} [/math], we can derive the following equation: [math]{A} = [[X]^T[X]]^{-1}{[X]^T{Y}}[/math].

From this equation, we can determine not only the coefficients, but also the approximated values in statistic.

Using calculus, the following formulas for coefficients can be obtained:

[math]a = [/math][math]S_{xy} \over{S^2_x} [/math] and [math] b = \bar{y} - a*\bar{x} [/math]

where

[math] S_{xy} = [/math][math]\sum_{i=1}^{i=n}((x_i-\bar{x})*(y_i-\bar{y})) \over{n-1}[/math]

[math] S^2_x = [/math][math]\sum_{i=1}^{i=n}((x_i-\bar{x})^2) \over{n-1}[/math]

[math] \bar{x} = [/math][math]\sum_{i=1}^{i=n}(x_i) \over{n}[/math]

[math] \bar{y} = [/math][math]\sum_{i=1}^{i=n}(y_i) \over{n}[/math].

Moreover, the diagonal values and non-diagonal values matrix [math][[X]^T[X]]^{-1}[/math] represents variances and covariances of coefficient [math] a_i [/math], respectively.

Assume the diagonal values of [math][[X]^T[X]]^{-1}[/math] is [math] x_{i,i} [/math] and the corresponding coefficient is [math] a_i [/math], then

[math]var(a_{i-1}) = x_{i,i}*s^2_{y/x}[/math] and [math]cov(a_{i-1}, a_{j-1}) = x_{i,j}*s^2_{y/x}[/math]

where [math] s_{y/x} [/math] is called stand error of the estimate, and [math] s_{y/x} = \sqrt{S \over {n - 2}} [/math].

(Here, lower index, y/x, means that the error of certain x is caused by the inaccurate approximation of corresponding y.)

We have many application on these two information. For example, we can derive the upper and lower bound of intercept and slope.

Partial Least Squares

Introduction

Theory

Least-Squares Method in statistical view

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Community

Print/export

Tools