Regression Analysis

.
We know that equation of a straight line.

The equation of the straight line is given by

Y = a + bx

Where a and b are constants,

a is the y intercept i.e. the point where the line y = a + bx cuts the y axis,

b is the slope of the line. It gives the rate of change of y with respect to X.

.
Yes you got it right

The line -3+2/3X touches Y axis at -3 and the slope of the line is 2/3, means for one unit increase in X the value of Y increases by 0.67

The line 2+1.75X touches Y axis at2 and the slope of the line is 1.75, means for one unit increase in X the value of Y increases by 1.75

Now we see how to calculate the constants a: y intercept, where the regression line will meet Y axis and slope i.r. gradient of the straight line We can find the values of a and b using the following normal equations.

From y = a + bx ........(1)

Taking sum of both sides, we get

$$\sum$$ y = na + b$$\sum$$X .......(2) as a is constant hence $$\sum$$a = na.

Multiply equation (1) by X and take sum of both sides we get

$$\sum$$XY = a$$\sum$$X + b$$\sum$$X2 ........(3)

Solving equations (2) and (3) we get

b=$$\frac{\sum XY-n\bar X \bar Y}{\sum X^2-n\bar X^2}$$

a=$$\bar Y -n \bar X$$

After obtaining the values of a &amp; b we get an estimating equation.

$$\hat Y$$=a+bX

where $$\hat Y$$ is estimated value of y when value of X is given.

Obtain the regression equation for the following data.

We find out values of a and b using the above data

TOTAL 45 20 188 415

$$\bar X = \frac{45}{5}$$= 9

$$\bar Y = \frac{20}{5}$$= 4

b = $$\frac{\sum XY-n\bar X \bar Y}{\sum X^2-n\bar X^2}$$= 0.8

a = 4 – 9 x 0.8 = - 3.2

$$ \hat Y $$= -3.2 +0.8X is an estimating equation

For a bivariate data (Xi, Yi), the relationship may be Y depends on X or X depends on Y.

If Y depends on X then the regression line is Y on X. Y is dependent variable and X is independent variable.

If X depends on Y, then regression line is X on Y and X is dependent variable and Y is independent variable. The regression equation Y on X is Y = a + bx, is used to estimate value of Y when X is known. The regression equation X on Y is X = c + dy is used to estimate value of X when Y is given and a, b, c and d are constant.


 * [[Image:kavita.png|300px|centre]]

Y = a + bx can also be interpreted as ‘a’ is the average value of Y when X is zero.

X = c + dy, value c is the average value of X, when Y is zero.

The slopes of the equation Y on X and X on Y are denoted as byx and bxy respectively.

The values of byx and bxy are

byx = $$\frac{Cov(X Y)}{Variance X}$$

bxy = $$\frac{Cov(X Y)}{Variance Y}$$

Simplifying we get,

byx = $$\frac{\sum XY-n\bar X \bar Y}{\sum X^2-n\bar X^2}$$

bxy = $$\frac{\sum XY-n\bar X \bar Y}{\sum Y^2-n\bar Y^2}$$

byx and bxy are the coefficient of regression.

After we obtain values of byx and bxy we obtain the regression equations by substituting in the following equation.

Y on X is $$ Y-\bar Y= byx (X-\bar X)$$

and

X on Y Y on X is $$ X-\bar X= bxy (Y-\bar Y)$$

The value of b in the previous section is same as byx.

The regression equations Y on X and X on Y has following properties

a)The lines of regression meet in a point whose co-ordinates are . The averages of both X and Y will lie on both the lines of regression.

b)The regression coefficients byx, bxy and correlation coefficient ‘r’ will have the same sign. The relationship will remain the same in any of the coefficients.

c)There is an angle formed between the two lines of regression. Let the angle be denoted by $$\theta$$. The correlation is perfect then the angle $$\theta$$. is 0. The lines exactly coincide. As the correlation becomes weaker and weaker the $$\theta$$. increases. An if $$\theta$$. is 90 0 then variables may not be linearly correlated.


 * [[Image:kavita.png|300px|centre]]

d)The correlation coefficient ‘r’ is geometric mean of the regression coefficients. The sign + or – given to ‘r’, that exists for byx and bxy.

byx = $$\frac{\sum XY-n\bar X \bar Y}{\sum X^2-n\bar X^2}$$

bxy = $$\frac{\sum XY-n\bar X \bar Y}{\sum Y^2-n\bar Y^2}$$

r = ± $$\sqrt{byx*bxy}$$

e) byx = r $$ \frac{\sigma y}{\sigma x}$$

and

bxy = r $$ \frac{\sigma x}{\sigma y}$$

Regression analysis is used for defining the relationship of two variables. The linear relationships are expressed using mathematical formula. For a given value of independent variable we can predict value of independent variable.For example, given value of advertising expenditure what will be projection of revenue? we use regression analysis for such kind of forecasting.

Regression: A general process of predicting one variable from another by statistical means using previous data

Regression line: A line fitted to set of data points to estimate the relationship between the variables.

Dependent variable: The variable we are trying to predict

Independent variable: The known variable in regression analysis.

1. A computer while calculating correlation coefficient between two variables X and Y from 25 pairs of observations obtained the following

N = 25

Find the correlation coefficient of X and Y. Mean values of, X and Y. Regression equations of Y on X and X on Y.

2. A furniture retailer in a locality is interested in studying whether some relationship exists between the number of building permits issued in that locality in the past years and the volume of sales in those years. He has accordingly collected the data for the sales (y) and the number of building permits issued(X) in the past 10 years. The results are as follows ∑ X=200 ∑Y= 2200 ∑XY= 45800 ∑X2= 4600 and ∑Y2 =-490400. Using the appropriate regression equations, find i)The level of sales expected next year when 2000 building permits are to be issued. ii)The level of sales expected next year when 2000 building permits are to be issued.

3. To the Internal Revenue Service, the reasonableness of total itemized deduction depends on taxpayer’s adjusted gross income. Large deductions, which include charity and m A furniture retailer in a locality is interested in studying whether some relationship exists between the number of building permits issued in that locality in the past years and the volume of sales in those years. He has accordingly collected the data for the sales (y) and the number of building permits issued (X) in the past 10 years. The results are as follows ∑ X=200 ∑Y= 2200 ∑XY= 45800 ∑X2= 4600 and ∑Y2 =-490400.

Using the appropriate regression equations, find iii)The level of sales expected next year when 2000 building permits are to be issued. iv)The level of sales expected next year when 2000 building permits are to be issued.

4. To the Internal Revenue Service, the reasonableness of total itemized deduction depends on taxpayer’s adjusted gross income. Large deductions, which include charity and medical deductions, are more reasonable for taxpayers with large adjusted gross incomes. If a taxpayer claims larger than average itemized deductions for a given level of income, the chances if a IRS audit are increased. Data (in $1000) on adjusted gross income and the average or reasonable amount of itemized deductions follow. Adjusted gross income ($1000s) 22 27 32 48 65 85 120 Total itemized deductions ($1000s) 9.6 9.6 10.1 11.1 13.5 17.7 25.5 Use the least square method to develop the estimated regression equation. Estimate a reasonable level of total itemized deductions for a tax payer with an adjusted gross income of $52000. If this taxpayer has claimed total itemized deductions of $20,00, would the IRS agent’s request for a n audit appear justified? Explain.

5. In a laboratory experiment on correlation research study, the equation to the to regression lines were to be 2X-Y+1=0 and 3X-2Y+7=0. Find the means of X and Y. Also work out the values of the regression coefficients and the coefficient of correlation between the two variables X and Y. Given variance of X=9 find the standard deviation of Y.

6. In a laboratory experiment on correlation research study, the equation to the to regression lines were to be 2X-Y+1=0 and 3X-2Y+7=0. Find the means of X and Y. Also work out the values of the regression coefficients and the coefficient of correlation between the two variables X and Y. Given variance of X=9 find the standard deviation of Y.

7. The two lines of regression based on 100 observations were 20X-9Y-106=0 and 4X-5Y+30=0. Determine the coefficient of correlation, and calculate the variance of Y if the variance of X is 9.

Anderson et al, Statistics for business and economics, eighth edition,2002, Thomson Asia Pvt. Ltd. Singapore

R. Levin and D. Rubin, Statistics for management, seventh edition,1997,Prntice Hall of India, New Delhi.

Frank and Althoen, Statistics concept and applications,1994, Cambridge university press, Cambridge

A.D.Aczel and J. Sounderpandian, Complete Business Statistics, 2002, Tata McGraw Hill, New Delhi,India

W.J.Stevenson, Business Statistics concept and applications, 1978, Harper and Row publishers, New York, USA.