User:Rupali andc/NORA

REGRESSION A TOOL FOR DATA ANALYSIS

STRUCTURE

1.1	Introduction 1.2	Objectives 1.3	Important Concepts 1.3.1 Regression 1.3.2 Dependent Variable 1.3.3 Independent Variable 1.3.4 Simple Linear Regression 1.3.5Multiple Linear Regression 1.4	Regression Lines 1.4.1 Features of Regression Lines 1.4.2 Coefficient Of Correlation and Regression Lines 1.5	Regression Equations 1.5.1 Properties of Regression Coefficients 1.5.2 Methods of Calculating Regression Equations 1.6	Standard Error of Estimates 1.7	Comparison between Correlation and Regression 1.8	Limitations of Regression Analysis 1.9	Summary 1.10	Glossary 1.11	Self Assessment Exercise 1.12	Further Readings 1.13	Solution To Check Your Progress 1.1 INTRODUCTION In general, the word regression means moving backward or returning to the average value. This concept was introduced by Sir Francis Galton. However, in statistics regression is a common statistical data analysis technique. It is used to estimate or predict the unknown value of the variable from the known values of other variables. In order to survive in the dynamic environment, every organization has to make prediction of future events. For instance, government requires estimates about population, production, consumption, prices and many more to devise the policies for efficient functioning of the economy. Similarly business houses make use of this tool to estimate demand for it’s products as a function of it’s price, to analyze sales trend over time etc. These estimates help the organization to remain competitive. Banks require estimates of demand for various types of loans and government policies to adjust the interest rates. However, estimates are not perfectly accurate therefore; standard error of estimate is used to measure the reliability of estimates. This analysis can be used only if there is some kind of relationship, not necessarily a cause and effect relationship between the variables under study. 1.2 OBJECTIVES

The primary concern in this lesson has been to enable the students to: •	Define the concept of regression analysis •	Differentiate between correlation and regression •	Formulate the basic techniques of estimating the unknown values of the variable for making decision

1.3 IMPORTANT   CONCEPTS

1.3.1 Regression: It measures average relationship between two or more variables in terms of original units of data. It is used to estimate the unknown value of dependent variable from the given values of independent variables.

1.3.2 Dependent variable :  The variable whose values are predicted on the basis of values of another variable. It is also called as explained variable. If price of wheat is used to determine the quantity of wheat demanded, then quantity of wheat is a dependent variable and price is an independent variable.

1.3.3 Independent variable :  The variable which is used to predict the values of another variable. It is also called as explanatory variable.

Simple linear regression : In this only one independent variable is used to estimate the value of dependent variable. For instance, study of effect of rainfall on the productivity of food grains

Multiple linear regression : In this two or more independent variables are used to estimate the values of dependent variable. For instance, effect of rainfall and fertilizers used on the productivity of food grains.

Regression Lines It gives the best estimate of values of one variable for any given values of the other variable. These lines are drawn on the assumption of least square i.e., lines should be drawn through the plotted points in such a manner that the sum of squares of deviations of the actual values from the computed value is the least. In simple linear regression there are only two variables i.e., “X” AND “Y” therefore, there are two regression lines. •	Regression line of X on Y                        This gives best estimate for values of variable “X” for any given values of variable “Y”. •	Regression line of Y on X                    This gives best estimate for values of variable “Y” for any given values of variable “X”. . Features of Regression Lines

•	The sum the of squared deviations from the regression lines i.e., is less than the sum of the squared deviations from any other line •	The sum of positive and negative deviations cancel out i.e., •	The regression lines intersect at the mean values of the variables i.e. ( Y ) – actual values of variable Y  - computed values of variable Y

Coefficient of Correlation and Regression Lines When coefficient of correlation is zero i.e. r =0 The two regression lines are perpendicular to each other.

When coefficient of correlation is perfectly positive or negative r = The two regression lines coincide i.e., there is only one line

When there is high degree of correlation The two regression lines are closer to each other

When there is low degree of correlation The regression lines are farther away from each other CHECK YOUR PROGRESS

Activity A          There will be only one regression line in case of two variables if: 1.	r = 0 2.	r = +1 3.	r = -1 4	none of the above

Activity B           When coefficient of correlation is zero i.e., r = 0 the regression lines cut each other making an angle of : 1.	 2.	90 i.e., parallel to OX and OY axis 3.	60 4.	none of the above

Regression Equations: They are mathematical expression of regression lines. In regression equation there are two constants “a” and “b”, also called as the parameters of the lines. These parameter determines the position of lines. A change in the values of either “a” or “b” or both of them will form a new line. In simple linear regression there are two regression equations. •	Regression of X on Y                 Here, a = X intercept .It gives the value of dependent variable(X) when the           value of independent variable(Y) is zero. b = slope of line. This give change in the value of dependent variable(X) for a unit change in the value of independent variable(Y).It is also called as regression coefficient of X on Y i.e., b.

•	Regression of Yon X         Here, a = Y intercept. It gives the value of dependent variable(Y) when the value of the independent variable (X) is zero.

b = Slope of line. This gives change in the value of dependent variable(Y) for a unit change in the value of independent variable(X). It is also called as regression of Y on X i.e., Properties of Regression Coefficients

1.	Both the regression coefficients  will have same sign i.e. they will be either positive or negative. 2.	Coefficient of correlation is the geometric mean of two regression coefficients i.e. r = 3.	Both the regression coefficients cannot be greater than one, because coefficient of correlation cannot exceed one. 4.	Coefficient of correlation will have same sign as that of regression coefficients That is, if regression coefficients have positive sign then coefficient of correlation will also be positive. Regression coefficients are independent of origin but not of scale. This mean that if any constant value is added to or subtracted from each value of variable X and Y then there will be no change in the values of regression coefficients. But, if any constant value is multiplied with each value of the variable X and Y then values of regression coefficients will change. CHECK YOUR PROGRESS

Activity C          The values of   and of ,  calculate coefficient of correlation (r).

Activity D          The values of     and  , calculate coefficient of correlation (r).

Activity E         Do you agree? Comment. is possible

Methods Of Calculating Regression Equations

1.	Normal Equations method 2.	Direct method 3.	Deviation from actual mean 4.	Deviation from assumed mean 5.	In case of frequency distribution table

Illustration 1 From the following data X	1	2	3	4	5 Y	6	8	5	4	7 Find out 1.	Regression of X on Y 2.	Regression of Yon X 3.	Predict the value of X when Y =20 4.	Predict the value of Y when X =15 5.	Calculate coefficient of correlation

Normal Equations Method

X	Y XY 1	6	1	36	6 2	8	4	64	16 3	5	9	25	15 4	4	16	16	16 5	7	25	49	35

1.	Regression of X on Y           To find out value of a and b we will solve two normal equations simultaneously -- (1)                   - (2)

Putting the values in (1) and (2)

15 = 5a + 30b--- (3) 88 = 30a +190b- (4)

Multiply equation (3) by 6 and subtract

90 = 30a + 180b(5) 88 = 30a + 190b-(6) --     --    --                        2 =  (-10)b b = - 0.2

putting this value of b in (3) we get

15 = 5a + (-0.2)(3) a = 4.2

put this value of a and b in the equation, we get Regression of X on Y as

2.	Regression of Y on X             To find out value of a and b we will solve two normal equations simultaneously --(1)                    (2)

putting the values in (1) and (2)

30 =  5a + 15b-(3) 88 = 15a + 55b (4)

Multiply equation (3) by 3 and subtract 90 = 15a + 45b-(5) 88 = 15a + 55b-(6) --    --     ---                    -                      2 = (- 10)b b = - 0.2

put this value of b in (3) we get 30 = 5a + 15(-0.2) a = 6.6 put the values of a and b in the equation, we get the regression of Y on X as

Y = 6.6 - 0.2X

3.	 Predict the value of X when Y = 20

Regression of X on Y                            X  = 4.2 – 0.2Y X = 4.2 – 0.2(20) X = 0.2 4.	Predict the value of Y when X =15

Regression of Y on X                             Y = 6.6 – 0.2X Y = 6.6 – 0.2(15) Y = 3.3

5.	Coefficient of correlation r = r = r = (- 0.2)

Direct Method

X	Y XY 1	6	1	36	6 2	8	4	64	16 3	5	9	25	15 4	4	16	16	16 5	7	25	49	35

3               ;

Regression of X on Y          (X -  )  =

regression coefficient of X on Y        or         put the values we get 1.	Put the value of   in regression equation of X on Y              X - 3 =  - 0.2 (Y – 6) X – 3 = - 0.2 Y + 1.2 X =  4.2 – 0.2Y

Regression of Y on X

or    2.	Put the values of  in regression equation of Y on X                (Y- 6) = -0.2 (X – 3) Y- 6 =  -0.2X + 0.6 Y =  6.6 - 0.2X

3.	Predict the value of X when Y = 20

Regression of X on Y                            X  = 4.2 – 0.2Y X = 4.2 – 0.2(20) X = 0.2 4.	Predict the value of Y when X =15

Regression of Y on X                             Y = 6.6 – 0.2X

Y = 6.6 – 0.2(15) Y = 3.3

5	Coefficient of correlation r = r = r = (- 0.2)

Actual Mean Method

X	Y 1	6	-2	0	4	0	0 2	8	-1	2	1	4	-2 3	5	0	-1	0	1	0 4	4	1	-2	1	4	-2 5	7	2	1	4	1	2 3               ;

1.	Regression Equation of X on Y  : (X - )  =

regression coefficient of X on Y

Put this value in Regression Equation of X on Y 2.	Regression Equation of Y on X : regression coefficient of Y on X Put the values of in regression equation of Y on X                (Y- 6) = -0.2 (X – 3) Y- 6 = -0.2X + 0.6 Y = 6.6 - 0.2X 3.	Predict the value of X when Y = 20

Regression of X on Y                           X  = 4.2 – 0.2Y X = 4.2 – 0.2(20) X = 0.2 4.	Predict the value of Y when X =15

Regression of Y on X                            Y = 6.6 – 0.2X Y = 6.6 – 0.2(15) Y = 3.3

5.	  Coefficient of correlation

r = r = r = (- 0.2)

Assumed Mean Method

X Y 1	-3	9	6	1	1	-3 2	-2	4	8	3	9	-6 3	-1	1	5	0	0	0 4	0	0	4	-1	1	0 5	1	1	7	2	4	2                  A=Assumed mean

1.	Regression Equation of X on Y : regression coefficient of X on Y

Put this value of  in regression equation of X  on Y:

2.	Regression Equation of Y on X : regression coefficient of Y on X

Put this value of in regression equation: (Y- 6) = -0.2 (X – 3) Y- 6 = -0.2X + 0.6 Y = 6.6 - 0.2X 3.	Predict the value of X when Y = 20

Regression of X on Y                           X  = 4.2 – 0.2Y X = 4.2 – 0.2(20) X = 0.2 4.	Predict the value of Y when X =15

Regression of Y on X                            Y = 6.6 – 0.2X Y = 6.6 – 0.2(15) Y = 3.3

5.	  Coefficient of correlation

r = r = r = (- 0.2)

Regression Equation in case of frequency distribution table :

•	Regression Equation of X on Y  class interval of variable X  class interval of variable Y

•	Regression Equation of Y on X : class interval of variable X class interval of variable Y

CHECK YOUR PROGRESS

Activity  F         Regression coefficient of X on Y is denoted by: (1)   ,   (2)     ,  (3)  r  ,  (4)  none of them

Standard Error Of Estimate It measures the accuracy of the estimated values. The estimated values of dependent variable may be different from its observed values. The main reason for such error or variation is that the variation in dependent variable may not be only due to variation in independent variables. These variation may be due to number of other factors also. For example, production of Rice depends not only on the amount of rainfall but also on other factors such as quality of seeds used, amount and quality of fertilizer used and many more. This shows how good and representative the regression line is as a measure of relationship between the two variables. It measures the scatterness of observation around the regression line.This concept is similar to standard deviation, which measures dispersion of observation about the mean of distribution. Interpretations ; 1.	If the value of standard error of estimate is zero, then the dots will lie on the regression lines and estimates based on regression lines will be perfect.

2.	If the value of the standard error of estimate is small, then the dots will be closer to the regression lines and estimates based on regression lines will be good.

3.	If the value of standard error of estimate is large, then the dots will be farther away from the regression lines and estimates based on regression lines will be poor.

•	Standard Error Of X                         standard deviation of X

•	Standard Error Of Y                         standard deviation of Y

CHECK YOUR  PROGRESS Activity G     From the following data compute standard error of estimate of  X  ,     , Comparison Between Correlation And Regression

1.	Correlation measures degree and direction of relationship between the variables. Regression measures nature and extent of relationship between the variables. 2.	Correlation is a relative measure hence, correlation coefficient is independent of units of measurement. Regression is an absolute measure therefore, regression coefficient is not independent of units of measurement. 3.	Coefficient of correlations  i.e. they are symmetrical. Regression coefficients  i.e. they are not symmetrical. 4.	Correlation coefficient is independent of change of origin and change of scale.Regression coefficients are independent of change of origin but not of scale. 5.	There may be non-sense correlation between the variables but this is not so in case of regression. 6.	Correlation cannot be used for prediction .Regression is tool for making prediction.

Limitations of Regression Analysis 1.	Specified limited Range Regression relationship based on limited data cannot be used beyond that range. For example, there is close relationship between the productivity of food grains and the amount of fertilizers used. However, indiscriminate use of fertilizer may destroy the food grains.

2.	Cause and effect relationship Regression analysis cannot identify cause and effect relationship between the variables. It simply assumes that one variable is independent and other is dependent. For example, there is relation between the volume of sales and amounts spend on advertisement but which is the cause and which is the effect. 3.	Historical data used to estimate future events Regression equation is calculated on the basis of past trends. Over the period of time conditions may change and assumptions on which it is based may fail.

SUMMARY Forecasting refers to the process of predicting future events. With the increasing size and increase in competition the problems are getting complicated. Therefore, to be successful every organizations should be able to analyze data to identify key factors for decision making. Regression analysis is one of the techniques for making predicitions.This should be carefully used along with the awareness of prevailing socio-economic and business conditions.Commonsense and good judgement will help in making accurate forecast.

GLOSSARY •	Correlation   :  It measures the degree and direction of relationship between the variables. •	Standard Deviation  : It measures dispersion of observed values around the mean

LIST  OF   FORMULAE

4	Regression Equation of X on Y.

Regression Coefficient of X on Y

1.	When deviations are taken from actual mean

2  When deviations are taken from assumed mean

3  When no deviations are taken

4  In case of frequency distribution table

5	Regression Equation of Y on X

Regression coefficient of Y on X

1.	When deviations are taken from actual mean 2.	When deviations are taken from assumed mean 3.	When no deviations are taken 4.	In case of frequency distribution table 5.	•	Coefficient of correlation •	Standard Error of Estimate of X

•	Standard Error of Estimate of Y

SELF ASSESSMENT   EXERCISE 1  What are regression coefficients? State the properties of regression coefficients.

2   The regression equations of X on Y and Y on X are irreversible. Explain

3    Distinguish between correlation and regression.

4   Explain the concept of standard error of estimate. What is the standard error of      estimating Y from X if r = 1? 5  From the following data calculate 1 two regression coefficients 2 coefficient of correlation 3 two regression equations, N=10,   ,  ,   ,   ,

FURTHER READINGS •	Gupta, S.P., and Archana Gupta, Statistical Methods ,Sultan Chand & Sons, latest edition •	Jhunjhunwala, Bharat, Business Statistics, S. Chand & Company Ltd., first edition,2008. •	http:// www.nlreg.com •	http:// learning.mazoo.net •	http:// www.statitically significant consulting.com •	http:// www.wikipedia