welcome to this lecture, you may kindly recall

that in the earlier lecture we had discussed the analysis of variance in a multiple linear

regression model this is a test for testing the null hypothesis that all the regression

coefficient are=0 or not. so through the test of analysis of variance, we are judging

the overall adequacy of the model. now, we have two options, suppose the results

from this analysis of variance, they indicate that h naught is accepted or second option

is that h-naught is rejected right. so in case if h-naught is accepted, then there is

no issue, we understand that none of the variables are going to contribute in explaining the

variation in y. when h-naught is rejected, this indicates that at least there is one

independent variable that is explaining the variation in the values of y. and there is another alternative also, that

there can be more than one explanatory variable which are helping in explaining the variation

in y. so now the question is this we would like to identify those variables which are

contributing and those variables which are not contributing. so in order to do it, we

have to proceed one by one. one by one means we have to go step wise and we need to test

the significance of regression coefficients one by one. so now we try to discuss here the test of

hypothesis for individual regression coefficients, and these regression coefficients are based

on our assumption that well, they are responsible for the rejection of null hypothesis..so since,

we have considered the regression coefficient beta2 beta3 betak. so i would like to develop

a test of hypothesis for individual beta so let us try to postulate our null hypothesis,

say h-naught beta j is=say here 0. and j goes from here 2 to k, you may also

recall that in the case of simple linear regression modelling, we had discussed the construction

of test statistics for testing the significance of slow parameters h-naught beta1 is=beta1

naught. so this test of hypothesis now in our case that is going to be base on the similar

lines what we did in the case of simple linear regression model, but in this case our alternative

hypothesis in which we are interested is h ,1beta j is not=zero. so now, in the case of simple linear regression

model, we had two cases, when sigma square is known and when sigma square is unknown.

in this case, we can see that we have estimated the sigma square by some of a square due to

residual divided by the degrees of freedom. so here we are interested in the case when

sigma square is unknown to us and that is being estimated from the given sample of data. so under social assumption, we can construct

the test statistics tj which is beta j hat – it’s given value which is 0 divided by standard

error of beta j hat which i am denoting as a sigma square cjj, where cjj is the jth diagonal

element in the matrix x transpose x whole inverse, why if you try to remember, we had

obtained that the covariance matrix of beta hat was sigma square x transpose x whole inverse. and this covariance matrix is giving the variances

of beta1 hat, beta2 hat, beta k hat on the diagonal terms and they are co variances on

the off diagonal terms. so in order to find out the standard error of beta j, we are picking

up the variance of beta j hat. so this is being given by sigma square and the jth diagonal

element of the matrix, x transpose x whole inverse. now this statistics is going to follow a t

distribution with n – k minus one degrees of freedom under h nought. and, you can also

recall that we are estimating sigma square by ss residual divided by degrees of freedom. so now in this case, once we have obtained

the data, we have calculated the statistics tj, my decision will become reject h-naught

at alpha level of significance if absolute value of tj is greater than the

critical value t alpha by two at n – k- 1-degrees of freedom. so this is actually

a sort of marginal test. marginal test means earlier we had analysis of variance, which

is testing the equality of all beta2, beta3, beta k together. and now we are coming on the aspect of testing

one regression coefficient at a time right. so, that are why this is called as a marginal

test, why because, also the beta j depends on all other

explanatory variables also except xj. so now using this thing, we

can now identify that which are the independent variables are explaining the variation in

y and which are not. simultaneously, i would also like to discuss the aspect of confidence

interval estimation. so in this case, if we try to find out the

confidence interval for beta j, then the hundred one – alpha percent confidence interval for

beta j, j goes from 2, 3, up to here k is given by the expression and if you recall

in the case of simple linear regression model, we had obtained the confidence interval for

the slow parameter beta1 and intercept term beta0. so here also we are going to follow

the similar philosophy. so, i can say here that this beta hat j – beta

j upon is the standard that sigma square hat cjj, this lies between – t alpha by 2 and

– k – 1 and t alpha by 2 and – k – 1 and this probability is going to be 1 – alpha, and

based on this i can find out the confidence interval as beta hat j – t alpha by 2 and

– k – 1, sigma hat square cjj and beta j hat + t alpha by 2 and – k – 1, is square

root of sigma square cjj. so this is the hundred 1 – alpha percent confidence

intervals for an individual beta j. so now here you can see that how the theory concepts

and algebra that we have learnt in the case of simple linear regression modelling is helping

us in developing the model for a case when we have more than one independent variables

now continuing on the aspects of this confidence interval, this is a confidence interval for

an individual regression coefficient. now we can also construct the simultaneous

confidence interval on regression coefficients right. what do we really mean by simultaneous

confidence interval on regression coefficient? so this is actually a set of confidence intervals

that are true simultaneously with probability 1 – alpha right and they are also called as

joint confidence interval. so now in order to construct a joint confidence

interval, we can use the result that beta hat – beta transpose, x transpose x beta hat

– beta upon k – 1 ms res this follows a f distribution with k – 1 and n – k – 1 degrees

of freedom. now i can write down that we would like to find out the value of beta in such

a way that beta hat – beta transpose x transpose x beta hat – beta over k – 1 times ms res

is less than or=f alpha k – 1 and – k – 1 degrees of freedom. and the probability of such an event is 1

– alpha. so now hundred 1 – alpha percent confidence region for all parameters in beta

is the region which is given by this inequality k – 1 and this describes an elliptically shaped

region. you see, when you have only one parameter then we have a confidence interval when we

have two parameters and we want to find out their simultaneous confidence interval that

can be a region in the two dimensional. similarly when we go for the ith dimensional,

this confidence interval will be transformed into a region. so now, this confidence interval

is essentially the simultaneous interval for all the parameters beta1, beta2, betak, so

this is going to be a sort of elliptically shaped region. well, after this we come to another aspect

and we talked about coefficient of determination and this is actually denoted by r square.

so now the first question is what is this coefficient of determination, now you see,

now we have reached to a stage where using the data on independent and dependent variables,

we have obtained a model by estimating the parameters beta1, beta2, betak and sigma square. now this estimation technique can be anything,

either least square estimation or maximum likelihood estimation, and based on that we

have obtained the fitted model. now basic question is that how do we know that the model

which we have got is good or bad or how to judge the goodness of fit of this model. so

this coefficient of determination helps us in determining the goodness of fit of a model. so first question comes, how should i judge

it? in case if you try to recall in the case of simple linear regression model what we

have done. we had one independent variable x and we had one dependent variable y, we

had obtained the data and then we have created a scattered diagram and it was something like

this. so suppose if i try to take here two situations something like this x and y and

in which the skills of x and y are the same here and we try to fit here a line like this

and here this. now what you can say that which of the model

is going to be fitted better, so obviously in this figure i can see that the points are

lying more closely to the fitted line in comparison to this figure. here you can see that the

points are lying here and here and they are quite far away then the points in figure number

one. so this gives us an idea that in case if the points are lying close to the line

that means our model is better fitted. one simple option to major this quantity is

to find out the correlation coefficient between x and y. so obviously, in case of figure number

one, the correlation coefficient will have a higher value than in figure number two.

this concept is extended and this is used to judge the goodness of fit in a multiple

linear regression model. now we try to extend the concept of a simple correlation coefficient

and we try to use the concept of multiple correlation coefficient. so in case if i define r be the multiple correlation

coefficient between y and x1, x2, here say x k. then the square of this multiple correlation

which is denoted as r square, this is called as coefficient of determination and the utility

of r square is this it describes how well the sample line fits to the observed data,

and in some sense this measures the goodness of fit of the model. and it also measure the explanatory power

of model and this reflects the model adequacy in the sense that how much is the explanatory

power of the explanatory variables. so in simple sense i would say r square is a measure

that will give us an idea that the model which we have obtained whether this model is good

or bad and if good how much this is good, and if bad how much this is bad. so we try to consider here the model and we

consider here a model yi is=beta1, beta2 xi2 + up to here, beta k xik + epsilon i,

i goes from here 1 to n. one very important thing we have to keep in mind that we are

assuming here that intercept term is present. this r square has a limitation that it assumes

that the intercept term is present in the model and the value of r square can be obtained

only in such a condition. if you do not consider the intercept term

in the model, then the definition of r square which we are going to consider here, this

will not remain valid. so now in this case, we try to define the r square has 1 – say

sum of a square due to residual divided by sum of a square due to total. now if you see

the interpretation of this r square under what condition you would call that a model

is good fitted. the model is good fitted when the contribution

of the random error component in the model is as small as possible, and ideally this

should be=0. so now this can be return as sst – ss res divided by sst. now, you may

recall that in the case of analysis of variance we had proved that sst is=ss reg and ss

res. so now if i try to use this relation over here, this can be written as ss reg or

sst. so now, the expression for this r square will

simply now here 1 – ss res that is epsilon hat transpose epsilon hat divided by summation

i goes now 1 to here n, yi – y bar, whole square. now the question comes, how do we

ensure that this definition of r square is giving us what we want? if you see in a good

model, what will happen, for example, if i try to consider this expression r square is

=1 – ss res over sst. so a model will be good in case if the contribution

due to random error is as small as possible and in that case, ideally we would assume

that sum of square due to residual should be zero. so in this case, if i say that if

sum of a square due to residual is 0 then r square=one and this is a best fitted model

and that would be an ideal condition in which we are all interested. now on the other hand, in case if the model

is not at all good fitted, that means the contribution of the sum of square due to regression

is 0 that means none of the x1, x2, xk variables are helping us in explaining the variations

in the values of y and in that case when the sum of a square is due to regression become

0 then sst is=ss res and r square in this case becomes 1 – 1 over 1 that is=0. so this would be indicating the poorest fit

of the model or rather i would say this is the worse fitted model. so now we see that the value of r square is

lying between 0 and 1 and r square is=0, this indicates the poorest fit, worse fit

and r square=1, this indicates the best fit. now, if i try to take any other value

of r square say for example, 0 point 95, then this indicates that 95 percent of the variation

in y is being explained by the fitted model or the independent variable x1, x2 xk. this r square has one limitation that value

of r square increases as number of explanatory variables increases, this is a limitation.

now suppose somebody is trying to fit a model with certain number of explanatory variables.

now some more variables are added in the model, which are not relevant, they are simply useless

variable. they are not affecting at all the values of y. so this does not indicate that the model will

get better by using those irrelevant variables, but and we try to do so, the value of r square

will increase and that would indicate my model is getting better and better. so in order

to handle this limitation, we can define variant of r square which is called as adjusted r

square. so this adjusted r square corrects this limitation. so now we discuss about the adjusted r square.

the adjusted r square, this is denoted by r bar square and this is defined as 1 – sum

of square due to residuals divided by n – k upon sum of squares due to total and divided

by n – 1. so this can be further simplified as 1 – n – 1 over n – k times 1 – r square.

now if you try to observe, what are we going to do here, we are trying to divide sum of

square due to residual by n – k. so if you try to recall what was your n – k

in the context of some of the square due to residual, this was actually the degrees of

freedom associated with the distribution of ss res in the context of analysis of variance

and similarly, if you see we are trying to divide the total sum of a square sst by here

n – 1. so here again, this n – 1 is simply the degrees of freedom associated with the

distribution of sst in the context of analysis of variance. so this adjusted r square helps us and it

does not increase has the number of independent variables are added in the model, but on the

other side, adjusted r square also has certain limitation. so, first limitation is this,

that adjusted r square can be negative. with this is difficult to believe that well, that

is a square quantity but if you try to see with this is a function of n k and see this

here r square. yes, r square cannot be negative. for example

if i try to illustrate this limitation, let me take a hypothetical example that k is=3,

n is=10 and r is=0 point 1 6 and then r bar square will be 1 – 9 over 7 into 0 point

8 4 and this value turns out to be less than 0. so obviously now you can see here that

r bar square has no interpretation. on the other hand, if you try to look at this

situation from the application point of view, i would argue that in practice such situations

are very rare to occur, why because if you see we are here getting a value of r square

which is 0 point 1 6 that means the fitted model is explaining only 16 percent of variability

using the variable x1, x2, xk. so obviously this is already indicating that the linearity

of the model is questionable. so in this situation, even i would not like

to use the multiple linear regression model, but rather i would try to fit some other model. so now after this, let me try to explain that

what are the limitations of r square, well r square is a very popular goodness of fittest

statistics among all the user in experimental sciences, but it has got some serious limitations

also. first limitation, we already have discussed that if constant term or the intercept term

in the model is absent, then r square is not defined. this can also be shown mathematically, but

i am skipping the proof here, and in case if someone is considering a model without

the intercept term and still if he or she tries to find out the value of r square, there

is a risk that the r square value cannot be negative. the next question comes, then if

someone is trying to fit a model without intercept term, then how the goodness of fit of a model

can be judged. well, there is a unique answer in the literature

some adopt measures have been defined, but definitely there is no guarantee that those

adopt measures will give us a good statistical outcome, but any way this is the limitation

of the r square and this is how it has to be used okay. the second limitation of r square

is this that r square is sensitive to extreme values. so i can say in simple word that r

square lacks robustness. now coming on the issue from the application

point of view that we know that when we are going to fit a model to a given set of data,

we need to first make sure that there are no extreme values in the given set of data

and in case if they are present, there are some other ways to handle them. so i really,

this condition will not happen in practice if somebody is carefully making a model, but

definitely in case if somebody is ignoring this aspect then r square will lack the robustness. and earlier, we had also discussed the third

limitation that r square increases as the number of explanatory variables increases.

now, let me come to a different type of situation where we are interested in comparing two different

models. suppose, there is a situation and there are two models which are fitted. suppose

the model number 1 is say yi is=beta1 + beta2 xi 2 + betak xik + epsilon i. second model is that, the model is fitted

using the same data, but by taking the log transformation, that all the values on the

response variable, there log is taken and then the model is fitted. so in this case,

i would like to denote the regression coefficients by gamma 1 gamma 2, gamma k in place of beta1,

beta2, betak, so the model can be written as here xi2 plus here gamma k xik + some suppose

some random error component epsilon i. in this case, if we try to define the r square

then for model number one, the r square is suppose, r1 square that is defined as 1 – summation

i goes now 1 to n, yi – yi hat whole square divided by i goes from 1 to n. say yi – y

bar whole square. now we have to define the r square for model number two, now there can

be various possibilities. for example, one simple option, i am not saying this is the

only option there can be many other option. one option is that 1 – summation i goes from

1 to n, log of yi – log of yi hat square divided by i goes from 1 to n, log of yi – log of

y bar. here also someone may argue that instead of taking log of y bar we would like to consider

the arithmetic mean of log of yi’s that first we take the transformation and then

finding out the sample mean but anyway that is not my objective way to discuss those things,

but it is clear from the values of r 1 square and r 2 square that these two values are not

comparable. so now the issue is very, very simple that

two different persons or the same persons have obtained two different models and he

wants to know out of this model number one and two which is a better fitted model, so

this cannot be obtained using the definition of r square. so i am not saying at all that

r square is a bad measure but r square is a very good measure, r square measures the

goodness of fit but it has some limitation and it has some nice properties. so i would

suggest that use r square but be careful, handle with care. this r square has also got a relationship

with f statistics that we had obtained in the case of analysis of variance, we see how,

if you recall that in the case of analysis of variants, we had considered the model yi

is=beta1, beta2 xi2 + betak xik + epsilon i and at that time also i had told that we

are going to consider here the presence of intercept term in the model because that we

will use later on to established relationship between r square and this f statistics. so now, this is the situation we are going

to use it here. so if you remember that over a null hypothesis in the case of analysis

of variants was h-naught beta2=beta3=betak is=0. this was your under analysis of variance,

and based on that we had finally obtained the f statistics which was obtained as mean

square due to regression divided by mean square due to residuals and this was actually n – k

upon k – 1, ss due to regression and ss due to residuals. now this can further be expressed as a n – k

over k – 1 and ss regression and now using the relationship that total sum of square

is=sum of square due to regression + some of square due to residuals, i can rewrite

some of square due to residual as total sum of squares – sum of square due to regression.

so this can be written as here, n – k over k – 1, ss reg divided by sst divided by 1

– ss reg divided by sst. so this comes out to be nothing but n – k

over k – 1 and then r square over 1 – r square. so you can see here that there is a close

form relationship between the f statistics of analysis of variance and r square and both

are very closely related and then they have a very close interpretation also. so f and

r square are closely related. we will see how? when i say, suppose r square is=0 then in this case f also becomes zero and

when we say that r square is equal to one then that becomes infinity in limit. so this

implies that larger the value of r square this implies greater the value of f. so now

what is the interpretation of this thing, so i can conclude that if f is highly significant

that means the test of hypothesis based on this f statistics is indicating that all the

regression coefficients are significant, then we can reject h naught. and when we reject the hypothesis h-naught,

when all the variables x2, x3 up to xk they are relevant variable and they are helping

in explaining the variation in y. so i can conclude that y is linearly related to x2,

x3, xk and that is what we had also said in case of analysis of variance that this is

a test of overall adequacy. so now one can see that there is a close connection between

f statistics of analysis of variants and this r square and their interpretations are also

related. so when we are going for the software issues

then we will see that the software outcome consist of r square values, adjusted r square

values as well as the analysis of variance stable. so looking at the outcome of software,

we try to make different types of conclusions for the fitted linear regression model. so

now we stop here the topics of multiple linear regression model, and in the next lecture,

we will come off with some other issues, till then good bye.