# Testing of Hypothesis (continued) and Goodness of Fit of the Model

welcome to this lecture, you may kindly recall
that in the earlier lecture we had discussed the analysis of variance in a multiple linear
regression model this is a test for testing the null hypothesis that all the regression
coefficient are=0 or not. so through the test of analysis of variance, we are judging
the overall adequacy of the model. now, we have two options, suppose the results
from this analysis of variance, they indicate that h naught is accepted or second option
is that h-naught is rejected right. so in case if h-naught is accepted, then there is
no issue, we understand that none of the variables are going to contribute in explaining the
variation in y. when h-naught is rejected, this indicates that at least there is one
independent variable that is explaining the variation in the values of y. and there is another alternative also, that
there can be more than one explanatory variable which are helping in explaining the variation
in y. so now the question is this we would like to identify those variables which are
contributing and those variables which are not contributing. so in order to do it, we
have to proceed one by one. one by one means we have to go step wise and we need to test
the significance of regression coefficients one by one. so now we try to discuss here the test of
hypothesis for individual regression coefficients, and these regression coefficients are based
on our assumption that well, they are responsible for the rejection of null hypothesis..so since,
we have considered the regression coefficient beta2 beta3 betak. so i would like to develop
a test of hypothesis for individual beta so let us try to postulate our null hypothesis,
say h-naught beta j is=say here 0. and j goes from here 2 to k, you may also
recall that in the case of simple linear regression modelling, we had discussed the construction
of test statistics for testing the significance of slow parameters h-naught beta1 is=beta1
naught. so this test of hypothesis now in our case that is going to be base on the similar
lines what we did in the case of simple linear regression model, but in this case our alternative
hypothesis in which we are interested is h ,1beta j is not=zero. so now, in the case of simple linear regression
model, we had two cases, when sigma square is known and when sigma square is unknown.
in this case, we can see that we have estimated the sigma square by some of a square due to
residual divided by the degrees of freedom. so here we are interested in the case when
sigma square is unknown to us and that is being estimated from the given sample of data. so under social assumption, we can construct
the test statistics tj which is beta j hat – it’s given value which is 0 divided by standard
error of beta j hat which i am denoting as a sigma square cjj, where cjj is the jth diagonal
element in the matrix x transpose x whole inverse, why if you try to remember, we had
obtained that the covariance matrix of beta hat was sigma square x transpose x whole inverse. and this covariance matrix is giving the variances
of beta1 hat, beta2 hat, beta k hat on the diagonal terms and they are co variances on
the off diagonal terms. so in order to find out the standard error of beta j, we are picking
up the variance of beta j hat. so this is being given by sigma square and the jth diagonal
element of the matrix, x transpose x whole inverse. now this statistics is going to follow a t
distribution with n – k minus one degrees of freedom under h nought. and, you can also
recall that we are estimating sigma square by ss residual divided by degrees of freedom. so now in this case, once we have obtained
the data, we have calculated the statistics tj, my decision will become reject h-naught
at alpha level of significance if absolute value of tj is greater than the
critical value t alpha by two at n – k- 1-degrees of freedom. so this is actually
a sort of marginal test. marginal test means earlier we had analysis of variance, which
is testing the equality of all beta2, beta3, beta k together. and now we are coming on the aspect of testing
one regression coefficient at a time right. so, that are why this is called as a marginal
test, why because, also the beta j depends on all other
explanatory variables also except xj. so now using this thing, we
can now identify that which are the independent variables are explaining the variation in
y and which are not. simultaneously, i would also like to discuss the aspect of confidence
interval estimation. so in this case, if we try to find out the
confidence interval for beta j, then the hundred one – alpha percent confidence interval for
beta j, j goes from 2, 3, up to here k is given by the expression and if you recall
in the case of simple linear regression model, we had obtained the confidence interval for
the slow parameter beta1 and intercept term beta0. so here also we are going to follow
the similar philosophy. so, i can say here that this beta hat j – beta
j upon is the standard that sigma square hat cjj, this lies between – t alpha by 2 and
– k – 1 and t alpha by 2 and – k – 1 and this probability is going to be 1 – alpha, and
based on this i can find out the confidence interval as beta hat j – t alpha by 2 and
– k – 1, sigma hat square cjj and beta j hat + t alpha by 2 and – k – 1, is square
root of sigma square cjj. so this is the hundred 1 – alpha percent confidence
intervals for an individual beta j. so now here you can see that how the theory concepts
and algebra that we have learnt in the case of simple linear regression modelling is helping
us in developing the model for a case when we have more than one independent variables
now continuing on the aspects of this confidence interval, this is a confidence interval for
an individual regression coefficient. now we can also construct the simultaneous
confidence interval on regression coefficients right. what do we really mean by simultaneous
confidence interval on regression coefficient? so this is actually a set of confidence intervals
that are true simultaneously with probability 1 – alpha right and they are also called as
joint confidence interval. so now in order to construct a joint confidence
interval, we can use the result that beta hat – beta transpose, x transpose x beta hat
– beta upon k – 1 ms res this follows a f distribution with k – 1 and n – k – 1 degrees
of freedom. now i can write down that we would like to find out the value of beta in such
a way that beta hat – beta transpose x transpose x beta hat – beta over k – 1 times ms res
is less than or=f alpha k – 1 and – k – 1 degrees of freedom. and the probability of such an event is 1
– alpha. so now hundred 1 – alpha percent confidence region for all parameters in beta
is the region which is given by this inequality k – 1 and this describes an elliptically shaped
region. you see, when you have only one parameter then we have a confidence interval when we
have two parameters and we want to find out their simultaneous confidence interval that
can be a region in the two dimensional. similarly when we go for the ith dimensional,
this confidence interval will be transformed into a region. so now, this confidence interval
is essentially the simultaneous interval for all the parameters beta1, beta2, betak, so
this is going to be a sort of elliptically shaped region. well, after this we come to another aspect
and we talked about coefficient of determination and this is actually denoted by r square.
so now the first question is what is this coefficient of determination, now you see,
now we have reached to a stage where using the data on independent and dependent variables,
we have obtained a model by estimating the parameters beta1, beta2, betak and sigma square. now this estimation technique can be anything,
either least square estimation or maximum likelihood estimation, and based on that we
have obtained the fitted model. now basic question is that how do we know that the model
which we have got is good or bad or how to judge the goodness of fit of this model. so
this coefficient of determination helps us in determining the goodness of fit of a model. so first question comes, how should i judge
it? in case if you try to recall in the case of simple linear regression model what we
have done. we had one independent variable x and we had one dependent variable y, we
had obtained the data and then we have created a scattered diagram and it was something like
this. so suppose if i try to take here two situations something like this x and y and
in which the skills of x and y are the same here and we try to fit here a line like this
and here this. now what you can say that which of the model
is going to be fitted better, so obviously in this figure i can see that the points are
lying more closely to the fitted line in comparison to this figure. here you can see that the
points are lying here and here and they are quite far away then the points in figure number
one. so this gives us an idea that in case if the points are lying close to the line
that means our model is better fitted. one simple option to major this quantity is
to find out the correlation coefficient between x and y. so obviously, in case of figure number
one, the correlation coefficient will have a higher value than in figure number two.
this concept is extended and this is used to judge the goodness of fit in a multiple
linear regression model. now we try to extend the concept of a simple correlation coefficient
and we try to use the concept of multiple correlation coefficient. so in case if i define r be the multiple correlation
coefficient between y and x1, x2, here say x k. then the square of this multiple correlation
which is denoted as r square, this is called as coefficient of determination and the utility
of r square is this it describes how well the sample line fits to the observed data,
and in some sense this measures the goodness of fit of the model. and it also measure the explanatory power
of model and this reflects the model adequacy in the sense that how much is the explanatory
power of the explanatory variables. so in simple sense i would say r square is a measure
that will give us an idea that the model which we have obtained whether this model is good
or bad and if good how much this is good, and if bad how much this is bad. so we try to consider here the model and we
consider here a model yi is=beta1, beta2 xi2 + up to here, beta k xik + epsilon i,
i goes from here 1 to n. one very important thing we have to keep in mind that we are
assuming here that intercept term is present. this r square has a limitation that it assumes
that the intercept term is present in the model and the value of r square can be obtained
only in such a condition. if you do not consider the intercept term
in the model, then the definition of r square which we are going to consider here, this
will not remain valid. so now in this case, we try to define the r square has 1 – say
sum of a square due to residual divided by sum of a square due to total. now if you see
the interpretation of this r square under what condition you would call that a model
is good fitted. the model is good fitted when the contribution
of the random error component in the model is as small as possible, and ideally this
should be=0. so now this can be return as sst – ss res divided by sst. now, you may
recall that in the case of analysis of variance we had proved that sst is=ss reg and ss
res. so now if i try to use this relation over here, this can be written as ss reg or
sst. so now, the expression for this r square will
simply now here 1 – ss res that is epsilon hat transpose epsilon hat divided by summation
i goes now 1 to here n, yi – y bar, whole square. now the question comes, how do we
ensure that this definition of r square is giving us what we want? if you see in a good
model, what will happen, for example, if i try to consider this expression r square is
=1 – ss res over sst. so a model will be good in case if the contribution
due to random error is as small as possible and in that case, ideally we would assume
that sum of square due to residual should be zero. so in this case, if i say that if
sum of a square due to residual is 0 then r square=one and this is a best fitted model
and that would be an ideal condition in which we are all interested. now on the other hand, in case if the model
is not at all good fitted, that means the contribution of the sum of square due to regression
is 0 that means none of the x1, x2, xk variables are helping us in explaining the variations
in the values of y and in that case when the sum of a square is due to regression become
0 then sst is=ss res and r square in this case becomes 1 – 1 over 1 that is=0. so this would be indicating the poorest fit
of the model or rather i would say this is the worse fitted model. so now we see that the value of r square is
lying between 0 and 1 and r square is=0, this indicates the poorest fit, worse fit
and r square=1, this indicates the best fit. now, if i try to take any other value
of r square say for example, 0 point 95, then this indicates that 95 percent of the variation
in y is being explained by the fitted model or the independent variable x1, x2 xk. this r square has one limitation that value
of r square increases as number of explanatory variables increases, this is a limitation.
now suppose somebody is trying to fit a model with certain number of explanatory variables.
now some more variables are added in the model, which are not relevant, they are simply useless
variable. they are not affecting at all the values of y. so this does not indicate that the model will
get better by using those irrelevant variables, but and we try to do so, the value of r square
will increase and that would indicate my model is getting better and better. so in order
to handle this limitation, we can define variant of r square which is called as adjusted r
square. so this adjusted r square corrects this limitation. so now we discuss about the adjusted r square.
the adjusted r square, this is denoted by r bar square and this is defined as 1 – sum
of square due to residuals divided by n – k upon sum of squares due to total and divided
by n – 1. so this can be further simplified as 1 – n – 1 over n – k times 1 – r square.
now if you try to observe, what are we going to do here, we are trying to divide sum of
square due to residual by n – k. so if you try to recall what was your n – k
in the context of some of the square due to residual, this was actually the degrees of
freedom associated with the distribution of ss res in the context of analysis of variance
and similarly, if you see we are trying to divide the total sum of a square sst by here
n – 1. so here again, this n – 1 is simply the degrees of freedom associated with the
distribution of sst in the context of analysis of variance. so this adjusted r square helps us and it
does not increase has the number of independent variables are added in the model, but on the
other side, adjusted r square also has certain limitation. so, first limitation is this,
that adjusted r square can be negative. with this is difficult to believe that well, that
is a square quantity but if you try to see with this is a function of n k and see this
here r square. yes, r square cannot be negative. for example
if i try to illustrate this limitation, let me take a hypothetical example that k is=3,
n is=10 and r is=0 point 1 6 and then r bar square will be 1 – 9 over 7 into 0 point
8 4 and this value turns out to be less than 0. so obviously now you can see here that
r bar square has no interpretation. on the other hand, if you try to look at this
situation from the application point of view, i would argue that in practice such situations
are very rare to occur, why because if you see we are here getting a value of r square
which is 0 point 1 6 that means the fitted model is explaining only 16 percent of variability
using the variable x1, x2, xk. so obviously this is already indicating that the linearity
of the model is questionable. so in this situation, even i would not like
to use the multiple linear regression model, but rather i would try to fit some other model. so now after this, let me try to explain that
what are the limitations of r square, well r square is a very popular goodness of fittest
statistics among all the user in experimental sciences, but it has got some serious limitations
also. first limitation, we already have discussed that if constant term or the intercept term
in the model is absent, then r square is not defined. this can also be shown mathematically, but
i am skipping the proof here, and in case if someone is considering a model without
the intercept term and still if he or she tries to find out the value of r square, there
is a risk that the r square value cannot be negative. the next question comes, then if
someone is trying to fit a model without intercept term, then how the goodness of fit of a model
can be judged. well, there is a unique answer in the literature
some adopt measures have been defined, but definitely there is no guarantee that those
adopt measures will give us a good statistical outcome, but any way this is the limitation
of the r square and this is how it has to be used okay. the second limitation of r square
is this that r square is sensitive to extreme values. so i can say in simple word that r
square lacks robustness. now coming on the issue from the application
point of view that we know that when we are going to fit a model to a given set of data,
we need to first make sure that there are no extreme values in the given set of data
and in case if they are present, there are some other ways to handle them. so i really,
this condition will not happen in practice if somebody is carefully making a model, but
definitely in case if somebody is ignoring this aspect then r square will lack the robustness. and earlier, we had also discussed the third
limitation that r square increases as the number of explanatory variables increases.
now, let me come to a different type of situation where we are interested in comparing two different
models. suppose, there is a situation and there are two models which are fitted. suppose
the model number 1 is say yi is=beta1 + beta2 xi 2 + betak xik + epsilon i. second model is that, the model is fitted
using the same data, but by taking the log transformation, that all the values on the
response variable, there log is taken and then the model is fitted. so in this case,
i would like to denote the regression coefficients by gamma 1 gamma 2, gamma k in place of beta1,
beta2, betak, so the model can be written as here xi2 plus here gamma k xik + some suppose
some random error component epsilon i. in this case, if we try to define the r square
then for model number one, the r square is suppose, r1 square that is defined as 1 – summation
i goes now 1 to n, yi – yi hat whole square divided by i goes from 1 to n. say yi – y
bar whole square. now we have to define the r square for model number two, now there can
be various possibilities. for example, one simple option, i am not saying this is the
only option there can be many other option. one option is that 1 – summation i goes from
1 to n, log of yi – log of yi hat square divided by i goes from 1 to n, log of yi – log of
y bar. here also someone may argue that instead of taking log of y bar we would like to consider
the arithmetic mean of log of yi’s that first we take the transformation and then
finding out the sample mean but anyway that is not my objective way to discuss those things,
but it is clear from the values of r 1 square and r 2 square that these two values are not
comparable. so now the issue is very, very simple that
two different persons or the same persons have obtained two different models and he
wants to know out of this model number one and two which is a better fitted model, so
this cannot be obtained using the definition of r square. so i am not saying at all that
r square is a bad measure but r square is a very good measure, r square measures the
goodness of fit but it has some limitation and it has some nice properties. so i would
suggest that use r square but be careful, handle with care. this r square has also got a relationship
with f statistics that we had obtained in the case of analysis of variance, we see how,
if you recall that in the case of analysis of variants, we had considered the model yi
is=beta1, beta2 xi2 + betak xik + epsilon i and at that time also i had told that we
are going to consider here the presence of intercept term in the model because that we
will use later on to established relationship between r square and this f statistics. so now, this is the situation we are going
to use it here. so if you remember that over a null hypothesis in the case of analysis
of variants was h-naught beta2=beta3=betak is=0. this was your under analysis of variance,
and based on that we had finally obtained the f statistics which was obtained as mean
square due to regression divided by mean square due to residuals and this was actually n – k
upon k – 1, ss due to regression and ss due to residuals. now this can further be expressed as a n – k
over k – 1 and ss regression and now using the relationship that total sum of square
is=sum of square due to regression + some of square due to residuals, i can rewrite
some of square due to residual as total sum of squares – sum of square due to regression.
so this can be written as here, n – k over k – 1, ss reg divided by sst divided by 1
– ss reg divided by sst. so this comes out to be nothing but n – k
over k – 1 and then r square over 1 – r square. so you can see here that there is a close
form relationship between the f statistics of analysis of variance and r square and both
are very closely related and then they have a very close interpretation also. so f and
r square are closely related. we will see how? when i say, suppose r square is=0 then in this case f also becomes zero and
when we say that r square is equal to one then that becomes infinity in limit. so this
implies that larger the value of r square this implies greater the value of f. so now
what is the interpretation of this thing, so i can conclude that if f is highly significant
that means the test of hypothesis based on this f statistics is indicating that all the
regression coefficients are significant, then we can reject h naught. and when we reject the hypothesis h-naught,
when all the variables x2, x3 up to xk they are relevant variable and they are helping
in explaining the variation in y. so i can conclude that y is linearly related to x2,
x3, xk and that is what we had also said in case of analysis of variance that this is
a test of overall adequacy. so now one can see that there is a close connection between
f statistics of analysis of variants and this r square and their interpretations are also
related. so when we are going for the software issues
then we will see that the software outcome consist of r square values, adjusted r square
values as well as the analysis of variance stable. so looking at the outcome of software,
we try to make different types of conclusions for the fitted linear regression model. so
now we stop here the topics of multiple linear regression model, and in the next lecture,
we will come off with some other issues, till then good bye.