Calculation of the least squares method. Where the least squares method applies

07.09.2020

Method of least squares (MNA, English. ORDINARY LEAST SQUARES, OLS) -- mathematical methodused to solve various tasks based on minimizing the sum of the squares of deviations of certain functions from the desired variables. It can be used to "solve" redefined systems of equations (when the number of equations exceeds the number of unknown), to search for solutions in the case of conventional (non-redefined) nonlinear systems of equations, for approximation of point values \u200b\u200bby some function. MNA is one of the basic regression analysis methods for evaluating unknown parameters of regression models on selective data.

Essence of the least squares method

Suppose that the set of unknown variables (parameters) is a set of functions from this set of variables. The task is to select such values \u200b\u200bx so that the values \u200b\u200bof these functions are as close as possible to some values. Essentially we are talking about "solutions" of the redefined system of equations in the specified sense of the maximum proximity to the left and right parts of the system. The essence of MNA is to choose as a "measure of proximity" the sum of the squares of the deviations of the left and right parts. Thus, the essence of MNK can be expressed as follows:

In case the system of equations has a solution, then at least the sum of the squares will be zero and accurate solutions of the system of equations analytically or, for example, by various numerical optimization methods can be found. If the system is overridden, that is, speaking of the incredible, the number of independent equations is greater than the number of desired variables, the system does not have an accurate solution and the least squares method allows you to find some "optimal" vector in the sense of the maximum proximity of the vectors and or the maximum proximity of the abnormal vector to zero (proximity It is understood in the sense of Euclidean distance).

Example - Linear Equations System

In particular, the method of smallest squares can be used to "solve" a system of linear equations

where the matrix is \u200b\u200bnot square, but a rectangular size (more precisely the rank of the matrix A is larger than the number of desired variables).

Such a system of equations, in general, has no solution. Therefore, this system can be "solved" only in the sense of choosing such a vector to minimize the "distance" between vectors and. To do this, you can apply the criterion for minimizing the sum of the squares of the difference between the left and right parts of the system equations, that is. It is easy to show that the solution to this minimization problem leads to the solution of the following system of equations

Using the pseudo transmission operator, the solution can be rewritten as follows:

where is a pseudo-male matrix for.

This task can also be "solved" using the so-called weighted MNC (see below), when different system equations receive a different weight from theoretical considerations.

Strict substantiation and establishment of the boundaries of the meaningful applicability of the method of Dana A. A. Markov and A. N. Kolmogorov.

MNG in regression analysis (data approximation) [edit | Edit Vicky Text] Let there be values \u200b\u200bof some variable (these may be the results of observations, experiments, etc.) and the corresponding variables. The task is to ensure that the relationship between and approximate some function known to some unknown parameters is in fact, that is, actually find best values parameters, the most approaching the values \u200b\u200bto the actual values. In fact, this is reduced to the case of "solutions" of the redefined system of equations relative:

In regression analysis and, in particular, there are probabilistic models of relationships between variables and in econometrics

where - the so-called random model errors.

Accordingly, the deviations of the observed values \u200b\u200bfrom the model is supposed already in the model itself. The essence of MNA (ordinary, classical) is to find such parameters in which the sum of the squares of deviations (errors for regression models is often referred to as regression residues) will be minimal:

where is the English. Residual Sum of Squares is defined as:

In general, the solution to this problem can be carried out by numerical optimization methods (minimization). In this case, they talk about nonlinear MNG (NLS or NLLS - English. Non-linear least squares). In many cases, you can get an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function by directing it according to unknown parameters, equating derivatives to zero and solving the resulting system of equations:

MNA in the case of linear regression [edit | edit wiki text]

Let the regression dependence be linear:

Let y be a vector-column of the observation of the explanatory variable, and is the selection of factors observation (lines of the matrix - vectors of factors in this observation, according to columns - vector values \u200b\u200bof this factor in all observations). The matrix representation of the linear model is:

Then the estimate vector of the explanatory variable and the regression residues will be equal

accordingly, the sum of the squares of regression residues will be equal to

Differentiating this feature by the parameter vector and equating derivatives to zero, we obtain a system of equations (in matrix form):

In decrypted matrix form, this system of equations is as follows:

where all the amounts are taken in all valid values.

If the model includes a constant (as usual), then at all, therefore, in the upper left corner of the matrix of the equation system, there is a number of observations, and in the other elements of the first line and the first column - simply the amount of variable values: and the first element of the right part of the system - .

The solution of this system of equations and gives a general formula for MN-estimates for a linear model:

For analytical purposes, the latter representation of this formula is useful (in the system of equations in dividing on N, instead of the amounts appear the average arithmetic). If the data is centered in the regression model, then in this representation, the first matrix makes sense of a selective covariance matrix of factors, and the second - vector of covariance of factors with a dependent variable. If, in addition, the data is still informed on the speed (that is, ultimately standardized), then the first matrix has the meaning of the selective correlation matrix of factors, the second vector - vector of selective correlations of factors with a dependent variable.

An important property of MNA estimates for models with constant - the line of constructed regression passes through the center of gravity of sample data, that is, equality is performed:

In particular, as a last resort, when the only regressor is a constant, we obtain that the MNC-evaluation of a single parameter (actually constant) is equal to the average value of the explanable variable. That is, the arithmetic average, known for its good properties from the laws of large numbers, is also an MNK estimate - satisfies the criterion of a minimum of the sum of the squares of the deviations from her.

Simplest private events [edit | edit wiki text]

In the case of paired linear regression, when the linear dependence of one variable from another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations is:

From here it is easy to find ratings of coefficients:

Despite the fact that in the general case of a model with a constant preferable, in some cases, it is known from theoretical considerations that the constant should be zero. For example, in physics, the dependence between voltage and current has the form; Measuring tension and current strength, it is necessary to estimate resistance. In this case, we are talking about the model. In this case, instead of the system of equations, we have the only equation

Consequently, the formula for estimating the only coefficient has the form

Statistical properties of MNK estimates [edit | edit wiki text]

First of all, we note that for linear models of MNA estimates are linear estimates, as follows from the above formula. The MNK-estimates are necessary and sufficiently implementing the most important conditions for regression analysis: conditional by factors mathematical expectation of a random error should be zero. This condition, in particular, is performed, if the mathematical expectation of random errors is zero, and factors and random errors are independent random variables.

The first condition can be considered always for models with a constant, since the constant takes over the nonzero mathematical expectation of errors (therefore models with a constant in general is preferable). The smallest square regression covariance

The second condition is the condition of exogenous factors - principal. If this property is not fulfilled, it can be assumed that almost any estimates will be extremely unsatisfactory: they will not even be legal (that is, even a very large amount of data does not allow to obtain qualitative estimates in this case). In the classical case, a stronger assumption of the determination of factors is made, in contrast to a random error, which automatically means the fulfillment of the exogency condition. In general, for the consistency of estimates, it is enough to perform an exogency condition together with the convergence of the matrix to a certain non-degenerate matrix with an increase in the size of the sample to infinity.

In addition to consistency and non-ability, estimates (usual), MNC were also effective (the best in class of linear unstasted estimates) requires additional properties of a random error:

Permanent (equal) dispersion of random errors in all observations (lack of heterosdasticity):

No correlation (autocorrelation) of random errors in different observations among themselves

These assumptions can be formulated for the covariance matrix of random errors.

Linear model satisfying such conditions is called classic. MNK-Evaluation for classic linear regression are the most effective estimates in the class of all linear uninforced estimates (in English-language literature sometimes use Best Linear Unbiased Estimator - the best linear unambiguous assessment; Gauss Theorem is more often given in the domestic literature. Markova). As it is easy to show, the covariance matrix of the odds of the coefficients will be equal to:

Efficiency means that this covariance matrix is \u200b\u200b"minimal" (any linear combination of coefficients, and in particular the coefficients themselves, have a minimal dispersion), that is, in the class of linear unbelievable estimates of the MNK-Best estimate. Diagonal elements of this matrix - dispersion of coefficients estimates are important parameters of the quality of the estimates. However, it is impossible to calculate the covariance matrix, since the dispersion of random errors is unknown. It can be proved that unrestricted and wealthy (for a classic linear model) estimate of the dispersion of random errors is the value:

Substituting this value in the formula for the covariance matrix and obtain an estimate of the covariance matrix. The estimates obtained are also attributed to and wealthy. It is also important that evaluation of error dispersion (and therefore dispersions of coefficients) and estimating model parameters are independent random valuesThat allows you to get test statistics to test the hypotheses about the model coefficients.

It should be noted that if the classical assumptions are not fulfilled, the MNK estimates of the parameters are not the most effective estimates (remaining unsecured and consistent). However, the evaluation of the covariance matrix is \u200b\u200beven more deteriorating - it becomes shifted with inexpensively. This means that statistical conclusions about the quality of the built model in this case can be extremely unreliable. One of the options for solving the last problem is the use of special evaluations of the covariance matrix, which are wealthy in violations of classical assumptions (standard errors in the form of White and standard errors in the form of New Usta). Another approach is to apply the so-called generalized MNC.

Generalized MNC [edit | edit wiki text]

Main article: The generalized method of least squares

Method of least squares allows broad generalization. Instead of minimizing the sum of the squares of residues, you can minimize some positively defined quadratic form from the residual vector, where - some symmetric positively defined weight matrix. Normal MNC is a particular case of this approach, when the weight matrix is \u200b\u200bproportional to a single matrix. As is known from the theory of symmetric matrices (or operators) for such matrices there is a decomposition. Therefore, the specified functionality can be represented as follows.

that is, this functionality can be represented as the sum of the squares of some converted "residues". Thus, you can select the class of least squares methods - LS-methods (Least Squares).

It has been proven (Theorem Aitken), which for a generalized linear regression model (in which no limitations are imposed on the covariaration matrix of random errors) are the most effective (in the class of linear unrelated estimates) are estimates of T.N. Generalized MNC (OMNA, GLS - Generalized Least Squares) - LS methods with a weight matrix equal to the reverse covariance matrix of random errors :.

It can be shown that the formula for OMNA-estimates of the parameters of the linear model has the form

The covariance matrix of these estimates will respectively will be equal

In fact, the Essence of the OMNA is a specific (linear) transformation (P) of the source data and the use of ordinary MNC to transformed data. The purpose of this transformation is for converted data random errors already satisfy classical assumptions.

Weighted MNC [edit | edit wiki text]

In the case of a diagonal weight matrix (and hence the covariance matrix of random errors) we have the so-called weighted MNA (WLS - Weighted Least Squares). In this case, the weighted sum of the squares of the model residues is minimized, that is, each observation receives "weight", inversely proportional dispersion of a random error in this observation:

In fact, data is converted by weighing observations (division by magnitude proportional to the intended standard deviation Random errors), and ordinary MNA applies to suspended data.

By choosing a type of regression function, i.e. The type of the dependence model of the dependence y from x (or x from y), for example, the linear model y x \u003d a + bx, it is necessary to determine the specific values \u200b\u200bof the model coefficients.

At different values, A and B, an infinite number of dependences of the form y x \u003d a + bx can be constructed on the coordinate plane there is an infinite number of direct, we also need such a dependence that corresponds to the observed values \u200b\u200bin the best possible way. Thus, the task is reduced to the selection of the best coefficients.

Linear function A + BX We are looking for, based on only some existing observations. To find a function with the best compliance with the observed values, we use the method of the smallest squares.

Denote: Y i - the value calculated by the equation y i \u003d a + bx i. Y i is the measured value, ε i \u003d y i -Y i - the difference between the measured and calculated by the equation values, ε i \u003d y i -a-bx i.

In the least squares method, ε i, the difference between the measured Y i and the values \u200b\u200bcalculated by the equation values \u200b\u200by i was minimal. Therefore, we find the coefficients a and b so that the sum of the squares of the deviations of the observed values \u200b\u200bfrom the values \u200b\u200bon the straight line of the regression turned out to be the smallest:

Exploring this function of arguments A and using derivatives to extremum, it can be proved that the function takes the minimum value if the coefficients A and B are system solutions:

(2)

If we divide both parts of normal equations on n, then we get:

Considering that (3)

Receive From here, substituting the value A in the first equation, we get:

At the same time, B is called the regression coefficient; A is called a free member of the regression equation and calculate according to the formula:

The resulting direct is an estimate for the theoretical line of regression. We have:

So, It is the equation of linear regression.

Regression may be straight (b\u003e 0) and reverse (B Example 1. The measurement results of X and Y values \u200b\u200bare given in the table:

x I.	-2	0	1	2	4
y I.	0.5	1	1.5	2	3

Assuming that between X and Y there is a linear dependence Y \u003d A + BX, which method of least squares determine the coefficients a and b.

Decision. Here n \u003d 5
x i \u003d -2 + 0 + 1 + 2 + 4 \u003d 5;
x i 2 \u003d 4 + 0 + 1 + 4 + 16 \u003d 25
x i y i \u003d -2 0.5 + 0 1 + 1 1.5 + 2 2 + 4 3 \u003d 16.5
y i \u003d 0.5 + 1 + 1.5 + 2 + 3 \u003d 8

and the normal system (2) has the form

Solving this system, we obtain: b \u003d 0.425, a \u003d 1.175. Therefore, y \u003d 1.175 + 0.425x.

Example 2. There is a sample of 10 observations of economic indicators (x) and (y).

x I.	180	172	173	169	175	170	179	170	167	174
y I.	186	180	176	171	182	166	182	172	169	177

It is required to find a selective regression equation on X. Construct a selective line of regression Y to X.

Decision. 1. We will organize data on the X I and Y i values. We get a new table:

x I.	167	169	170	170	172	173	174	175	179	180
y I.	169	171	166	172	180	176	177	182	182	186

To simplify the calculations, we will make the calculated table into which you bring the necessary numerical values.

x I.	y I.	x i 2.	x i y i
167	169	27889	28223
169	171	28561	28899
170	166	28900	28220
170	172	28900	29240
172	180	29584	30960
173	176	29929	30448
174	177	30276	30798
175	182	30625	31850
179	182	32041	32578
180	186	32400	33480
Σx i \u003d 1729	Σy i \u003d 1761	Σx i 2 299105	Σx i y i \u003d 304696
x \u003d 172.9	y \u003d 176.1.	x i 2 \u003d 29910.5	xy \u003d 30469.6

According to formula (4), calculate the regression coefficient

and according to formula (5)

Thus, the regression selective equation has the form y \u003d -59.34 + 1.3804x.
Application on the coordinate plane of the point (X i; Y i) and note the direct regression.

Figure 4.

Figure 4 shows how the observed values \u200b\u200bare located relative to the regression line. For the numerical estimate of the deviations y I from Y i, where Y I is observed, and y I determined by the regression of the value, will be a table:

x I.	y I.	Y I.	Y i -y i
167	169	168.055	-0.945
169	171	170.778	-0.222
170	166	172.140	6.140
170	172	172.140	0.140
172	180	174.863	-5.137
173	176	176.225	0.225
174	177	177.587	0.587
175	182	178.949	-3.051
179	182	184.395	2.395
180	186	185.757	-0.243

The values \u200b\u200bof Y i are calculated according to the regression equation.

A noticeable deviation of some observed values \u200b\u200bfrom the regression line is explained by a small number of observations. In the study of degree linear dependency Y from x The number of observations is taken into account. The strength of the dependence is determined by the correlation coefficient.

Which is the wider application in various fields of science and practical activity. It can be physics, chemistry, biology, economy, sociology, psychology, and so on, so on. The will of fate often have to deal with the economy, and therefore today I will execute you a junior in an amazing country called Econometric \u003d) ... How do you not want it?! There is very good - you just need to decide! ... But here's the fact that you probably definitely want - it is to learn to solve the tasks method of least squares. And especially diligent readers will learn to solve them not only unmistakably, but also very quickly ;-) But first general setting of the task + Related example:

Suppose in some subject area, the indicators that have a quantitative expression are investigated. In this case, there is every reason to believe that the indicator depends on the indicator. This assistance can be as a scientific hypothesis, as well as based on the elementary common sense. Leave, however, science aside and explore more appetizing areas - namely, food stores. Denote by:

- Shopping area of \u200b\u200bthe food store, sq.m.,
- annual turnover of the food store, million rubles.

It is clear that the larger the area of \u200b\u200bthe store, the more in most cases there will be more of its turnover.

Suppose that after conducting observations / experiments / counts / dances with a tambourine at our disposal is numerical data:

With guests, I think everything is clear: - This is the area of \u200b\u200bthe 1st store, - its annual turnover, - the area of \u200b\u200bthe 2nd store, - its annual turnover, etc. By the way, it is not necessary to have access to secret materials at all - a fairly accurate estimate of the turnover can be obtained by means mathematical statistics. However, we are not distracted, the course of commercial espionage is already paid \u003d)

Tabar data can also be written in the form of points and depict into the usual for us. cartesian system .

Reply to an important question: how many points are needed for high-quality research?

The bigger, the better. The minimum allowable set consists of 5-6 points. In addition, with a small amount of data, the "abnormal" results cannot be included in the sample. So, for example, a small elite store can help out any more "their colleagues", thereby distorting the overall pattern, which is required to find!

If you just just need to choose a function, schedule which passes as close to points . This feature is called approximating (approximation - approximation) or theoretical function . Generally speaking, here immediately appears an obvious "applicant" - a high degree, whose schedule passes through all points. But this option is complicated, and often just incorrect (because the schedule will be "loop" all the time and poorly reflect the main trend).

Thus, the wanted function should be quite simple and at the same time reflect the dependence adequately. How do you guess, one of the methods of finding such functions and is called method of least squares. First we will analyze it in general. Let some function bring the experimental data:

How to estimate the accuracy of this approximation? Calculate and differences (deviations) between experimental and functional values (Learning the drawing). The first thought that comes to mind is to evaluate how great the amount is, but the problem is that the differences may be negative (eg, ) And deviations as a result of this summation will be mutually separated. Therefore, as an estimate of the accuracy of approximation, it is suited to accept the amount modules Deviations:

or in the twisted form: (Suddenly someone does not know: - This is the sum icon, and the auxiliary variable "counter", which takes values \u200b\u200bfrom 1 to).

Approaching experimental points various functions, We will receive different values, and obviously, where this amount is less - the function is more accurate.

This method exists and is called it method of least modules. However, in practice, he received much more distribution least square methodin which possible negative values \u200b\u200bare not eliminated by the module, but the construction of deviations in the square:

, after which the efforts are directed to the selection of such a function so that the sum of the squares of deviations It was as little as possible. Actually, hence the name of the method.

And now we come back to another an important moment: As noted above, the selected function should be quite simple - but there are also a lot of such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, it would immediately like to "reduce the field of activity." What class of functions to choose for research? Primitive, but effective reception:

- the easiest to portray points In the drawing and analyze their location. If they tend to be placed in a straight line, then you should search equation direct with optimal values \u200b\u200band. In other words, the challenge is to find such coefficients - so that the sum of the squares of deviations was the smallest.

If the points are located, for example, by hyperball, it is not clear that the linear function will give a bad approximation. In this case, we are looking for the most "profitable" coefficients for the hyperbole equation - those that give the minimum sum of squares .

Now note that in both cases we are talking about functions of two variableswhose arguments are parameters of wanted dependencies:

And essentially, we need to solve the standard task - to find minimum function of two variables.

Recall our example: Suppose that the "store" points tend to be located in a straight line and there is every reason to assume that linear dependency Commodity turnover from shopping area. We will find such coefficients "A" and "BE" to the sum of the squares of deviations It was the smallest. Everything is as usual - first private derivatives of the 1st order. According to rule of linearity You can differentiate directly under the amount icon:

If you want to use this information for an essay or courses - I will be very grateful for the link in the list of sources, such detailed calculations will find a little where:

Let's make a standard system:

We reduce each equation on the "deuce" and, in addition, "collapse" amounts:

Note : Independently analyze why "A" and "BE" can be taken out of the sum icon. By the way, it can be done formally with the amount

Rewrite the system in the "applied" form:

After that, the algorithm of solving our task is started:

Coordinates of points do we know? We know. Amount Can we find? Easily. Make up simpler system of two linear equations with two unknown("A" and "BE"). System solve, for example, cramer methodAs a result, we get a stationary point. Checking a sufficient condition of Extremum, you can make sure that at this point the function Reaches exactly minimum. Check is associated with additional calculations and therefore leave it for the scenes (If necessary, the missing frame can be viewed). We make the final conclusion:

Function the best way (at least compared to any other linear function) Binds experimental points . Roughly speaking, her schedule passes as close as possible to these points. In tradition econometrics The resulting approximating function is also called equation of paired linear regression .

The problem under consideration has a great practical value. In a situation with our example, the equation allows you to predict what trade turnover ("Igarek") will be at the store, with a different value of trading area (Tom or other meaning "X"). Yes, the resulting forecast will be only a forecast, but in many cases it will be quite accurate.

I will figure out only one task with "real" numbers, since there are no difficulties in it - all calculations at the level of school program 7-8 class. In 95 percent of cases, you will be invited to find a linear function, but at the very end of the article I will show that it is not more difficult to find the equations of optimal hyperboles, exhibitors and some other functions.

In fact, it remains to distribute the promised buns - so that you learned to solve such examples not only accurately, but also quickly. Carefully learn the standard:

A task

As a result of the study of the relationship between two indicators, the following pairs of numbers were obtained:

The smaller squares method find a linear function that best brings empirical (experienced) data. Make a drawing on which in the Cartesian rectangular coordinate system to build experimental points and graph of the approximating function . Find the sum of the squares of deviations between empirical and theoretical values. Find out whether the function will be better (from the point of view of the least squares method) Apply the experimental points.

Note that "ICS" values \u200b\u200bare natural, and it has a characteristic meaningful meaning, which I will tell a little later; But they, of course, can be fractional. In addition, depending on the content of one task as "Icx", and the "ignorable" values \u200b\u200bcan be completely or partially negative. Well, we have a "faceless" task, and we start it decision:

The optimal function coefficients will find as a solution of the system:

In order to more compact recording, the "counter" variable can be omitted, since it is clear that the summation is carried out from 1 to.

The calculation of the necessary amounts is more convenient to arrange in a tabular form:

Calculations can be carried out on the microcalculatory, but it is much better to use Excel - and faster, and without errors; We watch a short video:

Thus, we get the following system:

Here you can multiply the second equation for 3 and from the 1st equation to subtract the 2nd. But this luck - in practice the system is more often not gifted, and in such cases saves cramer method:
So the system has a single solution.

Perform a check. I understand that I do not want, but why miss the mistakes where they can not be absolutely missed? Substitute the solution found to the left part of each system equation:

Right parts of the respective equations are obtained, it means that the system is solved correctly.

Thus, the desired approximating function: - from all linear functions Experimental data best approaches it.

Unlike straight dependence of the store turnover from its square, the dependence found is inverse (the principle of "the more - the less"), and this fact is immediately detected by negative angular coefficient. Function tells us that with an increase in a certain indicator on 1 unit, the value of the dependent indicator decreases average0.65 units. As they say, the higher the price of buckwheat, the less it is sold.

To build a graph of an approximating function, we will find two of its values:

and do a drawing:

Built line called trend Line (namely - line of linear trend, i.e. In the general case, the trend is not necessarily a straight line). All familiar expression "be in trend", and, I think that this term does not need additional comments.

Calculate the sum of the squares of deviations between empirical and theoretical values. Geometrically - this is the sum of the squares of the length of the "raspberry" segments (two of which are so small that they are not even visible).

Calculations Let us in the table:

They can be done again manually, just in case I will bring an example for the 1st point:

But much more efficiently to do a known manner:

Once again, repeat: what is the meaning of the result? Of all linear functions function The indicator is the smallest, that is, in its family, this is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function Will it be better to bring the experimental points?

We find the appropriate amount of the squares of deviations - to distinguish, I will indicate their letter "Epsilon". The technique is exactly the same:

And again to every fire calculation for the 1st point:

In Excel, we use standard feature Exp (Syntax can be viewed in Exele Help).

Output:, Therefore, the exponential function brings the experimental points worse than direct .

But it should be noted that "worse" is do not mean, what is wrong. Now built a graph of this exponential function - and he also passes close to points - Yes, so that without an analytical study and it is difficult to say, what a function is more accurate.

On this decision is completed, and I return to the question of the natural values \u200b\u200bof the argument. In various studies, as a rule, economic or sociological, natural "ices" numerical months, years or other equal time intervals. Consider, for example, such a task.

Example.

Experimental data on variable values H. and W. Led in the table.

As a result of their alignment, a function was obtained

Using least square method, approximate this data linear dependence y \u003d AX + B (Find Parameters but and b.). Find out which of the two lines is better (in the sense of the least squares method) aligns experimental data. Make a drawing.

The essence of the least squares method (MNC).

The task is to find the coefficients of linear dependence in which the function of two variables but and b. Takes the smallest value. That is, with data but and b. The sum of the squares of the deviations of the experimental data from the direct line will be the smallest. This is the whole essence of the least squares method.

Thus, the example solution comes down to finding the extremum function of two variables.

Displays the formula for finding coefficients.

A system of two equations with two unknowns is compiled and solved. We find private derivatives in variable but and b., equate these derivatives to zero.

Solve the resulting system of equations by any method (for example for a substitution method or) and we obtain formulas for finding coefficients using the least squares method (MNC).

With data but and B. function Takes the smallest value. Proof of this fact is given.

That is the whole method of least squares. Formula for finding a parameter a. contains amounts ,, and parameter n. - Number of experimental data. The values \u200b\u200bof these sums are recommended to calculate separately. Coefficient b. Located after calculation a..

It's time to remember about the source example.

Decision.

In our example N \u003d 5.. Fill out a table for the convenience of calculating amounts that are included in the formula of the desired coefficients.

Values \u200b\u200bin the fourth line of the table are obtained by multiplying the values \u200b\u200bof the 2nd string to the values \u200b\u200bof the 3rd string for each number I..

Values \u200b\u200bin the fifth line of the table are obtained by the construction of the 2nd string values \u200b\u200bfor each number. I..

The values \u200b\u200bof the last column of the table are the sums of values \u200b\u200bby lines.

We use the formulas of the least squares method for finding coefficients but and b.. We substitute the corresponding values \u200b\u200bfrom the last column of the table:

Hence, y \u003d 0.165x + 2.184 - The desired approximating straight line.

It remains to find out which of the lines y \u003d 0.165x + 2.184 or It is better to approximate the initial data, that is, it is estimated by the method of smallest squares.

Assessment of the error of the least squares method.

This requires to calculate the sums of the squares of the deviations of the source data from these lines. and A smaller value corresponds to a line that is better in the sense of the smaller square method approximates the source data.

Since, then straight y \u003d 0.165x + 2.184 Better brings the source data.

Graphic illustration of the least squares method (MNC).

On the charts everything is perfectly visible. The red line is the found straight y \u003d 0.165x + 2.184, blue line is Pink dots are the source data.

What is it necessary for all these approximations?

I personally use to solve the problems of smoothing data, interpolation and extrapolation problems (in the initial example could ask to find the observed value y. for x \u003d 3. or for x \u003d 6. According to the MND method). But let us talk more about this later in another section of the site.

Evidence.

So as for found but and b. The function has taken the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function It was positively defined. Show it.

The second order differential is:

I.e

Consequently, the quadratic form matrix is

and the values \u200b\u200bof the elements do not depend on but and B..

We show that the matrix is \u200b\u200bpositively defined. To do this, it is necessary that the angular minors are positive.

Corner Minor of the first order . The inequality is strict, since the points are mismatched. In the future, we will mean.

Second-order Corner Minor

We prove that method of mathematical induction.

Output: Found values but and B. correspond to the smallest value of the function Therefore, are the desired parameters for the method of smallest squares.

It is widely used in an econometric in the form of a clear economic interpretation of its parameters.

Linear regression comes down to finding the equation of the form

View equation allows on specified parameter values h.have theoretical values \u200b\u200bof the productive feature, substituting the actual values \u200b\u200bof the factor h..

The construction of linear regression is reduced to the assessment of its parameters - butand in.Estimates of the parameters of linear regression can be found by different methods.

A classic approach to evaluating linear regression parameters is based on method of least squares(MNC).

MNA allows you to obtain such parameter estimates butand in,at which the sum of the squares of deviations of the actual values \u200b\u200bof the revolution (y)from settlement (theoretical) mi-Nimalna:

To find a minimum of functions, it is necessary to calculate the frequency derivatives for each of the parameters butand b.and equate them to zero.

Denote by s, then:

Converting the formula, we obtain the following system of normal equations to evaluate parameters but and in:

Solving system of normal equations (3.5) or by the method of consistently excluding variables, or by the method of determinants, we will find the desired estimates of the parameters butand in.

Parameter in called the regression coefficient. Its value shows the average change in the result with a change in a factor per unit.

The regression equation is always complemented by an indicator of tightness of communication. When using linear regression, a linear correlation coefficient acts as such an indicator. There are different modifications of the formula of the linear correlation coefficient. Some of them are below:

As is known, the linear correlation coefficient is within the boundaries: -1 ≤ ≤ 1.

To assess the quality of the selection of the linear function, the square is calculated

Linear correlation coefficient called the determination coefficient.The determination coefficient characterizes the fraction of the dispersion of the productive y,explaced by regression, in a general dispersion of an effective feature:

Accordingly, the amount 1 - characterizes the proportion of dispers-c y,caused by the influence of the other not recorded in the model of factors.

Questions for self-control

1. The essence of the smallest square method?

2. How many variables are paired regression?

3. What coefficient is the prosecution of the connection between changes?

4. What limits determine the determination coefficient?

5. Evaluation of the parameter B in correlation and regression analysis?

1. Christopher Dugger. Introduction to econometry. - M.: Infra - M, 2001 - 402 p.

2. S.A. Borodich. Econometrics. Minsk LLC "New Knowledge" 2001.

3. R.U. Rakhmetov Short course on econometric. Tutorial. Almaty. 2004. -78c.

4. I.I. Eliseeva .Economic. - M.: "Finance and Statistics", 2002

5. Monthly information and analytical journal.

Nonlinear economic models. Nonlinear regression models. Transformation of variables.

Nonlinear economic models ..

Transformation of variables.

The coefficient of elasticity.

If there are non-neural relations between economic phenomena, they are expressed using the corresponding nonlinear functions: for example, equilateral hy-pebula , parabolas of the second degree and D.R.

There are two classes of nonlinear regressions:

1. Regression, nonlinear relative to those included in the analysis of explanatory variables, but linear according to the estimated parameters, for example:

Polynomials of various degrees - ,;

Equilateral hyperbole -;

Half-firographic function -.

2. Regression, nonlinear on the estimated parameters, for example:

Power -;

Indicative -;

Exponential -.

The total amount of squares of deviations of individual values \u200b\u200bof the performance feature w.from the mean value is caused by the effect of many reasons. Conditionally divide the entire set of reasons for two groups: studied factor x.and other factors.

If the factor does not affect the result, then the regression line on the schedule is parallel to the axis ohand

Then the entire dispersion of the productive sign is due to the impact of other factors and the total amount of the squares of deviations will coincide with the residual. If other factors do not affect the result, then oil is connectedfrom h.functionally and the residual sum of the squares is zero. In this case, the sum of the squares of deviations explained by the regression coincides with the total square of the squares.

Since not all points of the correlation field lie on the regression line, it always takes place their scatter as caused by the factor influence h., i.e. regression w.by x,so caused by the action of other reasons (inexplicable variation). The suitability of the regression line for the forecast depends on which part of the total variation of the feature w.accounted for by the explained variation

Obviously, if the sum of the squares of deviations caused by regression will be greater than the residual sum of the squares, the regression equation is statistically significant and factor h.has a significant impact on the result y

, i.e. with the number of freedom of independent variation of the feature. The number of freedom degrees is associated with the number of units of the combination of N and with the number of constants determined by it. In relation to the problem under study, the number of degrees of freedom should show how much independent responses from p

Assessment of the significance of the regression equation as a whole is given with power F.-Criteria Fisher. At the same time, the zero hymester is put forward, which is zero coefficient, i.e. b \u003d.0, and therefore a factor h.does not affect the result y

The direct calculation of the F-criterion is preceded by the analysis of the dispersion. The central place in it occupies the decomposition of the total sum of the squares of the variable deviations w.from average w.into two parts - "explained" and "unexplained":

The total amount of the squares of deviations;

The sum of the squares of the deviation explained by regression;

Residual sum of the squares of deviation.

Any sum of the squares of deviations is associated with the number of step-by , i.e. with the number of freedom of independent variation of the feature. The number of degrees of freedom is associated with the number of units of aggregate n. and with the number of constants determined by it. In relation to the problem under study, the number of freedom of freedom should show how much independent response from ppossible required for the formation of the amount of squares.

Dispersion of one degree of freedomD..

F-Criteria:

ELI zero hypothesis is fair, then factor and residual dispersion do not differ from each other. For H 0, it is necessary to refutation so that the factor dispersion exceeds the residual several times. English Statistics of Snedacor Once - Work Tables Critical Values F.- Nonads at different levels of zero hypothesis and various number of degrees. Table value F.-Criteria is the maximum amount of dispersion ratio, which can take place for their divergence for this level of probability of zero hypothesis. Calculated F.- relationship is recognized as reliable, if more tabular.

In this case, the zero hypothesis about the absence of signs of signs is rejected and concluded about the materiality of this connection: F Fact\u003e F TableH 0 deviates.

If the value will be less tabular F fact \u003c, F tab , The probability of zero hypothesis is higher than the specified level and it cannot be rejected without serious risk to make the wrong conclusion about the availability of communication. In this case, the regression equation is considered statistically insignificant. N about does not deviate.

Standard regression coefficient error

To assess the materiality of the regression coefficient, it is compared with its standard error, i.e. the actual value is determined t.-Criteria Student: which is then compared with a tabular value at a certain level of significance and the number of degrees of freedom ( n.- 2).

Standard error parameter but:

The significance of the linear correlation coefficient is tested on the basis of the error value correlation coefficient t R:

General dispersion sign h.:

Multiple linear regression

Building model

Multiple regression represents the regression of an effective feature with two and large number of factors, i.e. model of the species

Regression can give good result When modeling, if the influence of other factors affecting the object of the study can be neglected. The behavior of individual economic variables cannot be monitored, i.e. it fails to ensure the equality of all other conditions for assessing the influence of the one under study. In this case, you should try to identify the influence of other factors by entering them into the model, i.e. post-rotation of the multiple regression equation: y \u003d a + b 1 x 1 + b 2 + ... + b p x p + .

The main goal of multiple regression is to build a model with a large number of factors, while determining the effect of each of them separately, as well as the cumulative impact on the simulated indicator. The model specification includes two circles of questions: the selection of factors and the choice of the type of regression equation

Views

Save to classmates Save Vkontakte