Excel splitting by the least squares method. Linear boy regression analysis. Zastosuvannya nadbudovi search solution

The method of least squares is a mathematical procedure for inducing a linear alignment, as it would most accurately match a set of two rows of numbers. Metoyu zastosuvannya tsgogo way є minіmіzatsіya zagalnoї quadratic pardon. Excel has tools, for the help of which you can freeze the method when calculating. Let's figure out how to fight.

The method of least squares (LSM) is a mathematical description of the fallow land of one variable over another. Yogo can be beaten for the next hour of forecasting.

Uvіmknennya nadbudovi "Search solution"

In order to win MNCs in Excel, it is necessary to increase the nadbudova "Search for solution", yak for zamovchuvannyam vimknen.


Now function Search solution activated in Excel, and the tools appeared on the page.

Wash the manager

Let's describe the application of MNCs on a specific application. Maybe two rows of numbers x і y , The sequence of which is shown in the image below.

The most accurate way to describe the fallowing function is:

When you see x=0 y tezh one 0 . Therefore, the purpose can be described as fallow y=nx .

We should know the minimum amount of squares of retail.

Solution

Let's move on to the description of the bezperedny zastosuvannya method.


Like Bachimo, zastosuvannya method of least squares є to do with a collapsible mathematical procedure. We showed її y dії on the simplest butt, and іsnuyet richly folded vpadki. Vtіm, іnstrumentarіy Microsoft Excel clicks maximally simplifies the calculation, scho viroblyayutsya.

Least squares method vikoristovuєtsya to evaluate the parameters of the equalization of the regression.

One of the methods of developing stochastic links between signs is regression analysis.
Regression analysisє vysnovok equal regression, for the help of which the average value of the vipadkovo variant (sign-result) is known, as is the value of the other (or other) variant (sign-factors) of the house. Vin includes the following steps:

  1. choose the form of a link (in the form of an analytical equalization of regression);
  2. assessment of parameters equalization;
  3. evaluation of the accuracy of the analytical regression
The most common way to describe a statistical connection is to use a linear form. Responde to the Linіyny Zv'yazka explain to the Ekonomic іnterpretsky ї parameteriva, the mentioned variasyu і Tim, the symbolic nelinіnіyzki to the viconnya of Rosrahun, to renew (the guts of the Logarithmunsyi) to Lyni).
In the case of a linear paired link, the regression will be equal in the future: y i =a+b·x i +u i . Parameters this equal a and b are evaluated according to the data of the statistical warning x and y. The result of such an assessment is equal: , de - assessments of parameters a and b - the value of the effective sign (change), otrimane for the equal regression (rozrahunkov's value).

The most common way to evaluate the parameters of vicorist Method of least squares (LSM).
The method of least squares gives the best (possible, effective and unbiased) estimates of the parameters of the regression equation. But only in that vipadka, as if the songs change their minds about the vipadical member (u) that independent change (x) (div. change their mind MNK).

The task of estimating the parameters of the linear paired alignment by the least squares method polagaє in the offensive: take such assessments of the parameters, with any sum of squares given the actual value of the effective sign - y i of the rozrahunkovyh values ​​- minimal.
Formally OLS criterion can be written like this: .

Classification of least squares methods

  1. Method of least squares.
  2. The method of maximum likelihood (for a normal classical linear regression model, the normality of regression overflows is postulated).
  3. The more general method of least squares of LLS is succesfull in different autocorrelation pardons and in different heteroscedasticity.
  4. Least squares weighting method (a part of GLS with heteroscedastic excesses).

Illustrate the essence the classical method of least squares graphically. For which we will need a dotted graph behind the given data (x i , y i , i = 1; n) in a rectangular coordinate system (such a dotted graph is called a correlation field). Let's try to choose a straight line, as it is closest to the points of the correlation field. According to the method of least squares, the line is chosen so that the sum of the squares of the vertical lines between the points of the correlation field and the line will be minimal.

Mathematical record of this task: .
The values ​​of y i x i =1...n are known to us, given a warning. The function S of stink has constants. Changes for this function are parameter estimates - , . In order to determine the minimum of the functions of two replacements, it is necessary to calculate the private similar functions of the skin authorities and equate them to zero, then. .
As a result, we take the system from two normal linear rivers:
Virishyuchi tsyu system, we know shukani estimates of parametrіv:

The correctness of the calculation of the parameters of the equalization of the regression can be overestimated by the equal sums (it is possible to calculate the difference through the rounding of the calculations).
For the analysis of parameter estimates, you can use Table 1.
The sign of the regression coefficient b indicates a direct link (like b > 0, the link is direct, like b<0, то связь обратная). Величина b показывает на сколько единиц изменится в среднем признак-результат -y при изменении признака-фактора - х на 1 единицу своего измерения.
Formally, the value of the parameter is the average value of y at x equal to zero. Since there is no sign-factor and may have a zero value, then the interpretation of the parameter cannot be sensed.

Estimation of accuracy of communication between signs zdіysnyuєtsya for the help of the coefficient of the linear pair correlation - r x, y. Vin can be repaid according to the formula: . In addition, the linear pair correlation coefficient can be assigned through the regression coefficient b: .
The range of admissible values ​​of the linear coefficient of the paired correlation is -1 to +1. The sign of the correlation coefficient shows the connection directly. If r x, y >0, then the link is direct; how r x, y<0, то связь обратная.
If this coefficient is close to one in modulus, then the link between the signs can be interpreted as a way to complete the short line. Since the second module is equal to the unit r x , y = 1, then the link between the signs is functional linear. Since the signs of x and y are linearly independent, then r x y is close to 0.
For rozrahunku r x, y, you can also vicorize table 1.

To assess the quality of the omitted regression level, calculate the theoretical coefficient of determination - R 2 yx:

,
de d 2 - The variance of y is explained by the regression;
e 2 - redundant (not explained by equal regression) variance of y;
s 2 y - global (surface) dispersion of y.
The coefficient of determination characterizes the part of the variation (dispersion) of the effective sign y, which is explained by regression (a, also, i by the factor x), and the main variation (dispersion) y. The coefficient of determination R 2 yx accepts values ​​from 0 to 1. As a matter of fact, the value 1-R 2 yx characterizes the part of the variance y viklikana with the addition of other deficiencies in the model of factors and pardons of specificity.
With a paired linear regression R 2 yx = r 2 yx.

4.1. Wikoristanya vbudovanih funktsіy

Calculation regression coefficients check for additional functions

linear(Value_y; Value_x; Konst; statistics),

Value_y- array of y values,

Value_x- neob'visual array value x, like an array X omissions, then it is transferred that the array (1; 2; 3; ...) is the same size as i Value_y,

Konst- logical value, as it indicates, what is needed, what is a constant b completed 0. Yakshcho Konst maximum value TRUE otherwise omitted, then b be counted with a great rank. Like an argument Konst may mean nonsense, then b be equal to 0 i value a pick up in such a way that spontaneity was celebrated y=ax.

Statistics- is a logical meaning, as it shows, what it is necessary to rotate the additional statistics in the same way as the regression. Like an argument Statistics maximum value TRUE, then the function linear check additional regression statistics. Like an argument Statistics maximum value bullshit or omissions, then the function linear turn less coefficient a that fast b.

It is necessary to remember that the result of the functions LINE()є impersonal array value.

For rozrahunku correlation coefficient function

Corel(Array1;Array2),

turn the value of the correlation coefficient, de Array1- array of values y, Array2- array of values x. Array1і Array2 mayut buti alone.

BUTT 1. fallow y(x) is presented at the table. encourage regression line and calculate correlation coefficient.

y 0.5 1.5 2.5 3.5
x 2.39 2.81 3.25 3.75 4.11 4.45 4.85 5.25

Let's enter a table with the value of the MS Excel arc and let's get a dot plot. The working sheet will look at the image shown in fig. 2.

To analyze the values ​​of the regression coefficients aі b cell view A7:B7, go down to the master of functions and in the category Statistical choose a function linear. It is worth remembering the dialogue window, which turned out to be the way it is shown in fig. 3 i press OK.


As a result of the calculation of the value, only appear in the middle A6(Fig.4). In order for the meaning to appear in the middle B6 it is necessary to go into the editing mode (key F2), and then press a combination of keys CTRL+SHIFT+ENTER.

For rozrahunka the value of the correlation coefficient in the commercial C6 the following formula was introduced:

C7=CORREL(B3:J3;B2:J2).

Knowing the regression coefficients aі b calculus function value y=ax+b for tasks x. For which we introduce the formula

B5=$A$7*B2+$B$7

and skopyuєmo її in the range С5:J5(Fig. 5).

Let's depict the regression line on the diagram. You can see the experimental points on the chart, right-click on the mouse and get the command Pochatkov data. At the dialogue window, which appeared (Fig. 5), select the tab Row and click on the button Add. Remember the input fields, the shards are shown in fig. 6 and press the button OK. A regression line will be added to the graph of experimental data. Behind the locking of the її graph, there will be images at the sight of points that are not crossed by lines that are smoothed.



In order to change the look of the regression line, we can point it lower than dії. Right-click the mouse on the points to display the graph of the line, select the command Diagram type and install the view of the dotted diagrams, the shards are shown in fig. 7.

The type of line, її color and tovshchina can be changed by an offensive rank. See the line on the diagram, right-click the mouse button and select the command from the context menu Data series format… Dalі zrobiti installation, for example, oskіlki shown in fig. eight.

As a result of all transformations, we take a graph of experimental data and a regression line in one graphic gallery (Fig. 9).

4.2. Victory line to the trend.

Pobudova of various approximating deposits in MS Excel is implemented as power diagrams - trend line.

BUTT 2. As a result of the experiment, a tabular deposit was assigned to the deak.

0.15 0.16 0.17 0.18 0.19 0.20
4.4817 4.4930 5.4739 6.0496 6.6859 7.3891

Choose and encourage an approximate fallow. Encourage tabular graphs and selected analytical deposits.

The development of tasks can be divided into the following stages: introduction of weekend data, prompting a dot chart and adding a trend line to the chart.

Let's take a look at the process in a report. Let's enter the output data on the work sheet and let's create a schedule of experimental data. We can see the experimental points on the chart, right-click the mouse and speed up the command Add l trend(Fig. 10).

Dialogue Vіkno, scho appeared, allows you to induce an approximate fallow.

On the first deposit (Fig. 11) of the first window, the type of approximate fallow is indicated.

On the other (Fig. 12), the following parameters are set:

· The name of the approximate fallow;

・Forecast forward (backward) on n units (this parameter is chosen, the number of units forward (back) is necessary to continue the trend line);

· Show the cross point of a curve behind a straight line y=const;

· Show the approximate function on the diagrams nі (parameter show alignment on the diagrams);

· put on the diagram the value of the root-mean-square correction of h (parameter put on the diagram the value of the accuracy of the approximation).

Let's choose as approximating the deposit by a polynomial of another level (Fig. 11) and show the level that describes this polynomial on the graph (Fig. 12). Otriman's diagram is presented in fig. 13.

Similarly for help trend lines you can pick the parameters of such fallows like

Linear y=a∙x+b,

logarithmic y=a ln(x)+b,

exponential y=a∙eb,

· static y=a x b,

polynomial y=a∙x 2 +b∙x+c, y=a∙x 3 +b∙x 2 +c∙x+d and so far, up to the polynomial of the 6th degree, inclusive,

· Linear filtration.

4.3. A selection of tools for the analysis of options: Finding a solution.

Of significant interest is the implementation in MS Excel of the selection of parameters in functional arrears by the method of least squares with different tools for analyzing options: Finding a solution. This technique allows you to choose the parameters of the function, no matter what. Let's take a look at the possibility of the offensive task.

BUTT 3. As a result of the experiment, the deposit z(t) is taken out in the table

0,66 0,9 1,17 1,47 1,7 1,74 2,08 2,63 3,12
38,9 68,8 64,4 66,5 64,95 59,36 82,6 90,63 113,5

Choose coefficients of deposition Z(t)=At 4 +Bt 3 +Ct 2 +Dt+K the path of the smallest squares.

The task is equivalent to the task of valuing the minimum function of five changes

Let's look at the process of solving the optimization problem (Fig. 14).

Let me know BUT, At, W, Dі Before save in the middle A7:E7. Explore the theoretical value of the function Z(t)=At4+Bt3+Ct2+Dt+K for tasks t(B2:J2). For whom in the middle B4 we introduce the value of the function in the first point (comic B2):

B4=$A$7*B2^4+$B$7*B2^3+$C$7*B2^2+$D$7*B2+$E$7.

Copy the formula to the range C4: J4 and we take the value of the function at the points, the abscissas of which are taken in the middle B2:J2.

At the middle B5 we introduce a formula that calculates the square of the difference between the experimental and rozrachunk points:

B5=(B4-B3)^2,

and skopyuєmo її in the range С5:J5. At the middle F7 save the total quadratic pardon (10). For which we introduce the formula:

F7 = SUM(B5: J5).

Hurry up as a team Service®Search Solution that virіshimo the task of optimization without obmezhen. It is worth remembering the rank of the input field in the dialogue window shown in fig. 14 and press the button Vikonati. If the decision is found, it will show up in the window, depicted in fig. fifteen.

The result of the work of the visionary block will be the appearance in the commissaries A7:E7parameter value functions Z(t)=At4+Bt3+Ct2+Dt+K. In the middle B4:J4 taken function value scoring at exit points. At the middle F7 you will be safe sumarna quadratic pardon.

It is possible to depict experimental points and a selected line in one graphical area, so that you can see the range B2:J4, viklikati Meister Diagram and then we will format the old look of the otrimanih graphics.

Rice. 17 display the MS Excel worksheet after the calculation.

The method of least squares (LSM) lies within the scope of regression analysis. Vіn mає impersonal zastosuvan, oskolki allow zdіysnyuvat zdіysnyuvat priblizhennya izmenenâ ї ї ї ї ї ї ії іshmi more simple. MNCs can appear supra-lingually reddish in the course of the analysis of warnings and actively vicorate to evaluate some values ​​for the results of vimiryuvan іnshih, in order to avenge the vipadkovі pardon. From these statistics you know how to implement the calculation by the method of least squares in Excel.

Statement of the problem on a specific application

Let's assume that there are two indicators X and Y. Moreover, Y should be deposited as X. So, as OLS can tell us from a glance of regression analysis (in Excel, yoga methods are implemented for additional functions), then we immediately go to a specific task.

Otzhe, let X be the trade area of ​​the food store, as it is found in square meters, and Y is the river of goods, which is worth millions of rubles.

It is necessary to make a forecast, what kind of goods (Y) mother store, like in a new and chi іnsha trading area. It is obvious that the function Y = f(X) is growing, so the hypermarket sells more goods, less stall.

Dekіlka sіv about the correctness of the holidays, which are victorious for transmission

Let's say, we can make a table, I'll get money for n stores.

Depending on the mathematical statistics, the results will be more or less correct, so the data can be calculated if 5-6 objects are needed. In addition, it is not possible to score "anomalous" results. Zokrema, a small elite boutique can be the mother of a larger commodity, less a commodity of the great retail outlets of the “masmarket” class.

The essence of the method

The data in the tables can be displayed on the Cartesian plane at the point M 1 (x 1 y 1), ... M n (x n y n). Now the solution of the problem is to select the approximate function y = f(x) so that the graph can go closer to the points M1, M2, .. Mn.

Obviously, it is possible to select a rich term of a high level, but such an option is not only important to implement, but simply incorrect, because it does not represent the main trend, but it is necessary to demonstrate it. The most sensible solutions are to look straight at y = ax + b, which is the closest approximation of experimental data, and more precisely, the coefficients are a and b.

Accuracy rating

If there is any approximation of special importance, the assessment of accuracy will increase. Significantly through e i difference (v_dhilennya) between functional and experimental values ​​for the point x i , then e i = y i - f (x i).

Obviously, to estimate the accuracy of the approximation, you can vary the sum of the values, so when choosing a straight line for the approximate occurrence of the fallowness of X species Y, it is necessary to give the value of the sum of e i at all points. However, not everything is so simple, because a series of positive inspirations will practically be present and negative.

You can change the supply, vikoristovuyuchi modules vіdkhilen or їх squares. The rest method is to fill the widest width. Vіn vikoristovuєtsya in rich areas, including regression analysis (in Excel, its implementation is built for the help of two functions) and has long achieved its effectiveness.

Least squares method

In Excel, as you know, the autosum function is introduced, which allows you to calculate the values ​​of all values, like sorting out the seen range. In this rank, nothing can make us lose the meaning of viraz (e 1 2 + e 2 2 + e 3 2 + ... e n 2).

Mathematical notation may look like:

Somewhat later, a decision was made about approximating for the help of a straight line, then maybe:

In this way, the task of knowing the direct line, as the best way to describe the specific value of the values ​​X and Y, is reduced to the calculation of the minimum function of two variables:

For whom it is necessary to equate to zero private costs for new changes a and b, and to change the primitive system, which is composed of two equals of two unknown species:

After a simple transformation, including subdivision for 2 and manipulations from the sums, we take:

Virishyuchi її, for example, by Cramer's method, we take a stationary point with certain coefficients a* and b*. Tse і є minіmum, tobto for transferring, what kind of goods will be at the store with a big square, pіdіyde straight y = a * x + b * , which is a regression model for the butt, about which one can go. Obviously, it is not possible to know the exact result, but in addition, there are statements about those who pay off the purchase on credit to a store of a specific area.

How to implement the least squares method in Excel

"Excel" has a function for the analysis of the value of the MNC. Vaughn may look like this: "TENDENCY" (in the case of Y value; in the case of X value; new X value; const.). Zastosuєmo the formula for spreading the MNC Excel to our table.

For this, in the future, in which case, the result of the calculation for the method of least squares in Excel is displayed, enter the sign = і and select the TREND function. At the vіknі there are clearly visible fields, seeing:

  • range of v_domih value for Y (at time data for commodity circulation);
  • range x 1, … x n, that is the size of the trading area;
  • and vіdomі, і nіvіdomі value x, for which it is necessary to z'yasuvat rozmіr commodities (іnformatsiyu about їhnє roztashuvannya on the working archway div. far away).

In addition, the formula has a logical change "Konst". If you want to enter field 1 in the second field, it is important that the following calculation is carried out, in addition, that b = 0.

If it is necessary to recognize the forecast greater than one value x, then after entering the formula, next click not on "Introduction", but it is necessary to type the combination "Shift" + "Control" + "Enter" ("Introduction") on the keyboard.

Acts of particularity

Regression analysis can be accessible to dummies. The Excel formula for transferring values ​​to an array of unknown variables - "TENDENCIES" - can be used by those who have no idea about the method of least squares. It is enough just to know the deeds and specialties of your work. Zokrema:

  • If you expand the range of the given values ​​of the change y in one row, or stovpts, then the leather row (stowpets) with the given values ​​of x will be accepted by the program as a change change.
  • If the “Trends” window does not have a range of x values, then the different Excel functions will see an array that consists of a number of numbers, the number of such values ​​shows the range from the given values ​​of the change.
  • In order to get an array of "transferred" values ​​at the output, it is necessary to enter the array as a formula to calculate the trend.
  • If no new x values ​​are entered, then the TREND function takes them into account. If the stench is not set, then the array 1 is taken as an argument; 2; 3; 4;…, which is proportional to the range with the already given parameters y.
  • The range to replace the new x value is due to the addition of such a large number of rows of abostovptsiv, as the range from the given values ​​of y. In other words, wine can be proportionately independent change.
  • An array with the given x values ​​may have a small number of changes. However, if there is more than one, then it is necessary that the ranges from the given values ​​x and y are proportional. At different times, a number of changes are needed, so that the range from the given values ​​\u200b\u200bof y fits in one column or in one row.

Function "RETELL"

Regression analysis in Excel is implemented for a number of additional functions. One of them is called "Prediction". It is similar to "TENDENCIES", so you can see the result calculated by the method of least squares. However, only for one X, for some unknown value of Y.

Now you know the formulas in Excel for dummies, which allow you to predict the value of the future value of that number of indicator according to the linear trend.

The method of least squares (LSM) is based on the minimization of the sum of the squares of the given function in the given data. This statistic has approximate data for an additional linear functiony = a x + b .

Least squares method(English) Ordinary Least Squares , OLS) is one of the basic methods of regression analysis in terms of assessing unknown parameters regression models for vibratory tribute.

Let's look at the proximity of the functions, which can be found only in one type of change:

  • Linear: y=ax+b (tsya article)
  • : y=a*Ln(x)+b
  • : y=a*x m
  • : y=a*EXP(b*x)+c
  • : y=ax 2 +bx+c

Note: Approximations of the polynomial from the 3rd to the 6th degree are considered in this article. The observation of the trigonometric polynomial is examined here.

Linear fallow

We are called by the call of the 2nd change Xі y. Є admission, sho y deposit in X according to the linear law y = ax + b. In order to determine the parameters of the interrelationship of the last check of caution: for the skin value of x i, a test of y was carried out (div. file attached). Obviously, let's say 20 pairs of value (х i; y i).

Note: Yakshcho krok change by X postiyny, then for awakening rozsiyuvannya diagrams it is possible to wick, even if not, it is necessary to wick the type of diagrams Krapkova .

From the diagrams, it is obvious that the link between the variables is close to linear. In order to understand how the multiplicity of straight lines most “correctly” describes the fallowness between the changes, it is necessary to determine the criterion, for which the lines will be compared.

As such a criterion vikoristovuemo viraz:

de ŷ i = a * x i + b ; n – number of pairs of values ​​(for times n=20)

Vishchezgadan viraz is the sum of the squares between the guarded values ​​y i ŷ i i often denoted as SSE ( sum of squared Errors (Residuals), the sum of squares of pardons (surpluses)) .

Least squares method polagaє have such a line ŷ = ax + b, For what vyshchezgadane viraz nabuvaє minimal value.

Note: Be it a line in a two-world expanse, it is unambiguously determined by the values ​​of 2 parameters: a (nahil) that b (Destruction).

It is important that the sum of the squares is less than the sum of the squares, but the line is more likely to approximate the actual data and it may be possible to use a farther distance to predict the value of y in the form of a change. I realized that it’s true that there is no mutual connection between the changeable ones, there are no non-linear calls, then the MNC will still choose the best line. In this way, the MNC does not say anything about the presence of a real interrelationship of changes, the method simply allows you to choose such parameters of the function a і b , For those who are vizchezgadane, viraz is minimal.

More complicated mathematical operations (report div.), you can calculate the parameters a і b :

As you can see from the formula, the parameter a is a reference to covariance and to that in MS EXCEL for calculating the parameter a you can win such formulas (div. file butt sheet Lineyna):

= COVAR(B26:B45;C26:C45)/ VAR.G(B26:B45) or

= COVARIATION.B(B26:B45;C26:C45)/VAR.B(B26:B45)

Also for the calculation of the parameter a you can twist the formula = Nakhil (C26:C45; B26:B45). For parameter b tweak the formula = VIDRIZOK(C26:C45;B26:B45) .

If you find it, the Linear() function allows you to calculate several times the number of parameters. To enter the formula Linear (C26: C45; B26: B45) it is necessary to see in a row 2 closets and squeeze CTRL + SHIFT + ENTER(Div. article about). The value will be rotated at the left center a , at the right - b .

Note: Don't call out from the introductions array formulas it is necessary to additionally tweak the INDEX() function. Formula = INDEX(LINEST(C26:C45,B26:B45),1) or just = Linear (C26: C45; B26: B45) turn the parameter, vіdpovіdalny nahil іnії, tobto. a . Formula = INDEX(LINEAR(C26:C45,B26:B45),2) turn the parameter, v_dpov_dalny for retinal lines from v_syu Y, tobto. b .

Having calculated the parameters, Diagrams of Rise you can call the line.

Another way to get straight lines behind the least squares method is the diagram tool Trend line. For whom to see the diagram, choose from the menu Layout tab, in group Analysis press Trend line, then Linear proximity .

By ticking the box “show alignment on the diagram” in the dialog window, you can reconsider that the parameters are found to match the values ​​on the diagram.

Note: In order for the parameters to be set appropriately, the type of diagrams should be . On the right in the fact that when prompted by diagrams Schedule X-axis values ​​cannot be given as a fixed value (a fixed value can be specified only as a signature, so as not to add a dot to the expanded dot). Replace the value of the victorious succession 1; 2; 3; ... (for the numbering of categories). To that, as I will be trend line on diagram type Schedule, instead of the actual values ​​of X, they will change the values ​​of the sequence, which will lead to an incorrect result (so, obviously, the actual values ​​of X do not change from the sequence 1; 2; 3; ...).

Share with friends or save for yourself:

Enthusiasm...