Normal distribution of a random variable and the rule of three sigma.

garden equipment 25.09.2019
garden equipment

Brief theory

Normal is the probability distribution of a continuous random variable , whose density has the form:

where is the mathematical expectation , is the standard deviation .

The probability that it will take a value belonging to the interval:

where is the Laplace function:

The probability that the absolute value of the deviation is less than a positive number:

In particular, for , the following equality holds:

When solving problems put forward by practice, one has to deal with various distributions of continuous random variables.

In addition to the normal distribution, the main distribution laws for continuous random variables are:

Problem solution example

The part is made on the machine. Its length is a random variable distributed over normal law with parameters , . Find the probability that the length of the part will be between 22 and 24.2 cm. What deviation of the length of the part from can be guaranteed with a probability of 0.92; 0.98? Within what limits, symmetrical with respect to , will practically all dimensions of the parts lie?

Solution:

The probability that a random variable distributed according to the normal law will be in the interval:

We get:

The probability that a random variable, distributed according to the normal law, deviates from the mean by no more than :

By condition

:

Medium solution cost control work 700 - 1200 rubles (but not less than 300 rubles for the entire order). The price is strongly influenced by the urgency of the decision (from days to several hours). The cost of online help in the exam / test - from 1000 rubles. for the ticket solution.

The application can be left directly in the chat, having previously thrown off the condition of the tasks and informing you of the deadlines for solving it. The response time is several minutes.

From this article you will learn:

    What's happened confidence interval?

    What is the point 3 sigma rules?

    How can this knowledge be put into practice?

Nowadays, due to an overabundance of information associated with a large assortment of products, sales directions, employees, activities, etc., it's hard to pick out the main, which, first of all, is worth paying attention to and making efforts to manage. Definition confidence interval and analysis of going beyond its boundaries of actual values ​​- a technique that help you identify situations, influencing trends. You will be able to develop positive factors and reduce the influence of negative ones. This technology used in many well-known world companies.

There are so-called alerts", which inform managers stating that the next value in a certain direction went beyond confidence interval. What does this mean? This is a signal that some non-standard event has occurred, which may change the existing trend in this direction. This is the signal to that to sort it out in the situation and understand what influenced it.

For example, consider several situations. We calculated a sales forecast with forecast boundaries of 100 commodity items for 2011 by months and in March actual sales:

  1. By " sunflower oil» broke through the upper limit of the forecast and did not fall into the confidence interval.
  2. For "Dry yeast" went beyond the lower limit of the forecast.
  3. By " Oatmeal» broke through the upper limit.

For the rest of the goods, the actual sales were within the specified forecast limits. Those. their sales were in line with expectations. So, we identified 3 products that went beyond the borders, and began to figure out what influenced the going beyond the borders:

  1. With Sunflower Oil, we entered a new trading network, which gave us additional sales volume, which led to going beyond the upper limit. For this product, it is worth recalculating the forecast until the end of the year, taking into account the forecast for sales to this chain.
  2. For Dry Yeast, the car got stuck at customs, and there was a shortage within 5 days, which affected the decline in sales and going beyond the lower border. It may be worthwhile to figure out what caused the cause and try not to repeat this situation.
  3. For Oatmeal, a sales promotion was launched, which resulted in a significant increase in sales and led to an overshoot of the forecast.

We identified 3 factors that influenced the overshoot of the forecast. There can be many more of them in life. To improve the accuracy of forecasting and planning, the factors that lead to the fact that actual sales can go beyond the forecast, it is worth highlighting and building forecasts and plans for them separately. And then take into account their impact on the main sales forecast. You can also regularly evaluate the impact of these factors and change the situation for the better for by reducing the influence of negative and increasing the influence of positive factors.

With a confidence interval, we can:

  1. Highlight destinations, which are worth paying attention to, because events have occurred in these areas that may affect change in trend.
  2. Determine Factors that actually make a difference.
  3. To accept weighted decision(for example, about procurement, when planning, etc.).

Now let's look at what a confidence interval is and how to calculate it in Excel using an example.

What is a confidence interval?

The confidence interval is the forecast boundaries (upper and lower), within which with a given probability (sigma) get the actual values.

Those. we calculate the forecast - this is our main benchmark, but we understand that the actual values ​​are unlikely to be 100% equal to our forecast. And the question arises to what extent may get actual values, if the current trend continues? And this question will help us answer confidence interval calculation, i.e. - upper and lower bounds of the forecast.

What is a given probability sigma?

When calculating confidence interval we can set probability hits actual values within the given forecast limits. How to do it? To do this, we set the value of sigma and, if sigma is equal to:

    3 sigma- then, the probability of hitting the next actual value in the confidence interval will be 99.7%, or 300 to 1, or there is a 0.3% probability of going beyond the boundaries.

    2 sigma- then, the probability of hitting the next value within the boundaries is ≈ 95.5%, i.e. the odds are about 20 to 1, or there is a 4.5% chance of going out of bounds.

    1 sigma- then, the probability is ≈ 68.3%, i.e. the chances are about 2 to 1, or there is a 31.7% chance that the next value will fall outside the confidence interval.

We formulated 3 Sigma Rule,which says that hit probability another random value into the confidence interval with a given value three sigma is 99.7%.

The great Russian mathematician Chebyshev proved a theorem that there is a 10% chance of going beyond the boundaries of a forecast with a given value of three sigma. Those. the probability of falling into the 3 sigma confidence interval will be at least 90%, while an attempt to calculate the forecast and its boundaries “by eye” is fraught with much more significant errors.

How to independently calculate the confidence interval in Excel?

Let's consider the calculation of the confidence interval in Excel (ie the upper and lower bounds of the forecast) using an example. We have a time series - sales by months for 5 years. See attached file.

To calculate the boundaries of the forecast, we calculate:

  1. Sales forecast().
  2. Sigma - standard deviation forecast models from actual values.
  3. Three sigma.
  4. Confidence interval.

1. Sales forecast.

=(RC[-14] (data in time series)-RC[-1] (model value))^2(squared)


3. Sum for each month the deviation values ​​from stage 8 Sum((Xi-Ximod)^2), i.e. Let's sum January, February... for each year.

To do this, use the formula =SUMIF()

SUMIF(array with numbers of periods inside the cycle (for months from 1 to 12); reference to the number of the period in the cycle; reference to an array with squares of the difference between the initial data and the values ​​of the periods)


4. Calculate the standard deviation for each period in the cycle from 1 to 12 (stage 10 in the attached file).

To do this, from the value calculated at stage 9, we extract the root and divide by the number of periods in this cycle minus 1 = ROOT((Sum(Xi-Ximod)^2/(n-1))

Let's use formulas in Excel =ROOT(R8 (reference to (Sum(Xi-Ximod)^2)/(COUNTIF($O$8:$O$67 (reference to an array with cycle numbers); O8 (reference to a specific cycle number, which we consider in the array))-1))

Using the Excel formula = COUNTIF we count the number n


By calculating the standard deviation of the actual data from the forecast model, we obtained the sigma value for each month - stage 10 in the attached file .

3. Calculate 3 sigma.

At stage 11, we set the number of sigmas - in our example, "3" (stage 11 in the attached file):

Also practical sigma values:

1.64 sigma - 10% chance of going over the limit (1 chance in 10);

1.96 sigma - 5% chance of going out of bounds (1 chance in 20);

2.6 sigma - 1% chance of going out of bounds (1 in 100 chance).

5) We calculate three sigma, for this we multiply the “sigma” values ​​\u200b\u200bfor each month by “3”.

3. Determine the confidence interval.

  1. Upper forecast limit- sales forecast taking into account growth and seasonality + (plus) 3 sigma;
  2. Lower Forecast Bound- sales forecast taking into account growth and seasonality - (minus) 3 sigma;

For the convenience of calculating the confidence interval for a long period(see attached file) we will use Excel formula =Y8+VLOOKUP(W8;$U$8:$V$19;2;0), where

Y8- sales forecast;

W8- the number of the month for which we will take the value of 3 sigma;

Those. Upper forecast limit= "sales forecast" + "3 sigma" (in the example, VLOOKUP(month number; table with 3 sigma values; column from which we extract the sigma value equal to the month number in the corresponding row; 0)).

Lower Forecast Bound= "sales forecast" minus "3 sigma".

So, we have calculated the confidence interval in Excel.

Now we have a forecast and a range with boundaries within which the actual values ​​will fall with a given probability sigma.

In this article, we looked at what sigma is and rule of three sigma how to determine confidence interval and what you can use for this technique on practice.

Accurate forecasts and success to you!

How Forecast4AC PRO can help youwhen calculating the confidence interval?:

    Forecast4AC PRO will automatically calculate the upper or lower forecast limits for more than 1000 time series at the same time;

    The ability to analyze the boundaries of the forecast in comparison with the forecast, trend and actual sales on the chart with one keystroke;

In the Forcast4AC PRO program, it is possible to set the sigma value from 1 to 3.

Join us!

Download Free Forecasting and Business Intelligence Apps:


  • Novo Forecast Lite- automatic forecast calculation v excel.
  • 4analytics- ABC-XYZ analysis and analysis of emissions in Excel.
  • Qlik Sense Desktop and Qlik ViewPersonal Edition - BI systems for data analysis and visualization.

Test the features of paid solutions:

  • Novo Forecast PRO- forecasting in Excel for large data arrays.

Find the probability that a normally distributed random variable takes a value from the interval ( a - 3σ, a + 3σ ):

Therefore, the probability that the value of a random variable will be outside of this interval is equal to 0.0027, that is, it is 0.27% and can be considered negligible. Thus, in practice, one can assume that all possible values ​​of the normally distributed random variable lie in the interval ( a - 3σ, a + 3σ ).

The result obtained allows us to formulate three sigma rule: if a random variable is normally distributed, then the modulus of its deviation from x = a does not exceed 3σ.

16.7. exponential distribution.

Definition. exponential (exponential) is called the probability distribution of a continuous random variable X, which is described by the density

Unlike the normal distribution, the exponential law is determined by only one parameter λ . This is its advantage, since usually the distribution parameters are not known in advance and they have to be estimated approximately. It is clear that it is easier to evaluate one parameter than several.

Let's find the distribution function of the exponential law:

Hence,

Now we can find the probability that an exponentially distributed random variable falls into the interval ( a,b):

Function values e -X can be found from the tables.

16.8. Reliability function.

Let element(that is, some device) starts working at the time t 0 = 0 and must work for a period of time t. Denote by T continuous random variable - the uptime of the element, then the function F(t) = p(T > t) determines the probability of failure over time t. Therefore, the probability of failure-free operation during the same time is equal to

R(t) = p(T > t) = 1 – F(t).

This function is called reliability function.

16.9. The exponential law of reliability.

Often, the uptime of an element has an exponential distribution, i.e.

F(t) = 1 – e - λt .

Therefore, the reliability function in this case has the form:

R(t) = 1 – F(t) = 1 – (1 – e -λt) = e -λt .

Definition. The exponential law of reliability is called the reliability function defined by the equality

R(t) = e - λt ,

where λ – failure rate.

Example. Let the uptime of an element be distributed according to an exponential law with a distribution density f(t) = 0,1 e - 0,1 t at t≥ 0. Find the probability that the element will work flawlessly for 10 hours.

Solution. Because λ = 0,1, R(10) = e-0.1 10 = e -1 = 0,368.

16.10. Expected value.

Definition. mathematical expectation A discrete random variable is the sum of the products of its possible values ​​and their corresponding probabilities:

M(X) = X 1 R 1 + X 2 R 2 + … + X P R P .

If the number of possible values ​​of a random variable is infinite, then
if the resulting series converges absolutely.

Remark 1. The mathematical expectation is sometimes called weighted average, since it is approximately equal to the arithmetic mean of the observed values ​​of the random variable for a large number of experiments.

Remark 2. From the definition of mathematical expectation, it follows that its value is not less than the smallest possible value of a random variable and not more than the largest.

Remark 3. The mathematical expectation of a discrete random variable is non-random(constant. Later we will see that the same is true for continuous random variables.

Example. Let's find the mathematical expectation of a random variable X- the number of standard parts among three selected from a batch of 10 parts, including 2 defective ones. Let us compose a distribution series for X. It follows from the condition of the problem that X can take the values ​​1, 2, 3. Then

Example 2. Define the mathematical expectation of a random variable X– the number of coin tosses before the first appearance of the coat of arms. This quantity can take on an infinite number of values ​​(the set of possible values ​​is the set natural numbers). Its distribution series has the form:

(0,5) P

+ (when calculating, the formula for the sum of an infinitely decreasing geometric progression was used twice:
, where).

Properties of mathematical expectation.

    The mathematical expectation of a constant is equal to the constant itself:

M(WITH) = WITH.

Proof. If we consider WITH as a discrete random variable that takes only one value WITH with probability R= 1, then M(WITH) = WITH 1 = WITH.

    The constant factor can be taken out of the expectation sign:

M(SH) = CM(X).

Proof. If the random variable X given by the distribution series

x i

x n

p i

p n

then the distribution series for SH looks like:

WITHx i

WITHx 1

WITHx 2

WITHx n

p i

p n

Then M(SH) = Cx 1 R 1 + Cx 2 R 2 + … + Cx P R P = WITH(X 1 R 1 + X 2 R 2 + … + X P R P) = CM(X).

Definition. Two random variables are called independent, if the distribution law of one of them does not depend on what values ​​the other has taken. Otherwise random variables dependent.

Definition. Let's call product of independent random variablesX andY random variable XY, whose possible values ​​are equal to the products of all possible values X for all possible values Y, and the probabilities corresponding to them are equal to the products of the probabilities of the factors.

    The mathematical expectation of the product of two independent random variables is equal to the product of their mathematical expectations:

M(XY) = M(X)M(Y).

Proof. To simplify the calculations, we restrict ourselves to the case when X and Y take only two possible values:

x i

p i

at i

g i

Then the distribution series for XY looks like that:

XY

x 1 y 1

x 2 y 1

x 1 y 2

x 2 y 2

p 1 g 1

p 2 g 1

p 1 g 2

p 2 g 2

Hence, M(XY) = x 1 y one · p 1 g 1 + x 2 y one · p 2 g 1 + x 1 y 2 · p 1 g 2 + x 2 y 2 · p 2 g 2 = y 1 g 1 (x 1 p 1 + x 2 p 2) + + y 2 g 2 (x 1 p 1 + x 2 p 2) = (y 1 g 1 + y 2 g 2) (x 1 p 1 + x 2 p 2) = M(XM(Y).

Remark 1. Similarly, one can prove this property for more possible values ​​of factors.

Remark 2. Property 3 is valid for the product of any number of independent random variables, which is proved by the method of mathematical induction.

Definition. Let's define sum of random variablesX andY as a random variable X +Y, whose possible values ​​are equal to the sums of each possible value X with every possible value Y; the probabilities of such sums are equal to the products of the probabilities of the terms (for dependent random variables, to the products of the probability of one term and the conditional probability of the second).

4) The mathematical expectation of the sum of two random variables (dependent or independent) is equal to the sum of the mathematical expectations of the terms:

M (X + Y) = M (X) + M (Y).

Proof.

Consider again the random variables given by the distribution series given in the proof of property 3. Then the possible values X + Y are X 1 + at 1 , X 1 + at 2 , X 2 + at 1 , X 2 + at 2. Denote their probabilities respectively as R 11 , R 12 , R 21 and R 22. Let's find M(X+Y) = (x 1 + y 1)p 11 + (x 1 + y 2)p 12 + (x 2 + y 1)p 21 + (x 2 + y 2)p 22 =

= x 1 (p 11 + p 12) + x 2 (p 21 + p 22) + y 1 (p 11 + p 21) + y 2 (p 12 + p 22).

Let's prove that R 11 + R 22 = R one . Indeed, the event that X + Y will take on the values X 1 + at 1 or X 1 + at 2 and whose probability is R 11 + R 22 coincides with the event that X = X 1 (its probability is R one). Similarly, it is proved that p 21 + p 22 = R 2 , p 11 + p 21 = g 1 , p 12 + p 22 = g 2. Means,

M(X + Y) = x 1 p 1 + x 2 p 2 + y 1 g 1 + y 2 g 2 = M (X) + M (Y).

Comment. Property 4 implies that the sum of any number of random variables is equal to the sum of the expected values ​​of the terms.

Example. Find the mathematical expectation of the sum of the number of points rolled when throwing five dice.

Let's find the mathematical expectation of the number of points that fell when throwing one die:

M(X 1) = (1 + 2 + 3 + 4 + 5 + 6)
The same number is equal to the mathematical expectation of the number of points dropped on any die. Therefore, by property 4 M(X)=

When carrying out practical calculations, the unit of measurement of the deviation of a random variable subject to the normal law from its dispersion center (mathematical expectation) is taken as the standard deviation a. Then, on the basis of formula (7) § 17, we obtain equalities useful in various calculations

These results are shown geometrically in Fig. 439.

It is almost certain that the random variable (error) will not deviate from the mathematical expectation in absolute value by more than 1. This assumption is called the rule of three sigma.

In the theory of shooting and in the processing of various statistical materials, it is useful to know the probability of a random variable falling into the intervals (0, E),

With a distribution density determined by formula (1) § 19. Knowledge of these probabilities in many cases reduces calculations and helps in the analysis of phenomena.

When calculating these probabilities, we will use the formula (8) § 19 and the function table

The calculation results are geometrically shown in Fig. . 440, which is called the error dispersion scale. It follows from these calculations that it is almost certain that the value of a random variable falls within the interval. The probability that the value of a random variable falls outside this interval is less than 0.01.

Example 1. One shot is fired at a strip 100 m wide. Aiming was calculated at the middle line of the strip, which is perpendicular to the projectile flight plane. Scattering obeys the normal law with a probable deviation in range. Determine the probability of hitting the band (Fig. 441). The median deviation in range in the theory of shooting is denoted lateral.

Solution. Let's use the formula (7) § 19. In our case . Hence,

Comment. Approximately, it would be possible to solve the problem without using function tables, but using the dispersion scale (Fig. 440).

As I wrote earlier, due to the natural science education and the bias towards logical understanding and explanation of reality, I am an adherent of the technical analysis of the markets.

After graduating from the university and receiving the specialty of radiophysics, I was engaged in research and application of methods for analyzing and processing signals in systems for technical diagnostics and monitoring the state of objects of aviation and rocket technology. The specifics of the work required a thorough knowledge of analog and digital signal processing methods. At that time, it was not clear to me why foreign authors illustrate digital methods using stock quotes as an example. But when I first saw the market price charts at the end of 2000, everything became clear to me. Where else will human interest and main brains be concentrated, if not where it smells of real money.
Well, it also became extremely interesting for me to try the methodology, principles and mathematical apparatus familiar to me in this area. And money played here not leading role, more ambition.

Market prices are ultimately determined by the state and dynamics of the development of the world economic system and fundamental factors - key statistics state of national economies. The processes of the global shift are superimposed by local trends in price changes, taking into account production renewal cycles, seasonal factors in the balance of supply and demand, and so on. up to changes caused by the influence of economic and political news and the actions of individual market participants.
The study of the nature and degree of influence of macroeconomic indicators on price dynamics is the subject of fundamental analysis, which is based on the study of statistical data for the past period of time, i.e. on an already accomplished fact.

Technical analysis is based on the study of charts depicting price behavior over time. Applicable to all assets, the price of which is determined on the basis of free fluctuations in supply and demand (currencies, commodity futures, options, securities, and much more), and is based on the postulates arising from Dow theory. The main of these postulates is formulated as follows: the market takes into account everything. The price is a consequence and an exhaustive reflection of all driving forces market. Any factor influencing the price (economic, political or psychological) is already taken into account by the market and included in the price. Everything that influences the price is sure to be reflected in this very price. With the help of price charts, the market itself announces its intentions to an attentive analyst, whose task is to interpret these intentions correctly and in time.

The natural approach of a radiophysicist is the analysis of a price chart - a function of time, which is a kind of signal that needs to be analyzed and the information of interest to us should be extracted from it. And this is technical analysis.

Technical analysis and designing of indicators has been done by everyone and they have done all sorts of things. But the mind, corrupted by the Faculty of Physics, did not recognize methods that did not have a clear and transparent physical meaning and interpretation of the results. One of the most beautiful and effective indicators with something behind it is the Bollinger Bands indicator, which plots channels in units of the standard deviation of price sigma from the mean and is based on the three sigma criterion.

three sigma rule means that almost all values ​​of a normally distributed random variable with a probability of 0.9973 lie within +-3 sigma of the mean value. We will not go into a discussion of the question of how justified the application of the criterion to markets that do not obey the normal distribution law and are not stationary.

Normal distribution and limits of deviations.

Probabilities of hitting a random variable in given intervals.

As we said in technical analysis, the standard deviation is used to build Bollinger Channels.
The principle of constructing the indicator is very simple. In a given time window, the average price and the standard deviation from the average are determined, and then distances in standard deviation units are plotted up and down from this average price, indicating not the upper and lower borders of the channel. Then the window is shifted one count to the right, the calculations are repeated, and so on. The result is a graph of a simple moving average and two boundaries, above and below, separated from the average by a given number of standard deviation units, usually 2 sigma.

An example of an indicator is shown in the figure below.

Bollinger bands indicator and its application.

The Bollinger bands are two lines that are far from the moving average by values ​​proportional to the standard deviation of closing prices from the moving average. This parameter characterizes the market volatility; accordingly, the channel width will also increase with the growth of volatility.
An BB analysis decision is made when the price either rises above the upper BB resistance line or falls below the lower BB support line. If the price chart fluctuates between these two lines, then there are no reliable buy/sell signals based on the BB analysis. The decision to open a position is made only when the price chart crosses the BB line to return to normal.
Sometimes going beyond the border of explosives means a "false breakdown", i.e. when prices just tried a new level and immediately returned back. V this case there is an opportunity to work against the trend, but you should carefully evaluate whether the breakdown turned out to be “false”. A good confirmation in such cases is the volume indicator, which should drop sharply in case of a false breakout.
Additional signals of BB lines. BB convergence is observed when the market calms down and does not show significant fluctuations. There is a consolidation to the continuation of the current or the emergence of a new trend. The divergence of BB is observed when the current trend strengthens or a new one begins. Divergence with increased volumes is a good confirmation of the trend. Average is good level support in a bull market and a good level of resistance in a bear market.

The indicator is good and I used it at one time.
But in the modification, I replaced the simple moving average with an exponential one (the reasons can be explained in the comments), and the deviation used to calculate sigma with the standard deviation of the price from the exponential average. In the future, the indicator parameters were heavily modified and only the three sigma principle remained from the classic Bollinger Bands.

Today the indicator looks like this.

The blue line in the center is the moving average.

The +-sigma, +-2*sigma and +-3*sigma channels are laid up and down from it.
The rules for interpretation and use are approximately the same as for the classic BB indicator. But there is no such fussiness in decision-making.
The indicator parameters are automatically adjusted to the time frame.
Recommended timeframes and parameters to use:
Weekly chart - long-term trend - cycle 2-3 years;
Daily chart - medium-term trend - a cycle of 5-7 months;
Chart H4 - short-term trend - a cycle of 30-40 days;
Chart H1 - local trend - a cycle of 4-6 days;
M15 chart - daily trend - cycle 15-30 hours;
M5 chart - intraday trend - cycle 4-6 hours;
Chart M1 - hourly trend - cycle 50-70 minutes.

P.S. I myself practically do not use this indicator, since the toolkit of the SWT method has volatility channels that solve similar problems. I can send to those who wish.
Since Timofey has not yet introduced the possibility of exchanging files into the chat, write your e-mail in a personal.
Plus to the message is welcome, but not required to receive the indicator.
Yes, I completely forgot. Works only in the MT4 terminal. Hands do not reach MT5 due to uselessness.

P.P.S. Yes, I beg your pardon. I will send out once a day as requests accumulate.

  • Keywords:

We recommend reading

Top