Variation series. Statistical distribution of the sample

Decor elements 25.09.2019
Decor elements

​ Variation series - a series in which they are compared (in ascending or descending order) options and their respective frequencies

Variants are separate quantitative expressions of a feature. Designated with a Latin letter V . The classical understanding of the term "variant" assumes that each unique value of a feature is called a variant, regardless of the number of repetitions.

For example, in a variational series of indicators of systolic blood pressure measured in ten patients:

110, 120, 120, 130, 130, 130, 140, 140, 160, 170;

only 6 values ​​are options:

110, 120, 130, 140, 160, 170.

Frequency is a number indicating how many times an option is repeated. Denoted by a Latin letter P . The sum of all frequencies (which, of course, is equal to the number of all studied) is denoted as n.

    In our example, the frequencies will take on the following values:
  • for variant 110 frequency P = 1 (value 110 occurs in one patient),
  • for variant 120 frequency P = 2 (value 120 occurs in two patients),
  • for variant 130 frequency P = 3 (value 130 occurs in three patients),
  • for variant 140 frequency P = 2 (value 140 occurs in two patients),
  • for variant 160 frequency P = 1 (value 160 occurs in one patient),
  • for variant 170 frequency P = 1 (value 170 occurs in one patient),

Types of variation series:

  1. simple- this is a series in which each option occurs only once (all frequencies are equal to 1);
  2. suspended- a series in which one or more options occur repeatedly.

The variation series is used to describe large arrays of numbers; it is in this form that the collected data of most medical studies are initially presented. In order to characterize the variation series, special indicators are calculated, including average values, indicators of variability (the so-called dispersion), indicators of the representativeness of sample data.

Variation series indicators

1) The arithmetic mean is a generalizing indicator that characterizes the size of the studied trait. The arithmetic mean is denoted as M , is the most common type of average. The arithmetic mean is calculated as the ratio of the sum of the values ​​of the indicators of all units of observation to the number of all examined. The method for calculating the arithmetic mean differs for simple and weighted variation series.

Formula for calculation simple arithmetic mean:

Formula for calculation weighted arithmetic mean:

M = Σ(V * P)/ n

​ 2) Mode - another average value of the variation series, corresponding to the most frequently repeated variant. Or, to put it differently, this is the option that corresponds to the highest frequency. Designated as Mo . The mode is calculated only for weighted series, since in simple series none of the options is repeated and all frequencies are equal to one.

For example, in the variation series of heart rate values:

80, 84, 84, 86, 86, 86, 90, 94;

the value of the mode is 86, since this variant occurs 3 times, therefore its frequency is the highest.

3) Median - the value of the option, dividing the variation series in half: on both sides of it there is an equal number of options. The median, as well as the arithmetic mean and mode, refers to average values. Designated as Me

4) Standard deviation (synonyms: standard deviation, sigma deviation, sigma) - a measure of the variability of the variation series. It is an integral indicator that combines all cases of deviation of a variant from the mean. In fact, it answers the question: how far and how often do the options spread from the arithmetic mean. Denoted Greek letter σ ("sigma").

When the population size is more than 30 units, the standard deviation is calculated using the following formula:

For small populations - 30 observation units or less - the standard deviation is calculated using a different formula:

Variation series - this is a statistical series showing the distribution of the phenomenon under study according to the value of any quantitative trait. For example, patients by age, duration of treatment, newborns by weight, etc.

Option - individual values ​​of the characteristic by which the grouping is carried out (denoted V ) .

Frequency- a number indicating how often one or another variant occurs (denoted P ) . The sum of all frequencies shows total number observations and is denoted n . The difference between the largest and smallest variant of the variation series is called scope or amplitude .

There are variation series:

1. Discontinuous (discrete) and continuous.

The series is considered continuous if the grouping attribute can be expressed in fractional values ​​(weight, height, etc.), discontinuous if the grouping attribute is expressed only as an integer (days of disability, number of heartbeats, etc.).

2. Simple and weighted.

A simple variational series is a series in which the quantitative value of a variable attribute occurs once. In a weighted variational series, the quantitative values ​​of a varying trait are repeated with a certain frequency.

3. Grouped (interval) and ungrouped.

A grouped series has options combined into groups that unite them in size within a certain interval. In an ungrouped series, each individual variant corresponds to a certain frequency.

4. Even and odd.

In even variational series, the sum of frequencies or total number observations is expressed by an even number, in odd ones by an odd number.

5. Symmetrical and asymmetrical.

In a symmetrical variation series, all types of averages coincide or are very close (mode, median, arithmetic mean).

Depending on the nature of the phenomena being studied, on the specific tasks and objectives of the statistical study, as well as on the content of the source material, in sanitary statistics the following types of averages are used:

structural averages (mode, median);

arithmetic mean;

average harmonic;

geometric mean;

medium progressive.

Fashion (M about ) - the value of the variable trait, which is more common in the studied population, i.e. option corresponding to the highest frequency. It is found directly by the structure of the variation series, without resorting to any calculations. It is usually a value very close to the arithmetic mean and is very convenient in practice.

Median (M e ) - dividing the variational series (ranked, i.e. the values ​​​​of the variant are arranged in ascending or descending order) into two equal halves. The median is calculated using the so-called odd series, which is obtained by successively summing the frequencies. If the sum of the frequencies corresponds to an even number, then the median is conventionally taken as the arithmetic mean of the two average values.

The mode and median are applied in the case of an open population, i.e. when the largest or smallest options do not have an exact quantitative characteristic (for example, under 15 years old, 50 and older, etc.). In this case, the arithmetic mean (parametric characteristics) cannot be calculated.

Average i arithmetic - the most common value. The arithmetic mean is usually denoted by M.

Distinguish between simple arithmetic mean and weighted mean.

simple arithmetic mean calculated:

— in those cases when the totality is represented by a simple list of knowledge of an attribute for each unit;

— if the number of repetitions of each variant cannot be determined;

— if the numbers of repetitions of each variant are close to each other.

The simple arithmetic mean is calculated by the formula:

where V - individual values ​​of the attribute; n - number individual values;
- sign of summation.

Thus, the simple average is the ratio of the sum of the variant to the number of observations.

Example: determine the average length of stay in bed for 10 patients with pneumonia:

16 days - 1 patient; 17–1; 18–1; 19–1; 20–1; 21–1; 22–1; 23–1; 26–1; 31–1.

bed-day.

Arithmetic weighted average is calculated in cases where the individual values ​​of the characteristic are repeated. It can be calculated in two ways:

1. Directly (arithmetic mean or direct method) according to the formula:

,

where P is the frequency (number of cases) of observations of each option.

Thus, the weighted arithmetic mean is the ratio of the sum of the products of the variant by the frequency to the number of observations.

2. By calculating deviations from the conditional average (according to the method of moments).

The basis for calculating the weighted arithmetic mean is:

— grouped material according to variants of a quantitative trait;

— all options should be arranged in ascending or descending order of the attribute value (ranked series).

To calculate by the method of moments, the prerequisite is the same size of all intervals.

According to the method of moments, the arithmetic mean is calculated by the formula:

,

where M o is the conditional average, which is often taken as the value of the feature corresponding to the highest frequency, i.e. which is more often repeated (Mode).

i - interval value.

a - conditional deviation from the conditions of the average, which is a sequential series of numbers (1, 2, etc.) with a + sign for the option of large conditional average and with the sign - (-1, -2, etc.) for the option, which are below the average. The conditional deviation from the variant taken as the conditional average is 0.

P - frequencies.

- total number of observations or n.

Example: determine the average height of 8-year-old boys directly (table 1).

Table 1

Height in cm

Boys P

Central

option V

The central variant, the middle of the interval, is defined as the semi-sum of the initial values ​​of two adjacent groups:

;
etc.

The VP product is obtained by multiplying the central variants by the frequencies
;
etc. Then the resulting products are added and get
, which is divided by the number of observations (100) and the weighted arithmetic mean is obtained.

cm.

We will solve the same problem using the method of moments, for which the following table 2 is compiled:

Table 2

Height in cm (V)

Boys P

n=100

We take 122 as M o, because out of 100 observations, 33 people had a height of 122 cm. We find the conditional deviations (a) from the conditional average in accordance with the above. Then we obtain the product of conditional deviations by frequencies (aP) and summarize the obtained values ​​(
). The result will be 17. Finally, we substitute the data into the formula:

When studying a variable trait, one should not be limited only to the calculation of average values. It is also necessary to calculate indicators characterizing the degree of diversity of the studied features. The value of one or another quantitative attribute is not the same for all units of the statistical population.

The characteristic of the variation series is the standard deviation ( ), which shows the scatter (scattering) of the studied features relative to the arithmetic mean, i.e. characterizes the fluctuation of the variation series. It can be determined directly by the formula:

The standard deviation is equal to the square root of the sum of the products of the squared deviations of each option from the arithmetic mean (V–M) 2 by its frequencies divided by the sum of the frequencies (
).

Calculation example: determine the average number of sick leaves issued in the clinic per day (table 3).

Table 3

Number of sick days

sheets issued

doctor per day (V)

Number of doctors (P)

;

In the denominator, when the number of observations is less than 30, it is necessary from
take away a unit.

If the series is grouped at equal intervals, then the standard deviation can be determined by the method of moments:

,

where i is the value of the interval;

- conditional deviation from the conditional average;

P - frequency variant of the corresponding intervals;

is the total number of observations.

Calculation example : Determine the average duration of stay of patients in a therapeutic bed (according to the method of moments) (table 4):

Table 4

Number of days

bed stay (V)

sick (P)

;

The Belgian statistician A. Quetelet discovered that the variations of mass phenomena obey the error distribution law, discovered almost simultaneously by K. Gauss and P. Laplace. The curve representing this distribution has the shape of a bell. According to the normal distribution law, the variability of the individual values ​​of the trait is within
, which covers 99.73% of all units in the population.

It is calculated that if you add and subtract 2 to the arithmetic mean , then 95.45% of all members of the variation series are within the obtained values ​​and, finally, if we add and subtract 1 to the arithmetic mean , then 68.27% of all members of this variational series will be within the obtained values. In medicine with magnitude
1associated with the concept of norm. The deviation from the arithmetic mean is greater than 1 , but less than 2 is subnormal and the deviation is greater than 2 abnormal (above or below normal).

In sanitary statistics, the three sigma rule is used in the study of physical development, assessment of the activities of health care institutions, and assessment of public health. The same rule is widely applied in national economy in setting standards.

Thus, the standard deviation serves to:

— measurements of the dispersion of a variational series;

— characteristics of the degree of diversity of attributes, which are determined by the coefficient of variation:

If the coefficient of variation is more than 20% - strong diversity, from 20 to 10% - medium, less than 10% - weak diversity of characters. The coefficient of variation is, to a certain extent, a criterion for the reliability of the arithmetic mean.

Statistical distribution series are the simplest kind of grouping.

Statistical distribution series is an ordered quantitative distribution of population units on homogeneous groups on a varying (attributive or quantitative) basis.

Depending on the feature, underlying the formation of groups, there are attributive and variational series of distribution.

attributive called distribution series built on qualitative grounds, i.e. signs that do not have a numerical expression. An example of an attributive distribution series is the distribution of the economically active population of the Russian Federation by sex in 2010 (Table 3.10).

Table 3.10. Distribution of the economically active population of the Russian Federation by sex in 2010

variational distribution series are called, built on a quantitative basis, i.e. sign that has a numeric expression.

The variational distribution series consists of two elements: variants and frequencies.

Options name the individual values ​​of the feature that it takes in the variation series.

Frequencies are the numbers of individual variants or each group of the variation series. Frequencies show how often certain values ​​of a trait occur in the studied population. The sum of all frequencies determines the size of the entire population, its volume.

Frequencies call the frequency, expressed in fractions of a unit or as a percentage of the total. Accordingly, the sum of the frequencies is equal to 1, or 100%.

Depending on the nature of the trait variation Distinguish between discrete and interval variation series of distribution.

Discrete variational distribution series - this is a distribution series in which groups are composed according to a feature that changes discontinuously, i.e. through a certain number of units, and taking only integer values. For example, the distribution of the number of apartments built in Russian Federation according to the number of rooms in them I! 2010 (Table 3.11).

Table 3.11. Distribution of the number of apartments built in the Russian Federation by the number of rooms in them in 2010

Interval variation series of distribution - this is a distribution series in which the grouping attribute, which forms the basis of the grouping, can take any values ​​in the interval that differ from each other by an arbitrarily small amount.

The construction of interval variation series is advisable, first of all, with a continuous variation of a trait (Table 3.12), and also if a discrete variation of a trait manifests itself over a wide range (Table 3.13), i.e. the number of options for a discrete feature is quite large.

Table 3.12. Distribution of subjects of the Southern Federal District of the Russian Federation by area as of January 1, 2011

Table 3.13. Distribution of subjects of the Central Federal District of the Russian Federation by number municipal institutions education as of January 1, 2011

The rules for constructing distribution series are similar to the rules for constructing a grouping.

The analysis of distribution series can be visually carried out on the basis of their graphical representation. For this purpose, a polygon, a histogram, distributions are built.

Polygon used in the display of discrete variational distribution series. To build it in a rectangular coordinate system, along the abscissa axis, the ranked values ​​of the varying feature are plotted on the same scale, and a scale is applied along the ordinate axis to express the magnitude of the frequencies. Obtained at the intersection of the x-axis (X) and the ordinate axes (Y), the points are connected by straight lines, resulting in broken line, called the frequency polygon.

histogram used to display an interval variation series. When constructing a histogram, the values ​​of the intervals are plotted on the abscissa axis, and the frequencies are depicted by rectangles built on the corresponding intervals. The height of the bars should be proportional to the frequencies.

A histogram can be converted to a distribution polygon by connecting the midpoints of the upper sides of the rectangles with straight lines.

When constructing a histogram of the distribution of a variational series with unequal intervals, along the ordinate axis, not the frequencies are plotted, but the distribution density of the feature in the corresponding intervals. Distribution density - is the frequency calculated per unit interval width,

those. how many units in each group are per unit of the interval value.

A cumulative curve can be used to graphically represent variational distribution series. By using cumulates represent a number of accumulated frequencies. The cumulative frequencies are determined by successive summation of frequencies by groups.

When constructing the cumulate of the interval variation series along the abscissa (X) lay off the variants of the series, and along the y-axis (Y) the accumulated frequencies, which are plotted on the graph field in the form of perpendiculars to the abscissa axis in the upper limits of the intervals. Then these perpendiculars are connected and get a broken line, i.e. cumulate.

If, with a graphical representation of a variational distribution series in the form of a cumulative axis X and Y interchanged, it turns out ogive.

Practice 1

VARIATIONAL SERIES OF DISTRIBUTION

variation series or near distribution called the ordered distribution of units of the population according to increasing (more often) or decreasing (less often) values ​​of the attribute and counting the number of units with one or another value of the attribute.

There are 3 kind distribution range:

1) ranked row- this is a list of individual units of the population in ascending order of the studied trait; if the number of population units is large enough, the ranked series becomes cumbersome, and in such cases, the distribution series is constructed by grouping the population units according to the values ​​of the trait under study (if the trait takes a small number of values, then a discrete series is constructed, and otherwise, an interval series);

2) discrete series- this is a table consisting of two columns (rows) - specific values ​​\u200b\u200bof a varying attribute X i and the number of population units with the given value of the feature f i– frequencies; the number of groups in a discrete series is determined by the number of actually existing values ​​of the variable attribute;

3) interval series- this is a table consisting of two columns (rows) - intervals of a varying sign X i and the number of population units falling within a given interval (frequencies), or the proportion of this number in the total number of populations (frequencies).

Numbers showing how many times individual options occur in a given population are called frequencies or scales variant and are denoted by a lowercase letter of the Latin alphabet f. The total sum of the frequencies of the variational series is equal to the volume of this population, i.e.

where k– number of groups, n is the total number of observations, or the size of the population.

Frequencies (weights) are expressed not only in absolute, but also in relative numbers - in fractions of a unit or as a percentage of the total number of variants that make up this set. In such cases, the weights are called relative frequencies or frequencies. The total sum of particulars is equal to one

or
,

if the frequencies are expressed as a percentage of the total number of observations P. The replacement of frequencies by frequencies is not obligatory, but sometimes it turns out to be useful and even necessary in those cases when it is necessary to compare with each other variational series that differ greatly in their volumes.

Depending on how the attribute varies - discretely or continuously, in a wide or narrow range - the statistical population is distributed in intervalless or interval variation lines. In the first case, the frequencies refer directly to the ranked values ​​of the trait, which acquire the position of individual groups or classes of the variation series, in the second, they calculate the frequencies related to individual intervals or intervals (from - to), into which the overall variation of the trait is divided, ranging from minimal to maximum options for this set. These spaces, or class spaces, may or may not be equal in width. From here they distinguish equal and unequal interval variational series. In unequal interval series, the nature of the frequency distribution changes as the width of the class intervals changes. Unequal-interval grouping in biology is used relatively rarely. As a rule, biometric data are distributed in equal interval series, which allows not only to identify the pattern of variation, but also facilitates the calculation of the summary numerical characteristics of the variation series, comparing the distribution series with each other.

When starting to construct an equal-interval variational series, it is important to correctly outline the width of the class interval. The fact is that a rough grouping (when very wide class intervals are set) distorts the typical features of variation and leads to a decrease in the accuracy of the numerical characteristics of the series. When choosing excessively narrow intervals, the accuracy of the generalizing numerical characteristics increases, but the series turns out to be too extended and does not give a clear picture of the variation.

To obtain a well-defined variational series and to ensure sufficient accuracy of the numerical characteristics calculated from it, it is necessary to divide the variation of the trait (in the range from the minimum to the maximum options) into such a number of groups or classes that would satisfy both requirements. This problem is solved by dividing the range of variation of the attribute by the number of groups or classes that are planned when constructing the variation series:

,

where h– interval value; X m a x i X min - the maximum and minimum values ​​in the aggregate; k is the number of groups.

When constructing an interval distribution series, it is necessary to choose the optimal number of groups (character intervals) and set the length (range) of the interval. Since the analysis of a distribution series compares frequencies in different intervals, it is necessary that the length of the intervals be constant. If you have to deal with an interval series of distribution with unequal intervals, then for comparability you need to bring the frequency or frequency to the unit of the interval, the resulting value is called density ρ , that is
.

The optimal number of groups is chosen so that the variety of values ​​of the attribute in the aggregate is reflected to a sufficient extent and, at the same time, the regularity of the distribution, its shape is not distorted by random frequency fluctuations. If there are too few groups, there will be no pattern of variation; if there are too many groups, random frequency jumps will distort the shape of the distribution.

Most often, the number of groups in a distribution series is determined by the Sturgess formula:

where n- the size of the population.

A graphical representation provides essential assistance in the analysis of a distribution series and its properties. The interval series is represented by a bar chart, in which the bases of the bars, located along the abscissa axis, are the intervals of values ​​of the varying attribute, and the heights of the bars are the frequencies corresponding to the scale along the ordinate axis. This type of diagram is called histogram.

If there is a discrete distribution series or the middle intervals are used, then the graphic representation of such a series is called polygon, which is obtained by connecting straight points with coordinates X i and f i .

If the class values ​​are plotted along the abscissa axis, and the accumulated frequencies are plotted along the ordinate axis, followed by connecting the points with straight lines, a graph is obtained called cumulative. The accumulated frequencies are found by successive summation, or cumulation frequencies in the direction from the first class to the end of the variation series.

Example. There are data on the egg production of 50 laying hens for 1 year kept on a poultry farm (Table 1.1).

T a b l e 1.1

Egg laying hens

No. of laying hens

Egg production, pcs.

No. of laying hens

Egg production, pcs.

No. of laying hens

Egg production, pcs.

No. of laying hens

Egg production, pcs.

No. of laying hens

Egg production, pcs.

It is required to build an interval distribution series and display it graphically in the form of a histogram, a polygon and a cumulate.

It can be seen that the trait varies from 212 to 245 eggs obtained from a laying hen in 1 year.

In our example, using the Sturgess formula, we determine the number of groups:

k = 1 + 3,322lg 50 = 6,643 ≈ 7.

Calculate the length (range) of the interval using the formula:

.

Let's build an interval series with 7 groups and an interval of 5 pieces. eggs (Table 1.2). To build graphs in the table, we calculate the middle of the intervals and the accumulated frequency.

T a b l e 1.2

Interval series of distribution of egg production

Group of laying hens according to the size of egg production

X i

Number of laying hens

f i

Interval midpoint

X i'

Accumulated frequency

f i

Let's build a histogram of the distribution of egg production (Fig. 1.1).

Rice. 1.1. Histogram of egg production distribution

These histograms show the form of distribution characteristic of many traits: the values ​​of the average intervals of the trait are more common, and the extreme (small and large) values ​​of the trait are less common. The form of this distribution is close to the normal distribution law, which is formed if a variable variable is influenced by a large number of factors, none of which has a predominant value.

The polygon and cumulate of the distribution of egg production have the form (Fig. 1.2 and 1.3).

Rice. 1.2. Egg distribution polygon

Rice. 1.3. Cumulate distribution of egg production

Problem solving technology in spreadsheet processor Microsoft excel next.

1. Enter the initial data in accordance with fig. 1.4.

2. Rank the row.

2.1. Select cells A2:A51.

2.2. Left click on the toolbar on the button<Сортировка по возрастанию > .

3. Determine the size of the interval for constructing the interval series of the distribution.

3.1. Copy cell A2 to cell E53.

3.2. Copy cell A51 to cell E54.

3.3. Calculate the range of variation. To do this, enter the formula in cell E55 =E54-E53.

3.4. Calculate the number of variation groups. To do this, enter the formula in cell E56 =1+3.322*LOG10(50).

3.5. Enter in cell E57 the rounded number of groups.

3.6. Calculate the length of the interval. To do this, enter the formula in cell E58 =E55/E57.

3.7. Enter in cell E59 the rounded length of the interval.

4. Build an interval series.

4.1. Copy cell E53 to cell B64.

4.2. Enter the formula in cell B65 =B64+$E$59.

4.3. Copy cell B65 to cells B66:B70.

4.4. Enter the formula in cell C64 =B65.

4.5. Enter the formula in cell C65 =C64+$E$59.

4.6. Copy cell C65 to cells C66:C70.

The results of the solution are displayed on the display screen in the following form (Fig. 1.5).

5. Calculate the interval frequency.

5.1. Execute the command Service,Data analysis by clicking alternately with the left mouse button.

5.2. In the dialog box Data analysis set with the left mouse button: Analysis Tools <Гистограмма>(Fig. 1.6).

5.3. Left click on the button<ОК>.

5.4. On the tab bar chart set the parameters according to fig. 1.7.

5.5. Left click on the button<ОК>.

The results of the solution are displayed on the display screen in the following form (Fig. 1.8).

6. Fill in the table "Interval series of distribution".

6.1. Copy cells B74:B80 to cells D64:D70.

6.2. Calculate the sum of the frequencies. To do this, select cells D64:D70 and left-click on the button on the toolbar<Автосумма > .

6.3. Calculate the middle of the intervals. To do this, enter the formula in cell E64 =(B64+C64)/2 and copy to cells E65:E70.

6.4. Calculate the accumulated frequencies. To do this, copy cell D64 to cell F64. In cell F65, enter the formula =F64+D65 and copy it to cells F66:F70.

The results of the solution are displayed on the display screen in the following form (Fig. 1.9).

7. Edit the histogram.

7.1. Right-click on the diagram on the name "pocket" and in the tab that appears, click the button<Очистить>.

7.2. Right-click on the chart and on the tab that appears, click the button<Исходные данные>.

7.3. In the dialog box Initial data change the x-axis labels. To do this, select cells B64:C70 (Fig. 1.10).

7.5. Press key .

The results are displayed on the display screen in the following form (Fig. 1.11).

8. Build an egg distribution polygon.

8.1. Left click on the toolbar on the button<Мастер диаграмм > .

8.2. In the dialog box Chart Wizard (Step 1 of 4) use the left mouse button to set: Standard <График>(Fig. 1.12).

8.3. Left click on the button<Далее>.

8.4. In the dialog box Chart Wizard (Step 2 of 4) set the parameters according to fig. 1.13.

8.5. Left click on the button<Далее>.

8.6. In the dialog box Chart Wizard (Step 3 of 4) enter the names of the chart and axis Y (Fig. 1.14).

8.7. Left click on the button<Далее>.

8.8. In the dialog box Chart Wizard (Step 4 of 4) set the parameters according to fig. 1.15.

8.9. Left click on the button<Готово>.

The results are displayed on the display screen in the following form (Fig. 1.16).

9. Insert data labels on the chart.

9.1. Right-click on the chart and on the tab that appears, click the button<Исходные данные>.

9.2. In the dialog box Initial data change the x-axis labels. To do this, select cells E64:E70 (Fig. 1.17).

9.3. Press key .

The results are displayed on the display screen in the following form (Fig. 1.18).

The distribution cumulate is constructed similarly to the distribution polygon based on the accumulated frequencies.

The set of values ​​of the parameter studied in a given experiment or observation, ranked by magnitude (increase or decrease) is called a variation series.

Suppose we have measured arterial pressure in ten patients in order to obtain an upper BP threshold: systolic pressure, i.e. only one number.

Imagine that a series of observations (statistical population) of arterial systolic pressure in 10 observations has the following form (Table 1):

Table 1

The components of a variational series are called variants. Variants represent the numerical value of the trait being studied.

The construction of a variational series from a statistical set of observations is only the first step towards comprehending the features of the entire set. Next, it is necessary to determine the average level of the studied quantitative trait (the average level of blood protein, average weight patients, average time to onset of anesthesia, etc.)

The average level is measured using criteria that are called averages. The average value is a generalizing numerical characteristic of qualitatively homogeneous values, characterizing by one number the entire statistical population according to one attribute. The average value expresses the general that is characteristic of a trait in a given set of observations.

There are three types of averages in common use: mode (), median () and arithmetic mean ().

To define any medium size it is necessary to use the results of individual observations, writing them in the form of a variation series (Table 2).

Fashion- the value that occurs most frequently in a series of observations. In our example, mode = 120. If there are no repeating values ​​in the variation series, then they say that there is no mode. If several values ​​are repeated the same number of times, then the smallest of them is taken as the mode.

Median- the value dividing the distribution into two equal parts, the central or median value of a series of observations ordered in ascending or descending order. So, if there are 5 values ​​in the variational series, then its median is equal to the third member of the variational series, if there is an even number of members in the series, then the median is the arithmetic mean of its two central observations, i.e. if there are 10 observations in the series, then the median is equal to the arithmetic mean of 5 and 6 observations. In our example.

Note an important feature of the mode and median: their values ​​are not affected by the numerical values ​​of the extreme variants.

Arithmetic mean calculated by the formula:

where is the observed value in the -th observation, and is the number of observations. For our case.

The arithmetic mean has three properties:

The middle one occupies the middle position in the variation series. In a strictly symmetrical row.

The average is a generalizing value and random fluctuations and differences in individual data are not visible behind the average. It reflects the typical that is characteristic of the entire population.

The sum of deviations of all variants from the mean is equal to zero: . The deviation of the variant from the mean is indicated.

The variation series consists of variants and their corresponding frequencies. Of the ten values ​​obtained, the number 120 was encountered 6 times, 115 - 3 times, 125 - 1 time. Frequency () - the absolute number of individual options in the population, indicating how many times this option occurs in the variation series.

The variation series can be simple (frequencies = 1) or grouped shortened, 3-5 options each. A simple series is used with a small number of observations (), grouped - with large numbers observations().

We recommend reading

Top