Determination of the variation series. Variational series and its characteristics

Plant encyclopedia 25.09.2019
Plant encyclopedia

Various sampled values ​​will be called options a number of values ​​and denote: X 1 , X 2,…. First of all, we will produce ranging options, i.e. their arrangement in ascending or descending order. Each option has its own weight, i.e. a number that characterizes the contribution of this option to the total population. Frequencies or frequencies are used as weights.

Frequency n i option x i is a number that shows how many times a given option occurs in the considered sample population.

Frequency or relative frequency w i option x i called a number equal to the ratio of the frequency of a variant to the sum of the frequencies of all variants. Frequency shows what part of the sample population has a given option.

A sequence of options with their corresponding weights (frequencies or frequencies), written in ascending (or descending) order, is called variation series.

Variational series are discrete and interval.

For a discrete variation series, point values ​​of a feature are specified, for an interval series, feature values ​​are specified as intervals. Variational series can show the distribution of frequencies or relative frequencies (frequencies), depending on what value is indicated for each option - frequency or frequency.

Discrete variation series of frequency distribution looks like:

The frequencies are found by the formula, i = 1, 2, ..., m.

w 1 +w 2 + … + w m = 1.

Example 4.1. For a given set of numbers

4, 6, 6, 3, 4, 9, 6, 4, 6, 6

build discrete variation series distribution of frequencies and frequencies.

Solution . The volume of the population is n= 10. The discrete series of the frequency distribution has the form

Interval series have a similar form of notation.

Interval variation series of frequency distribution is written as:

The sum of all frequencies is the total observations, i.e. the volume of the population: n = n 1 +n 2 + … + n m.

Interval variation series of distribution of relative frequencies (frequencies) looks like:

The frequency is found by the formula, i = 1, 2, ..., m.

The sum of all the frequencies is equal to one: w 1 +w 2 + … + w m = 1.

Interval series are most often used in practice. If there are a lot of statistical sample data and their values ​​differ from each other by an arbitrarily small amount, then the discrete series for these data will be rather cumbersome and inconvenient for further research. In this case, data grouping is used, i.e. the interval containing all the values ​​of the feature is divided into several partial intervals and, having calculated the frequency for each interval, an interval series is obtained. Let us write down in more detail the scheme for constructing an interval series, assuming that the lengths of the partial intervals will be the same.

2.2 Building an interval series

To build an interval series, you need:

Determine the number of intervals;

Determine the length of the intervals;

Determine the location of the spacing on the axis.

For determining number of intervals k there is Sturges' formula, according to which

,

where n- the volume of the entire population.

For example, if there are 100 values ​​of a characteristic (variant), then it is recommended to take the number of intervals in equal intervals to build an interval series.

However, very often in practice, the number of intervals is chosen by the researcher himself, given that this number should not be very large, so that the series is not cumbersome, but not very small, so as not to lose some properties of the distribution.

Interval length h is determined by the following formula:

,

where x max and x min is the largest and most small value options.

The value are called sweep row.

To construct the intervals themselves, do different things. One of the most simple ways is as follows. The beginning of the first interval is taken as the value
... Then the rest of the boundaries of the intervals are found by the formula. Obviously, the end of the last interval a m + 1 must satisfy the condition

After all the boundaries of the intervals have been found, the frequencies (or frequencies) of these intervals are determined. To solve this problem, look through all the options and determine the number of options that fall into one or another interval. Let us consider the complete construction of an interval series using an example.

Example 4.2. For the following statistics, written in ascending order, construct an interval series with the number of intervals equal to 5:

11, 12, 12, 14, 14, 15, 21, 21, 22, 23, 25, 38, 38, 39, 42, 42, 44, 45, 50, 50, 55, 56, 58, 60, 62, 63, 65, 68, 68, 68, 70, 75, 78, 78, 78, 78, 80, 80, 86, 88, 90, 91, 91, 91, 91, 91, 93, 93, 95, 96.

Solution. Total n= 50 option values.

The number of intervals is specified in the problem statement, i.e. k=5.

The length of the intervals is
.

Let's define the boundaries of the intervals:

a 1 = 11 − 8,5 = 2,5; a 2 = 2,5 + 17 = 19,5; a 3 = 19,5 + 17 = 36,5;

a 4 = 36,5 + 17 = 53,5; a 5 = 53,5 + 17 = 70,5; a 6 = 70,5 + 17 = 87,5;

a 7 = 87,5 +17 = 104,5.

To determine the frequency of intervals, we count the number of variants that fall into this interval. For example, the first interval from 2.5 to 19.5 contains options 11, 12, 12, 14, 14, 15. Their number is 6, therefore, the frequency of the first interval is n 1 = 6. The frequency of the first interval is ... The second interval from 19.5 to 36.5 includes variants 21, 21, 22, 23, 25, the number of which is 5. Therefore, the frequency of the second interval is n 2 = 5, and the frequency ... Having found the same way frequencies and frequencies for all intervals, we obtain the following interval series.

The interval series of frequency distribution is as follows:

The sum of the frequencies is 6 + 5 + 9 + 11 + 8 + 11 = 50.

The interval series of frequency distribution is as follows:

The sum of the frequencies is 0.12 + 0.1 + 0.18 + 0.22 + 0.16 + 0.22 = 1. ■

When constructing interval series, depending on the specific conditions of the problem under consideration, other rules can also be applied, namely

1. Interval variation series can consist of partial intervals different lengths... Unequal lengths of intervals make it possible to distinguish the properties of a statistical population with an uneven distribution of a feature. For example, if the boundaries of the intervals determine the number of inhabitants in cities, then it is advisable in this problem to use intervals that are unequal in length. Obviously, for small cities, a small difference in the number of inhabitants is also important, and for large cities, the difference of tens and hundreds of inhabitants is not significant. Interval series with unequal lengths of partial intervals are studied mainly in the general theory of statistics and their consideration is beyond the scope of this manual.

2.In mathematical statistics sometimes interval series are considered for which the left boundary of the first interval is assumed to be –∞, and the right boundary of the last interval is assumed to be + ∞. This is done in order to bring the statistical distribution closer to the theoretical one.

3. When constructing interval series, it may turn out that the value of some variant coincides exactly with the border of the interval. The best thing to do in this case is to do the following. If there is only one such coincidence, then consider that the considered option with its frequency fell into an interval located closer to the middle of the interval series, if there are several such options, then either all of them are attributed to the right intervals of these options, or all - to the left ones.

4. After determining the number of intervals and their length, the arrangement of the intervals can be done in another way. Find the arithmetic mean of all considered values ​​of the options X Wed and the first interval is constructed in such a way that this sample mean would be within some interval. Thus, we get an interval from X Wed - 0.5 h before X Wed + 0.5 h... Then to the left and to the right, adding the length of the interval, we build the remaining intervals until x min and x max will not fall into the first and last intervals, respectively.

5. Interval rows at a large number it is convenient to write intervals vertically, i.e. the intervals should be recorded not in the first line, but in the first column, but the frequencies (or frequencies) in the second column.

Sample data can be considered as values ​​of some random variable X... A random variable has its own distribution law. From the theory of probability it is known that the distribution law of a discrete random variable can be specified in the form of a distribution series, and a continuous one - using the distribution density function. However, there is a universal distribution law that holds for both discrete and continuous random variables... This distribution law is given in the form of a distribution function F(x) = P(X<x). For sample data, you can specify an analog of the distribution function - an empirical distribution function.


Similar information.


Statistical distribution series are the simplest kind of grouping.

Statistical distribution series is an ordered quantitative distribution of population units into homogeneous groups according to a varying (attributive or quantitative) characteristic.

Depending on the sign, underlying the formation of groups, distinguish between attributive and variation series of distribution.

Attributive are called distribution series built according to qualitative characteristics, i.e. features that do not have a numerical expression. An example of an attributive series of distribution is the distribution of the economically active population of the Russian Federation by sex in 2010 (Table 3.10).

Table 3.10. Distribution of the economically active population of the Russian Federation by sex in 2010

Variational distribution series are called, built on a quantitative basis, i.e. a feature that has a numeric expression.

The distribution variation series consists of two elements: variants and frequencies.

Variants name the individual values ​​of the feature, which it takes in the variation series.

Frequencies are the numbers of individual variants or each group of the variation series. Frequencies show how often one or another attribute value occurs in the studied population. The sum of all frequencies determines the size of the entire population, its volume.

Frequencies called frequencies, expressed in fractions of a unit or as a percentage of the total. Accordingly, the sum of the frequencies is 1, or 100%.

Depending on the nature of the variation of the trait distinguish between discrete and interval variation series of distribution.

Discrete variation series of distribution - This is a distribution series in which the groups are composed according to a feature that changes discontinuously, i.e. through a certain number of units, and accepting only integer values. For example, the distribution of the number of apartments built in the Russian Federation by the number of rooms in them I! 2010 (Table 3.11).

Table 3.11. Distribution of the number of apartments built in the Russian Federation by the number of rooms in them in 2010

Interval variation series of distribution - This is a distribution series in which the grouping attribute, which forms the basis of the grouping, can take any values ​​in the interval that differ from each other by an arbitrarily small amount.

The construction of interval variation series is expedient, first of all, with continuous variation of the trait (Table 3.12), as well as if the discrete variation of the trait manifests itself within wide limits (Table 3.13), i.e. the number of options for a discrete feature is large enough.

Table 3.12. Distribution of constituent entities of the Southern Federal District of the Russian Federation by area as of January 1, 2011

Table 3.13. Distribution of subjects of the Central Federal District of the Russian Federation by the number of municipal educational institutions as of January 1, 2011

The rules for constructing distribution series are similar to the rules for constructing a grouping.

The analysis of distribution series can be visually carried out on the basis of their graphic representation. For this purpose, a polygon, a histogram, and distributions are built.

Polygon used when displaying discrete variation series of distribution. To construct it in a rectangular coordinate system along the abscissa axis, the ranked values ​​of the variable feature are plotted on the same scale, and a scale is applied along the ordinate axis to express the magnitude of the frequencies. Obtained at the intersection of the abscissa axis (X) and the ordinate axes (Y) of the points are connected by straight lines, as a result of which a polyline, called a frequency polygon, is obtained.

Histogram used to represent an interval variation series. When constructing a histogram, the values ​​of the intervals are plotted on the abscissa axis, and the frequencies are depicted by rectangles built at the corresponding intervals. The height of the bars should be proportional to the frequencies.

A histogram can be converted to a distribution polygon by connecting the midpoints of the upper sides of the rectangles with straight lines.

When constructing a histogram of the distribution of the variation series with unequal intervals on the ordinate axis, not the frequencies are plotted, but the density of the feature distribution in the corresponding intervals. Density of distribution - this is the frequency, calculated per unit width of the interval,

those. how many units in each group are per unit of the interval.

A cumulative curve can be used to graphically represent the distribution variation series. Via cumulates represent a series of accumulated frequencies. The accumulated frequencies are determined by sequentially summing the frequencies in groups.

When constructing the cumulates of the interval variation series along the abscissa axis (X) the options of the series are laid, and the accumulated frequencies along the ordinate axis (Y) are plotted on the graph field in the form of perpendiculars to the abscissa axis in the upper boundaries of the intervals. Then these perpendiculars are connected and a polyline is obtained, i.e. cumulative.

If the graphical representation of the variation series of distributions in the form of cumulative axes X and U swap, it turns out ogive.

    All the values ​​of the studied property that are found in the studied population are called the value of the feature (option, variant), and the change in this value variation. Variants are designated by small letters of the Latin alphabet with indices corresponding to the ordinal number of the group - x i .

    A number that shows how many times each value of a trait occurs in the studied population frequency and denote f i ... The sum of all frequencies of the series is equal to the volume of the studied population.

    Very often needs to be counted accumulated frequency (S). The cumulative frequency for each characteristic value shows how many population units have a characteristic value no greater than the given value. The accumulated frequency is calculated by successively adding to the frequency of the first value of the feature the frequencies of the following feature values:

The accumulated frequency is calculated from the very first characteristic value.

The sum of the frequencies is always equal to one or 100%. Replacing frequencies with frequencies allows one to compare the series of variations with a different number of observations.

The frequencies of the series (f i) in some cases can be replaced by the frequencies (ω i).

If the variation series is given at unequal intervals, then for a correct understanding of the nature of the distribution, it is necessary to calculate the absolute or relative density of distribution.

    Absolute distribution density (p f ) is the value of the frequency per unit of the size of the interval of a separate group of the series:

R f = f/ i.

    Relative distribution density (p ω ) represents the value of the frequency per unit of the size of the interval of a separate group of the series:

R ω = ω / i.

For rows with unequal intervals, only these characteristics give a more correct idea of ​​the nature of the distribution than frequency and frequency.

    Statistical distribution of the sample called a list of options (feature values) and their corresponding frequencies or distribution densities, relative frequencies or relative distribution densities.

Different distribution series are characterized by a different set of frequency characteristics:

minimum - attributive series (frequency, frequency),

for discrete four characteristics are used (frequency, frequency, accumulated frequency, accumulated frequency),

for interval - all five (frequency, frequency, cumulative frequency, cumulative frequency, absolute and relative density of distribution).

  1. Rules for constructing an interval variation series

  1. Graphic representation of variation series

The first stage in the study of the variation series is the construction of its graphic representation. The graphical representation of the variation series facilitates their analysis and makes it possible to judge the shape of the distribution. For a graphical representation of the variation series in statistics, a histogram, polygon and cumulative distribution are built.

The discrete variation series is depicted as a so-called frequency polygon.

To display the interval series, the frequency distribution polygon and the frequency histogram are used.

Graphs are built in a rectangular coordinate system.

Statistical distribution series represent an ordered arrangement of units of the studied population into groups according to grouping characteristics.

Distinguish between attributive and variation distribution series.

Attributive is a distribution series based on qualitative characteristics. It characterizes the composition of the population for various essential features.

According to the quantitative criterion, variation range of distribution. It consists of the frequency (number) of individual variants or each group of the variation series. These numbers show how often different variants (characteristic values) occur in a distribution series. The sum of all frequencies determines the size of the entire population.

The numbers of groups are expressed in absolute and relative terms. In absolute terms, it is expressed by the number of units of the population in each selected group, and in relative terms - in the form of shares, specific weights presented as a percentage of the total.

Depending on the nature of the variation of the trait, discrete and interval variation series of distribution are distinguished. In a discrete variational series, the distributions of the groups are composed according to a feature that changes discretely and takes only integer values.

In the interval variation series of the distribution, the grouping attribute that constitutes the base of the grouping can take on any values ​​in a certain interval.

The variation series consists of two elements: frequencies and variations.

Option the individual value of the variable characteristic, which it takes in the distribution series, is called.

Frequency- this is the number of individual variants or each group of the variation series. If frequencies are expressed in fractions of one or as a percentage of the total, then they are called frequencies.

The rules and principles for constructing interval distribution series are based on similar rules and principles for constructing statistical groupings. If the interval variation series of the distribution is plotted with equal intervals, the frequencies allow us to judge the degree of filling the interval with population units. To carry out a comparative analysis of the filling of the intervals, an indicator is determined that will characterize the distribution density.

Distribution density is the ratio of the number of population units to the width of the interval.

Variational are called distribution series, built on a quantitative basis. Any variation series consists of two elements: options and frequencies. Variants the individual values ​​of the attribute, which it takes in the variation series, are considered, that is, the specific value of the varying attribute. Frequencies- these are the numbers of individual variants or each group of the variation series, that is, these are numbers showing how often certain options are found in the distribution series. The sum of all frequencies determines the size of the entire population, its volume.

Frequencies called frequencies, expressed in fractions of a unit or as a percentage of the total. Accordingly, the sum of the frequencies is 1 or 100%.

Depending on the nature of the variation of the trait, discrete and interval variation series are distinguished.

As you know, the variation of quantitative features can be discrete (discontinuous) or continuous.

In the case of discrete variation, the value of a quantitative characteristic takes only integer values. Hence, discrete variation series characterizes distribution of units of the population on a discrete basis. An example of a discrete variation series is the distribution of families by the number of rooms in individual apartments, given in table. 3.12.

The first column of the table shows the variants of the discrete variation series, the second - the frequencies of the variation series, and the third - shows the frequencies.

In the case of continuous variation, the value of a feature in units of a population can take, within certain limits, any values ​​that differ from each other by an arbitrarily small amount. Building interval variation series it is expedient, first of all, with continuous variation of the characteristic, and also if the discrete variation manifests itself in wide limits, that is, the number of variants of the discrete characteristic is large enough. Table 3.3 shows an interval variation series.

Graphical representation of distribution rows

The analysis of distribution series can be carried out on the basis of their graphical representation. Bar and pie charts are plotted to show the structure of a population.

Lines such as polygon, cumulative, ogive, histogram are also used with diagrams. When displaying discrete variation series, a polygon is used.

Polygon- a broken curve, is built on the basis of a rectangular coordinate system, when the values ​​of the feature are plotted along the X-axis, and frequencies are plotted along the Y-axis.

Smooth curve connecting points is the empirical distribution density.

Cumulata- a broken curve, built on the basis of a rectangular coordinate system, when the values ​​of the feature are plotted along the X-axis, and the accumulated frequencies are plotted along the Y-axis.

For discrete rows, the values ​​of the attribute themselves are plotted on the axis, and for interval rows, the middle of the intervals.

On the basis of histograms, it is possible to construct diagrams of accumulated frequencies with the subsequent construction of an integral empirical distribution function.

Variational series, their elements.

A researcher interested in the tariff category of mechanical workers
workshop, conducted a survey of 100 workers. Let us arrange the observed values
prize-nak in ascending order. This operation is called ranking the sta-
statistical data. As a result, we get the next row, which is called
Xia ranked:

1,1,..1, 2,2..2, 3,3,..3, 4,4,..4, 5,5,..5, 6,6,..6.

From the ranked series it follows that the studied attribute (tariff
digit) took six different values: 1, 2, 3, 4, 5 and 6.

In what follows, various values ​​of the characteristic will be called option-
mi,
and under variation - understand the change in the values ​​of the attribute.

Depending on the values ​​accepted by the sign, the signs are divided
on the discretely varying and continuously varying.

Tariff category is a discretely varying feature. Number showing
the number of times the variant x occurs in a number of observations is called hour-
tooy
option m x.

Instead of the frequency of variant x, one can consider its relation to the general
number of observations n, which is called often option and its relation is denoted w x.

w x = m x / n = m x / åm x

The table that allows you to judge the distribution of frequencies (or frequencies) between the options is called discrete variation series.

Along with the concept of frequency, the concept is used accumulated frequency,
which denote t x nak. The accumulated frequency shows how many
observation, the sign took on values ​​less than the given value x. Relationship
the accumulated frequency to the total number of observations n is called accumulated
frequency
and denote w x nak... It's obvious that



w x nak = m x nak / n = m x nak / åm x.

The accumulated frequencies (frequencies_ for a discrete variation series, are calculated in the following table:

X m x m x nak w x nak
0+4=4 0,04
4+6=10 0,10
10+12=22 0,22
22+16=38 0,38
38+44=82 0,82
82+18=100 1,00
Above 6

Let it be necessary to investigate the output per worker - machine operator of a machine shop in the reporting year as a percentage of the previous year. Here, the investigated feature x is production in the reporting year as a percentage of the previous one. This is a continuously varying feature. To identify the characteristic features of varying the values ​​of the attribute, we will unite into groups of workers, whose production value fluctuates within 10%. The grouped data is presented in the table:

Research Sign x Number of workers m Share of workers w Accumulated. frequency m x nak w x nak
80-90 8/117 8/117
90-100 15/117 8+15=23 23/117
100-110 46/117 23+46=69 69/117
110-120 29/117 69+29=98 98/117
120-130 13/117 98+13=111 111/117
130-140 3/117 111+3=114 114/117
140-150 3/117 114+3=117 117/117
å

In the table, the frequencies m show in how many observations the feature took values ​​belonging to one or another interval. This frequency is called interval, and its ratio to the total number of observations is interval frequency w. The table that makes it possible to judge the distribution of frequencies between the intervals of variation of the values ​​of the attribute is called interval variation series.

The interval variation series is constructed according to observation data for non-
discontinuously varying sign, as well as discretely varying, if
a large number of observing options. A discrete variation series is constructed
only for a discretely varying feature

Sometimes the interval variation series is conventionally replaced by a discrete one.
Then the mean value of the interval is taken as the option x, and the corresponding
the next interval frequency - for t x.

To determine the optimal constant interval h, one often uses Sturgess formula:

h= (x max - x min) / (1 + 3.322 * lg n).

Construction of int.var.series

Frequencies m show how many observations the feature took on values ​​belonging to one or another interval. This frequency is called the interval frequency, and its ratio to the total number of observations is called the interval frequency w. The table that makes it possible to judge the distribution of frequencies (or frequencies) between the intervals of variation of the values ​​of the feature is called the interval variation series.

An interval variation series is constructed according to the observation data for a continuously varying feature, as well as for a discretely varying one, if the number of observed variants is large. A discrete variation series is constructed only for a discretely varying feature.

Sometimes the interval variation series is conventionally replaced by a discrete one. Then the median value of the interval is taken as the variant x, and the corresponding interval frequency - as mx

To construct an interval variation series, it is necessary to determine the size of the interval, set the full scale of intervals and group the results of observations in accordance with it.

To determine the optimal constant interval h, the Sturgess formula is often used:

h = (xmax - xmin) / (1+ 3.322 lg n).

where xmax xmin are the maximum and minimum options, respectively. If, as a result of calculations, h turns out to be a fractional number, then either the nearest integer or the nearest simple fraction should be taken as the value of the interval.

It is recommended to take the value a1 = xmin-h / 2 as the beginning of the first interval; the beginning of the second interval coincides with the end of the first and is equal to a2 = a1 + h; the beginning of the third interval coincides with the end of the second and is equal to a3 = a2 + h. The construction of intervals continues until the beginning of the next interval in the order is greater than xmax. After setting the scale of intervals, the results of observations should be grouped.

5) Concept, forms of expression and types of statistical indicators.

Statistical indicator is a quantitative characteristic of socio-economic phenomena and processes in conditions of qualitative certainty. The qualitative definiteness of the indicator lies in the fact that it is directly related to the inner content of the studied phenomenon or process, its essence.

System of statistical indicators Is a set of interrelated indicators that have a single-level or multi-level structure and are aimed at solving a specific statistical problem.

In contrast to the characteristic, the statistical indicator is obtained by calculation. This can be a simple count of units of a population, the summation of their values ​​of an attribute, a comparison of 2 or more values, or more complex calculations.

Distinguish between a specific statistical indicator and a category indicator.

Specific statistic characterizes the size, magnitude of the studied phenomenon or process in a given place and at a given time. However, in theoretical works and at the design stage of statistical observation, they also operate with absolute indicators or indicators-categories.

Indicators-categories reflect the essence, the general distinctive properties of specific statistical indicators of the same type without specifying the place, time and numerical value. All statistical indicators are divided according to the coverage of the aggregate units into individual and free, and in form - into absolute, relative and average.

Individual indicators characterize a separate object or a separate unit of the aggregate - an enterprise, a firm, a bank, etc. An example is the number of industrial production personnel of an enterprise. On the basis of the correlation of two individual absolute indicators characterizing the same object or unit, an individual relative indicator is obtained.

Summary indicators in contrast to individual, they characterize a group of units, which is a part of a statistical population or the entire population as a whole. These indicators are subdivided into volumetric and calculated ones.

Volume indicators are obtained by adding the values ​​of the attribute of individual units of the population. The resulting value, called the volume of the trait, can act as a volumetric absolute indicator, or it can be compared with another volumetric absolute value or the volume of the population. In the last 2 cases, the volumetric relative and volumetric average indicators are obtained.

Calculated indicators, calculated according to various formulas, serve to solve individual statistical problems of analysis - measurement of variation, characteristics of structural shifts, assessment of the relationship, etc. They are also divided into absolute, relative or average.

This group includes indices, closeness coefficients, sampling errors and other indicators.

The coverage of population units and the form of expression are the main, but not the only, classification features of statistical indicators. The time factor is also an important classification feature. Socio-economic processes and phenomena are reflected in statistical indicators either as of a certain point in time, as a rule, on a certain date, the beginning or end of a month, a year, or for a certain period - a day, a week, a month, a quarter, a year. In the first case, the indicators are momentary, in the second - interval.

Depending on belonging to one or two objects of study, they are distinguished single-object and interobject indicators... If the former characterize only one object, then the latter are obtained by comparing two values ​​related to different objects.

In terms of spatial certainty, statistical indicators are subdivided into general territorial characterizing the studied object or phenomenon as a whole in the country, regional and local relating to any part of the territory or a separate object.

6) Types and relationship of relative indicators.

Relative indicator represents the result of dividing one absolute indicator by another and expresses the relationship between the quantitative characteristics of socio-economic processes and phenomena. Therefore, relative to absolute indicators, relative indicators or indicators in the form of relative values ​​are derivatives.

When calculating the relative indicator, the absolute indicator located in the numerator of the resulting ratio is called current or comparable... The indicator with which the comparison is made and which is in the denominator is called the base or base of comparison. Relative rates can be expressed as percentages, ppm, ratios, or can be named numbers.

All relative indicators used in practice are divided into:

· Speakers; · Plan; · Implementation of the plan; · Structures; · Coordination; · Intensity and level of ec-th development; · Comparisons.

Relative danamiki score is the ratio of the level of the process or phenomenon under study for a given period of time to the level of the same process or phenomenon in the past.

KPI = current indicator / previous. Or baseline.

The value calculated in this way shows how many times the current level exceeds the previous one or what proportion of the latter is. If this indicator is expressed as a multiple ratio, it is called growth rate, when multiplying this coefficient by 100%, you get growth rate.

Relative structure exponent represents the ratio of the structural parts of the studied object and their whole. The relative indicator of the structure is expressed in fractions of a unit or as a percentage. The calculated values ​​(d i), respectively called the proportions or specific gravity, show which proportion has or which specific weight has the i-th part in the total.

Relative indicators of coordination characterize the ratio of the individual parts of the whole to each other. In this case, the part that has the greatest specific weight or is a priority from an economic, social or some other point of view is selected as a comparison base. The result is how many units of each structural part are in 1 unit of the basic structural part.

Relative intensity indicator characterizes the degree of propagation of the studied process or phenomenon in its inherent environment. This indicator is calculated when the absolute value turns out to be insufficient to formulate reasonable conclusions about the scale of the phenomenon, its size, saturation, and distribution density. It can be expressed as a percentage, ppm, or a named value. A variety of relative indicators of intensity are relative indicators of the level of eco-development, characterizing production per capita and playing an important role in assessing the development of the state's economy. In terms of the form of expression, these indicators are close to the average indicators, which often leads to their mixing or identification. The difference between them lies only in the fact that when calculating the average indicator, we are dealing with a set of units, each of which is the carrier of the averaged characteristic.

Relative Comparison Score is the ratio of the same-name absolute indicators characterizing different objects (enterprises, firms, regions, districts, etc.)

Variation indicators

The study of variation (change in the values ​​of a trait within the population) is of great importance in statistics and socio-economic research in general. The absolute and relative indicators of variation, characterizing the variability of the values ​​of a variable attribute, allow, in particular, to measure the degree of connection and relationship, to assess the degree of homogeneity of the population, typicality and stability of the average, to determine the magnitude of the possible error of sample observation.

The absolute indicators of variation include the range of variation, the average linear deviation, variance, standard deviation and quarterly deviation.

The range of variation shows how much the value of a quantitatively varying attribute changes

R = xmax-xmin, where xmax (xmin) is the maximum (minimum) value of the feature in the aggregate (in the distribution series).

The average linear deviation d is defined as the average value of the deviations of the options of the attribute from the average in the first degree, taken modulo:

The average linear deviation is relatively rarely used to assess the variation of a trait. The variance and standard deviation are usually calculated.

If it is necessary to compare the variability of several characteristics in one population or the same characteristic in several populations with different indicators of the center of distribution, then use the relative indicators of variation.

These include the following indicators:

1. Oscillation coefficient:

2. Relative linear deviation:

3. Coefficient of variation:

4. Relative index of quartile variation:

The most commonly used measure of relative variation is the coefficient of variation. This indicator is used not only for the comparative assessment of variation, but also as a characteristic of the homogeneity of the population. An aggregate is considered homogeneous if<0,33.

Forms.

1. Stat. reporting is such an org-I form in which units of obs-I will provide information about their activities in the form of forms, regulatory apparatus.

The peculiarity of reporting is that it is obligatory justified, obliged to be executed and legalized by the signature of the head or the person in charge.

2. Specially organized observation is the most striking and simple example of this form of obb-I yavl. census. The census is usually carried out at regular intervals, simultaneously over the entire study area at the same time.

The Russian statistics bodies conduct population censuses of certain types of sub-distributions and organizations, maternal resources, perennial plantations, objects of NZ construction, etc.

4. Register form of observation - based on the maintenance of the stat-register. In the register, every unit of obs-I har-Xia a number of indicators. In domestic statistical practice, the most widespread are the us-I registers and the p / p registers.

Population registration - carried out by the registry office

Registration p / p - EGRPO led.org. statistics.

Kinds.

can be divided into groups by trail. featured:

a) by the time of registration

b) by the coverage of the units of the

By reg. they are:

Current (continuous)

Discontinuous (periodic and one-time)

With tech. obs. changes in phenomena and processes are recorded as they arrive (registration of birth, death, marriage, divorce, etc.)

Periodic obs. is carried out through def. time intervals (N population census every 10 years)

Uniformly. obs. held either not regularly, or only once (referendum)

By the scope of units. sov-ti stat-e obs. there are:

Solid

Discontinuous

Continuous obs. is a survey of all units of the

Discontinuous obs. assumes h. serv-yu is subject to only a part of the research of the council.

There are several types of discontinuous observations:

Basic method array

Selective (by yourself)

Monographic

This method of xia is that as a rule the most creatures are selected, usually the largest units. sov-ti in the cat. focus means. part of all oblh signs.

With a monographic observation and a thorough an. exposed to dep. units study-oh sov-ti or m. or typical for a given sov-ti units. or there are some new varieties of phenomena.

Multiple obs. carried out in order to identify or emerging trends in the development of this phenomenon.

The ways

Direct observation

Documentary obs.

Directly called. such obs. with a cat. the registrars themselves, by immediate measurement, counting, restraining the mouth, are the fact subject to registration and on this basis make an entry in the form.

Documentary method obs. based on use as sources of information, various docs, as a rule, accounting x-ra (i.e. statistical reporting)

Polling is a way of persuading a cat. the necessary information will be obtained from the words of the respondent (i.e. the respondent) (oral, correspondent, questionnaire, attendance, etc.)

Determination of sampling errors.

In the process of conducting selective observation, two types of errors are distinguished: registration and representativeness.

Registration errors - deviations between the value of the indicator, obtained during the statistical observation, and its actual value. These errors can appear with both continuous and non-continuous observation. Registration errors occur due to incorrect or inaccurate information. The sources of this type of error can be misunderstanding of the essence of the issue, inattention of the registrar, omission or repeated counting of individual observation units. Registration errors are divided into systematic caused by reasons acting in any one direction and smoothing the survey results (rounding of numbers), and random, which are the result of the action of various random factors (rearrangement of adjacent digits). Random errors have a different orientation and, with a sufficiently large volume of the surveyed population, are mutually canceled out.

Representative errors - deviations of the values ​​of the indicator of the surveyed population from its value in the initial population. These errors are also categorized into systematic, which appear due to violation of the principles of selection of the units to be observed from the initial population, and random that arise if the selected population does not fully reproduce the entire population as a whole. The amount of random error can be estimated.

Sample observation error- the difference between the value of a trait in the general population and its value, calculated from the results of selective observation. In the practice of sample surveys, the average and marginal sampling error is most often determined.

The average sampling error for different sampling methods is calculated in different ways. If random or fur-th selection, then

For medium: m = s 2 / (n) 1/2

For a share: m = (w (1-w) / n) 1/2, where

m - mean sampling error

s 2 - general variance

n - sample size

If the sample population is formed on the basis of a typical sample and the selection of units is carried out in proportion to the volume of typical groups, then the average error is:

For medium: m = (s i 2 / n) 1/2

For a share: m = (w i (1-w i) / n) 1/2 , where

s i 2 - average of variances within groups

w i - the proportion of units in the total group that have the trait under study.

s i 2 = ås 2 n i / ån i

The mean error of the serial sample is equal to:

For medium: m = (d x 2 / r) 1/2

For a share: m = (d 2 w / r) 1/2

d 2 w - intergroup variance

d x 2 - intergroup variance of the quantitative trait.

r is the number of selected series /

d 2 x = å (x i -x) 2 / r

d 2 w = å (w i - w) 2 / r

If the selection of units from the general population is performed in a non-repetitive manner, then an amendment is made to the average error formulas: (1-n / N) 1/2

Marginal sampling error D is calculated as the product of the confidence coefficient t and the average sampling error: D = t * m. D is related to the confidence level that guarantees it. This level determines the coefficient of confidence t, and vice versa. The t values ​​are given in special mathematical tables.

Determination of the sample size.

The sample size is calculated, as a rule, at the design stage of the sample survey. The formulas for determining the sample size follow from the formulas for the marginal sampling errors.

The volume of actually random and mechanical repeated samples is determined by the formulas:

For medium n = t 2 s 2 / D 2

For a share n = t 2 w (1-w) / D 2

In the case of a non-replicate sample:

For medium n = t 2 s 2 N / ND 2 + t 2 s 2

For a share n = t 2 w (1-w) N / ND 2 + t 2 w (1-w).

The quantities s 2 and w unknown before random observation. Approximately they are found like this:

1. taken from previous surveys;

2. if the maximum and minimum values ​​of the attribute are known, then the standard deviation is determined according to the "three sigma" rule:

s = x max - x min / 6

3. when studying an alternative feature, if there is no information about its share in the general population, the maximum possible value w = 0.5 is taken

With a typical selection proportional to the size of typical groups, the sample size for each group is determined by the formula : n i = n * N i / N, where

n i - sample size from the i-th group

N i- the volume of the i -th group in the gen-th council.

With a sample proportional to the variation of the trait, the size of the sample from each group is found as follows: n i = nN i s i / åN i s i.

With a typical resampling proportional to the size of the groups, the total size of the sample is found as follows:

For medium n = t 2 s 2 i / D 2

For a share n = t 2 w (1-w) / D 2

In the case of a non-replicate typical sample:

For medium n = t 2 s 2 i N / D 2 N + t 2 s 2 i

For a share n = t 2 w (1-w) N / D 2 N + t 2 w (1-w)

Basic concepts and prerequisites for the use of correlation-regression analysis.

Correlation- this is a statistical relationship between random variables that do not have a strictly functional nature, in which a change in one of the random variables leads to a change in the expectation of the other.

Correlation analysis- has as its task a quantitative determination of the closeness of the relationship between two signs and between the productive and a set of factor signs. The tightness of communication is quantitatively expressed by the magnitude of the correlation coefficients.

Correlation-regression analysis as a general concept includes the measurement of tightness, the direction of communication and the establishment of an analytical expression (form) of communication (regression analysis).

Regression analysis is to determine the analytical expression of the relationship, in which a change in one quantity (called dependent or resultant sign) is due to the influence of one or more independent quantities (factors), and many of all other factors that also affect the dependent quantity, take -mayes for constant and average values. Regression can be one-way (paired) and multifactorial (multiple).

The purpose of regression analysis is the assessment of the functional dependence of the conditional mean value of the resultant trait (Y) on the factorial (x 1, x 2, ... x k) signs.

The basic premise of regression analysis is that only the resultant attribute (Y) obeys the normal distribution law, and the factorial attributes x 1, x 2, ..., x k can have an arbitrary distribution law. In the analysis of time series, the time t is used as a factor attribute. At the same time, in the regression analysis, it is assumed in advance that there are causal relationships between the effective (Y) factorial (x 1, x 2, ..., x k) signs. The regression equation, or a statistical model of the relationship of socio-economic phenomena, expressed by the function Y x = f (x 1, x 2, ..., x k), is sufficiently adequate to the real simulated phenomenon or process if the following building requirements.

1. The set of the studied initial data d / b is homogeneous and mathematically described by continuous functions.

2. The ability to describe the modeled phenomenon by one or more equations of cause-and-effect relationships.

3. All factor signs must have a quantitative (digital) expression.

4. The presence of a sufficiently large volume of the studied sample population.

5. Causal relationships between phenomena and processes should be described by linear or reduced to linear forms of dependence.

6. Absence of quantitative restrictions on the parameters of the communication model.

7. The constancy of the territorial and temporal structure of the studied population.

The theoretical validity of the relationship models built on the basis of correlation-regression analysis is ensured by observing the following basic conditions.

1. All signs and their joint distributions must obey the normal distribution law;

2. The variance of the modeled attribute (Y) should remain constant all the time when the value (Y) and the values ​​of the factor attributes change.

3. Individual observations are d / w independent, that is, the results obtained in the i-th observation should not be associated with previous ones and contain information about subsequent observations, as well as influence them.

SUMMARY OBJECTIVES AND ITS CONTENT

observation provides information on each unit of the investigated object. The data obtained are not generalized indicators. With their help, it is impossible to draw conclusions about the object as a whole without preliminary data processing.

Therefore, the goal of the next stage of statistical research is to systematize the primary data and obtain, on this basis, a summary characteristics of the entire object using generalizing statistical plots.

Summary - a set of sequential operations to generalize specific individual facts that form a set, to identify the typical features and patterns inherent in the phenomenon under study as a whole.

if, during statistical observation, data is collected about each unit of an object, then the result of the summary is detailed data reflecting the whole population as a whole

The statistical summary should be conducted on the basis of a preliminary theoretical analysis of phenomena and processes, so that during the summary it does not lose information about the phenomenon under study and all statistical results reflect the most important characteristic features of the object.

In terms of the depth of processing of the material, the summary can be simple and complex.

A simple summary is called the operation of calculating the total totals for sov-ti observation units.

A complex summary is a set of operations that include grouping of observation units, calculating totals for each group and for the entire object, and presenting the grouping results and summary in the form of statistical tables.

The summary is preceded by the development of its program, which consists of the following stages: selection of grouping attributes; determination of the order of formation of groups; development of a system of statistical po-ley to characterize groups and the object as a whole; development of a system of layouts of statistical tables, in which the results of the summary should be presented.

According to the form of material processing, summary: decentralized and centralized.

With a decentralized summary (this is what is used, as a rule, in the processing of statistical reporting), the development of the material is carried out in sequential stages. So, the reports of enterprises are compiled by the statistical bodies of the constituent entities of the Russian Federation, and already the results for the region are submitted to the State Statistics Committee of Russia, and there are determined pok-li as a whole for the national economy of the country.

With a centralized summary, all primary material goes to one organization, where it is processed from start to finish. A centralized summary is usually used to process materials from one-time statistical surveys.

According to the technique of execution, the statistical summary is subdivided into mechanized and manual.

Mechanized summary - in which all operations are carried out using electronic computers. With a manual summary, all the main operations (calculation of group and grand totals) are carried out manually.

To conduct a summary, a plan is drawn up, which sets out organizational issues: by whom and when all operations will be carried out, the procedure for its implementation, the composition of information to be published in the periodical, press.

Closing the rows of ding-ki

When analyzing the rows of ding-ki, it becomes necessary to close them-combine two or more rows into one row. Closure is necessary in cases where the levels of the series are incomparable due to territorial changes, due to changes in prices and in connection with changes in the median of calculating the levels of the series. it is necessary to close (combine) the above two rows into one. This can be done using the coefficient of comparability. Multiplying the data for the year by the obtained coefficient, we obtain a closed (comparable) series of dinks of absolute values. , and after the change are taken as 100%, and the rest are recalculated as a percentage in relation to these levels, respectively.

30. M-dy alignment of rows of ding-ki

Any number of ding-ki can theoretically be represented in the form of three components:

Trend (main trend and development of time series);

Cyclic (periodic) fluctuations, including seasonal;

Random fluctuations.

One of the tasks arising in the analysis of dink series is to establish changes in the levels of the phenomenon under study. In some cases, the regularity of changes in the levels of a series of dinks is quite clear, for example, either a systematic decrease in the levels of a series, or their increase. sometimes the levels of a series undergo very different changes (sometimes they increase, sometimes they decrease). In this case, we can only talk about a general trend and development: either to growth or to decline.

Identification of the main trend and development (trend) is called alignment of the time series, and the main trend of alignment is revealed.

Direct selection of a trend can be done by three means.

* M-d integration of intervals. This md is based on the enlargement of the time periods to which the levels of the series belong. For example, a row of ding-ki

the daily output is replaced by a series of monthly projection output, and so on.

* M-d moving average. In this way, the initial levels of the series are replaced by average values, which are obtained from a given level and several symmetrically surrounding it. The integer number of levels over which the average value is calculated is called the smoothing interval. The smoothing interval can be odd (3, 5, 7, etc. points) and even (2, 4, 6, etc. points). The calculation of averages is carried out by the sliding method, that is, the gradual exclusion of the first level from the accepted sliding period and the inclusion of the next one. With odd smoothing, the resulting arithmetic mean value is fixed to the middle of the calculated interval.

"-" m-dyki smoothing by moving averages consists in the conventionality of determining the smoothed levels for points at the beginning and end of the series.

* Analytical alignment is the most effective way to identify the underlying trend and development. In this case, the levels of a series of dinks are expressed as a function of time: Yt = f (t)

The purpose of the analytic alignment of the dyne series is to determine the analyte factor f (t). In practice, according to the available time series, the form is set and the parameters of the function f (t) are found, and then the behavior of deviations from the trend is analyzed.

In economics, a function of the form is often used: Yi = a0 + ∑ ai + ti

From a function of the form (3.12), most often, when aligning, a linear function is used / (*) = ao + a1 * t or a parabolic f (t) = a0 + att + a2 t2.

The coefficients ao, a, a2, ..., ap in the formula are found by the least squares method.

According to this method, to find the parameters of the p-th degree polynomial, it is necessary to solve the system of so-called normal equations:

nаo + a1∑t = ∑Y

ao∑t + a1∑t * t = ∑Y * t.

The trend shows how systematic factors affect the level of dink. The fluctuation of levels around the trend is a measure of the impact of residual (random) factors. This impact can be assessed

according to the formula of the standard deviation.

Basic concepts of correlation and regression analysis.

Recommended to read

To the top