Distribution and grouping series. Give a definition of the variation series

Site arrangement 25.09.2019
Site arrangement

    All values ​​of the studied property that occur in the studied population are called the value of the attribute (option, variant), and the change in this value variation. Variants are designated by small letters of the Latin alphabet with indices corresponding to the ordinal number of the group - x i .

    A number that shows how many times each value of a trait occurs in the studied population frequency and denote f i ... The sum of all frequencies of the series is equal to the volume of the studied population.

    Very often needs to be counted accumulated frequency (S). The cumulative frequency for each characteristic value shows how many population units have a characteristic value no greater than the given value. The accumulated frequency is calculated by sequentially adding to the frequency of the first value of the feature the frequencies of the following feature values:

The accumulated frequency is calculated from the very first value of the characteristic

The sum of the frequencies is always equal to one or 100%. Replacing frequencies with frequencies allows one to compare the series of variations with a different number of observations.

The frequencies of the series (f i) in some cases can be replaced by the frequencies (ω i).

If the variation series is given at unequal intervals, then for a correct understanding of the nature of the distribution, it is necessary to calculate the absolute or relative density of distribution.

    Absolute distribution density (p f ) represents the value of the frequency per unit size of the interval of a separate group of the series:

R f = f/ i.

    Relative distribution density (p ω ) represents the value of the frequency per unit of the size of the interval of a separate group of the series:

R ω = ω / i.

For rows with unequal intervals, only these characteristics give a more correct idea of ​​the nature of the distribution than frequency and frequency.

    Statistical distribution of the sample call a list of options (feature values) and their corresponding frequencies or distribution densities, relative frequencies or relative densities distribution.

Different distribution series are characterized by a different set of frequency characteristics:

minimum - attributive series (frequency, frequency),

for discrete four characteristics are used (frequency, frequency, accumulated frequency, accumulated frequency),

for interval - all five (frequency, frequency, cumulative frequency, cumulative frequency, absolute and relative density of distribution).

  1. Rules for constructing an interval variation series

  1. Graphic representation of variation series

The first stage of the study variation series is the construction of its graphic image. The graphical representation of the variation series facilitates their analysis and makes it possible to judge the shape of the distribution. For a graphical representation of the variation series in statistics, a histogram, polygon and cumulative distribution are built.

The discrete variation series is depicted as a so-called frequency polygon.

To display the interval series, the frequency distribution polygon and the frequency histogram are used.

Graphs are built in a rectangular coordinate system.

Variational series: definition, types, main characteristics. Calculation method
fashion, median, arithmetic mean in medical and statistical research
(show with a conditional example).

A variation series is a series of numerical values ​​of the trait under study, differing from each other in magnitude and located in a certain sequence (in ascending or descending order). Each numerical value of the series is called a variant (V), and the numbers showing how often one or another variant occurs in a given series is called frequency (p).

The total number of observation cases that make up the variation series is denoted by the letter n. The difference in the meaning of the studied characteristics is called variation. If the varying trait does not have a quantitative measure, the variation is called qualitative, and the distribution series is attributive (for example, the distribution according to the outcome of the disease, according to the state of health, etc.).

If a variable feature has a quantitative expression, such a variation is called quantitative, and the distribution series is called variational.

Variational series are divided into discontinuous and continuous - according to the nature of the quantitative trait, simple and weighted - according to the frequency of occurrence.

In a simple variation series, each variant occurs only once (p = 1), in a weighted one, the same variation occurs several times (p> 1). Examples of such series will be discussed later in the text. If the quantitative trait is continuous, i.e. between integer values ​​there are intermediate fractional values, the variation series is called continuous.

For example: 10.0 - 11.9

14.0 - 15.9, etc.

If a quantitative feature is discontinuous, i.e. its individual values ​​(variants) differ from each other by an integer and do not have intermediate fractional values; the variation series is called discontinuous or discrete.

Using the heart rate data from the previous example

for 21 students, we will construct a variation series (Table 1).

Table 1

Distribution of medical students by heart rate (beats / min)

Thus, to build a variation series means the available numerical values ​​(options) to systematize, order, i.e. arrange in a certain sequence (in ascending or descending order) with the corresponding frequencies. In this example, the options are arranged in ascending order and are expressed as whole discontinuous (discrete) numbers, each option occurs several times, i.e. we are dealing with a weighted, discontinuous or discrete variation series.

As a rule, if the number of observations in the statistical population we are studying does not exceed 30, then it is enough to arrange all the values ​​of the trait under study in the increasing series of variations, as in Table. 1, or in descending order.

At a large number observations (n> 30), the number of variants encountered can be very large, in this case an interval or grouped variation series is compiled, in which, to simplify subsequent processing and clarify the nature of the distribution, the variants are combined into groups.

Usually the number of group options ranges from 8 to 15.

There must be at least 5 of them, because otherwise it will be too coarse, excessive enlargement, which distorts the big picture variation and greatly affects the accuracy of the average. When the number of group options is more than 20-25, the accuracy of calculating the average values ​​increases, but the features of the variation of the feature are significantly distorted and mathematical processing becomes more complicated.

When compiling a grouped series, it is necessary to take into account

- option groups should be arranged in a certain order (in ascending or descending);

- the intervals in the variant groups must be the same;

- the values ​​of the boundaries of the intervals should not coincide, because it will be unclear to which groups to assign individual options;

- it is necessary to take into account the qualitative features of the collected material when setting the interval limits (for example, when studying the weight of adults, an interval of 3-4 kg is permissible, and for children of the first months of life it should not exceed 100 g)

Let's build a grouped (interval) series characterizing the data on the heart rate (number of beats per minute) for 55 medical students before the exam: 64, 66, 60, 62,

64, 68, 70, 66, 70, 68, 62, 68, 70, 72, 60, 70, 74, 62, 70, 72, 72,

64, 70, 72, 76, 76, 68, 70, 58, 76, 74, 76, 76, 82, 76, 72, 76, 74,

79, 78, 74, 78, 74, 78, 74, 74, 78, 76, 78, 76, 80, 80, 80, 78, 78.

To build a grouped row, you must:

1. Determine the size of the interval;

2. Determine the middle, beginning and end of the groups variant of the variation series.

● The value of the interval (i) is determined by the number of supposed groups (r), the number of which is set depending on the number of observations (n) according to a special table

Number of groups depending on the number of observations:

In our case, for 55 students, you can make up from 8 to 10 groups.

The value of the interval (i) is determined by the following formula -

i = V max-V min / r

In our example, the value of the interval is 82-58 / 8 = 3.

If the value of the interval is a fractional number, the result should be rounded to the nearest whole number.

There are several types of average values:

● arithmetic mean,

● geometric mean,

● average harmonic,

● root mean square,

● medium progressive,

● median

V medical statistics the most commonly used are arithmetic mean values.

The arithmetic mean (M) is a generalizing value that determines the typical that is characteristic of the entire population. The main methods for calculating M are: the arithmetic mean method and the method of moments (conditional deviations).

The arithmetic mean method is used to calculate the simple arithmetic mean and the weighted arithmetic mean. The choice of the method for calculating the arithmetic mean depends on the type of variation series. In the case of a simple variation series, in which each option occurs only once, the arithmetic simple average is determined by the formula:

where: M is the arithmetic mean;

V is the value of the variable feature (options);

Σ - indicates the action - summation;

n - total number observations.

An example of calculating the arithmetic mean simple. Respiratory rate (number of breaths per minute) in 9 men aged 35 years: 20, 22, 19, 15, 16, 21, 17, 23, 18.

To determine the average level of respiratory rate in men aged 35 years, it is necessary:

1. Construct a variation series, arranging all the options in ascending or descending order. We got a simple variation series, because variant values ​​appear only once.

M = ∑V / n = 171/9 = 19 breaths per minute

Output. The respiratory rate in men aged 35 years is on average 19 respiratory movements per minute.

If the individual values ​​of the variant are repeated, there is no need to write out each variant in a line, it is enough to list the sizes of variant (V) and indicate the number of their repetitions (p) next to it. such a variation series, in which the variants are, as it were, weighted by the number of frequencies corresponding to them, is called a weighted variation series, and the calculated average value is an arithmetic weighted average.

Weighted arithmetic mean is determined by the formula: M = ∑Vp / n

where n is the number of observations equal to the sum of frequencies - Σр.

An example of calculating the arithmetic weighted average.

The duration of disability (in days) in 35 patients with acute respiratory diseases (ARI) treated by a local doctor during the first quarter of this year was: 6, 7, 5, 3, 9, 8, 7, 5, 6, 4, 9, 8, 7, 6, 6, 9, 6, 5, 10, 8, 7, 11, 13, 5, 6, 7, 12, 4, 3, 5, 2, 5, 6, 6, 7 days ...

The method for determining the average duration of disability in patients with acute respiratory infections is as follows:

1. Let's construct a weighted variational series, since individual variant values ​​are repeated several times. To do this, you can arrange all the options in ascending or descending order with their corresponding frequencies.

In our case, the options are arranged in ascending order

2. Calculate the arithmetic mean weighted by the formula: M = ∑Vp / n = 233/35 = 6.7 days

Distribution of patients with acute respiratory infections by duration of disability:

Duration of incapacity for work (V) Number of patients (p) Vp
∑p = n = 35 ∑Vp = 233

Output. The duration of disability in patients with acute respiratory diseases averaged 6.7 days.

Fashion (Mo) is the most common variation in the variation series. For the distribution presented in the table, the option equal to 10 corresponds to the mode, it occurs more often than others - 6 times.

Distribution of patients by length of stay in a hospital bed (in days)

V
p

Sometimes the exact magnitude of the mode is difficult to establish, because in the studied data there may be several observations that occur “most often”.

Median (Me) is a nonparametric indicator that divides the variation series by two equal halves: on both sides of the median is the same number option.

For example, for the distribution shown in the table, the median is 10, because on both sides of this value there are 14 options, i.e. the number 10 occupies the central position in this row and is its median.

Given that the number of observations in this example is even (n = 34), the median can be determined as follows:

Me = 2 + 3 + 4 + 5 + 6 + 5 + 4 + 3 + 2/2 = 34/2 = 17

This means that the middle of the series falls on the seventeenth option, which corresponds to a median equal to 10. For the distribution presented in the table, the arithmetic mean is:

M = ∑Vp / n = 334/34 = 10.1

So, for 34 observations from the table. 8, we got: Mo = 10, Me = 10, the arithmetic mean (M) is 10.1. In our example, all three indicators turned out to be equal or close to each other, although they are completely different.

The arithmetic mean is the resultant sum of all influences; all options, without exception, take part in its formation, including the extreme ones, often atypical for a given phenomenon or set.

The mode and median, in contrast to the arithmetic mean, do not depend on the magnitude of all individual values a varying attribute (values ​​of the extreme variant and the degree of dispersion of the series). The arithmetic mean characterizes the entire mass of observations, the mode and median - the main mass

  • 1. Public health and health care as a science and area of ​​practice. Main goals. Object, subject of study. Methods.
  • 2. History of health care development. Modern health care systems, their characteristics.
  • 3. State policy in the field of public health protection (Law of the Republic of Belarus "on health care"). Organizational principles of the state health care system.
  • 4. Nomenclature of healthcare organizations
  • 6. Insurance and private forms of health care.
  • 7. Medical ethics and deontology. Definition of the concept. Modern problems of medical ethics and deontology, characteristics. Oath of Hippocrates, oath of a doctor of the Republic of Belarus, Code of medical ethics.
  • 10. Statistics. Definition of the concept. Types of statistics. Statistical data accounting system.
  • 11. Groups of indicators for assessing the health status of the population.
  • 15. Unit of observation. Definition, characteristics of accounting signs
  • 26. Time series, their types.
  • 27. Indicators of the dynamic range, calculation, application in medical practice.
  • 28. Variational series, its elements, types, construction rules.
  • 29. Average values, types, calculation methods. Application in the work of a doctor.
  • 30. Indicators characterizing the diversity of a trait in the studied population.
  • 31. Representativeness of the feature. Assessment of the reliability of differences between relative and average values. The concept of Student's "t" criterion.
  • 33. Graphic displays in statistics. Types of diagrams, rules for their construction and design.
  • 34. Demography as a science, definition, content. The importance of demographic data for health care.
  • 35. Public health, factors affecting public health. Health formula. Indicators characterizing public health. Analysis scheme.
  • 36. Leading medical and social problems of the population. Problems of the size and composition of the population, mortality, fertility. Take from 37,40,43
  • 37. Population statistics, research methodology. Population census. Types of population age structures. Population size and composition, importance for health care
  • 38. Population dynamics, its types.
  • 39. Mechanical movement of the population. Study methodology. Characteristics of migration processes, their influence on the indicators of public health.
  • 40. Fertility as a medical and social problem. Study methodology, indicators. Fertility rates according to who. Modern trends in the Republic of Belarus and in the world.
  • 42. Reproduction of the population, types of reproduction. Indicators, calculation method.
  • 43. Mortality of the population as a medical and social problem. Study methodology, indicators. Whole mortality rates Modern tendencies. The main causes of mortality in the population.
  • 44. Infant mortality as a medical and social problem. Factors determining its level. Methodology for calculating indicators, assessment criteria WHO.
  • 45. Perinatal mortality. Methods for calculating indicators. Causes of perinatal mortality.
  • 46. ​​Maternal mortality. Methods for calculating the indicator. The level and causes of maternal mortality in the Republic of Belarus and the world.
  • 52. Medical and social aspects of the neuropsychic health of the population. Organization of neuropsychiatric care.
  • 60. Methods for studying morbidity. 61. Methods for studying the incidence of the population, their comparative characteristics.
  • Methodology for studying general and primary morbidity
  • Indicators of general and primary morbidity.
  • 63. Study of the morbidity of the population according to special registration data (infectious and major non-epidemic diseases, hospitalized morbidity). Indicators, accounting and reporting documents.
  • The main indicators of "hospitalized" morbidity:
  • Key indicators for the analysis of morbidity with vut.
  • 65. Study of morbidity according to the data of preventive examinations of the population, types of preventive examinations, the procedure for conducting. Health groups. The concept of "pathological affection".
  • 66. Morbidity by cause of death data. Study methodology, indicators. Medical certificate of death.
  • The main indicators of morbidity according to the data on the causes of death:
  • 67. Forecasting morbidity rates.
  • 68. Disability as a medical and social problem. Definition of the concept, indicators.
  • Disability trends in Belarus.
  • 69. Lethality. Methodology for calculating and analyzing mortality. Significance for the practice of a doctor and healthcare organizations.
  • 70. Methods of standardization, their scientific and practical purpose. Calculation methods and analysis of standardized indicators.
  • 72. Criteria for determining disability. The degree of expression of persistent disorders of the body's functions. Indicators characterizing disability.
  • 73. Prevention, definition, principles, contemporary problems. Types, levels, directions of prevention.
  • 76. Primary health care, definition of the concept, role and place in the health care system of the population. Main functions.
  • 78 .. Organization of medical care provided to the population on an outpatient basis. The main organizations are: a medical outpatient clinic, a city polyclinic. Structure, tasks, directions of activity.
  • 79. Nomenclature of hospital organizations. Organization of medical care in a hospital setting of health care organizations. Indicators of the provision of inpatient care.
  • 80. Types, forms and conditions for the provision of medical care. Organization of specialized medical care, their tasks.
  • 81. The main directions of improving inpatient and specialized care.
  • 82. Protection of health of women and children. Control. Medical organizations.
  • 83. Modern problems of women's health protection. Organization of obstetric and gynecological care.
  • 84. Organization of medical and preventive care for children. Leading problems of child health protection.
  • 85. Organization of health care for the rural population, the basic principles of providing medical care to rural residents. Organization stages.
  • Stage II - Territorial Medical Association (TMO).
  • Stage III - regional hospital and medical institutions of the region.
  • 86. City polyclinic, structure, tasks, management. The main indicators of the polyclinic activity.
  • The main indicators of the polyclinic activity.
  • 87. Precinct-territorial principle of organization of outpatient care to the population. Types of plots.
  • 88. Territorial therapeutic area. Standards. The content of the work of a district therapist.
  • 89. Office of infectious diseases of the polyclinic. Sections and methods of work of the doctor of the office of infectious diseases.
  • 90. Preventive work of the polyclinic. Department of prophylaxis of the polyclinic. Organization of preventive examinations.
  • 91. Dispensary method in the work of the polyclinic, its elements. Control card of dispensary observation, information reflected in it.
  • 1st stage. Registration, examination of the population and selection of contingents for registration in dispensary registration.
  • 2nd stage. Dynamic monitoring of the health status of dispensaries and the implementation of preventive and therapeutic measures.
  • 3rd stage. An annual analysis of the state of dispensary work in a medical facility, an assessment of its effectiveness and the development of measures to improve it (see Question 51).
  • 96. Department of medical rehabilitation of the polyclinic. Structure, tasks. The procedure for referral to the department of medical rehabilitation.
  • 97. Children's clinic, structure, tasks, work sections.
  • 98. Features of providing medical care to children on an outpatient basis
  • 99. The main sections of the work of the district pediatrician. The content of medical and preventive work. Communication in work with other medical and preventive organizations. Documentation.
  • 100. The content of the preventive work of the district pediatrician. Organization of patronage supervision of newborns.
  • 101. Comprehensive assessment of the health status of children. Medical examinations. Health groups. Clinical examination of healthy and sick children
  • Section 1. Information about the subdivisions, installations of the treatment-and-prophylactic organization.
  • Section 2. States of the medical and preventive organization at the end of the reporting year.
  • Section 3. Work of doctors of the polyclinic (outpatient clinic), dispensary, consultations.
  • Section 4. Preventive medical examinations and work of dental (dental) and surgical offices of a medical and prophylactic organization.
  • Section 5. Work of medical and auxiliary departments (offices).
  • Section 6. Work of diagnostic departments.
  • Section I. Activities of the antenatal clinic.
  • Section II. Inpatient childbirth
  • Section III. Maternal mortality
  • Section IV. Information about births
  • 145. Medical and social expertise, definition, content, basic concepts.
  • 146. Legislative documents regulating the procedure for conducting medical and social expertise.
  • 147. Types of mrek. Composition of regional, district, interdistrict, city and specialized MEC. Organization of work, rights and obligations. The procedure for referral to the MREC and examination of citizens.
  • 148. The main tasks and concepts of medical and social expertise.
  • 149. Rehabilitation, definition, types. The Law of the Republic of Belarus "On the Prevention of Disability and Rehabilitation of Disabled Persons".
  • the series is formed from relative or average values.

    27. Indicators of the dynamic range, calculation, application in medical practice.

    The absolute level of the series-quantity (levels) that make up the dynamic series (reflect

    phenomena at a certain moment or time interval))

    Absolute gain represents the difference between the next and the previous level.

    Growth rate is the ratio of the next level to the previous one, multiplied by 100%.

    Rate of increase is the ratio of the absolute increase (decrease) to the previous level, multiplied by 100%.

    1% gain value is determined by the ratio of the absolute growth to the growth rate.

    The indicator of visibility (shows the ratio of each level of the series to one of them, more often the initial one, taken as 100%).

    28. Variational series, its elements, types, construction rules.

    Variational series- a number of homogeneous statistical quantities characterizing the same quantitative accounting feature, differing from each other in size and arranged in a certain order (decreasing or increasing).

    Elements of the variation series:

    a) option -v- the numerical value of the studied changing quantitative trait.

    b) frequency -porf- the repeatability of a variant in a variation series, showing how often one or another variant occurs within a given series.

    v) total number of observationsn- the sum of all frequencies: n = ΣΡ. If the total number of observations is more than 30, the statistical sample is considered big if n is less than or equal to 30 - small.

    Variational series are:

    depending on the frequency of occurrence of the trait:

    a) simple- row - each option occurs once, i.e. frequencies are equal to one.

    b) normal- a row in which variants occur more than once.

    v) grouped- a series in which the variants are combined into groups according to their magnitude within a certain interval, indicating the frequency of repetition of all variants included in the group.

    The grouped variation series is used for a large number of observations and a large range of extreme values ​​of the variant.

    The processing of the variation series consists in obtaining the parameters of the variation series ( average size, standard deviation and mean error of the mean).

    3.depending on the number of observations:

    a) even and odd

    b) large (if the number of observations is more than 30) and small (if the number of observations is less than or equal to 30)

    29. Average values, types, calculation methods. Application in the work of a doctor.

    Average values give a generalizing characteristic of the statistical population for a certain changing quantitative attribute. average value characterizes the entire series of observations with one number expressing the general measure of the trait under study. It levels out random deviations of individual observations and gives a typical characteristic of a quantitative trait.

    Average requirements:

    1) the qualitative homogeneity of the population for which the average value is calculated - only then will it objectively reflect the characteristic features of the phenomenon under study.

    2) the average value should be based on the mass generalization of the trait under study, since only then does it express the typical dimensions of the trait

    Average values ​​are obtained from distribution series (variation series).

    Types of average values:

    a ) fashion(Mo) - the value of a feature, which is more common than others in the aggregate. The mode is taken to be the variant, which corresponds to the largest number of frequencies of the variation series.

    b ) Median(Me) - the value of the feature, which occupies the median value in the variation series. It divides the variation series into two equal parts.

    The magnitude of the mode and median are not influenced by the numerical values ​​of the extreme variants available in the variation series. They may not always accurately characterize the variation series and are relatively rarely used in medical statistics. The arithmetic mean characterizes the variation series more precisely.

    v ) Arithmetic mean(M, or) - is calculated based on all the numerical values ​​of the trait under study.

    Other averages are less commonly used: geometric mean (when processing the results of titration of antibodies, toxins, vaccines); root mean square (when determining the average diameter of a section of cells, the results of cutaneous immunological tests); average cubic (to determine the average volume of tumors) and others.

    In a simple variation series, where the options are found only once, the simple arithmetic mean is calculated using the formula:
    where V is the numerical values ​​of the variant, n is the number of observations,

    In the usual variation series, the arithmetic weighted average is calculated by the formula:

    Where V is the numerical values ​​of the variant, p is the frequency of the occurrence of the variant, n is the number of observations.

    Averages of the same magnitude can be obtained from series with different degrees of scattering, therefore, to characterize the variation series, in addition to the average value, another characteristic is needed , allowing to assess the degree of its fluctuations.

    Simple indicators characterizing the diversity of a trait in the studied population are

    a) limit- the minimum and maximum value of the quantitative characteristic

    b) amplitude- the difference between the largest and the smallest option value.

    Application of averages:

    a) to characterize physical development (height, weight, chest circumference, dynamometry)

    b) to assess the state of human health by analyzing the physiological, biochemical parameters of the body (blood pressure, heart rate, body temperature)

    c) to analyze the activities of medical organizations (the average number of days of bed work per year, etc.)

    d) to assess the work of doctors (the average number of visits per doctor, the average number of surgical operations, the average hourly workload of a doctor at a clinic appointment)

Statistical distribution series are the simplest kind of grouping.

Statistical distribution series is an ordered quantitative distribution of population units on homogeneous groups on a varying (attributive or quantitative) basis.

Depending on the sign, underlying the formation of groups, distinguish between attributive and variation series of distribution.

Attributive are called distribution series built according to qualitative characteristics, i.e. features that do not have a numerical expression. An example of an attributive series of distribution is the distribution of the economically active population of the Russian Federation by sex in 2010 (Table 3.10).

Table 3.10. Distribution of the economically active population of the Russian Federation by sex in 2010

Variational distribution series are called, built on a quantitative basis, i.e. a feature that has a numeric expression.

The distribution variation series consists of two elements: variants and frequencies.

Variants name the individual values ​​of the feature, which it takes in the variation series.

Frequencies are the numbers of individual variants or each group of the variation series. Frequencies show how often one or another attribute value occurs in the studied population. The sum of all frequencies determines the size of the entire population, its volume.

Frequencies called frequencies, expressed in fractions of a unit or as a percentage of the total. Accordingly, the sum of the frequencies is 1, or 100%.

Depending on the nature of the variation of the trait distinguish between discrete and interval variation series of distribution.

Discrete variation series of distribution - This is a distribution series in which the groups are composed according to a feature that changes discontinuously, i.e. through a certain number of units, and accepting only integer values. For example, the distribution of the number of apartments built in Russian Federation according to the number of rooms in them I! 2010 (Table 3.11).

Table 3.11. Distribution of the number of apartments built in the Russian Federation by the number of rooms in them in 2010

Interval variation series of distribution - This is a distribution series in which the grouping attribute, which forms the basis of the grouping, can take any values ​​in the interval that differ from each other by an arbitrarily small amount.

The construction of interval variation series is expedient, first of all, with continuous variation of the trait (Table 3.12), as well as if the discrete variation of the trait manifests itself within wide limits (Table 3.13), i.e. the number of options for a discrete feature is large enough.

Table 3.12. Distribution of constituent entities of the Southern Federal District of the Russian Federation by area as of January 1, 2011

Table 3.13. Distribution of subjects of the Central Federal District of the Russian Federation by number municipal institutions education as of January 1, 2011

The rules for constructing distribution series are similar to the rules for constructing a grouping.

The analysis of distribution series can be visually carried out on the basis of their graphic representation. For this purpose, a polygon, a histogram, and distributions are built.

Polygon used when displaying discrete variation series of distribution. To construct it in a rectangular coordinate system along the abscissa axis, the ranked values ​​of the variable feature are plotted on the same scale, and a scale is applied along the ordinate axis to express the magnitude of the frequencies. Obtained at the intersection of the abscissa axis (X) and the ordinate axes (Y) of the points are connected by straight lines, resulting in broken line called the frequency polygon.

Histogram used to represent an interval variation series. When constructing a histogram, the values ​​of the intervals are plotted on the abscissa axis, and the frequencies are depicted by rectangles built at the corresponding intervals. The height of the bars should be proportional to the frequencies.

A histogram can be converted to a distribution polygon by connecting the midpoints of the upper sides of the rectangles with straight lines.

When constructing a histogram of the distribution of the variation series with unequal intervals on the ordinate axis, not the frequencies are plotted, but the density of the feature distribution in the corresponding intervals. Density of distribution - is the frequency, calculated per unit interval width,

those. how many units in each group are per unit of the interval.

A cumulative curve can be used to graphically represent the distribution variation series. By using cumulates represent a series of accumulated frequencies. The accumulated frequencies are determined by sequentially summing the frequencies in groups.

When constructing the cumulates of the interval variation series along the abscissa axis (X) the options of the series are laid, and the accumulated frequencies along the ordinate (Y) are plotted on the graph field in the form of perpendiculars to the abscissa axis in the upper boundaries of the intervals. Then these perpendiculars are connected and a polyline is obtained, i.e. cumulate.

If the graphical representation of the variation series of the distribution in the form of cumulative axes X and U swap, it turns out ogive.

Variational series - This is a statistical series showing the distribution of the phenomenon under study by the value of any quantitative attribute. For example, patients by age, by terms of treatment, newborns by weight, etc.

Option - individual values ​​of the characteristic by which the grouping is carried out (denoted by V ) .

Frequency- a number showing how often one or another option occurs (denoted P ) ... The sum of all frequencies shows total number observations and denoted n ... The difference between the largest and smallest variants of the variation series is called swing or amplitude .

There are variation series:

1. Discontinuous (discrete) and continuous.

The series is considered continuous if the grouping attribute can be expressed in fractional quantities (weight, height, etc.), discontinuous, if the grouping attribute is expressed only as an integer (days of disability, the number of heartbeats, etc.).

2. Simple and balanced.

A simple variation series is a series in which the quantitative value of a variable characteristic occurs once. In a weighted variation series, the quantitative values ​​of a variable feature are repeated with a certain frequency.

3. Grouped (interval) and ungrouped.

A grouped row has options, combined into groups, combining them in size within a certain interval. In an ungrouped row, each individual variant corresponds to a certain frequency.

4. Even and odd.

In even series of variations, the sum of frequencies or the total number of observations is expressed in an even number, in odd ones - in an odd one.

5. Symmetrical and asymmetrical.

In a symmetrical variation series, all kinds of means coincide or are very close (mode, median, arithmetic mean).

Depending on the nature of the phenomena under study, on the specific tasks and goals of statistical research, as well as on the content of the source material, in sanitary statistics the following types of averages apply:

structural averages (fashion, median);

arithmetic mean;

average harmonic;

geometric mean;

medium progressive.

Fashion (M O ) - the value of the variable characteristic, which is more often found in the studied population, i.e. option corresponding to the highest frequency. They find it directly from the structure of the variational series, without resorting to any calculations. It is usually a value very close to the arithmetic mean and is very convenient in practical activities.

Median (M e ) - dividing the variation series (ranked, i.e. the values ​​of the variant are arranged in ascending or descending order) into two equal halves. The median is calculated using the so-called odd series, which is obtained by successively summing the frequencies. If the sum of the frequencies corresponds to an even number, then the arithmetic mean of the two mean values ​​is conventionally taken as the median.

Mode and median apply in the case of an open population, i.e. when the largest or smallest options do not have an accurate quantitative characteristic (for example, up to 15 years old, 50 and older, etc.). In this case, the arithmetic mean (parametric characteristics) cannot be calculated.

Average i am arithmetic is the most common value. The arithmetic mean is denoted more often through M.

Distinguish between simple and weighted arithmetic mean.

Simple arithmetic mean calculated:

- in cases where the aggregate is represented by a simple list of knowledge of the attribute for each unit;

- if the number of repetitions of each option is not possible to determine;

- if the number of repetitions of each option is close to each other.

The simple arithmetic mean is calculated by the formula:

where V - individual values ​​of the attribute; n is the number of individual values;
is the summation sign.

Thus, the simple average is the ratio of the sum of the variant to the number of observations.

Example: determine the average length of stay in bed for 10 patients with pneumonia:

16 days - 1 patient; 17-1; 18-1; 19-1; 20-1; 21-1; 22-1; 23-1; 26-1; 31-1.

bed-day.

Weighted arithmetic mean calculated in cases where the individual values ​​of the characteristic are repeated. It can be calculated in two ways:

1. Direct (arithmetic mean or direct method) according to the formula:

,

where P is the frequency (number of cases) of observations of each option.

Thus, the weighted arithmetic mean is the ratio of the sum of the products of the variant by the frequency to the number of observations.

2. By calculating deviations from the conditional average (by the method of moments).

The basis for calculating the weighted arithmetic mean is:

- grouped material according to the variants of the quantitative attribute;

- all options should be arranged in ascending or descending order of the value of the feature (ranked series).

To calculate by the method of moments, a prerequisite is the same size of all intervals.

According to the method of moments, the arithmetic mean is calculated by the formula:

,

where M o is the conditional average, for which the value of the feature corresponding to the highest frequency is often taken, i.e. which repeats more often (Fashion).

i is the size of the interval.

a - conditional deviation from the conditions of the average, which is a sequential series of numbers (1, 2, etc.) with a + sign for the variant of large conditional average and with a sign - (- 1, –2, etc.) for the variant, which are below the conditional average. The conditional deviation from the options, taken as the conditional average, is equal to 0.

P - frequencies.

- the total number of observations or n.

Example: determine the average height of boys 8 years old directly (table 1).

Table 1

Height in cm

boys P

Central

option V

The central variant - the middle of the interval - is defined as the semi-sum of the initial values ​​of two neighboring groups:

;
etc.

The VP product is obtained by multiplying the center variants by the frequencies
;
etc. Then the resulting products are added and received
, which is divided by the number of observations (100) and the weighted arithmetic mean is obtained.

cm.

We will solve the same problem by the method of moments, for which the following table 2 is compiled:

Table 2

Height in cm (V)

boys P

n = 100

We take 122 as M o, because out of 100 observations, 33 people were 122cm tall. Find the conditional deviations (a) from the conditional average in accordance with the above. Then we obtain the product of the conditional deviations by the frequencies (aP) and sum the obtained values ​​(
). As a result, we get 17. Finally, we substitute the data into the formula:

When studying a variable characteristic, one cannot be limited only to the calculation of average values. It is also necessary to calculate indicators characterizing the degree of diversity of the studied characteristics. The value of this or that quantitative characteristic is not the same for all units of the statistical population.

The characteristic of the variation series is the standard deviation ( ), which shows the spread (dispersion) of the studied features relative to the arithmetic mean, i.e. characterizes the variability of the variation series. It can be determined directly by the formula:

The standard deviation is equal to the square root of the sum of the products of the squares of the deviations of each option from the arithmetic mean (V – M) 2 by its frequencies divided by the sum of frequencies (
).

Calculation example: determine the average number of sick leaves issued in the clinic per day (table 3).

Table 3

Number of sick leave

sheets issued

doctor per day (V)

Number of doctors (P)

;

In the denominator, when the number of observations is less than 30, it is necessary from
subtract one.

If the series is grouped at equal intervals, then the standard deviation can be determined by the method of moments:

,

where i is the size of the interval;

- conditional deviation from the conditional average;

P - frequency variant of the corresponding intervals;

- the total number of observations.

Calculation example : Determine the average length of stay of patients in a therapeutic bed (by the method of moments) (table 4):

Table 4

Number of days

stay in bed (V)

sick (P)

;

The Belgian statistician A. Quetelet discovered that variations in mass phenomena obey the error distribution law, discovered almost simultaneously by K. Gauss and P. Laplace. The curve representing this distribution looks like a bell. According to the normal distribution law, the variability of the individual values ​​of the trait is within the range
that covers 99.73% of all units of the population.

It is calculated that if we add and subtract 2 to the arithmetic mean , then 95.45% of all members of the variation series are within the obtained values ​​and, finally, if we add and subtract 1 to the arithmetic mean , then within the obtained values ​​will be 68.27% of all members of the given variation series. In medicine with magnitude
1the concept of the norm is connected. Deviation from the arithmetic mean by more than 1 , but less than 2 is subnormal, and the deviation is greater than 2 abnormal (above or below normal).

In sanitary statistics, the three sigma rule is applied in the study of physical development, the assessment of the performance of health care institutions, and the assessment of the health of the population. The same rule is widely applied in national economy when defining standards.

Thus, the standard deviation serves to:

- measuring the variance of the variation series;

- characteristics of the degree of diversity of features, which are determined by the coefficient of variation:

If the coefficient of variation is more than 20% - a strong variety, from 20 to 10% - an average, less than 10% - a weak variety of traits. The coefficient of variation is to a certain extent a criterion for the reliability of the arithmetic mean.

Recommended to read

Up