Basic terms and concepts of medical statistics. Reliability and statistical significance

Engineering systems 25.09.2019
Engineering systems

If you don't act, you'll be out of your mind. (Shota Rustaveli)

Basic terms and concepts of medical statistics

In this article, we present some key concepts statistics relevant to medical research. The terms are discussed in more detail in the relevant articles.

Variation

Definition. The degree of dispersion of data (sign values) over the range of values

Probability

Definition. Probability is the degree to which a certain event can occur under certain conditions.

Example. Let us explain the definition of the term in the sentence “The probability of recovery when using medicinal product Arimidex equals 70%." The event is “the recovery of the patient”, the condition “the patient is taking Arimidex”, the degree of possibility is 70% (roughly speaking, out of 100 people taking Arimidex, 70 recover).

Cumulative Probability

Definition. The Cumulative Probability of Surviving at time t is the same as the proportion of patients who have survived at that time.

Example. If it is said that the cumulative probability of survival after a five-year course of treatment is 0.7, then this means that of the considered group of patients, 70% of the initial number remained alive, and 30% died. In other words, out of every hundred people, 30 died within the first 5 years.

Time to event

Definition. Time to event - this is the time, expressed in some units, elapsed from some initial time until the occurrence of some event.

Explanation. The units of time in medical research are days, months, and years.

Typical examples of initial times:

    start of patient follow-up

    surgical treatment

Typical examples of considered events:

    disease progression

    recurrence

    patient death

Sample

Definition. Part of a population obtained by selection.

Based on the results of the sample analysis, conclusions are drawn about the entire population, which is valid only if the selection was random. Since random selection from a population is practically impossible, one should strive to ensure that the sample is at least representative of the population.

Dependent and independent samples

Definition. Samples in which the objects of study were recruited independently of each other. An alternative to independent samples is dependent (connected, paired) samples.

Hypothesis

Bilateral and unilateral hypotheses

Let us first explain the use of the term hypothesis in statistics.

The goal of most research is to test the truth of some statement. The purpose of drug testing is most often to test the hypothesis that one drug is more effective than another (for example, Arimidex is more effective than Tamoxifen).

To convey the rigor of the study, the statement being verified is expressed mathematically. For example, if A is the number of years a patient on Arimidex will live and T is the number of years a patient will live on Tamoxifen, then the hypothesis being tested can be written as A>T.

Definition. A hypothesis is called two-sided if it consists in the equality of two quantities.

An example of a two-sided hypothesis: A=T.

Definition. A hypothesis is called one-sided (1-sided) if it consists in the inequality of two quantities.

Examples of one-sided hypotheses:

Dichotomous (binary) data

Definition. Data expressed by only two valid alternative values

Example: The patient is "healthy" - "sick". Edema "is" - "is not present".

Confidence interval

Definition. The confidence interval for some quantity is the range around the value of the quantity that contains the true value of that quantity (with a certain level of confidence).

Example. Let the quantity under study be the number of patients per year. On average, their number is 500, and the 95% confidence interval is (350, 900). This means that, most likely (with a probability of 95%), at least 350 and no more than 900 people will contact the clinic during the year.

Designation. A very common abbreviation is: 95% CI (95% CI) is a confidence interval with a confidence level of 95%.

Reliability, statistical significance (P - level)

Definition. The statistical significance of a result is a measure of confidence in its "truth".

Any research is based on only a part of the objects. The study of the effectiveness of a drug is not carried out on the basis of all patients on the planet in general, but only on a certain group of patients (it is simply impossible to conduct an analysis on the basis of all patients).

Let's assume that some conclusion was made as a result of the analysis (for example, the use of Arimidex as an adequate therapy is 2 times more effective than Tamoxifen).

The question that needs to be asked is: "How much can you trust this result?".

Imagine that we were conducting a study based on only two patients. Of course, in this case, the results should be treated with concern. If a large number of patients were examined ( numerical value « a large number” depends on the situation), then the conclusions drawn can already be trusted.

So, the degree of trust is determined by the value of the p-level (p-value).

A higher p-level corresponds to a lower level of confidence in the results obtained from the analysis of the sample. For example, a p-level equal to 0.05 (5%) shows that the conclusion made during the analysis of a certain group is only a random feature of these objects with a probability of only 5%.

In other words, with a very high probability (95%), the conclusion can be extended to all objects.

In many studies, 5% is considered an acceptable p-value. This means that if, for example, p=0.01, then the results can be trusted, but if p=0.06, then it is impossible.

Study

prospective study is a study in which samples are selected based on an input factor, and some resulting factor is analyzed in the samples.

Retrospective study is a study in which samples are selected based on the resulting factor, and some input factor is analyzed in the samples.

Example. The initial factor is a pregnant woman younger/older than 20 years. The resulting factor is the child is lighter/heavier than 2.5 kg. We analyze whether the weight of the child depends on the age of the mother.

If we take 2 samples, one with mothers younger than 20 years old, the other with older ones, and then analyze the mass of children in each group, then this is a prospective study.

If we collect 2 samples, in one - mothers who gave birth to children lighter than 2.5 kg, in the other - heavier, and then we analyze the age of mothers in each group, then this is a retrospective study (of course, such a study can be carried out only when the experiment is completed, i.e. all children were born).

Exodus

Definition. clinically significant event laboratory indicator or a feature that serves as an object of interest to the researcher. In clinical trials, outcomes serve as criteria for evaluating the effectiveness of a therapeutic or prophylactic intervention.

Clinical epidemiology

Definition. The science that allows predicting a particular outcome for each specific patient based on the study of the clinical course of the disease in similar cases using strict scientific methods study of patients to ensure the accuracy of forecasts.

Cohort

Definition. A group of study participants united by some common feature at the time of its formation and studied throughout long period time.

The control

Historical control

Definition. The control group formed and examined in the period preceding the study.

Parallel control

Definition. The control group, formed simultaneously with the formation of the main group.

Correlation

Definition. Statistical connection of two signs (quantitative or ordinal), showing that greater value one attribute in a certain part of the cases corresponds to a larger value - in the case of a positive (direct) correlation - the value of another attribute or a smaller value - in the case of a negative (inverse) correlation.

Example. A significant correlation was found between the level of platelets and leukocytes in the patient's blood. The correlation coefficient is 0.76.

Risk ratio (CR)

Definition. The risk ratio (hazard ratio) is the ratio of the probability of a certain ("bad") event for the first group of objects to the probability of the same event occurring for the second group of objects.

Example. If nonsmokers have a 20% chance of getting lung cancer and 100% chance of getting lung cancer in smokers, then the CR will be one-fifth. In this example, the first group of objects are non-smokers, the second group is smokers, and the occurrence of lung cancer is considered as a "bad" event.

It's obvious that:

1) if КР=1, then the probability of the event occurring in the groups is the same

2) if КР>1, then the event occurs more often with objects from the first group than from the second

3) if CR<1, то событие чаще происходит с объектами из второй группы, чем из первой

Meta-analysis

Definition. With statistical analysis summarizing the results of several studies investigating the same problem (usually the effectiveness of methods of treatment, prevention, diagnosis). Pooling studies provides a larger sample for analysis and greater statistical power of pooled studies. Used to increase the evidence or confidence in the conclusion about the effectiveness of the study method.

Kaplan-Meier method (Multiple Kaplan-Meier estimates)

This method was invented by statisticians E. L. Kaplan and Paul Meyer.

The method is used to calculate various quantities related to the time of observation of the patient. Examples of such values:

    chance of recovery within one year when using the drug

    chance of recurrence after surgery within three years after surgery

    cumulative probability of survival at five years among patients with prostate cancer after organ amputation

Let us explain the advantages of using the Kaplan-Meier method.

The value of the values ​​in the "normal" analysis (not using the Kaplan-Meier method) is calculated on the basis of dividing the considered time interval into intervals.

For example, if we examine the probability of death of a patient within 5 years, then the time interval can be divided into 5 parts (less than 1 year, 1-2 years, 2-3 years, 3-4 years, 4-5 years), so and 10 (half a year each), or another number of intervals. The results will be different for different partitions.

Choosing the most appropriate partition is not an easy task.

Estimates of the values ​​of the values ​​obtained by the Kaplan-Meier method do not depend on the division of the observation time into intervals, but depend only on the lifetime of each individual patient.

Therefore, it is easier for the researcher to carry out the analysis, and the results often turn out to be of higher quality than the results of the “ordinary” analysis.

The Kaplan-Meier curve is a graph of the survival curve obtained using the Kaplan-Meier method.

Cox model

This model was invented by Sir David Roxby Cox (b. 1924), a famous English statistician, author of over 300 articles and books.

The Cox model is used in situations where the quantities studied in the survival analysis depend on functions of time. For example, the probability of recurrence after t years (t=1.2,…) may depend on the logarithm of time log(t).

An important advantage of the method proposed by Cox is the applicability of this method in a large number of situations (the model does not impose strict restrictions on the nature or form of the probability distribution).

Based on the Cox model, an analysis (called a Cox analysis) can be performed, which results in a risk ratio value and a confidence interval for the risk ratio.

Nonparametric methods of statistics

Definition. A class of statistical methods that are used primarily for the analysis of non-normally distributed quantitative data, as well as for the analysis of qualitative data.

Example. To identify the significance of differences in the systolic pressure of patients depending on the type of treatment, we will use the nonparametric Mann-Whitney test.

Feature (variable)

Definition. X characteristics of the object of study (observation). There are qualitative and quantitative characteristics.

Randomization

Definition. A method of random distribution of research objects into the main and control groups using special means (tables or a random number counter, tossing a coin and other methods of randomly assigning a group number to an included observation). Randomization minimizes differences between groups in terms of known and unknown traits potentially influencing the outcome being studied.

Risk

Attributive- additional risk of an unfavorable outcome (for example, a disease) due to the presence of a certain characteristic (risk factor) in the object of study. This is the part of the risk of developing a disease that is associated with this risk factor, is explained by it and can be eliminated if this risk factor is eliminated.

Relative Risk- the ratio of the risk of an unfavorable condition in one group to the risk of this condition in another group. It is used in prospective and observational studies when groups are formed in advance, and the occurrence of the studied condition has not yet occurred.

rolling exam

Definition. A method for checking the stability, reliability, performance (validity) of a statistical model by successively deleting observations and recalculating the model. The more similar the resulting models, the more stable and reliable the model.

Event

Definition. The clinical outcome observed in the study, such as the occurrence of complications, relapse, recovery, death.

Stratification

Definition. M a sampling method in which a population of all participants who meet the inclusion criteria for a study are first divided into groups (strata) based on one or more characteristics (usually gender, age) potentially influencing the outcome under study, and then from each of these groups ( stratum), participants are independently recruited into the experimental and control groups. This allows the researcher to balance important characteristics between the experimental and control groups.

Contingency table

Definition. A table of absolute frequencies (numbers) of observations, the columns of which correspond to the values ​​of one feature, and the rows to the values ​​of another feature (in the case of a two-dimensional contingency table). The values ​​of absolute frequencies are located in cells at the intersection of rows and columns.

Let us give an example of a contingency table. Aneurysm surgery was performed in 194 patients. A known indicator of the severity of edema in patients before surgery.

Edema \ Outcome

no edema 20 6 26
moderate swelling 27 15 42
pronounced edema 8 21 29
mj 55 42 194

Thus, out of 26 patients without edema, 20 patients survived after the operation, 6 patients died. Out of 42 patients with moderate edema, 27 patients survived, 15 died, etc.

Chi-square test for contingency tables

To determine the significance (reliability) of differences in one sign depending on another (for example, the outcome of an operation depending on the severity of edema), a chi-square test is used for contingency tables:


Chance

Let the probability of some event be equal to p. Then the probability that the event will not occur is 1-p.

For example, if the probability that the patient will still be alive after five years is 0.8 (80%), then the probability that he will die during this time period is 0.2 (20%).

Definition. Chance is the ratio of the probability that an event will occur to the probability that the event will not occur.

Example. In our example (about the patient), the chance is 4, since 0.8/0.2=4

Thus, the probability of recovery is 4 times the probability of death.

Interpretation of the value of a quantity.

1) If Chance=1, then the probability of the event occurring is equal to the probability that the event will not occur;

2) if Chance >1, then the probability of the event occurring is greater than the probability that the event will not occur;

3) if Chance<1, то вероятность наступления события меньше вероятности того, что событие не произойдёт.

odds ratio

Definition. The odds ratio is the ratio of the odds for the first group of objects to the odds ratio for the second group of objects.

Example. Let us assume that both men and women undergo some treatment.

The probability that a male patient will still be alive after five years is 0.6 (60%); the probability that he will die during this time period is 0.4 (40%).

Similar probabilities for women are 0.8 and 0.2.

The odds ratio in this example is

Interpretation of the value of a quantity.

1) If the odds ratio = 1, then the chance for the first group is equal to the chance for the second group

2) If the odds ratio is >1, then the chance for the first group is greater than the chance for the second group

3) If the odds ratio<1, то шанс для первой группы меньше шанса для второй группы

What do you think makes your "soulmate" special, meaningful? Is it related to her (his) personality or to your feelings that you have for this person? Or maybe with the simple fact that studies show that the hypothesis that your liking is random has a probability of less than 5%? If we consider the last statement to be reliable, then successful dating sites would not exist in principle:

When you are doing split testing or any other analysis of your site, a misunderstanding of "statistical significance" can lead to misinterpretation of the results and therefore erroneous steps in the conversion optimization process. This is true of the thousands of other statistical tests performed daily in any existing industry.

To figure out what is statistical significance”, you need to immerse yourself in the history of the emergence of this term, to know its true meaning and understand how this “new” old understanding will help you correctly interpret the results of your research.

A bit of history

Although mankind has been using statistics to solve problems for many centuries, the modern understanding of statistical significance, hypothesis testing, randomization, and even design of experiments (Design of Experiments (DOE)) began to take shape only at the beginning of the 20th century and is inextricably linked with the name of Sir Ronald Fisher (Sir Ronald Fisher, 1890-1962):

Ronald Fisher was an evolutionary biologist and statistician who had a particular passion for the study of evolution and natural selection in the animal and plant kingdoms. During his illustrious career, he developed and popularized many useful statistical tools that we still use today.

Fisher used the techniques he developed to explain processes in biology such as dominance, mutation, and genetic variation. We can apply the same tools today to optimize and improve the content of web resources. The fact that these analysis tools can be used to work with objects that did not even exist at the time of their creation seems rather surprising. It is equally surprising that people used to perform the most complex calculations without calculators or computers.

To describe the results of a statistical experiment as having a high probability of being true, Fisher used the word significance.

Also one of the most interesting developments of Fisher is the “sexual son” hypothesis. According to this theory, women give their preference to promiscuous men (walkers) because this will allow sons born from these men to have the same predisposition and produce more offspring of their own (note that this is just a theory).

But no one, even brilliant scientists, is immune from making mistakes. Fisher's flaws annoy specialists to this day. But remember the words of Albert Einstein: "He who has never made a mistake has never created anything new."

Before moving on to the next point, remember that statistical significance is a situation where the difference in the results of testing is so large that this difference cannot be explained by the influence of random factors.

What is your hypothesis?

To understand what “statistically significant” means, you first need to understand what “hypothesis testing” is, since the two terms are closely intertwined.
A hypothesis is just a theory. Once you develop a theory, you will need to establish a procedure for collecting enough evidence and, in fact, collect this evidence. There are two types of hypotheses.

Apples or oranges - which is better?

Null hypothesis

As a rule, it is in this place that many experience difficulties. You need to keep in mind that the null hypothesis is not something that needs to be proven, like, for example, you prove that a certain change on the site will lead to an increase in conversion, but vice versa. The null hypothesis is a theory that says that if you make any changes to the site, nothing will happen. And the goal of the researcher is to refute this theory, not to prove it.

If we turn to the experience of crime detection, where investigators also hypothesize who the perpetrator is, the null hypothesis takes the form of the so-called presumption of innocence, the concept that the accused is presumed innocent until proven guilty in court.

If the null hypothesis is that two objects are equal in their properties, and you are trying to prove that one of them is still better (for example, A is better than B), you need to drop the null hypothesis in favor of the alternative one. For example, you compare one or another conversion optimization tool with each other. In the null hypothesis, they both have the same effect on the target (or have no effect). In the alternative, the effect of one of them is better.

Your alternative hypothesis may contain a numerical value, such as B - A > 20%. In this case, the null hypothesis and the alternative can take the following form:

Another name for an alternative hypothesis is a research hypothesis, since the researcher is always interested in proving this particular hypothesis.

Statistical significance and "p" value

Let's go back to Ronald Fisher and his concept of statistical significance.

Now that you have the null hypothesis and the alternative, how can you prove one and disprove the other?

Because statistics, by their very nature, involve studying a certain population (sample), you can never be 100% sure of the results you get. illustrative example: Election results often diverge from the results of preliminary polls and even exit pools.

Dr. Fisher wanted to create a dividing line that would let you know if your experiment was a success or not. This is how the confidence index came about. Reliability is the level we take to say what we consider "meaningful" and what is not. If "p", the confidence index, is 0.05 or less, then the results are significant.

Don't worry, it's really not as confusing as it seems.

Gaussian probability distribution. At the edges - less probable values ​​of the variable, in the center - the most probable. P-score (shaded green area) is the probability of the observed result occurring by chance.

A normal probability distribution (Gaussian distribution) is a representation of all possible values ​​of a certain variable on a graph (in the figure above) and their frequencies. If you do your research right, and then plot all the responses you get on a graph, that's the distribution you'll get. According to normal distribution, you will get a large percentage of similar answers, and the remaining options will be located at the edges of the graph (the so-called "tails"). Such a distribution of quantities is often found in nature, which is why it is called "normal".

Using an equation based on your sample and test results, you can calculate what's called a "test statistic" that tells you how much the results deviated. It will also tell you how close you are to the null hypothesis being true.

To keep your head down, use online calculators to calculate statistical significance:

One example of such calculators

The letter "p" stands for the probability that the null hypothesis is true. If the number is small, this would indicate a difference between the test groups, while the null hypothesis would be that they are the same. Graphically, this will look like your test statistic is closer to one of the tails of your bell distribution.

Dr. Fischer decided to set the confidence threshold for the results at p ≤ 0.05. However, this statement is also controversial, since it leads to two difficulties:

1. First, the fact that you have proved the null hypothesis wrong does not mean that you have proved the alternative hypothesis. All this significance just means that you can't prove either A or B.

2. Secondly, if the p-value is equal to 0.049, this will mean that the probability of the null hypothesis will be 4.9%. This can mean that at the same time, your test results can be both valid and false at the same time.

You can use the p-value or not, but then you will need to calculate the probability of the null hypothesis in each individual case and decide whether it is large enough not to make the changes that you planned and tested.

The most common scenario for conducting a statistical test today is to set a significance threshold of p ≤ 0.05 before running the actual test. Just remember to carefully examine the p-value when checking the results.

Errors 1 and 2

So much time has passed that the errors that can occur when using a measure of statistical significance have even received their own names.

Error 1 (Type 1 Errors)

As mentioned above, a p-value of 0.05 means there is a 5% chance that the null hypothesis is true. If you don't, you're making mistake number 1. The results say your new website has increased conversion rates, but there's a 5% chance it isn't.

Error 2 (Type 2 Errors)

This error is the opposite of error 1: you accept the null hypothesis when it is false. For example, test results tell you that the changes made to the site did not bring any improvements, while the changes were. As a result: you miss the opportunity to increase your performance.

This error is common in tests with insufficient sample sizes, so remember that the larger the sample, the more reliable the result.

Conclusion

Perhaps no term among researchers is as popular as statistical significance. When test results are not considered statistically significant, the consequences range from an increase in conversion rates to the collapse of the company.

And since marketers use this term when optimizing their resources, you need to know what it really means. Test conditions may change, but sample size and success criteria are always important. Remember this.

Statistical validity is essential in the calculation practice of the FCC. It was noted earlier that many samples can be selected from the same population:

If they are chosen correctly, then their average indicators and indicators of the general population differ slightly from each other in the size of the error of representativeness, taking into account the accepted reliability;

If they are chosen from different general populations, the difference between them turns out to be significant. Comparison of samples is commonly considered in statistics;

If they differ insignificantly, unimportantly, insignificantly, that is, they actually belong to the same general population, the difference between them is called statistically unreliable.

statistically significant a sample difference is a sample that differs significantly and fundamentally, i.e., belongs to different general populations.

In the FCC, assessing the statistical significance of sample differences means solving many practical problems. For example, the introduction of new teaching methods, programs, sets of exercises, tests, control exercises associated with their experimental verification, which should show that the test group is fundamentally different from the control. Therefore, special statistical methods are used, called statistical significance criteria, to detect the presence or absence of a statistically significant difference between samples.

All criteria are divided into two groups: parametric and non-parametric. Parametric criteria provide for the mandatory presence of a normal distribution law, i.e. means mandatory definition the main indicators of the normal law - the arithmetic mean and the standard deviation s. Parametric criteria are the most accurate and correct. Nonparametric criteria are based on rank (ordinal) differences between the elements of the samples.

Here are the main criteria for statistical significance used in the practice of the FCC: Student's test and Fisher's test.

Student's criterion named after the English scientist C. Gosset (Student is a pseudonym), who discovered this method. Student's t-test is parametric, used for comparison absolute indicators samples. Samples may vary in size.

Student's criterion is defined like this.

1. We find Student's criterion t according to the following formula:


where are the arithmetic means of the compared samples; t 1 , t 2 - representativeness errors identified on the basis of the indicators of the compared samples.

2. Practice in the FCC has shown that for sports work it is enough to accept the reliability of the score P = 0.95.

For calculation reliability: P = 0.95 (a = 0.05), with the number of degrees of freedom

k \u003d n 1 + p 2 - 2 according to the table in Appendix 4, we find the value of the boundary value of the criterion ( t gr).

3. Based on the properties of the normal distribution law, Student's criterion compares t and t gr.

We draw conclusions:

if t t gr, then the difference between the compared samples is statistically significant;

if t t gr, then the difference is not statistically significant.

For researchers in the field of FCC, the assessment of statistical significance is the first step in solving a specific problem: whether the compared samples differ fundamentally or not. The next step is to evaluate this difference from a pedagogical point of view, which is determined by the condition of the problem.

Consider the application of the Student's criterion on a specific example.

Example 2.14. A group of subjects in the amount of 18 people was assessed for heart rate (bpm) before x i and after y i warm-ups.

Evaluate the effectiveness of the warm-up in terms of heart rate. The initial data and calculations are presented in table. 2.30 and 2.31.

Table 2.30

Processing heart rate data before warm-up


The errors for both groups coincided, since the sample sizes are equal (the same group is studied at various conditions), and the standard deviations were s x = s y = 3 bpm. Let's move on to the definition of Student's criterion:

We set the reliability of the account: Р= 0.95.

The number of degrees of freedom k 1 \u003d n 1 + p 2 - 2 \u003d 18 + 18-2 \u003d 34. According to the table in Appendix 4, we find t gr= 2,02.

Statistical inference. Since t \u003d 11.62, and the boundary t gr \u003d 2.02, then 11.62\u003e 2.02, i.e. t > t gr, so the difference between the samples is statistically significant.

pedagogical conclusion. It was found that in terms of heart rate, the difference between the state of the group before and after the warm-up is statistically significant, i.e. significant, important. So, according to the heart rate indicator, we can conclude that the warm-up is effective.

Fisher's criterion is parametric. It is used when comparing the scatter rates of samples. This, as a rule, means a comparison in terms of the stability of sports work or the stability of functional and technical indicators in practice. physical culture and sports. Samples can be of different sizes.

The Fisher criterion is defined in the following sequence.

1. Find the Fisher Criterion F by the formula


where , are the variances of the compared samples.

The conditions of the Fisher criterion provide that in the numerator of the formula F there is a large variance, i.e. F is always greater than one.

We set the reliability of the account: P = 0.95 - and determine the number of degrees of freedom for both samples: k 1 = n 1 - 1, k 2 = n 2 - 1.

According to the table of Appendix 4, we find the boundary value of the criterion F gr.

Comparison of criteria F and F gr allows us to draw the following conclusions:

if F > F gr, then the difference between the samples is statistically significant;

if F< F гр, то различие между выборками статически недо­стоверно.

Let's take a concrete example.

Example 2.15. Let's analyze two groups of handball players: x i (n 1= 16 people) and y i (n 2 = 18 people). These groups of athletes were studied for the repulsion time (s) when throwing the ball into the goal.

Are repulsion rates the same?

Initial data and basic calculations are presented in Table. 2.32 and 2.33.

Table 2.32

Processing of repulsion indicators of the first group of handball players


Let's define the Fisher criterion:





According to the data presented in the table of Appendix 6, we find Fgr: Fgr = 2.4

Let us pay attention to the fact that in the table of Appendix 6 the enumeration of the numbers of degrees of freedom of both greater and lesser dispersion becomes coarser when approaching large numbers. So, the number of degrees of freedom of a larger dispersion follows in this order: 8, 9, 10, 11, 12, 14, 16, 20, 24, etc., and of a smaller one - 28, 29, 30, 40, 50, etc. d.

This is explained by the fact that with an increase in the sample size, the differences in the F-test decrease and tabular values ​​that are close to the original data can be used. So, in example 2.15 =17 is absent and we can take the value k = 16 closest to it, from which we get Fgr = 2.4.

Statistical inference. Since Fisher's test F= 2.5 > F= 2.4, the samples are statistically significant.

pedagogical conclusion. The values ​​of the repulsion time (s) when throwing the ball into the goal of the handball players of both groups differ significantly. These groups should be considered as different.

Further research should show what is the reason for this difference.

Example 2.20.(on the statistical significance of the sample ). Has the footballer's qualification increased if the time (s) from giving the signal to kicking the ball at the beginning of the training was x i , and at the end it was i .

The initial data and basic calculations are given in table. 2.40 and 2.41.

Table 2.40

Processing of time indicators from giving a signal to hitting the ball at the beginning of a workout


Let's determine the difference between groups of indicators according to Student's criterion:

With reliability P \u003d 0.95 and degrees of freedom k \u003d n 1 + n 2 - 2 \u003d 22 + 22 - 2 \u003d 42, according to the table in Appendix 4, we find t gr= 2.02. Since t = 8.3 > t gr= 2.02 - the difference is statistically significant.

Let's determine the difference between the groups of indicators according to the Fisher criterion:


According to the table of Appendix 2, with reliability P = 0.95 and degrees of freedom k = 22-1 = 21, the value of F gr = 21. Since F = 1.53< F гр = = 2,1, различие в рассеивании исходных данных статистически недостоверно.

Statistical inference. According to the arithmetic mean, the difference between the groups of indicators is statistically significant. In terms of dispersion (dispersion), the difference between the groups of indicators is not statistically significant.

pedagogical conclusion. The footballer's qualifications have improved significantly, but attention should be paid to the stability of his testimony.

Preparation for work

Before conducting this laboratory work on the discipline " Sports metrology» all students of the study group must form work teams of 3-4 students in each, to jointly complete the work assignment of all laboratory work.

In preparation for work read the relevant sections of the recommended literature (see section 6 of these guidelines) and lecture notes. Study sections 1 and 2 for this lab, as well as the work task for it (section 4).

Prepare a report form on the standard sheets writing paper in A4 format and put in it the materials necessary for work.

The report must contain :

Title page indicating the department (UK and TR), study group, last name, first name, patronymic of the student, number and name of the laboratory work, date of its completion, as well as the last name, academic degree, academic title and position of the teacher accepting the work;

Objective;

Formulas with numerical values ​​that explain the intermediate and final results of calculations;

Tables of measured and calculated values;

Required graphic material for the task;

Brief conclusions on the results of each of the stages of the work assignment and in general on the work performed.

All graphs and tables are drawn accurately using drawing tools. Conditional graphic and letter designations must comply with GOSTs. It is allowed to draw up a report using computer (computer) technology.

Work task

Before carrying out all measurements, each member of the team must study the rules for using sports game Darts, given in Appendix 7, which are necessary for the following stages of research.

I - th stage of research"Research of the results of hitting the target of the sports game Darts by each member of the brigade for compliance with the normal distribution law according to the criterion χ 2 Pearson and the three sigma test"

1. measure (test) your (personal) speed and coordination of actions, by throwing darts 30-40 times at the circular target of the sport game Darts.

2. Results of measurements (tests) x i(in glasses) arrange in the form variation series and enter in table 4.1 (columns , do all necessary calculations, fill in the necessary tables and draw the appropriate conclusions on the correspondence of the obtained empirical distribution to the normal distribution law, by analogy with similar calculations, tables and conclusions of example 2.12, given in section 2 of these guidelines on pages 7 -10.

Table 4.1

Correspondence of the speed and coordination of the actions of the subjects to the normal distribution law

No. p / p rounded
Total

II - th stage of research

"Estimation of the average indicators of the general population of hits on the target of the sports game Darts of all students of the educational group based on the results of measurements of members of one brigade"

Assess the average indicators of the speed and coordination of actions of all students of the study group (according to the list of the study group of the class magazine) based on the results of hitting the target of the sports game Darts by all members of the team, obtained at the first stage of research of this laboratory work.

1. Document the results of measurements of speed and coordination of actions when throwing darts at a circular target of the sports game Darts of all members of your team (2 - 4 people), which are a selection of measurement results from the general population (measurement results of all students of the study group - for example, 15 people), entering them in the second and third columns tables 4.2.

Table 4.2

Processing indicators of speed and coordination of actions

brigade members

No. p / p
Total

Table 4.2 under should be understood , matched average score (see the results of calculations according to table 4.1) members of your team , obtained at the first stage of research. It should be noted that, usually, in table 4.2 there is a calculated average value of the measurement results obtained by one member of the team at the first stage of the research , since the probability that the results of measurements by different members of the team will coincide is very small. Then, usually values in a column tables 4.2 for each of the rows - are equal to 1, a in the line "Total » columns « », is written the number of members of your team.

2. Perform all the necessary calculations to fill in table 4.2, as well as other calculations and conclusions similar to the calculations and conclusions of example 2.13, given in the 2nd section of this methodological development on pages 13-14. It should be borne in mind when calculating the error of representativeness "m" it is necessary to use formula 2.4, given on page 13 of this methodological development, since the sample is small (n, and the number of elements of the general population N is known, and is equal to the number of students in the study group, according to the list of the journal of the study group.

III - th stage of research

Evaluation of the effectiveness of the warm-up in terms of "Speed ​​and coordination of actions" by each member of the team using the Student's criterion

To evaluate the effectiveness of the warm-up for throwing darts at the target of the sports game "Darts", performed at the first stage of the research of this laboratory work, by each member of the team in terms of "Speed ​​and coordination of actions", using the Student's criterion - a parametric criterion of statistical reliability of the empirical distribution law to the normal distribution law .

… Total

2. dispersion and North Kazakhstan , the results of measurements of the indicator "Speed ​​and coordination of actions" based on the results of the warm-up, given in table 4.3, (see similar calculations given immediately after table 2.30 of example 2.14 on page 16 of this methodological development).

3. Each member of the work team measure (test) your (personal) speed and coordination of actions after the warm-up,

… Total

5. Perform average calculations dispersion and North Kazakhstan ,the results of measurements of the indicator "Speed ​​and coordination of actions" after the warm-up, given in table 4.4, write down the overall result of the measurements based on the results of the warm-up (see similar calculations given immediately after table 2.31 of example 2.14 on page 17 of this methodological development).

6. Perform all the necessary calculations and conclusions, similar to the calculations and conclusions of example 2.14, given in the 2nd section of this methodological development on pages 16-17. It should be borne in mind when calculating the error of representativeness "m" it is necessary to use formula 2.1, given on page 12 of this methodological development, since the sample is n, and the number of elements of the population N ( is unknown.

IV - th stage of research

Evaluation of the uniformity (stability) of the indicators "Speed ​​and coordination of actions" of two members of the team using the Fisher criterion

Assess the uniformity (stability) of the indicators "Speed ​​and coordination of actions" of two members of the team using the Fisher criterion, according to the measurement results obtained at the third stage of the research of this laboratory work.

To do this, do the following.

Using the data of tables 4.3 and 4.4, the results of calculating dispersions for these tables, obtained at the third stage of the research, as well as the methodology for calculating and applying the Fisher criterion for assessing the uniformity (stability) of sports indicators, given in example 2.15 on pages 18-19 of this methodological development, draw appropriate statistical and pedagogical conclusions.

V - th stage of research

Evaluation of the groups of indicators "Speed ​​and coordination of actions" of one member of the team before and after the warm-up

Task 3. Five preschoolers are presented with a test. The time for solving each task is fixed. Will there be statistically significant differences between the time to solve the first three tasks of the test?

No. of subjects

Reference material

This task is based on the theory of analysis of variance. In the general case, the task of analysis of variance is to identify those factors that have a significant impact on the result of the experiment. Analysis of variance can be used to compare means of several samples if the number of samples is more than two. For this purpose, one-way analysis of variance serves.

In order to solve the tasks set, the following is adopted. If the variances of the obtained values ​​of the optimization parameter in the case of the influence of factors differ from the variances of the results in the absence of the influence of factors, then such a factor is recognized as significant.

As can be seen from the formulation of the problem, methods for testing statistical hypotheses are used here, namely, the problem of testing two empirical variances. Therefore, the analysis of variance is based on the verification of variances by the Fisher criterion. In this task, it is necessary to check whether the differences between the time for solving the first three tasks of the test by each of the six preschoolers are statistically significant.

The null (basic) hypothesis is called H o. The essence of e is reduced to the assumption that the difference between the compared parameters is zero (hence the name of the hypothesis - zero) and that the observed differences are random.

A competing (alternative) hypothesis is called H 1 , which contradicts the null one.

Decision:

Using the method of analysis of variance at a significance level of α = 0.05, we will test the null hypothesis (Hо) about the existence of statistically significant differences between the time of solving the first three tasks of the test in six preschoolers.

Consider the task condition table, in which we find the average time to solve each of the three test tasks

No. of subjects

Factor levels

Time to solve the first task of the test (in sec.).

Time to solve the second task of the test (in sec.).

Time to solve the third task of the test (in sec.).

Group average

Finding the overall average:

In order to take into account the significance of the time differences of each test, the total sample variance is divided into two parts, the first of which is called the factor variance, and the second is the residual

Calculate the total sum of squared deviations of the variant from the total average using the formula

or , where p is the number of time measurements for solving test tasks, q is the number of subjects. To do this, we will make a table of squares option

No. of subjects

Factor levels

Time to solve the first task of the test (in sec.).

Time to solve the second task of the test (in sec.).

Time to solve the third task of the test (in sec.).

Before collecting and studying data, experimental psychologists usually decide how the data will be analyzed statistically. Often the researcher sets the significance level, defined as a statistic, higher ( or below) which contains values ​​that allow us to consider the influence of factors as nonrandom. Researchers usually present this level in the form of a probabilistic expression.

In many psychological experiments, it can be expressed as " level 0.05" or " level 0.01". This means that random outcomes will only occur with a frequency 0.05 (1 out of th time) or 0.01 (1 in 100 times). The results of statistical analysis of data that meet a predetermined criterion ( be it 0.05, 0.01 or even 0.001), are referred to below as statistically significant.

It should be noted that the result may not be statistically significant, but still be of some interest. Often, especially during preliminary studies or experiments involving a small number of subjects or with a limited number of observations, the results may not reach the level of statistical significance, but suggest that in further studies with more precise control and with more observations will become more reliable. At the same time, the experimenter must be very careful in his desire to purposefully change the conditions of the experiment in order to achieve the desired result at any cost.

In another example of a 2x2 plan Ji used two types of subjects and two types of tasks to study the effect of special knowledge on the memorization of information.

In my study Ji studied the memorization of numbers and chess pieces ( variable A) children on armchairs RECARO Young Sport and adults ( variable B), that is, according to the plan 2x2. The children were 10 years old and were good at chess, while the adults were new to the game. The first task was to memorize the position of the pieces on the board as it would be during normal play, and restore it after the pieces were removed. Another part of this task was to memorize a standard series of numbers, as is usually done when determining IQ.

It turns out that special knowledge, such as the ability to play chess, makes it easier to remember information related to this area, but does not great influence to remember numbers. Adults, not too sophisticated in wisdom ancient game, remember fewer figures, but in memorizing numbers they are more successful.

In the body of the report Ji gives statistical analysis, mathematically confirming the presented results.

The 2x2 design is the simplest of all factorial designs. Increasing the number of factors or levels of individual factors greatly complicates these plans.

We recommend reading

Top