What percentage of medical studies are flawed?

Sample size planning in clinical studies

Background:This article describes the goal, necessity and methodology of sample size planning in clinical trials. Neither too small nor too large case numbers can be justified clinically, methodically or ethically. The medical professionals involved in clinical studies are directly involved in planning the number of cases, as their expertise and knowledge of the literature are essential here.

Method:Using a selection of selectively researched international scientific articles and our own expertise, the procedure for planning the number of cases is explained.

Results: Using a fictitious example in which two antihypertensive drugs A and B are compared with one another using a t-test, the sample size planning is shown and calculated as an example. Then a general principle for sample size planning is described, which is fundamentally applicable to other statistical tests as well. As an example, the medical expertise and assumptions required for planning the number of cases are listed for various cases. These usually depend on the statistical test.

Conclusion:Every clinical trial requires a rational justification for the planned sample size. The aim of sample size planning is to determine the optimal number of test subjects or patients for a clinical study. Planned case numbers should be worked out in collaboration with experienced biometricians and medical professionals. However, medical expertise is essential for case number planning.

The design is essential for the quality of any clinical and epidemiological study. Sample size planning is a crucial part of this (1). For methodological reasons, it is necessary to determine the course of the study and the number of cases before carrying out the study, and to define these in a protocol before starting recruitment. Deviations from this are only permitted within the framework of general guidelines for clinical studies. If the number of cases is neglected, an independent examiner can no longer determine afterwards whether the experimenter selected data or statistical methods in such a way that a desired result could be "proven". It is also necessary to control the probability with which an actual effect in the study can be discovered to be statistically significant. For example, when a pharmaceutical company plans to introduce a new drug, for economic as well as ethical reasons, it will not run the risk of not being able to provide evidence of effectiveness or non-inferiority to other drugs because the number of cases is too low. Likewise, it cannot be tolerated that too many patients are tested for the new drug. Both studies with too small and those with too large the number of cases are therefore not ethically and economically justifiable (2–4). For descriptive and retrospective studies, too, it should be planned in advance from which sources and to what extent data will be collected. Sample size planning is essential in medical research. If this is missing, this indicates a lack of quality in the relevant study and the results are viewed with skepticism.

The present article deals primarily with sample size planning when a single statistical test is intended to be used in relation to a confirmatory question. The aim of sample size planning is to select the sample sizes in such a way that an actually existing effect is recorded as statistically significant with a high degree of probability. In addition, it is important to have sufficient certainty that such an effect does not actually exist if it cannot be found in the study (4).

Determination of case numbers

For a study to compare two antihypertensive drugs A and B, two homogeneous and independent groups are formed by randomizing the study participants - i.e. randomly assigning the patients to the therapy groups. Patients in the first group received drug A, those in the second group received drug B. The mean reduction in blood pressure after four weeks was the primary endpoint.

From literature studies it is known that the lowering of the blood pressure in the population of hypertensive patients can be assumed to be normally distributed under both drugs and that drug A lowers the blood pressure of hypertensive patients by an average of about 10 mm Hg. Based on previous studies, a greater reduction of about 15 mm Hg is expected for drug B. This is seen as a relevant improvement. In addition, based on medical assessments, a standard deviation of 5 mm Hg for lowering blood pressure is assumed for both drugs.

To clarify the question of whether drug B lowers blood pressure statistically significantly more than drug A, a 1-sided t-test according to Student can be carried out for independent samples (5, 6). Sample size planning is carried out so that neither too few nor too many patients are included in the study. To determine a sample size, the power (in German: test strength, power or quality) and the significance level (7) of the statistical test are specified in advance. For the level of significance - that is the probability of receiving a statistically significant test result, even if there is no difference in reality - a value of 2.5% is usual for 1-sided tests (see [8], Section 5.5). However, other values ​​are also conceivable, depending on the question. A value of 80% or 90% is often used for the power - that is the probability of discovering the actual difference with the statistical test.

The graphicgifppt illustrates this relation for a standard deviation of 4, 5 and 6 mm Hg. For a standard deviation of 5 mm Hg, a case number of 17 is required for each group with the above figures and the specified power of 80%. With a standard deviation of 4 mm Hg, a case number of 12 subjects per group would be required, with 6 mm Hg a case number of 24 subjects per group would be required (graphic). A small sample calculation is also provided in the boxpresented in gifppt.

Necessary medical expertise

In the example above, medical expertise is required to estimate the expected difference and variance in the antihypertensive effects of the two drugs. Literature research or pilot studies are often used for this purpose. The biometrician can help the medical professional to determine this information. However, the meaning of the content can only be assessed by a qualified medical professional. So it is up to the physician, and not the biometrician, to decide whether the expected difference in mean blood pressure reduction of the two drugs is also clinically significant. If the drugs differ by only 1 mm Hg, for example, it could probably not be deduced from this that the patients who are treated with the more hypertensive preparation also benefit from this treatment, for example in the sense of a reduced risk of cardiovascular events.

The procedure presented for determining the number of cases is in principle also possible for other tests such as the unconnected Wilcoxon rank sum test for difference in position or the exact Fisher test for comparing two rates. Depending on the statistical method, different information is required from the medical professional. In Table 1gifppt is listed as an example of some statistical methods, which assumptions enable sample size planning.

For the t-test, the physician should make assumptions about the mean values ​​(μ1 and μ2) in two populations as well as assumptions about the standard deviations (σ1 and σ2) in these populations.

For the Fisher test, estimates of the relative proportions or rates of events (π1 and π2) sufficient in both populations. For this purpose, it must be determined from the literature how many out of 100 patients under therapy 1 and therapy 2 have an event such as side effects (= relative frequencies).

The Wilcoxon rank sum test requires an expert estimate of the probability that the target variable for the random drawing from population 1 is smaller than the random target variable from population 2. An estimate or assumption for this size should definitely be made in cooperation with a biometrician.

A careful assessment of the necessary parameters is worthwhile and can significantly prevent incorrect power analyzes and case number calculations (9).

Sample size planning

The above example for the unrelated t-test illustrates a frequently used scheme for determining case numbers. After assessing the necessary parameters, for example mean values ​​and standard deviations, and defining a significance level, the case numbers for the corresponding test are determined for varying assumptions about power. The relation here is as follows: the greater the power - i.e. the certainty of obtaining a significant result - the greater the number of cases required for the study. The smallest number of cases is chosen so that a specified power is at least achieved.

On the other hand, it also happens that the number of cases is limited by external factors - for example, the duration of the recruitment period, rare diseases or the time limit of a funded study - and yet the evaluation by means of a statistical test is planned. In this case, the achievable power must be determined during planning. The lower the power, the more futile it is to prove the suspected hypothesis (2, 3). Too little power can lead to a study being modified in the planning or not being carried out. Breckenkamp and co-authors (10) report on a planned cohort study to investigate the relationship between occupational exposure to electromagnetic fields and cancer. The authors state that none of the conceivable occupational cohorts would have exposed enough people. No study was carried out, although such a study was desirable from an environmental point of view.

If the focus of a study is not on proving a hypothesis but on estimating a parameter, then a sample size planning strategy can be pursued to estimate the expected range of confidence intervals (7). Suppose you want to estimate the prevalence of people with high blood pressure (plus a 95% confidence interval). The smaller the confidence interval, the better this population parameter (here prevalence) can be narrowed down. By defining the expected width of this confidence interval, a number of cases can be determined. With such a procedure it is necessary to give an idea of ​​the size of the prevalence and a desired precision.

Since, even with medical expertise, it is often only possible to make rough, rather unreliable assessments of the parameters used to determine the number of cases, several scenarios are often examined. For this purpose, the example mentioned and the graphic should be considered again. With an assumed standard deviation of 5 mm Hg, a total of 17 test persons per group were necessary for a power of 80%. If, contrary to expectations, the standard deviation is 6 mm Hg, then the power is only 65% ​​and only around 80% again when it is increased to 24 test persons per group. It can be seen here that an increase in the scatter also results in an increase in the number of cases. A reduction in the level of significance also leads to higher case numbers, because this reduces the probability of errors in incorrectly proving the effect. However, the level of significance must not be varied for the purpose of sample size planning. Further relations of this kind are in Table 2gifppt illustrated using the unlinked t-test.

In addition, it should always be noted that a difference to be demonstrated is also clinically relevant. The 5 mm Hg reduction in drug B compared to drug A is viewed by the clinician / researcher as a clinically relevant effect. However, if the effect to be expected in the study is too small, then the benefit of the clinical study must be questioned. In this case, even statistically significant results may not be clinically relevant (7).

A key point when planning the number of cases is the consideration of “lost-to-follow-up” or “drop-out” (11). If, for example, it can be assumed that some of the subjects in a study - for whatever reasons - will not be able to collect sufficient data, the number of cases must be increased in accordance with this proportion. The number of patients by which the number of cases has to be increased depends on the estimated participation rate and the study conditions. It should be noted, however, that such conditions usually also influence the representativeness of the data. The result is usually a distortion of the results. This must also be taken into account when planning the study.

Explicit formulas for determining the number of cases are available for the most common tests (12–14). Machin and co-authors (12) provide extensive tables for common values ​​of variables that are included in the sample size planning, from which the sample size can be read off directly.

As common statistical software programs, SPSS with SamplePower and SAS with the procedures "PROC POWER" and "PROC GLMPOWER" as well as the software Nquery provide suitable solutions for the calculation of case numbers. The program G * Power 3 of the Institute for Experimental Psychology at Heinrich Heine University Düsseldorf can be used free of charge (www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/). It is best to use a validated program, such as one of the above.


To plan the number of cases in a clinical study, you need advance information. Which prior information is required depends on the statistical methods planned. If the corresponding quantities cannot be estimated, it is advisable, for example, to conduct a pilot study before the confirmatory study with the aim of estimating the corresponding parameters of the populations. In any case, the expected effect should be at least as great as the minimal clinically relevant effect.

Even in exploratory and descriptive studies (1), the size of the study group (s) must be determined in order to be able to narrow down the parameters to be estimated with sufficient accuracy. A lack of sample size planning suggests that a study is of poor quality.

Sample size planning for a clinical study is based on an estimate based on prior information, which can vary in precision from study to study. This should always be taken into account when interpreting the results. A treatment effect that is overestimated in the planning phase usually results in an insufficient number of cases. The observed treatment effect can then only be insignificant because the number of cases is too low.

The handling of missing values ​​and patients who leave the study should also be taken into account when planning the number of cases.

Only a small part of the sample size planning can be highlighted here. Depending on the study design, there are other aspects that are important when planning the number of cases. The methods of sample size planning can change, for example, if a test for superiority, non-inferiority or equivalence is to be carried out in the clinical study (13). In the case of non-inferiority studies, quite high case numbers may be required, since the smallest clinically relevant difference is often used as the mean difference to be demonstrated, which then functions as the non-inferiority barrier. This is usually much smaller than an actual mean difference.

Often several hypotheses should be tested on the basis of one data set. Multiple test problems must be taken into account when planning the sample size. In many cases, therefore, only one main question is set.

In addition, the number of cases in modern studies is not always determined. For example, in the context of adaptive designs, the number of cases can be influenced or controlled during a study according to a scheme strictly defined in the planning phase. However, this procedure requires careful, statistically demanding planning and should never be carried out without an experienced biometrician.

Due to the complexity and far-reaching consequences of the number of cases planning, it is recommended that experienced biometricians and medical professionals work together. Joint planning of all important details can significantly improve the quality and informative value of studies (2, 3, 15).

Conflict of interest
The authors declare that there is no conflict of interest within the meaning of the guidelines of the International Committee of Medical Journal Editors.

Manuscript dates
Taken in: January 15, 2010, revised version accepted on March 22, 2010

Address for the authors
Prof. Dr. rer. nat. Maria Blettner
Institute for Medical Biometry, Epidemiology and Computer Science (IMBEI)
Clinic of the University of Mainz
Obere Zahlbacher Strasse 69
55131 Mainz
Email: [email protected]


Sample Size Calculation in Clinical Trials — Part 13 of a Series on Evaluation of Scientific Publications

Background: In this article, we discuss the purpose of sample size calculation in clinical trials, the need for it, and the methods by which it is accomplished. Study samples that are either too small or too large are unacceptable, for clinical, methodological, and ethical reasons. The physicians participating in clinical trials should be directly involved in sample size planning, because their expertise and knowledge of the literature are indispensable.

Methods: We explain the process of sample size calculation on the basis of articles retrieved by a selective search of the international literature, as well as our own experience.

Results: We present a fictitious clinical trial in which two antihypertensive agents are to be compared to each other with a t-test and then show how the appropriate size of the study sample should be calculated. Next, we describe the general principles of sample size calculation that apply when any kind of statistical test is to be used. We give further illustrative examples and explain what types of expert medical knowledge and assumptions are needed to calculate the appropriate sample size for each. These generally depend on the particular statistical test that is to be performed.

Conclusion: In any clinical trial, the sample size has to be planned on a justifiable, rational basis. The purpose of sample size calculation is to determine the optimal number of participants (patients) to be included in the trial. Sample size calculation requires the collaboration of experienced biostatisticians and physician-researchers: expert medical knowledge is an essential part of it.

How to cite: Dtsch Arztebl Int 2010; 107 (31-32): 552-6

DOI: 10.3238 / arztebl.2010.0552

@ The German version of this article is available online:

Röhrig B, du Prel JB, Blettner M: Study design in medical research - Part 2 of a series on evaluation of scientific publications [Study design in medical research. Part 2 of the series on the evaluation of scientific publications]. Dtsch Arztebl Int 2009; 106 (11): 184-9. FULL TEXT
Eng J: Sample size estimation: how many individuals should be studied? Radiology 2003; 227: 309-13. MEDLINE
Halpern SD, Karlawish JHT, Berlin JA: The continuing unethical conduct of underpowered clinical trails. JAMA 2002; 288: 358-62. MEDLINE
Altman DG: Practical Statistics for Medical Research. London: Chapman and Hall 1991.
du Prel JB, Röhrig B, Hommel G, Blettner M: Choosing Statistical Tests. Part 12 of a series on evaluation of scientific publications [Selection of statistical test procedures: Part 12 of the series on evaluating scientific publications]. Dtsch Arztebl Int 2010; 107 (19): 343-8. FULL TEXT
Sachs L: Applied statistics: application of statistical methods. 11th edition. Springer 2004; 352-61.
du Prel JB, Hommel G, Röhrig B, Blettner M: Confidence interval or p-value? Part 4 of a series on evaluation of scientific publications [confidence interval or p-value? Part 4 of the series on the evaluation of scientific publications]. Dtsch Arztebl Int 2009; 106 (19): 335-9. FULL TEXT
ICH E9: Statistical Principles for Clinical Trials. London UK: International Conference on Harmonization 1998; adopted by CPMP July 1998 (CPMP / ICH / 363/96).
Blettner M, Ashby D: Power calculation for cohort studies with improved estimation of expected numbers of death. Soz Preventivmed 1992; 37: 13-21. MEDLINE
Breckenkamp J, Berg-Beckhoff G, Münster E, Schüz J, Schlehofer B, Wahrendorf J, Blettner M: Feasibility of a cohort study on health risks caused by occupational exposure to radiofrequency electromagnetic fields. Environ Health 2009; 8:23 am MEDLINE
Schumacher M, Schulgen G: Methodology of clinical studies: methodological principles of planning, implementation and evaluation (statistics and their applications). 3rd edition. Berlin, Heidelberg, New York: Springer Verlag 2008: 1–436.
Machin D, Campbell MJ, Fayers PM, Pinol APY: Sample size tables for clinical studies. 2nd edition. Oxford, London, Berlin: Blackwell Science Ltd. 1987; 1-315.
Chow SC, Shao J, Wang H: Sample size calculations in clinical research. Boca Raton: Taylor & Francis, 2003; 1-358.
Bock J: Determination of the sample size for biological experiments and controlled clinical studies. Munich: Oldenbourg Verlag 1998; 1-246.
Altman DG: Statistics and ethics in medical research, misuse of statistics is unethical, BMJ 1980; 281: 1182-4. MEDLINE
Altman DG, Machin D, Bryant TN, Gardner MJ: Statistics with confidence. 2nd edition. BMJ Books 2000.
Fahrmeir L, artist R, Pigeot I, Tutz G: Statistics: The way to data analysis. 4th edition. Berlin, Heidelberg, New York: Springer Verlag 2003; 1-608.