Critical appraisal of meta-analyses: an introductory guide for the practicing surgeon
© Lawrentschuk et al; licensee BioMed Central Ltd. 2009
Received: 12 May 2009
Accepted: 22 July 2009
Published: 22 July 2009
Skip to main content
© Lawrentschuk et al; licensee BioMed Central Ltd. 2009
Received: 12 May 2009
Accepted: 22 July 2009
Published: 22 July 2009
Meta-analyses are an essential tool of clinical research. Meta-analyses of individual randomized controlled trials frequently constitute the highest possible level of scientific evidence for a given research question and allow surgeons to rapidly gain a comprehensive understanding of an important clinical issue. Moreover, meta-analyses often serve as cornerstones for evidence-based surgery, treatment guidelines, and knowledge transfer. Given the importance of meta-analyses to the medical (and surgical) knowledge base, it is of cardinal importance that surgeons have a basic grasp of the principles that guide a high-quality meta-analysis, and be able to weigh objectively the advantages and potential pitfalls of this clinical research tool. Unfortunately, surgeons are often ill-prepared to successfully conduct, critically appraise, and correctly interpret meta-analyses. The objective of this educational review is to provide surgeons with a brief introductory overview of the knowledge and skills required for understanding and critically appraising surgical meta-analyses as well as assessing their implications for their own surgical practice.
The statistical tool of meta-analysis is used with increasing frequency in surgical research. A recent review demonstrates that over the past decade, appearances of meta-analyses in the medical literature have increased by fourfold . The vast majority of meta-analyses combine results from different randomized controlled trials (RCTs) and, to a much lesser extent, cohort studies or case-control studies. In the interest of brevity, we will focus this short educational review on meta-analyses of RCTs only.
Since their invention and subsequent application in the medical literature in the early 20th century , meta-analyses have continuously evolved. The practice of performing high-quality, methodologically sound, and critically evaluated meta-analyses culminated in the creation of the Cochrane Group. Named for Archie Cochrane, a British researcher who contributed greatly to the development of modern epidemiology, the Cochrane group was established 15 years ago and is an international collaboration of over 10,000 investigators who appraise and compile high-quality meta-analyses on numerous topics, with over 1,600 published to date .
Given the ubiquity of meta-analyses in the current flourishing culture of evidence-based medicine, it is imperative for the practicing surgeon to acquire a basic understanding of the advantages and limitations of meta-analyses. Unfortunately, many surgeons lack a solid foundation in this essential area of knowledge. The present article represents an invited review and is based on different educational articles by the senior author (U.G.) [4–9]. Our objective is to provide a brief introductory overview of the techniques used to perform a meta-analysis and discuss some of the advantages and potential shortcomings of this statistical tool.
Type I (alpha) and type II (beta) error .
Truth in the overall patient population
No treatment difference
A. Correct conclusion
B. Type I error (false-positive result)
No treatment difference
C. Type II error (false-negative result)
D. Correct conclusion
The first situation represents false-positive result and is called a type I error. The bound that we put on the probability of committing a type I error is named alpha, also referred to as the level of statistical significance or significance level. The second situation represents a false-negative result and is called a type II error or beta error. Beta, the false-negative rate, is complementary to the power of a study , which is defined as the probability of finding a statistically significant result (i.e., rejecting the null hypothesis) in a study when a true difference exists between or among the groups of subjects being compared.
Often in biomedical research alpha is set at 0.05, meaning that a 5% chance of obtaining a false-positive result (i.e., the results show a statistically significant difference even though no real difference exists) is considered acceptable. Alpha is the benchmark to which p values are compared. If the p value is larger than alpha, a result is said to be non-significant. On the other hand, if the p value is smaller than the benchmark alpha, the findings are considered statistically significant.
Although it might at first seem reasonable to assume that both alpha and beta errors could be set at the same level of 5%, a false-positive finding is often considered potentially more harmful than a false-negative result (e.g., finding a surgical procedure to be beneficial to patients when no benefit actually exists). Thus, in medical science, beta is commonly set between 0.2 and 0.1. As the power of a study is complementary to the beta error, type II errors of 0.2, 0.15, and 0.1 correspond to statistical powers of 80% (1.0-0.2), 85% (1.0-0.15), and 90% (1-0.10), respectively.
It is important to recall that the sample size of a study is directly proportional to the power: the larger the sample size, the higher the power. This simple statistical concept is of tremendous importance to meta-analyses. A meta-analysis combines different RCTs to increase the overall sample size, thus increasing the statistical power. This increase in statistical power in turn shrinks the value of beta, with result that the chances of a false-negative finding are minimized in a well-performed meta-analysis.
Finally, effect size forms a critical part of evaluating meta-analyses, as has been described by the senior author in greater detail elsewhere . In summary, a meta-analysis combines the results of multiple studies that test a similar research hypothesis. Thererfore, the findings of individual RCTs (effect sizes) are combined using statistical techniques into an overall effect size, sometimes called meta-effect size. The meta-effect size is a more powerful and accurate estimate of the true effect sized compared with individual single studies.
Formulate a research question.
State the a priori hypothesis (a hypothesis generated prior to collecting the data). This step is vital to ensuring the validity of the meta-analysis. No matter how tempting it might be to form hypotheses as interesting correlations or patterns appear in the collected data, doing so is likely to bias the meta-analyses irretrievably and diminish its validity.
Write a protocol in which the research question, as well as inclusion and exclusion criteria for the trials to be pooled in the meta-analysis, are clearly described.
Perform a thorough literature search using several different search engines (e.g., PubMed, Embase, Cochrane, Google Scholar, etc.). Be sure to formally document the search strategies used and the results that the searches retrieved; the sensitivity and precision of the literature search is itself likely to affect the ultimate validity of the findings [13–16]. A non-electronic hand searching of the literature may also be useful, despite the time and effort required .
Perform a quality assessment (critical appraisal) and extraction of studies (usually performed by two independent investigators).
Extract the data from the RCTs.
Perform a statistical analysis (including a sensitivity analysis).
State conclusions and provide recommendations.
As we examine in detail the steps of performing a meta-analysis, it is important to emphasize particular aspects of the process. First, as with any study, the value of a meta-analysis can be assessed using the mnemonic 'FINER'. That is, the study must be F easible, I nteresting, N ovel, E thical, and R elevant . If a meta-analysis is feasible and ethical but not relevant and novel, it will be worthless–there is little to be gained from answering questions without any clinical relevance, or re-hashing research questions that have been thoroughly and definitively addressed.
Second, the literature search for relevant RCTs should be undertaken only after a protocol has been written, with a priori hypotheses and inclusion and exclusion criteria clearly defined. Ideally, at least two independent investigators should search for studies that meet inclusion criteria. Further, the search should be as comprehensive as possible; that is, not limited only to Medline, but rather spread among other scientific databases (e.g., Embase, Cochrane, Google Scholar, etc.) to minimize the possibility of omitting a study of interest simply because of the vagaries of data collection or indexing. Similarly, limits on the language of the publication, date, country of origin, etc. should be avoided if possible. Moreover, as explained in greater detail below, performing a systematic search for unpublished studies is imperative.
Finally, two independent investigators must assess the quality and suitability of retrieved studies. Assessments and decisions regarding inclusion of a given study in the meta-analysis should be based on the inclusion and exclusion criteria outlined in the protocol.
As mentioned above, the whiskers represent the 95% confidence intervals. These 95% confidence intervals comprise the range of values for which you can be 95% confident that the true value is included, and help provide the reader with an appreciation of the reliability of the results. The wider the 95% confidence intervals, the higher the uncertainty that the reported results are accurate . The width of the 95% confidence intervals is indirectly proportional to the sample size of the RCT: if the trial includes a large number of patients (e.g., Study 5, Figure 2), the width of the 95% confidence intervals will be narrow; as sample sizes dwindle, the 95% confidence intervals correspondingly grows wider.
In addition to providing information on the reliability of the results, the whiskers of a 95% confidence interval can inform the reader as to whether the study was statistically significant . If the confidence interval crosses the vertical line of no effect (0 for a difference between two groups and 1 for a ratio of two groups), then that trial result, taken individually, is not statistically significant (e.g., Study 6 in Figure 2). Conversely, if the confidence interval does not cross the vertical line of no effect, the result is statistically significant (e.g., Studies 5 and 8 in Figure 2). The overall result (summary effect) is represented by the diamond shape. In Figure 2, the confidence interval does not cross the line of no effect and thus represents a statistically significant overall result (p < 0.05).
When performed properly, meta-analyses have a number of important advantages over individual RCTs. These include:
The primary advantage of meta-analyses is an increase in statistical power over that of individual RCTs. As previously mentioned, power is defined as the probability of detecting a statistically significant result if the patient samples are truly different [7, 9]. For various reasons that are often difficult to predict beforehand, RCTs frequently prove to be underpowered. In other words, they enroll too few patients to prove that a detected difference, even when clinically relevant, is statistically significant [7, 22]. The result is a negative study; in other words, the p value exceeds the threshold for significance. In these small RCTs it may be unclear whether the lack of statistical significance truly reflects the fact that there is no difference between treatments, or whether the sample size was simply too small to demonstrate that a detected difference was significant. As mentioned earlier, meta-analyses may overcome this limitation by combining different RCTs, thereby increasing overall sample size (and with it, statistical power) in a way that would simply not be feasible if one were to attempt creating a single RCT with a comparable sample size.
Often, in a given area of interest, various RCTs may provide contradictory results. Quite commonly, this occurs for the same reason discussed above: the sample sizes for the various RCTs were not sufficient to ensure a definitive answer to the research question. This confusion, however, can be abated by applying the tool of meta-analysis, which can reveal an underlying unifying conclusion among seemingly contradictory study findings.
For a practical example, let us consider the role of neoadjuvant chemotherapy prior to radical cystectomy with extended lymphadenectomy in the treatment of bladder cancer. There have been many conflicting RCTs performed in this clinical arena, some of which found a statistically significant overall survival advantage with neoadjuvant chemotherapy in addition to radical cystectomy, while others found no such advantage. In order to elucidate this issue, a meta-analysis combining the various RCTs was performed. The meta-analysis demonstrated a 5% absolute improvement in overall survival at 5 years when more than 3000 patients from 11 RCTs were analysed . In this case, a meta-analysis helped reveal a unifying conclusion and led to important gains in knowledge as well as direct benefit for patients.
Checklist for the surgeon to critically appraise a meta-analysis
Has a research question/hypothesis been formulated a priori (before starting the meta-analysis)?
Is the research question FINER (F easible, I nteresting, N ovel, E thical, and R elevant)?
Is the meta-analysis based on a written protocol that clearly outlines research question, primary and secondary outcomes, and inclusion and exclusion criteria?
Has a thorough literature search been performed? Have different search engines (PubMed, Embase, Cochrane library, etc.) been used to identify relevant literature?
Did the authors look for unpublished data, for negative studies, and for publications in non-English languages to minimize retrieval, language, and publication bias?
Was a strategy to exclude individual studies clearly outlined in the publication?
Did two investigators independently perform the quality assessment of the individual studies?
Were sensitivity analyses performed?
It is quite possible for a researcher to apply methodologically sound meta-analytical techniques to suboptimal data. Unfortunately no amount of statistical technique can improve the fundamental quality of the data being combined for the meta-analysis. If the individual RCTs that make up the meta-analysis are themselves poorly designed and poorly conducted, the meta-analysis summarizing these trials will be of correspondingly limited reliability. Remember: garbage in-garbage out !
It is important to acknowledge that RCTs in surgery are themselves subject to their own particular challenges and sets of biases as has been discussed previously by the senior author in greater detail elsewhere [4, 5]. Briefly, typical caveats of surgical trials include limitations such as low external validity (poor generalizability), difficulty of blinding patients and investigators, co-intervention bias, lost-to-follow-up bias, and performance bias. Because it is often difficult to control for these biases it is therefore important that the astute reader of a meta-analysis evaluate the individual RCTs to assess the overall quality of the meta-analysis [4, 9].
There is a well-known phenomenon, extensively documented in the published literature, whereby positive trials (studies that produce a statistically significant result) are much more likely to be published than so-called negative trials (studies producing no statistically significant association), or trials that produce equivocal results [1, 24]. There are a number of possible reasons for the existence of such a bias, but although editorial bias in medical journals is often held as a culprit, there is some evidence to suggest that the bias may also arise when investigators or sponsors simply decide not to write and publish negative results. Regardless of its origin, however, publication bias may lead to overestimating the effect of the intervention being examined in the meta-analysis. Thus, in order to ensure the reliability of a meta-analysis, one must systematically search for negative trials for inclusion.
A variety of methods are used to detect potential publication biases, including graphical depiction through the use of funnel plots [1, 26]. In summary, such plots are scatter diagrams of the estimated treatment effects in the individual studies against the study size. Small studies will give more variable estimates and hence greater scatter. When completed, the plot should have a symmetrical appearance like that of a triangle or inverted funnel. An asymmetry in the funnel plot may reflect the possibility that smaller studies were not published due to non-significant findings, thus indicating publication bias. Funnel plots are easy-to-use, practical tools and should be employed systematically to detect and possibly prevent publication bias .
Recent consensus statements by the World Health Organization (WHO) and the International Council of Medical Journal Editors (ICMJE)  have led to requirements that any RCT be registered with http://www.clinicaltrials.gov/ before commencing patient accrual; failing to do so may render the study unable to be published by a large and growing number of peer-reviewed journals and additional sanctions may apply to researchers who neglect to register clinical trials. Although even the most finely-honed searches may fail to reveal negative data that has simply been "shelved" by investigators or sponsors, any author attempting a meta-analysis should exhaust every possible avenue for obtaining the most complete set of data possible.
Retrieval bias and language bias are often considered as two facets of publication bias and will be discussed below.
This bias refers to a potential distortion of the findings of a meta-analysis due to the overlooking or exclusion of relevant studies that merit inclusion in the meta-analysis. Retrieval bias may be the result of suboptimal search of electronic databases and failing to identify important unpublished results. It is critically important to search a variety of different databases . Novice researchers should be particularly attentive to carefully choosing search terms. The appropriate use of Medical Search Headers (MeSH) keywords and Boolean search strings can help maximize the retrieval of relevant articles; in particular, relatively inexperienced researchers may wish to experiment with search tutorials, such as those offered by PubMed, and refer to the growing body of literature regarding the optimization of literature searches.
Language bias is closely related with retrieval bias. It refers to a potential distortion of the results of a meta-analysis due to a failure to identify relevant study findings published in languages other than English. If a methodologically sound meta-analysis is performed, the investigators must systematically search for relevant studies outside of the scientific English literature .
Because perioperative care and surgical techniques are not necessarily uniform or easily standardized  meta-analyses are consequently more difficult to perform on surgical interventions than in drug trials. Dealing with such heterogeneity may be challenging: Not only is there a risk of heterogeneity within the RCT, but this heterogeneity may be amplified when combining different trials. Therefore, if the heterogeneity among the various RCTs included in a meta-analysis is high, the investigator risks comparing apples and oranges. Although it may occur by chance alone, heterogeneity in surgical trials is most often associated with variation in technical ability among surgeons .
Well-designed and rigorously performed multi-center surgical RCTs now typically contain some validation of standardized surgical technique. For instance, standardization can be achieved through peer review of surgical procedures. A good example of this is the COST trial, which compared open and laparoscopic colectomies for cancer . Before the laparoscopic surgeons were allowed to enroll patients into the trial, their surgical techniques were submitted to peer review, thereby ensuring a baseline standard. The peer review process diminishes the risk of suboptimal technical skills acting as a potential confounder and enhances homogeneity of surgical skills among similar trials.
The extent of heterogeneity among various trials can be determined through the application of the method of sensitivity analysis. In this case, what is important is not whether the subgroups remain statistically significant with respect to the overall research question, but whether any statistically significant differences exist among them. Also, if homogeneity exists, then omitting a certain trial from the analysis should not change the overall result. Conversely, if heterogeneity is present, then omitting a key trial may well change the pooled estimate. It follows that the more similar the included RCTs are, the less the degree of heterogeneity. If too much heterogeneity is found among different RCTs in the area of interest, performing a meta-analysis may not be feasible, and the reporting of different studies in a systematic review may be more appropriate.
Basic knowledge of the advantages and limitations of meta-analyses is essential to the practicing surgeon. Although meta-analyses are considered the highest level of evidence and an essential part of medical research, the clinician must be aware of their potential limitations, either when conducting a meta-analysis or interpreting the findings from one. Combining independent RCTs using statistical techniques will increase the statistical power in the context of the research question, but this will not necessarily translate into a higher-confidence conclusion if the individual studies that make up the meta-analysis are not themselves sufficiently well-designed and conducted.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.