Independent Statistical Consultant, Berlin
Systematic reviews of diagnostic test are undertaken for the same reason as systematic reviews of therapeutic inventions: to produce estimates of performance based on all available evidence, to evaluate the quality of published studies, and so account for the variation in findings between studies.
Studies of test performance or accuracy compare test results between separate groups of patients with and without the target disease, each of whom undergoes the experimental test as well as a second “Gold Standard” reference test. The relationship between the test results and the disease status is described using probabilistic measures, such as sensitivity, specificity, accuracy and likelihood ratios.
The choice of a statistical method for pooling study results depends on the source of heterogeneity, especially variation in diagnostic thresholds. Sensitivity, specificity, accuracy and likelihood ratios may be combined directly if the results are reasonable homogeneous. When a threshold effect exists, study results may be best summarised as a summary receiver operating characteristic ROC curve. Such a curve can prove difficult to interpret and apply in practice.