Tutorial - Statistical Learning Machines with Applications to Biomedical Data

James D. Malley, Ph.D.
Math. & Statistical Computing Laboratory; National Institutes of Health, Bethesda, USA
Ort des Vortrages: 
n/a
Uhrzeit: 
n/a
Datum: 
11. December 2003

Over the few years the machine learning community has developed many quite successful methods for classification and regression, and these include support vector machines, boosting, and random forests, each with subvariations. However, these methods have not been widely used or understood in the biostatistical community. To this end, we review the main ideas of statistical learning machines, and consider possible applications to biomedical data analysis. We observe that there are important differences in how these machines might be applied to such data, quite distinct from how they have been used in the computer sciences. These differences include much smaller sample sizes and a focus on prediction error estimates, both a result of having to use the data for both training and testing. Other differences include unbalanced data (positive cases much more numerous than negative cases) and questions of interpretability of the proposed prediction scheme. We outline possible solutions to these problems, and discuss the analysis of two data sets, the German Stroke Collaboration (with Prof. Hans-C. Diener, Clinic of Neurology, University Hospital Essen; Prof. Andreas Ziegler, Lübeck) and a study of prognostic factors in Systemic Lupus Erythematosus (in association with Dr. Michael Ward, National Institutes of Health, National Institute of Arthritis and Musculoskeletal and Skin Diseases).