Vol. 2, Issue 2, Part A (2025)

Machine learning models for early disease detection in healthcare

Author(s):

Liwen Zhang and Chenhao Liu

Abstract:

Early detection of disease is one of the most powerful determinants of patient outcomes, yet current diagnostic workflows often fail to identify pathological changes before clinical symptoms emerge. This study explores the application of machine learning (ML) models for early disease detection across multiple healthcare datasets, combining structured electronic health records (EHRs), medical imaging, and clinical variables. Six ML algorithms logistic regression, random forest, support vector machine (SVM), gradient boosting (XGBoost), deep neural network (DNN), and an ensemble model were trained and validated on three benchmark datasets: MIMIC-III for acute kidney injury (AKI), NIH Chest X-rays for pneumonia, and the UCI dataset for diabetes prediction. Data preprocessing included normalization, feature selection using principal component analysis, and synthetic oversampling to address class imbalance. Evaluation metrics comprised accuracy, sensitivity, specificity, F1-score, area under the ROC curve (AUC), and Brier score for calibration. The ensemble model achieved the highest mean AUC (0.90 external validation) and maintained superior calibration (Brier ≈ 0.14) compared to single models. Statistical analysis using DeLong and McNemar tests confirmed the ensemble’s significant improvement over baseline models (p<0.05). Explainability methods such as SHAP and LIME were integrated to highlight clinically relevant features creatinine change, urine output, and baseline eGFR corroborating established risk factors and enhancing interpretability. The study concludes that ensemble-based, interpretable ML frameworks can achieve high predictive accuracy and clinical reliability when supported by balanced data handling and rigorous external validation. Practical recommendations emphasize the need for multi-modal data integration, standardized AI governance, model transparency, and periodic recalibration before real-world deployment. Overall, the findings reinforce that responsible machine learning, grounded in methodological rigor and explainable design, can substantially advance early disease detection, thereby improving prognosis, reducing treatment burden, and supporting proactive, data-driven clinical care.

Pages: 87-92  |  7 Views  3 Downloads

How to cite this article:
Liwen Zhang and Chenhao Liu. Machine learning models for early disease detection in healthcare. J. Mach. Learn. Data Sci. Artif. Intell. 2025;2(2):87-92.