Abstract:
Thalassemia, which is a hereditary blood disorder that impacts millions all over the world, requires early identification of carriers to reduce its prevalence and associated complications. Carriers, even though they often don't show symptoms themselves, may pass the genetic mutation to their children, potentially causing thalassemia in future generations. Detecting carriers at an early stage allows for important interventions like genetic counseling, family planning, and education on thalassemia risk. In recent times, machine learning algorithms have become valuable tools in healthcare, capable of analyzing large datasets for predictive insights. This thesis aims to explore the use of machine learning to identify thalassemia carriers based on Complete Blood Count (CBC) results. The project involves data collection, preprocessing, feature selections, and model training. Specifically, we prioritize features that are most relevant to thalassemia detection, including utilizing the Mentzer index. Our approach uses the Random Forest Model for the detection of thalassemia carriers. Model performance will be evaluated rigorously using appropriate metrics for reliability and accuracy. The outcomes of this studies hold the potentials to significantly contribute to the field of thalassemia diagnosis’s and management. Through developing an accurate and efficient machine learning model based on CBC results, clinicians and researchers will gain valuable insights that could improve patient outcomes and inform future research and treatment strategies. That research could ultimately lead to better-targeted interventions and personalized care for individuals affected by thalassemia. Keywords: Genetic counseling, Complete Blood Count (CBC), feature selection, Mentzer index, Random Forest Model, patient outcomes, personalized care.