Abstract:
Pulmonary diseases including Tuberculosis (TB) can be fatal and millions of people get infected by them each year. The timely and accurate detection of pulmonary diseases can save millions of lives around the globe. Chest x-ray (CXR) is considered the first and foremost diagnostic radiology technique for the initial screening of pulmonary diseases. Moreover, it is world widely adopted and is available to the remotest corner of the world due to its invasive effects, easy procedure, accessibility and economic feasibility. On the other hand, CXR interpretation can be tricky and can guzzle plentiful time. This can be problematic when radiologists need to interpret thousands of CXRs. To cope up with this, radiologist’s load can be shared by the Computer-Aided Diagnostic (CAD) systems. CAD systems can assist in performing the trivial processing and can introduce better disease specific information display helping the radiologist in making quick and more appropriate decisions that will save radiologist’s time. Most of the existing CAD systems are based on research using Montgomery County (MC), Shenzhen (SH) and Japanese Society of Radiological Technology (JSRT) datasets having few hundred images and thus cannot be generalized on large scale, while very few of them have utilized the National Institutes of Health (NIH) CXR dataset containing more than 112k CXR images. The proposed research work presents a hierarchical custom Convolutional Neural Network (CNN) model for i)improved pulmonary disease identification and classification, ii)automated report generation by learning physician’s reports against CXR images using Natural Language Processing (NLP), transformers and Recurrent Neural Networks (RNNs). The proposed hierarchical model is based on the NIH CXR dataset, the Indiana University (IU) dataset and the locally gathered HealthWays dataset. In the first approach of pulmonary disease classification, the proposed hierarchical model is utilized in different ways on the NIH CXR dataset for detecting the healthy or infected images, healthy or TB infected images with 0.92 F1 score, TB specific class label classification with 0.84 average accuracy and 0.82 average accuracy for 14 thoracic disease class label classification. The model evaluation is performed on the benchmark split of the NIH CXR dataset for 14 thoracic disease classifications and reported improved classification performance surpassing the results of state-of-the-art methods. In the second approach of CXR classification using automated report generation, the proposed hierarchical model is trained on the IU dataset using medical reports and CXR images. The medical reports are used as ground truths along with the CXR images using transformers and RNNs for automated report generation. After training, the proposed model generated alike reports by itself by just analyzing the CXR and then classifying it accordingly. Finally, the hierarchical model is evaluated on the locally gathered Health Ways dataset to find the localized patterns of pulmonary diseases. The proposed hierarchical model contributes to the accurate and better identification of pulmonary diseases using CXR images.