Analysis Of Machine Learning Technique For Heart Failure Readmission
In 21st century, it is mandatory, and that people depend on technology. This resolves the problems and make life better. Nowadays lot of people suffer from Cardiac problem and its mainly due to life style problem. There are few points to know:
- Coronary heart disease (CHD) is the most common type of heart disease. Current methods of CHD risk identification don’t give god results.
- Smoking, elevated systolic blood pressure, diabetes etc. triggers for tests like lipid levels, electrocardiogram (ECG) etc. which are also seems to be insensitive.
- Hence data enhancement (by genetic data or more clinical, phenotype data) or usage of machine learning helps to improve the results.
- Genetic data improve results by 10% and detailed other data parameters also gives similar results.
Understanding ECGs and other reports many times doctors read them incorrectly and many times seen that people with chest pain were relieved telling its gastric pain and patients dies. Whereas the results produced by machine learning is easily understandable and it allows systems the ability to automatically learn and improve from experience without programming explicitly.
Hence the aim of this study is to predict the readmission happens within 30 days and 180 days after the discharge of patients due to all causes it can be and due to heart failure reasons. Readmission puts lot of burden on patients and as well as on healthcare and Insurance systems. The current Predictive model puts more burden as it performs poorly on clinical data. So richer data set like clinical, social, demographic etc used.
ML methods works best when independent variables are correlated efficiently.
Hence in this study 3 ML techniques are used i.e Random forest (RM), Boosting and Support vector Machines (SVM). Whereas Logistic Regression (LR) is used as traditional prediction model.
2. Literature survey
We started the study of literature with the view to work either on Brain related problems or Heart related problems. But seeing the conditions in todays world cardiac problems are very prominent and requires attention hence we choose to use the topic on cardiac problem.
In our survey of literature, we found many papers on using machine learning to predict cardiac problems. So the varied literature were found and they were categorised as below.
- Usage of various algorithms to predict the Heart related problems
- Comparison of Machine learning techniques to predict hear problem
- Usage of varied data sets apart from phenotype to genetic
- Probability of patients to get readmitted after heart failure
- Usage of hierarchical models to predict heart failure
Hence below is the snippet of few such papers which were referred during our survey and they fall in any of the category as mentioned above.
3. Problem Identification and Problem Formulation
After doing the survey and study of literature few findings were there and they only decided to identify the problem statement and its formulation. The findings were
- After going through all the literature, it was evident that normally the prediction is done traditionally using Logistic regression model
- Also, the data set used was varied some used clinical data some used genome data and some used phenotype
- The third important thing was that standalone models were used normally, and rarely Hierarchical model is used
- Fourth is heart failure is studied a lot, but readmission prediction is a new concept found in literature survey
Hence based on all above it was clear that we should choose to analyse the readmission of patients whose were admitted due to heart failure and the analysis to be done whether we use standalone ML model or hierarchal models can give better results. Also, it was evident that if we use a data set without genomic data can also give better results and we planned to use C-statistics with 95% confidence level rather giving prediction in percentage of accuracy. The addition to this paper can be that we can add the genetic data and use the neural network model to enhance the performance.
4. Planning, Design and Methodology
The heart of any study is how we are planning and designing the complete study. Hence there are various components attached to it which are as discussed below.
a) Data Source
It is very important that we choose the correct source to collect the data it should be authentic, and the instruments used in that must be perfectly validated and calibrated. In this study the tele monitoring system is used to gather data. Hence below are the important points regarding the data source.
- Data for this study is drawn via Tele-HF. 1653 patients were enrolled within 30 days of their discharge whose data were captured and analyzed.
- Data is comprised of clinical, socioeconomic, psychosocial, and health status. A wide array of instrumented data from comprehensive qualitative surveys is gathered.
- Tele HF data study showed that there is not much difference between the telemonitoring & control arms.
b) Analytic sample Selection
- For this study all patients those patients were included whose interview is completed within 30 days of discharge to ensure that information was reflective of the time of admission.
- Out of enrolled 1653 patients, 36 died before interview, 574 interviews done after 30 days, 39 patients’ data were missing hence we left with 1004 readmitted patients’ study and analysis was done
- For 180 days another 27 patients died hence left with a set of 977 patients for analysis.
c) Prediction techniques Used
- For traditional statistical model Logistic regression is used
- This traditional model is compared with Random forest, Boosting and Support vector machine models
- RF involves the creation of multiple decision trees that sort and identify important variables for prediction.
- Boosting algorithms harness the power of weaker predictors by combining with other variable and giving them weightages.
- SVM creates better and clearer separation of classes with the use of support boundaries.
- In this study the ML approaches were compared against traditional logistic regression for predicting readmission for heart failure.
d) Hierarchical Methods
- To overcome the problem of overfitting hierarchical models were developed using Random forest where output of random forest is fed to Logistic regression and Support vector machine.
- Random forest used all variables to predict the probability of readmission and hen these probabilities were given to LR and SVM for better results of prediction.
e) Feature selection
- In total 472 variables were used as inputs (called features in ML).
- Out of 276 baseline features were available via Tele-HF which included data from medical records, hospital laboratory, physical examination, quality of life, socioeconomic and demographic information
- Phenotype details of patients were captured whereas the study didn’t include the Genome-wide DNA methylation and Genome-wide genotype data as it was not available.
- Additional 276 dummy variables were created to know if any value is missing or not.
- Other studies have used below SNPs, but results were similar to Phenotype data as shown below.
f) Definition of outcome
- 4 separate outcomes were to be predicted
- 30 days all cause readmission
- 180 days all cause readmission
- 30 days readmission due to heart failure
- 180 days readmission due to heart failure
For good outcome the patient data is shown below
5. Implementation Approach
- The approach used is as shown in the figure, It is ran for 100X times for robust results
- After running the model C-statistics (which tells the model of discrimination) with 95% confidence level
- For each iteration, positive predictive value, sensitivity, specificity and f-score are calculated like C-statistics
- The Models were developed in R language and study was approved by YALE School of Medicines
- The model is as shown below.
The study of the literature shows that ML methods have full capacity and potential to improve the prediction of heart failure and the readmission scenarios. If the algorithms are used efficiently and the data pruning is done appropriately then the model can develop the relationship between the independent variable and in turn can produce effective results. Hence in that way the prediction based on traditional mechanism will be superseded by effective ML models. Future work needs to focus on further improvement of predictive ability through advanced methods and more judicious data, so as to facilitate better targeting of interventions to subgroups of patients at highest risk for adverse outcomes.