Assignment 3 B: Machine Learning Model Training 2
• Two multi-part, multiple-choice questions. with short answer questions
• AI in Healthcare with Phase 3 data set from (HTML file)
• Details of the Q1 & Q2 m/c questions and short answer questions are shown in the attached question sheet
• Lecture notes on Machine Learning in Healthcare for your reference
Phase 3: Model Training, Part 2
Welcome to Phase 3 of the capstone project. This section will be the second of two parts that concerns the model training process of the model development cycle. You continue to play the role of a bioinformatics professor. The questions will relate to the various challenges faced by the teams working on the two projects introduced in the first section.
You have made recommendations (based on your answers in the prior phase) to both of the research teams. They have taken your suggestions into account, and have since refined their results. Both teams have e-mailed you summaries of recent progress, which are shown below.
Project 1: CXR-based COVID-19 Detector
Hi,
Thank you for your excellent feedback. We are now facing the opposite problem– our model is now memorizing the training data and failing to generalize to new, unseen data. As a recap, below are the changes we’ve implemented since our last check-in.
We re-split the data into a training, validation, and test set. We are placing 80% of the data into the training set, 10% of the data into the validation set, and 10% of the data into the test set. We split the data by patient this time, to prevent patient overlap. We are now evaluating the model using the validation set.
We tried out your suggestion to upsize the images from 224 by 224 pixels to 512 by 512 pixels in order to retain some of the fine-grained resolution while keeping the memory requirements manageable. We adapted the first few layers of the model architecture to accommodate for this change.
We eased back on the data augmentation. Now we do a simple horizontal flip and incorporate only a slight amount of zoom.
Here are our new training curves from our model. Per your recommendation we’ve become oversampling the COVID-positive exams in the training set. It was helpful, but we’re starting to see some real learning occurring. However, as you can see, the loss for the training set is now far lower than that of the validation set.
Now that our model is training, we are tracking both the AUROC and accuracy of the model during training on the training and validation sets. Here is the model’s AUROC over time:
On the epoch where the model achieves the highest validation set AUROC, we see a 0.846 AUROC on the training set and 0.692 AUROC on the validation set. However, when we visualize the accuracy of the model, we get a very different story:
On the epoch where the model achieves the highest validation set accuracy, the model attains an accuracy of 0.912 on the training set and 0.914 on the validation set. We’re not sure why its accuracy is so high. We double-checked the code and there don’t seem to be any bugs in the program.
The model is certainly performing better than it was before, and I think there are still some bugs to work out. Let me know if you have any suggestions, thanks.
Project 2: EHR-based Intubation Predictor
Hi,
Thank you for your guidance– your suggestions were much needed and have allowed us to make significant progress.
We are now using the 40,000 exams from the “COVID-like” dataset as our training and validation sets. We are using the 3,000 exams from the COVID dataset as our test set.
Specifically, we are splitting the “COVID-like” dataset such that 70% of exams are in the training set and the remaining 30% are in the validation set. We are planning on using 10-fold cross validation on the training set in order to choose the best hyperparameters. Once we have those, we plan on training the model on the full training set with early stopping in order to produce our final model.
We are training both logistic regression models and random forest models. As always, let us know if you have any feedback or questions, thanks!
In the following quiz, you will answer questions that examine the issues of Team 1, as well as conceptual questions regarding the approach of Team 2.
In [ ]:
image1
image2
image3
image4
Phase III Model Training (Data File Phase 3 HTML)
Click on the Phase 3 Model Training Scenario file (copy from HTML data) to read the scenario. After reading through the case, please review the following questions.
Select your choice by using Bold size 14, Explain your choice within a few short sentences. I have indicated my first choice in bold size 12.
Keep the HTML scenario file open so that it is easier for you to look for the information questioned in the quiz/exercise.
Q 1
Part 1.
What learning phenomena is the team observing now?
Convergence
Generalization
Overfitting
Underfitting
Part 2. (a)
What are some techniques that can be applied in order to improve generalization performance?
Check all that apply.
Weight decay (L2 regularization)
Increasing the number of model parameters
Dropout
Stronger data augmentation
Early stopping
Part 2 (b)
Sometimes, overfitting is attributed to the task being “too hard.” Given what we know about model behavior during overfitting, how can this explanation be justified?
Answer this: in 5 sentences or less—————————————————————–
Part 3.
What is weight decay?
An added penalty in the loss function that encourages semantic clustering in feature space
An added penalty in the loss function that mitigates class imbalance
An added penalty in the loss function that ensures the model is well calibrated
An added penalty in the loss function that discourages models from becoming overly complex
Part 4.
What does dropout do?
Dropout randomly removes layers in the network during training in order to improve the rate of convergence
Dropout randomly removes neurons in the network during training in order to prevent overreliance on any one neuron
Dropout randomly removes layers in the network during training in order to prevent overreliance on any one layer
Dropout randomly removes neurons in the network during training in order to improve the rate of convergence
Part 5. Which of the following are tunable hyperparameters?
Check all that apply.
Weight decay strength
Dropout probability
Learning rate
Model weights
Part 6.
The team is noticing counterintuitive results regarding the performance of the model when measured with accuracy and AUROC. What is likely occurring?
NOTE: There are 27,000 COVID-negative exams and 3,000 COVID-positive exams, a breakdown of 90% negative cases and 10% positive cases.
Accuracy is a poor metric for performance because of the small number of samples in the test set
Accuracy is a poor metric for performance because of the high class imbalance
AUROC is a poor metric for performance because they have a predetermined threshold in mind
AUROC is a poor metric for performance because it can only be used in multi-class settings
Part 7.
Further analysis shows that the model is predicting that every patient is COVID-negative. What can be done to mitigate this effect?
Check all that apply.
Using dropout during training to improve performance on the test set
Undersampling COVID-positive exams during training
Upweighting COVID-positive exams loss during training
Lowering the learning rate to improve convergence
Q 2
Part 1.
What is a pro of using k-fold cross-validation instead of a hold-out validation set for hyperparameter tuning?
Improves model convergence rates because many hyperparameters can be tested at the same time
Regularizes the model by randomly selecting training examples automatically
Requires less overall time to train a model, due to the reduced number of training samples
Produces a more reliable estimate of the generalization performance of the model
Part 2.
What is a con of using k-fold cross-validation instead of a hold-out validation set for hyperparameter tuning?
Increases the number of parameters in the overall model, which leads to overfitting
Decreases model generalization performance because the model is able to learn on the test set
Requires more overall time to train a model, due to the repeated training runs associated with each experiment
Increases the overall memory requirements of the model during training, due to the higher number of samples seen during training
Part 3.
What are common criteria used for early stopping?
Check all that apply.
Training AUROC
Test loss
Training loss
Validation loss
Validation AUROC
Test AUROC
Part 4.
Which of the following hyperparameters are exclusive to deep learning models?
Check all that apply.
Number of layers
Dropout probability
Weight decay strength
Learning rate
Class weights (loss function)
Part 5 to 10 short answers in 5 sentences or less
In the sensitivity analysis, you identify that a prediction within 2 hours gives you a much higher AUC and PPV. Does this provide a better model to deploy, why or why not?
Part 6
What considerations must be made when applying
k-fold cross-validation?
Part 7
You recall that you have another EHR dataset composed of patients with varying respiratory illnesses. One promising direction of research could be to use this dataset as the training dataset for this project and using the COVID dataset as the evaluation dataset. What conditions must be met in order for it to be useful?
Part 8
The team is employing random zoom for the data augmentation task. In general, how should data augmentation transforms be selected?
Part 9
A colleague approaches you and suggests that it would be better if you created a model that relied only on observable feature and exam metadata (patient age, gender, ethnicity, etc.). What tradeoffs must be considered when using lab values as features?
Part 10
Before using the new public COVID dataset, you want to verify that there is no PHI in the data. What are some privacy issues that could come into play with imaging data?
image5
image5.wmf
image7.wmf
image1.wmf
image2.wmf
image3.wmf
image4.wmf
Class notes on Machine Learning and AI in a healthcare setting
Artificial intelligence (AI) has transformed industries around the world, and has the potential to radically alter the field of healthcare. Imagine being able to analyze data on patient visits to the clinic, medications prescribed, lab tests, and procedures performed, as well as data outside the health system — such as social media, purchases made using credit cards, census records, Internet search activity logs that contain valuable health information, and you’ll get a sense of how AI could transform patient care and diagnoses.
In this course, we’ll discuss the current and future applications of AI in healthcare with the goal of learning to bring AI technologies into the clinic safely and ethically. Here is a list of the learning objectives for a quick reference.
1) Solving the problems and challenges within the U.S. healthcare system requires a deep understanding of how the system works. Successful solutions and strategies must take into account the realities of the current system.
This course explores the fundamentals of the U.S. healthcare system. It will introduce the principal institutions and participants in healthcare systems, explain what they do, and discuss the interactions between them. The course will cover physician practices, hospitals, pharmaceuticals, and insurance and financing arrangements. We will also discuss the challenges of healthcare cost management, quality of care, and access to care. While the course focuses on the U.S. healthcare system, we will also refer to healthcare systems in other developed countries.
AI in healthcare use case: Natural language processing
When subject matter experts help train AI algorithms to detect and categorize certain data patterns that reflect how language is actually used in their part of the health industry, this natural language processing (NLP) enables the algorithm to isolate meaningful data. This helps decision-makers with the information they need to make informed care or business decisions quickly.
Healthcare payers
For healthcare payers, this NLP capability can take the form of a virtual agent using conversational AI to help connect health plan members with personalized answers at scale.
View the resource
.
Government health and human service professionals
For government health and human service professionals, a case worker can use AI solutions to quickly mine case notes for key concepts and concerns to support an individual’s care.
Clinical operations and data managers
Clinical operations and data managers executing clinical trials can use AI functionality to accelerate searches and validation of medical coding, which can help reduce the cycle time to start, amend, and manage clinical studies.
2) This course introduces you to a framework for successful and ethical medical data mining. We will explore the variety of clinical data collected during the delivery of healthcare. You will learn to construct analysis-ready datasets and apply computational procedures to answer clinical questions. We will also explore issues of fairness and bias that may arise when we leverage healthcare data to make decisions about patient care.Only by training AI to correctly perceive information and make accurate decisions based on the information provided, can you ensure your AI will perform the way it’s intended.
3 Machine learning and artificial intelligence hold the potential to transform healthcare and open up a world of incredible promise. But we will never realize the potential of these technologies unless all stakeholders have basic competencies in both healthcare and machine learning concepts and principles.
This course will introduce the fundamental concepts and principles of machine learning as it applies to medicine and healthcare. We will explore machine learning approaches, medical use cases, metrics unique to healthcare, as well as best practices for designing, building, and evaluating machine learning applications in healthcare. The course will empower those with non-engineering backgrounds in healthcare, health policy, pharmaceutical development, as well as data science with the knowledge to critically evaluate and use these technologies.
4 With artificial intelligence applications proliferating throughout the healthcare system, stakeholders are faced with both opportunities and challenges of these evolving technologies. This course explores the principles of AI deployment in healthcare and the framework used to evaluate downstream effects of AI healthcare solutions.
5 This last course includes a project that takes you on a guided tour exploring all the concepts we have covered in the different classes up till now. We have organized this experience around the journey of a patient who develops some respiratory symptoms and given the concerns around COVID19 seeks care with a primary care provider. We will follow the patient’s journey from the lens of the data that are created at each encounter, which will bring us to a unique de-identified dataset created specially for this specialization. The data set spans EHR as well as image data and using this dataset, we will build models that enable risk-stratification decisions for our patient. We will review how the different choices you make — such as those around feature construction, the data types to use, how the model evaluation is set up and how you handle the patient timeline — affect the care that would be recommended by the model. During this exploration, we will also discuss the regulatory as well as ethical issues that come up as we attempt to use AI to help us make better care decisions for our patient. This course will be a hands-on experience in the day of a medical data miner.
6 How does AI training work?
AI training starts with data. While the actual size of the dataset needed is dependent on the project, all machine learning projects require high-quality, well-annotated data in order to be successful. It’s the old GIGO rule of computer science — garbage in, garbage out. If you train your AI using poor-quality or incorrectly tagged data, you’ll end up with poor-quality AI.
Once the quality assurance phase is complete, the AI training process has three key stages:
1. Training
2. Validation
3. Testing
Keys to successful AI training
You need three ingredients to train AI well: high-quality data, accurate data annotation, and a culture of experimentation.
High-quality data
Bad data skews AI’s judgment and produces undesirable results. It can even create AI that is biased.
Accurate data annotation
Not only do you need to have plenty of high-quality data, but you must also accurately annotate it. Otherwise, your AI will have no contextual guidance to help it properly interpret the data, let alone learn from it. For example, correctly annotated images can help teach AI programs to tell the difference between suspected skin cancer and benign birthmarks.
A culture of experimentation
7 Why is it important to evaluate your machine learning algorithm?
Evaluating your machine learning algorithm is an essential part of any project. Your model may give you satisfying results when evaluated using a metric say accuracy score but may give poor results when evaluated against other metrics such as logarithmic loss or any other such metric.
Choose an evaluation metric depending on your use case. Different metrics work better for different purposes. Selecting the appropriate metrics also allows you to be more confident in your model when presenting your data and findings to others. On the flip side, using the wrong evaluation metric can be detrimental to a machine learning use case. A common example is focusing on accuracy, with an imbalanced dataset.
Healthcare privacy is a central ethical concern involving the use of big data in healthcare, with vast amounts of personal information widely accessible electronically.
Medical records and prescription data are being used, and even sold, for a variety of purposes. As long as it’s de-identified, patients’ permission isn’t needed. However, some medical ethicist argues, “There is a need for sound regulation to guide and oversee this brave new world of algorithmic-based healthcare.”