Medical Imaging

In this article, we will be looking at what is medical imaging, the different applications and use-cases of medical imaging, how artificial intelligence and deep learning is aiding the healthcare industry towards early and more accurate diagnosis. We will review literature about how machine learning is being applied in different spheres of medical imaging and in the end implement a binary classifier to diagnose diabetic retinopathy.

What is Medical Imaging ?

Medical imaging consists of set of processes or techniques to create visual representations of the interior parts of the body such as organs or tissues for clinical purposes to monitor health, diagnose and treat diseases and injuries. Moreover, it also helps in creating database of anatomy and physiology.

Owing to the advancements in the field today medical imaging has the ability to achieve information of human body for many useful clinical applications. Different types of medical imaging technology gives different information about the area of the body to be studied or medically treated.  

Organisations incorporating the medical imaging devices include freestanding radiology and pathology facilities as well as clinics and hospitals. Major manufacturers of these medical imaging devices include Fujifilm, GE, Siemens Healthineers, Philips, Toshiba, Hitachi and Samsung. With the advancement and increase in the use of medical imaging, the global market for these manufactured devices for medical imaging is estimated to generate around $48.6 billion by 2025 which was estimated to be $34 billion in 2018(click here).

Why is it so important?

The use of medical imaging for diagnostic services is regarded as a significant confirmation of assessment and documentation of many diseases and ailments. High quality imaging improves medical decision making and can reduce unnecessary medical procedures. For example, surgical interventions can be avoided if medical imaging technology like ultrasound and MRI are available.

Earlier diagnosis included exploratory procedures to figure out issues of ageing person, children with chronic pain, detection of early diabetes and cancer. With the advent of medical imaging the vital information of health can be made available from time to time easily which can help diagnose illnesses like pneumonia, cancer, internal bleeding, brain injuries, and many more.

A study by National Bureau of Economics Research shows increment in human life expectancy with incremental use of medical imaging. Therefore, a basic inference can be made that diagnosis and treatment via medical imaging can avoid invasive and life-threatening procedures. Therefore, minimising the risk caused by these procedures and also help in reducing the cost incurred and time taken by those procedures. A study done by Harvard researchers concluded that $385 spent on medical imaging saves approximately $3000 i.e. a hospital day stay.

Have a Deep Learning problem in mind? Want to apply Object Detection in your projects? Head over to Nanonets and build models for free!

Moreover, breast cancer diagnostics through medical imaging has helped the medical professionals to prescribe medications which has reduced the breast cancer mortality by 22% to 34% (click here). Apart from that, the early medication to stop blood clotting has resulted in 20% reduction in the death rates owing to colon cancer (click here). Therefore, early detection via effective medical imaging has empowered both the doctors with the opportunity to diagnose ailments early and the patients with the opportunity to fight to live longer.

Medical imaging is an ever-changing technology. With the advancement in the field of computer vision the medical imaging is improving day by day. This means that the benefits of it will keep on improving in coming time as more and more computer vision researchers and medical professionals are coming together for the advancement of medical imaging

Who does it and for whom?

Doctors perform medical imaging to determine the status of the organ and what treatments would be required for the recovery. The choice of imaging depends on the body being examined and the health concern of the patient. Therefore, patients are tested before if their body reacts affirmatively to the radiation used for medical imaging and making sure least possible amount of radiation is used for the process. Moreover, proper shielding is done to avoid other body parts from getting affected.

The end users of medical imaging are patients, doctors and computer vision researchers as explained below:

  • Doctors use it for the organ study and suggest required treatment schedules and also keep the visual data in their library for future reference in other medical cases too.
  • Patients are the end users of treatments received owing the conclusion derived from the images captured.
  • Computer vision researchers along with doctors can label the image dataset as the severity of the medical condition and type of condition post which the using traditional image processing or modern deep learning based approaches underlying patterns can be captured have a high potential to speed-up the inference process from medical images.

Medical Image Analysis

Different kinds and their corresponding approaches

Medical imaging is a part of biological imaging and incorporates radiology which includes following technologies:

Radiography : One of the first imaging technique used in modern medicine. It uses wide beam of X-rays to view non-uniformly composed material. These images help in assessment of the presence or absence of disease, damage or foreign object.

Two forms of radiographic images are used in medical imaging which are:

  1. Fluoroscopy : Produces real-time internal body part images but requires constant input of lower dose rate of X-rays. Thus, mainly used in image-guiding procedures where continuous feedback is demanded by the procedure.
  2. Projectional Radiography : More commonly used form of X-rays to determine the type and extent of the fracture and pathological changes in the lung. They are used to visualise internal areas around stomach and intestine and therefore, can help in diagnosing ulcers and certain type of colon cancer.

MRI – Magnetic Resonance Imaging : MRI scanner uses powerful magnets thereby emitting radio frequency pulse at the resonant frequency pulse of the hydrogen atoms to polarise and excite hydrogen nuclei of water molecules in human tissue. MRI doesn’t involve X-rays nor ionising radiation. MRI is widely used in hospitals and seen as a better choice than a CT scan since MRI helps in medical diagnosis without exposing body to radiation. MRI scans take longer time and are louder. Moreover, people with medical implants or non-removable metal inside body can’t undergo MRI scan safely.

Ultrasound : Ultrasound uses high frequency broadband MH range sound waves that are reflected by tissue to varying degrees to produce sort of 3D images. It is most commonly associated with foetus imaging in a pregnant woman. Ultrasound is also used for the imaging of abdominal organs, heart, breast, muscles, tendons, arteries and veins. It provides less anatomical detail relative to CT or MRI scans. Major advantage is ultrasound imaging helps to study the function of moving structures in real-time without emitting any ionising radiation. Very safe to use, can be quickly performed without any adverse effects and relatively inexpensive.

Endoscopy : Endoscopy uses an endoscope which is inserted directly into the organ to examine the hollow organ or cavity of the body. The type of endoscope differs depending upon the site to be examined in the body and can be performed by a doctor or a surgeon. Endoscopy is used to examine gastrointestinal tract, respiratory tract, ear, urinary tract, etc. Main risks involved with this procedure are infection, over-sedation, perforation, tear lining and bleeding.

Thermography : Thermographic cameras detect long infrared radiations emitted by the body which create thermal images based on the radiations received. The amount of radiation increases with increase in temperature. Therefore, thermography helps in checking variations in temperature. It is capable of capturing moving objects in real time. Thermographic cameras are quite expensive. Images of the objects having varying temperatures might not result into accurate thermal imaging of itself.

Nuclear Medicine Imaging : This type of medical imaging is done by taking radio-pharmaceuticals internally. Then, external gamma detectors capture and form images of the radiations which are emitted by the radio-pharmaceuticals. This is opposite of X-rays where radiations are through the body from outside but in this case the gamma rays are emitted from inside the body. However, the radiation dosage ar small still there’s a potential risk.

Tomography : Single photon emission computed tomography (SPECT) also known as tomography uses gamma rays for medical imaging. The gamma emitting radioisotope is injected in the bloodstream. SPECT is used for any gamma imaging study which is helpful in treatment specially for tumors, leukocytes, thyroids and bones.

We have discussed the important ones above but there are many more medical imaging techniques helping and providing solutions during various medical cases. Techniques such as electroencephalogy(EEG), magnetoencephalogy(MEG), electrocardiography(ECG) which produce data in form of graph with respect to time contain important information of the human body part but can’t be considered as a part of medical imaging directly.

Deep Learning for Medical Imaging

Why Deep Learning over traditional approaches

Healthcare industry is a high priority sector where majority of the interpretations of medical data are done by medical experts. Interpretation of medical images is quite limited to specific experts owing to its complexity, variety of parameters and most important core knowledge of the subject.

As mentioned in the above section about different medical imaging techniques, the advancement of image acquisition devices have reduced the challenge of data collection with time. Therefore, we are in an age where there has been rapid growth in medical image acquisition as well as running challenging and interesting analysis on them. With increase in data the burden in medical experts examining that data increases. Therefore, the probability of human error might increase.

Moreover, traditional machine learning can’t comprehend the complexity of such healthcare oriented problem statements owing to the complexity and importance of the subject.

  • Best we had till date, was traditional machine learning applications in computer vision which relied heavily on features crafted by medical experts who are the subject matter people of the concerned field.
  • This is a labour intensive process, as data varies from patient to patient and data comprehension varies with the experience of the medical expert too. Therefore, traditional learning methods were not reliable.
  • These earlier machine learning algorithms of Logistic Regression, Support Vector Machines(SVMs), K-Nearest Neighbours(KNNs), Decision Trees etc. used to take raw image data into account without any learning of hidden representations.
  • Moreover, the preprocessing was based on the knowledge provided by the medical expert which was very time consuming.
  • As mentioned above, image acquisition devices like X-Ray, CT and MRI scans etc. have improved over time and can fetch internal images of high resolution. But automated image interpretation is a tough ordeal to achieve.

On the other hand, deep learning in computer vision has shown great progress in capturing hidden representations and extract features from them. These feature extraction improve with better data and supervision so much that they can help diagnose a physician efficiently. The deep learning techniques are composed of algorithms like Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM) Networks, Generative Adversarial Networks (GANs) etc which don’t require manual preprocessing on raw data.

[Literature Review] Deep Learning for Healthcare

Main purpose of image diagnosis is to identify abnormalities. Deep learning uses efficient method to do the diagnosis in state of the art manner. Applications of deep learning in healthcare industry provide solutions to variety of problems ranging from disease diagnostics to suggestions for personalised treatment.

Various methods of radiological imaging have generated good amount of data but we are still short of valuable useful data at the disposal to be incorporated by deep learning model. Let’s discuss some of the medical imaging breakthroughs achieved using deep learning:

Diabetic Retinopathy  

There are two types of disorders owing to diabetes. Diabetes Mellitus being the metabolic disorder where Type-1 being the case in which pancreas can’t produce insulin and Type-2 in which the body don’t respond to the insulin, both of which lead to high blood sugar. Diabetic Retinopathy is an eye disorder owing to diabetes resulting in permanent blindness with the severity of the diabetic stage.

According to World Health Organisation(WHO)

  • The number of people suffering from diabetes have increased from 108 millions in 1980 to 422 millions in 2014.
  • The disease is increasing in low and medium income countries.
  • Diabetes is the major cause of blindness, kidney failure, heart attacks, stroke and lower limb amputation.
  • In 2016, approximately 1.6 million deaths were due to diabetes and this approximation is estimated to rise upto 2.2 million for the year 2022 due to high blood glucose levels.
  • Diabetic retinopathy is an important cause of blindness, and occurs as a result of long-term accumulated damage to the small blood vessels in the retina. 2.6% of global blindness can be attributed to diabetes.

Diabetic retinopathy can be controlled and cured if diagnosed at an early stage by retinal screening test. Manual processes to detect diabetic retinopathy is time consuming owing to equipment unavailability and expertise required for the the test. Issue being the disease doesn’t show any symptoms at early stage owing to which ophthalmologists need a good amount of time to analyse the fundus images which in turn cause delay in treatment.

  • Deep learning based automated detection of diabetic retinopathy has shown promising results. Gulshan et al. research where they applied Deep Convolutional Neural Network (DCNN) on the following two datasets for the classification between moderate and worse Referable Diabetic Retinopathy (RDR): Eye Picture Archive Communication System (EyePACS-1) dataset : 9963 images from 4997 patients and the Messidor-2 dataset : 1748 images from 874 patients. The algorithm devised by Gushan et al. claimed to have achieved 97.5% sensitivity and 93.4% specificity on EyePACS-1 data, and 96.1% sensitivity and 93.9% specificity on Messidor-1. The specific neural network used in this work is the Inception-v3 architecture proposed by Szegedy et al.
Inception-v3 architecture proposed by Szegedy et al.
  • Google AI team worked closely with doctors both in India and US to create a dataset of 128,000. Each of which were evaluated by 3-7 ophthalmologists from a panel of 54 ophthalmologists. This dataset was used to train a deep neural network to detect RDR. Algorithm’s performance was further tested on two separate validation sets (totalling approx. 12000 images), with the decision of a panel of 7 or 8 U.S. board certified ophthalmologists serving as reference standard. The performance was mind blowing and matched that of the ophthalmologists. On validation set the algorithm performed with F1-score of 0.95, better than the median F1-score of the 8 ophthalmologists (measured at 0.91) whom were consulted for the research.
Examples of retinal fundus photographs that are taken to screen for DR. The image on the left is of a healthy retina (A), whereas the image on the right is a retina with referable diabetic retinopathy (B) due a number of haemorrhages (red spots) present.
  • Kathirvel et al trained a DCNN with dropouts on publicly available Kaggle, DRIVE and STARE dataset to classify affected and healthy fundus which reported accuracy of 94%. Kaggle dataset include clinician labelled image across 5 classes namely : No DR, Mild, Moderate, Severe and Proliferative DR.
DCNN Architecture used by Kathirvel et al

With the advancements in the methods of automated diabetic retinopathy screening methods with high metrics pose a strong potential to assist doctors in evaluating more patients and speed up the diagnostic process which in turn can reduce the time gap for treatments. Google is trying hard to work with doctors and researchers to streamline the screening process across the world with hope that these methods can benefit maximally to both patients as well as doctors. Moreover working with the FDA and other regulatory agencies to further evaluate these technologies in clinical studies to make this as a standard part of the procedure.

Histological and Microscopial Elements Detection

Histological analysis is the study of cell, group of cells and tissues. Microscopic imaging technology and stains are used to detect the microscopic changes occurring at cellular and tissue level. It involves steps which include fixation, sectioning, staining and optical microscopic imaging.

Microscopial imaging is used for diseases like squamus cell carcinoma, melanoma, gastric carcinoma, gastric ephithilial metaplasia, breast carcinoma, malaria, intestinal parasites, etc. Genus plasmodium parasite are the main cause of malaria and microscopial imaging is the standard method for parasite detection in blood smear samples. Mycobacteria in sputum is the main cause of Tuberculosis. Smear microscopy and fluroscent auramine-rhodamin stain or Ziehl-Neelsen stain are standard methods for Tuberculosis diagnosis.

  • In 2016, Department of Computer Science of University of Warwick opened the CRCHistoPhenotypes – Labeled Cell Nuclei Data. The data includes 100 H&E stained histology images of colorectal adenocarcinomas. Out of which a total of 29,756 nuclei were labelled for detection and 24,444 nuclei among them had class labels associated with them. These class labels were epithelial, inflammatory, fibroblast and miscellaneous.
  • Sirinukunwattana et al shows the research where they used the CRCHistoPhenotypes datasets to train a deep learning model for detection and classification of colon cancer. The research went into the favour of the proposed spatially-constrained CNN for nucleus detection and the softmax CNN with the proposed neighbouring ensemble predictor for nucleus classification. The combination of the two have the potential to benefit the analysis of tissue morphology, and tissue constituents, eventually can become a useful tool for better understanding of the tumor microenvironment.
Example patches of different types of nuclei found in the dataset used by Sirinukunwattana et al, 1st Row – Epithelial nuclei, 2nd Row – Inflammatory nuclei (from left to right, lymphocyte, plasma nucleus, neutrophil, and eosinophil), 3rd Row – Fibroblasts, 4th Row – Miscellaneous nuclei (from left to right, adipocyte, endothelial nucleus, mitotic figure, and necrotic nucleus).
  • Bayramoglu et al aimed at the issue of limited availability of training data. This research checks the possibility of using transfer learning to minimise the hurdle owing to less data and still obtain a good hidden representation with limited data availability.
  • Quinn et al research employed a setup to capture blood smear images, sputum samples images and stool samples images. The experts identified bounding boxes around each object of interest in every image. In thick blood smear images, plasmodium were annotated (7245 objects in 1182 images); in sputum samples, tuberculosis bacilli were annotated (3734 objects in 928 images), and in stool samples, the eggs of hookworm, Taenia and Hymenolepsis nana were annotated (162 objects in 1217 images). Automatic microscopic diagnostic analysis was done by training Deep Convolutional Neural Networks (DCNNs) separately for each which resulted in high AUC of 1.00 for plasmodium images to detect Malaria, 0.99 for tuberculosis bacilli and 0.99 for hookworm detection. In this work, DCNN used comprised of four hidden layers:
  1. Convolution layer: 7 filters of size 3 × 3.
  2. Pooling layer: max-pooling, factor 2.
  3. Convolution layer: 12 filters of size 2 × 2.
  4. Fully connected layer, with 500 hidden units.
ROC and precision-recall for malaria, tuberculosis and intestinal parasites detection, showing Area Under Curve (AUC) and Average Precision (AP)

Malaria detection is highly crucial and important. According to 2018 reports by World Health Organisation(WHO), in 2018, an estimated 228 million cases of malaria occurred worldwide out of which there were an estimated 405,000 deaths from malaria globally. Children aged under 5 years are the most vulnerable group affected by malaria. In 2018, they accounted for 67% (272,000) of all malaria deaths worldwide.

Have an OCR problem in mind? Want to digitize invoices, PDFs or number plates? Head over to Nanonets and build OCR models for free!

Gastrointestinal Diseases Detection

Gastroinstestinal tract consists of all the organs which are involved in digestion of food and nutrient absorption from them starting from mouth to anus. The organs included are oesophagus, stomach, duodendum, large intestine(colon) and small intestine(small bowel). Oesophagus, stomach and duodendum constitute the upper gastrointestinal tract while large and small intestine form the lower gastrointestinal tract.

The digestion and absorption gets affected by the disorders like inflammation, bleeding, infections and cancer in the gastrointestinal tract. Ulcers cause bleeding in the upper gastrointestinal tract. Polyps, cancer or diverticulitis cause bleeding from large intestine. Celiac, Crohn, tumors, ulcers and bleeding owing to abnormal blood vessels are the issues concerned with small intestine.

Current imaging technologies play vital role in diagnosing these disorders concerned with the gastrointestinal tract which include endoscopy, enteroscopy, wireless capsule endoscopy, tomography and MRI.

  • Jia et al trained a DCNN to detect bleeding in wireless capsule endoscopy(WCE) images. The domain of computer aided diagnosis for gastrointestinal bleeding detection is an active area of research. Traditional methods were based on handcrafted features which are insufficient for blood detection with higher accuracy owing to very less capture of features. Therefore, deep convolutional neural network was applied on an expanded dataset of 10,000 WCE images. With the resultant F1-score of 0.9955 the DCNN based method outperformed the state-of-the-art approaches in WCE bleeding detection.
Examples of WCE images in our dataset. (a) A normal WCE image. (b) An active bleeding WCE image (c) An inactive bleeding WCE image.
DCNN architecture used by Jia et al to detect bleeding in wireless capsule endoscopy
  • Panpeng et al employs DCNN to detect intestinal haemorrhage in WCE images. WCE can painlessly capture a large number of internal intestinal images. However, only a small portion of these WCE images contain haemorrhage. Therefore, in order to to perform automated detection of intestinal haemorrhage data augmentation methods were employed to create a balanced dataset. As a result, the DCNN model achieved an F1-score of 0.9887.
  • Leenhardt et al employed DCNN to detect gastrointestinal angiectasia(GIA) in small bowel capsule endoscopy(SBCE) images of small intestine. GIA is the most common small intestinal vascular lesion with an inherent risk of bleeding. SBCE is the currently accepted procedure. Still frames of SBCE were annotated with typical GIA and normal which were given to a DCNN to do semantic segmentation and classification. The algorithm devised received a sensitivity of 100% and specificity of 96%. The success of this deep learning based approach paved the way for the future development of softwares for automated detection of GIA from SBCE images.
  • Urban et al employed DCNN to localise and detect polyps in screening colonoscopies. Colonoscopy is done for colorectal cancer prevention depends on the adenoma detection rate (ADR). The ADR by colonoscopists vary from 7% to 53%. It is estimated that with 1% increase in ADR risk of interval colorectal cancers decreases by 3%-6%. To develop new methods to increase the ADR a DCNN was trained on diverse and representative set containing 8,641 images from screening colonoscopies of more than 2000 patients labelled by expert colonoscopists. The resulted algorithm gave a AUC of 0.991 and an accuracy of 96.4%. This success has given hope to create an automated diagnostic system to increase the ADR to decrease interval colorectal cancers but requires validation in large multicenter trials.
Examples of dataset used by Urban et al. (Top row) Images containing a polyp with a superimposed bounding box. (Bottom row) Nonpolyp images. Three pictures on the left were taken using NBI and 3 pictures on the right include tools (eg, biopsy forceps, cuff devices, etc) that are commonly used in screening colonoscopy procedures.

Cardiac Imaging

CT and MRI scans are the most widely used technology for cardiac imaging. The uphill task being the manual identification of the coronary artery calcium (CAC) scoring in cardiac CT scans which incorporates a good amount of effort. Therefore, making it to be a time consuming task for epidemiological studies.

  • Litjens et al research paper discusses different deep learning frameworks which can be used for different ways in cardiovascular analysis which has been shown in the diagrammatic representation below.
This flowchart highlights how certain applications can be realized by using a specific algorithm. The arrows indicate for which application an algorithm is typically used. Note that this does not mean that, for example, a fully-connected network cannot be used for segmentation, but it is not the most appropriate choice.

Tumor Detection

Abnormal growth of cells of any body part creating a separate mass of tissue. This is called tumor or neoplasm. Generally, cells in our body undergo a cycle of developing, ageing, dying and finally replaced by new cells. This cycle gets disrupted in case of tumor and other forms of cancer. There are two types of tumor : Benign (non-cancerous) and Malignant (cancerous). Benign tumor is not that dangerous and stick to one part of the body and do not spread to other parts. On the other hand, malignant tumor is extremely harmful spreading to other body parts. Spreading of malignant tumor makes both treatment and prognosis difficult.

  • Wang et al is one of the initial research to detect breast cancer in digital mammography using machine learning. They used 482 mammographic images out of which 246 had tumors. There patients were women from North-East China between the ages of 32 and 74. This image database was built using senographe 2000D full digital breast X-ray camera and confirmed by radiologists at Tumor Hospital of Liao Ning Province. The machine learning approaches involved two models one being the single layered neural network which was named extreme machine learning (ELM) and second being the traditional support vector machine (SVM). The images underwent through series of preprocessing techniques like noise reduction, edge enhancement, edge segmentation which was followed by geometrical and textural feature extraction. These features were fed into the model and trained via cross validation approach. The results obtained had 84 error responses for ELM and 96 error responses for SVM out of 482 images. This research done in 2012 didn’t use CNN based approach but paved the way for the idea to use deep learning to do automated breast cancer detection in digital mammography.
Flowchart of the approach used by Wang et al
  • Shen et al in 2019 presented a research report where they applied DCNN on mammographic images to improve breast cancer detection. The DDSM contains digitised film mammograms in a lossless-JPEG format now being outdated. Therefore an updated version of the database containing the images in standard DICOM format called CBIS-DDSM was used. The dataset used consisted of 2478 mammography images from 1249 women had both craniocaudal (CC) and mediolateral oblique (MLO). Each view was treated as a separate image in this study and was randomly split in the ratio of 85:15 at the patient level to create independent training and test sets. The training data was further split 90:10 to create a validation set. The splits maintained similar proportion of cancer cases in all the training, validation and test sets. The total numbers of images in the training, validation and testing sets were 1903, 199 and 376, respectively. Two patch datasets were created by sampling image patches from ROIs and background regions. All patches had the same size of 224 × 224, which were large enough to cover most of the ROIs annotated. The first dataset (S1) consisted of sets of patches in which one was centred on the ROI and one is a random background patch from the same image. The second dataset (S10) consisted of 10 patches randomly sampled from around each ROI, with a minimum overlapping ratio of 0.9 with the ROI and inclusion of some background, to more completely capture the potentially informative region; and an equal number of background patches from the same image. All patches were classified into one of the five categories: background, malignant mass, benign mass, malignant calcification and benign calcification. The training consisted of two steps. The first step being to train the patch classifier to detect the region of interest(ROI) from the background using a DCNN. The second step included to train an image classifier which would run on the ROI images obtained from first step. The second step classifies the ROI image into one of the five categories which are background, malignant mass, benign mass, malignant calcification and benign calcification. The training for the step two was done using Resnet-50 and VGG-16, confusion matrix of the results of which are shown in the figure below.
Architecture used by Shen et al. Converting a patch classifier to an end-to-end trainable whole image classifier using an all convolutional design. The function f was first trained on patches and then refined on whole images.
Confusion matrix analysis of 5-class patch classification for Resnet50 (a) and VGG16 (b) in the S10 test set. The matrices are normalized so that each row sums to one.

Alzheimer’s and Parkinson’s Detection

Alzheimer’s disease(AD) is brain disorder which is irreversible and slow progresses to destroy memory and thinking skills hampering the ability to carry out simple tasks. Accurate diagnosis of AD plays an important role for patients care particularly in the early phase of the disease.

Parkinson’s disease is a neurological disorder causing progressing decline in motor system due to the disorder of basal ganglia in brain. The symptoms starts with tremors in hand followed by slow movement, stiffness and loss in balance.

  • Sarraf et al employed LeNet-5(a basic CNN model) to detect Alzheimer’s disease in fMRI data. The model was trained on 270900 images and tested on 90300 images in fMRI which finally obtained a mean accuracy of 96.8588%.
LeNet-5 architecture adopted for fMRI data
  • Jha et al employed sparse autoencoders for the detection of Alzheimer’s disease. The technique required less labeled training examples and minimal prior knowledge. In this study, the deep learning model used, consisted of sparse autoencoders, scale conjugate gradient (SCG), stacked autoencoder and a softmax output layer to classify the condition being in prodromal or mild stage. The algorithm devised resulted accuracy of 91.6% with a sensitivity of 98.09% and a specificity of 84.09%.
Structure of an Autoencoder
  • Ortiz et al employed DCNN to detect Parkinson’s disease.  A total 269 DaTSCAN images from PPMI (Parkinson’s Progression Markers Initiative) database were used in the preparation of the article. Out of 269, 158 were suffering from Parkinson’s disease and 111 were normal. The approach undertook was to create isosurfaces from DaTSCAN images as the raw images are too complex for CNN architectures like LeNet-5 and AlexNet. This approach resulted in an average accuracy of 95.1% and AUC of 0.97. This research was able to conclude that computation of isosurfaces helped in better convergence of CNN which in turn resulted in better and faster automated detection parkinson’s disease in DaTSCAN images.
Examples of isosurfaces with threshold = 0.5 for a NC subject (A) and PD patient (B)
Examples of isolines with different threshold for a NC subject (A) and PD patient (B)

Current Scenario and Challenges

Deep learning implementation in medical imaging makes it more disruptive technology in the field of radiology. Medical fields which have shown promises to be revolutionised using deep learning are:

  • Ophthalmology
  • Pathology
  • Cancer diagnosis

Google DeepMind Health and National Health Service, UK have signed an agreement to process the medical data of 1 million patients.

IBM Watson has entered the imaging domain after their successful acquisition of Merge Healthcare

Application of deep learning algorithms to medical imaging is fascinating and disruptive but there are many challenges pulling down the progress. Some of the major challenges are as follows:

Limited Datasets

The first and the major prerequisite to use deep learning is massive amount of training dataset as the quality and evaluation of deep learning based classifier relies heavily on quality and amount of the data. Limited availability of medical imaging data is the biggest challenge for the success of deep learning in medical imaging.

Development of massive training dataset is itself a laborious time consuming task which requires extensive time from medical experts. Therefore, more qualified experts are needed to create quality data at massive scale, especially for rare diseases. Moreover, a balanced dataset is necessary for deep learning algorithms to learn the underground representations appropriately. In healthcare majority of the available dataset is unbalanced leading to class imbalance.

Sharing of medical data is severely complex and difficult compared to other datasets. Data privacy is both sociological as well as a technical issue, which needs to be addressed from both angles.

HIPAA (Health Insurance Portability and Accountability Act of 1996) provides legal rights to patients to protect their medical records, personal and other health related information provided to hospitals, health plans, doctors and other healthcare providers.

Therefore, with the increase in healthcare data anonymity of the patient information is a big challenge for data science researchers because discarding the core personal information make the mapping of the data severely complex but still a data expert hacker can map through combination of data associations.

Differential privacy approaches can be undertaken which restricts the data to organisation on requirement basis. Sharing of sensitive data with limited disclosure is a real challenge. Therefore, it leads to a lot of restrictions. Limited data access owing to restriction reduces the amount of valuable information. Apart from that, the data is increasing day by day adding incremental threat to data security.

Process Standardization (Data and Models)

  1. Data Standards: Standardization of data is the need of the hour for deep learning in any domain especially for healthcare. The reason behind this is that variability of the data increases from hardware to hardware which loses consistency causing variations in the data(here, images) captured. Healthcare is a domain where aggregations of data from different sources are required for improved learning and better accuracy. Health data must be standardized and shared between providers. HIPAA, HL7(Health Level 7), HITECH(Health Information Technology for Economic and Clinical Health Act of 2009) and other health standardization and monitoring bodies have defined some standard maintaining guidelines. Authorized Testing and Certifying Body (ATCB) provides third party opinion on EHR(Electronic Health Records).
  2. Uninterpretable Black Box Model: Deep learning opened new avenues in the domain of medical imaging leading to new opportunities. Deep learning solved the complexity which wasn’t possible with traditional machine learning based approaches. One of the biggest roadblock being the black box model. The maths behind neural networks is crystal clear but weight matrices created with increase in layer depth makes the model uninterpretable.

Need to digitize documents, receipts or invoices but too lazy to code? Head over to Nanonets and build OCR models for free!

[Code] Detecting Diabetic Retinopathy with Deep Learning

Here, in this section we will create a binary classifier to detect diabetic retinopathy symptoms from the retinal fundus images. The data has been taken from the Kaggle Diabetic Retinopathy repository (click here).

Kaggle dataset include 35000 clinician labelled image across 5 classes namely :

  1. No DR
  2. Mild
  3. Moderate
  4. Severe
  5. Proliferative DR

Our objective here is to create a binary classifier to predict no DR or DR and not multi class classifier for 5 given classes. Therefore, we take the No DR data as no symptom class label and Severe as well as Proliferative DR as the as symptom class label.

The training dataset has 5 files out of which train001, train002, train003 and train004 were used for training and train005 data was used for validation. Considering the constraints of the huge dataset and RAM and GPU resources available I tried to devise this basic approach of feasible preprocessing steps and neural network model to create the above suggested binary classifier which includes

Preprocessing included the following steps:

  1. Image read and resizing to 512 x 512 x 3.
  2. Green channel selection resulting the tensor to be of shape 512 x 512 x 1.

Moreover, with just 1500 images of data the RAM(i.e. 12GB) was reaching it’s limit but major problem was GPU(i.e. 12 GB) memory was getting totally exhausted with addition of few convolutional layers. Therefore, I decided to go ahead with the Green channel only along with 1000 training images 500 images of symptoms and 500 non-symptom images along with 105 images in the validation set. Given if memory allocation was more, then image augmentation could’ve been possible with different angular rotations.

Please find the link to this repository.

Let’s get start with the training by first importing the dependencies. In order to refer

import numpy as np
import pandas as pd
import tensorflow as tf
import os
from matplotlib import pyplot as plt
from PIL import Image,ImageFile
from random import shuffle
from scipy.ndimage import rotate
from sklearn.metrics import accuracy_score,f1_score,precision_score,recall_score
from collections import Counter

The data has been downloaded and segregated using the trainLabels.csv. The segregation of the downloaded dataset into symptoms and nosymptoms has been shown separately in diabetic_retinopathy_dataalignment.ipynb notebook.

Further data segregation into two classes namely symptoms and nosymptoms, we read the segregated dataset.

sympdata_path = 'diabetic_retinopathy/project_data/symptoms/'
nonsympdata_path = 'diabetic_retinopathy/project_data/nosymptoms/'
symptoms_files = [sympdata_path+i for i in os.listdir(sympdata_path)]
nosymptoms_files = [nonsympdata_path+i for i in os.listdir(nonsympdata_path)]

Shuffling the orders of the data is highly important to avoid any bias during batch training which has been done in the following code section. As you can see total 1000 training images are only used owing the RAM constraints as well as to create a balanced dataset for training.

train_symptom = symptoms_files[:500]
train_nonsymptom = nosymptoms_files[:500]
train_X = train_symptom + train_nonsymptom
train_y = [1]*len(train_symptom) + [0]*len(train_nonsymptom)
train = list(zip(train_X,train_y))
train_X,train_y = zip(*train)

Mapping the test_labels with the class labels of the validation set with their corresponding labels.

size = 512   #self-defined resized size of the images

testcsv = pd.read_csv('diabetic_retinopathy/project_data/test_labels.csv',header=0)
val_data = {'diabetic_retinopathy/project_data/test/'+row.image+'.jpeg':row.level for i,row in testcsv.iterrows()}
vald = {}
counter = 0
for i in val_data:
    if val_data[i]==0 and counter<50:
    elif val_data[i]==1:

Thus, now we have the dataset containing the file names and their class mappings done. In the following section, we will read the images, resize, select green channel pixels and normalise them.

trainX = np.array([np.expand_dims(np.array(,size)))[:,:,1],3) for i in train_X])
trainX = trainX/255.0
c = Counter(train_y)
Reading, resizing, selecting Green channel and normalising the training set
valX = np.array([np.expand_dims(np.array(,size)))[:,:,1],3) for i in vald])
valX = valX/255.0
val_y = [vald[i] for i in vald]
c = Counter(val_y)
Reading, resizing, selecting Green channel and normalising the validation set

Converting the tuple of labels to numpy array and reshaping them to shape of (n,1) where n being number of samples.

train_y = np.array(train_y)
val_y = np.array(val_y)

train_y = train_y.reshape(-1,1)
val_y = val_y.reshape(-1,1)


Let’s define our basic CNN model which includes the following architecture:

  1. Conv2D(kernel_size=7,strides=1,filters=64,activation=’relu’)
  2. MaxPooling2D(pool_size=3,strides=2)
  3. Conv2D(kernel_size=5,strides=1,filters=64,activation=’relu’)
  4. MaxPooling2D(pool_size=3,strides=2)
  5. Conv2D(kernel_size=5,strides=1,filters=128,activation=’relu’)
  6. MaxPooling2D(pool_size=3,strides=2)
  7. Conv2D(kernel_size=5,strides=1,filters=128,activation=’relu’)
  8. MaxPooling2D(pool_size=4,strides=4)
  9. Flatten
  10. Dense(units=1024,activation=’relu’)
  11. Dense(units=1,activation=’sigmoid’) #binary classifier

The implementation of the above architecture using keras has been shown below in the code section.

def basemodel(modelfilename,trainX,train_y,valX,val_y,epoch=20,batchsize=10):
    callbacks = [tf.keras.callbacks.ModelCheckpoint(filepath=modelfilename,save_best_only=True,monitor='val_loss',verbose=2)]
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(64,kernel_size=7,strides=1,activation='relu', input_shape=(size,size,1)),
    opt = tf.keras.optimizers.Adam(learning_rate=0.00001)
    model.compile(loss = 'binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    history =,
    return model,history

Summary of the above devised model can be seen below with output shape from each component layer of the model.

Plotting of the metrics using matplotlib library has been done in the function plot_metric as shown below.

def plot_metric(modelname,train,val,metric='accuracy'):
    plt.savefig(modelname+'_'+metric+'.png', bbox_inches='tight')

def plot_acc_loss(modelname,history_train_acc,history_train_loss,history_val_acc,history_val_loss):

Considering as per the GPU memory allocated for the task we went with the batch size of 8. You can optimise and tune it better by loading more data, followed by augmentation to increase the symptom dataset provided you have more RAM(if possible use a cloud resource for the task) to read massive dataset.

model,history = basemodel('DRmodel_NEW.h5',trainX,train_y,valX,val_y,epoch=200,batchsize=8)
history_train_acc = history.history['accuracy']
history_train_loss = history.history['loss']
history_val_acc = history.history['val_accuracy']
history_val_loss = history.history['val_loss']

The training epochs shown below is the part where my model was able to reach the validation loss minima.

We can plot the graph using the function we created above to plot the training process. Please find below the accuracy and loss metrics plot below till 45 epochs at which the best validation loss was recorded.


The resultant accuracy and loss plots:

Accuracy Plot : Train vs Validation set w.r.t. number of epochs
Loss Plot : Train vs Validation set w.r.t. number of epochs

I also tried to incorporate transfer learning using InceptionV3 which you can check in the same ipython notebook but the convergence wasn’t proper and overfitting happened after 10 epochs even with change in learning rates. Moreover, owing the hardware resources only 800 images of size 256 x 256 x 3 were used for training. As a result of which convergence of the training was an issue and model overfitted the training data.

Further improvements, that are required to improve the transfer learning model would be:

  1. Get good RAM and GPU configuration
  2. Data augmentation by image rotation
  3. Image preprocessing techniques like histogram equalisation etc. to check if it enhances the accuracy or not
  4. Hyperparameter tuning (if required)

As I have shared the code repository above, you can use this code, try to modify by implementing data augmentation, core image preprocessing steps and custom loss functions for better performance.


Through the article, we learned about what medical imaging is and how important it has become in the current healthcare scenario. We look at the different kinds of medical imaging techniques, how they are performed and what kind of disease diagnosis they help with. We delved deep into several different kinds of diseases and applications of deep learning in the same, reviewing literature across various spheres of the sector. We looked at some regulatory concerns and important research objectives following which, we implemented a CNN model for binary classification of fundus images for the detection of diabetic retinopathy.