ML idnetification from Image Data
https://www.biorxiv.org/content/10.1101/2022.09.02.506375v1.abstract
Identification of Pseudomonas aeruginosa strains.
Uses data augmentation and transfer learning to overcome data starvation problem. Also uses DL and CNN.
Motivation: colonies, characterization in sizes, shapes, edges, textures and degrees of opacity and color. Categorization. From morphology -> genome sequencing technology.
Paper: a collection of 69 clinical and environmental P.aeruginosa isolates which present a range of phenotypic features. From these 69 isolates, generate a library of 266 P.aeruginosa images.
D-CNN and data augmentation and transfer learning to classify strains of P.aeruginosa and achieved an accuracy of 94%.
Morphological variation across strains and replicates
Isolates -> LB agar plates with Congo red -> quadruplicate and generate images.

The above figure summarizes some of the metrics (B) we use for colony complexity, especially compression ratio and sobel operator. (both in form of vector)
Fig 4- extent of variation using coefficients of variation (CV = mean / standard deviation) across replicates and strains.
classifying from image data
D-CNN. Use VGG19 DCNN on ImageNet with custom top layers on colony dataset. Apply data augmentation increase 266 images to 38304 images. 74-26 train test split.
First approach, train custom layers with learning rate of . Second approach is train all layers with learning rate of .Metrics: accuracy and loss, and use cross-entropy loss. Second approach produced the best performance across all metrics.
Note: downsampled the image to allow the use of pre-trained imagenet. So downsampling could be very important in this type of transfer learning (or, to make sure the image's dimension matches).
Last updated