Favio Vázquez
Data Scientist
@faviovaz
Webinars
Eric Feuilleaubois
Phd in Artificial Neural Networks
@Deep_In_Depth
Webinars
Deep Learner / Machine Learner
Curator of Deep_In_Depth - news feed on Deep Learning, Machine Learning and Data Science
Writer for Medium - Towards data science
Eric Feuilleaubois
Phd in Artificial Neural Networks
@Deep_In_Depth
Aim: Predict classes (labels) that have not been seen by the source (pre-trained) model
n07714571 - head cabbage
n07714990 - broccoli
n07715103 - cauliflower
n07716358 - zucchini, courgette
n07718472 - cucumber, cuke
n07718747 - artichoke, globe artichoke
n07720875 - bell pepper
n07730033 - cardoon
n07734744 - mushroom
n07742313 - Granny Smith
n07745940 - strawberry
n07747607 - orange
n07749582 - lemon
n07753113 - fig
n07753275 - pineapple, ananas
n07753592 - banana
n07768694 - pomegranate
Pumpkin
Tomato
Watermelon
- Availability of Open CNN Models with Top class performance
- CNNs detect visual features (patterns) in images
- First layers learn "basic" features
- Last layers learn "advanced" features
- Top layers classify
Basic features are very similar from one CNN model to another --> No need to
re-learn them, best to re-use them
Dataset | Train Acc. (%) |
Validation Acc. (%) |
---|---|---|
300 - 7% | 95 | 88 |
600 - 14% | 96 | 89 |
900 - 21% | 97 | 90 |
3917 - 95% | 97 | 91 |
300 - with Data Aug. | 98 | 88 |
600- with Data Aug. | 98 | 90 |
900- with Data Aug. | 98 | 90 |
300 - 600 - 900
WaterMelon
Pumpkin
300 - 600 - 900
WaterMelon
300 - 600
Training Dataset
Pumpkin
WaterMelon
Tomato
300 - 600 - 900
300 - 600 - 900
300 - 600 - 900
Two approaches:
1) Fine tune the convolutional part of the CNN
2) Fine tune the classification part of the CNN
Dataset | Train Acc. (%) |
Validation Acc. (%) |
---|---|---|
300 - Simpler | 99 | 88 |
300 - 512 - 64 | 99 | 86 |
300 - MSimpler-no BN | 98 | 84 |
300 - Simpler VGG 10% trainable |
99 | 89 |
3917 - 95% - Simpler | 98 | 93.5 |
95% - MSimpler-no BN | 40 | 35 |
95% - MSimpler- BN | 97 | 92 |
And drawbacks:
Favio Vázquez
Data Scientist
@faviovaz
https://github.com/faviovazquez
https://www.linkedin.com/in/faviovazquez/
Webinars
Outline
Favio Vázquez
Favio Vázquez
Favio Vázquez
2004 – Google
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat research.google.com/archive/mapreduce.html
2006 – Apache
Hadoop, originating from the Nutch Project
Doug Cutting research.yahoo.com/files/cutting.pdf
2008 – Yahoo
web scale search indexing
Hadoop Summit, HUG, etc. developer.yahoo.com/hadoop/
2009 – Amazon AWS
Elastic MapReduce
Hadoop modified for EC2/S3, plus support for Hive, Pig, Cascading, etc. aws.amazon.com/elasticmapreduce/
Favio Vázquez
Is a fast and general engine for large-scale data processing.
Favio Vázquez
Favio Vázquez
Transformations
Actions
Caché
Tiped
Scala & Java
RDD Benefits
Dataset[Row]
Optimized
Versatile
Favio Vázquez
Deep Learning Pipelines is an open source library created by Databricks that provides high-level APIs for scalable deep learning in Python with Apache Spark.
Favio Vázquez
Favio Vázquez
We will learn:
Favio Vázquez
Favio Vázquez
Take my article on Detecting Breast Cancer with Deep Learning, and using DLS solve it by yourselves!
Create a post or blog, and the top 10 will win $50 Amazon gift card each
DEMO
Favio Vázquez