New datalake release
CCE
Create a new datasource in a couple of hours
You don't need to be a developer to crawl a website!
Individual process for every data processing step
Easier to maintain and to improve
Speed
Tracking of
data as it flows through the platform
Easy to understand how the data is being transformed
Quicksight!
Decoupled
crawler raw data from normalisation and from cleansing
Optimised gelolocation process
Using temporal tables to run heavy operations
Using dedicated geolocation DB
Avoids using so many tables
Improve image handling
Loading of images happening in same transaction
Watermark handling
No longer using complex sql queries and many tables
Loading and deduping
Improved deduplication logic
Faster client apps
By optimising how the data is loaded for use in DoorFeed.com and Admin portal
Finally different environments
Preproduction
Production
What's next
More datasource in France
5 countries in Europe
Track performance and usage of the platform
Further
Data analysis
on how the new datalake is performing
(vs old platform)
Geolocation improvements
Abnormality
detection and opportunity service
Clean up
Cost analysis
Policies, acess review
Thanks to
Joao
Joao
Michael
Michael
Joao
Michael
Michael
Amber
Joao
Michael
Michael
Amber
Vlad
And to everyone in DoorFeed
New datalake release
By Juan Antonio Vicaria
New datalake release
- 35