New datalake release

CCE

Create a new datasource in a couple of hours

You don't need to be a developer to crawl a website!

Individual process for every data processing step

Easier to maintain and to improve

Speed

Tracking of
data as it flows through the platform

Easy to understand how the data is being transformed

Quicksight!

Decoupled
crawler raw data from normalisation and from cleansing

Optimised gelolocation process

Using temporal tables to run heavy operations
 

Using dedicated geolocation DB

Avoids using so many tables

Improve image handling

Loading of images happening in same transaction

Watermark handling

 

No longer using complex sql queries and many tables

Loading and deduping

Improved deduplication logic

Faster client apps

By optimising how the data is loaded for use in DoorFeed.com and Admin portal

Finally different environments

Preproduction
Production

What's next

More datasource in France

5 countries in Europe

Track performance and usage of the platform

Further
Data analysis
on how the new datalake is performing
(vs old platform)

Geolocation improvements

Abnormality
detection and opportunity service

Clean up

Cost analysis

Policies, acess review

Thanks to

Joao

Joao

Michael

Michael

Joao

Michael

Michael

Amber

Joao

Michael

Michael

Amber

Vlad

And to everyone in DoorFeed

New datalake release

By Juan Antonio Vicaria

New datalake release

  • 35