How to make an app

fast and resilient...
while querying a petabyte of data

About me

Karim Pinchon

  •    Backend developer
  •    @kpn13
  •    https://blog.karimpinchon.com
  •    https://slides.com/kpn13

What are we going to talk about?

Agenda

  1. Context
  2. Data
  3. Code
  4. Conclusion

Context

Context

B2B

SAAS

Market measurement

Context

Industries

Ecom

Food

Ride

FMCG

BNPL

Context

Some clients

Context

Context

Context

Context

Context

Context

Context

50+ countries

Context

7M+ shoppers

Context

1.5B orders

Context

1PB of data

Data

How do we process the data?

Raw data

1 - Data acquisition

2 - Data structuring

3 - Data enrichment

4 - Flat tables

Data acquisition

Several sources

Parsers

to

extract data

Data structuring

Scripts to transform

and structure data

Data enrichment

  1. fix bias
  2. add data
  3. ...

Flat tables

Split by industries and countries

Why this strategy ?

Flat tables

  • very different data structures
  • data volum discrepancy
  • limited number
  • business compatible

Code

What architecture?

What                 architecture?

boring

Technology

  • Apache
  • PHP
  • Mysql
  • Redis
  • BigQuery
  • SQS

Technology

  • Monolith
  • Vanilla PHP
  • Vanilla JS
  • VueJS

What can we do with that to process so much data?

The big three

The big three

Caching

Why caching is important?

Caching

Performances

Cost savings

Caching

Performances

About 15x faster

Caching

Cost savings

BigQuery is sooo expensive!

Solution?

No call no money

Always set SQL limits!

Important note

Caching

Resilience

What if BigQuery down or slow?

 

No problem... Almost

Bonus

Caching

Ok but how?

 

By using a trendy technology?

Caching

Well-establish technology and simplicity

+ handmade code

Caching

<?php

$fingerprint = compute($method, $url, $parameters, ...);
$responseCached = (new \Redis())->get($fingerprint);

if ($responseCached !== null) {
	return $responseCached;
}

// else execute request
 
  

Caching

Benefits

Drawbacks

  • simple
  • flexible
  • fine grained management
  • it just works
  • maintenance
  • less reliable?
  • slower
  • ressources (RAM)

Caching

Importante notes

Duration

Invalidation

Testing

Asynchronous tasks

Asynchronous task

Why async is important?

Asynchronous task

Better UX

Resilience

Scalable

Asynchronous task

Message broker

+

Workers

+

PHP

Asynchronous task

<?php 

use Aws\Sns\SnsClient;

$params = [
  'Message' => $body,
  'region' => $region,
  'TopicArn' => $topicArn,
];

(new SnsClient())->publish($params);

Asynchronous task

Benefits

Drawbacks

  • scalability
  • resilience
  • non-blocking
  • more complexe
  • delays
  • inconsistencies
  • testability

Short Polling

Short Polling

  1. The client sends an HTTP request to the server.
  2. The server processes the request and responds (in progress / done)
  3. The client waits for a set delay before sending a new request.
  4. This cycle repeats indefinitely until achievement.

Short polling

Short polling

Different analysis

Different steps

  • each analysis steps are configurable using a JSON file
  • could be 2 steps scenario, 3 steps, 4 steps, etc

Short Polling

Benefits

Drawbacks

  • easy to implement
  • resilience
  • server load
  • latency

Use SSE or websockets ?

Benefits

Drawback

  • save resources
  • less lentency
  • more complexe

Not necessary for us!

Conclusion

Conclusion

  • Build a smart read model
  • Defer processing as much as possible
  • Cache what you can
  • Use technologies you really need

Thank you

Comment avons-nous rendu notre application rapide et résiliante tout en exécutant des requêtes sur des TB de data !

By Karim PINCHON

Comment avons-nous rendu notre application rapide et résiliante tout en exécutant des requêtes sur des TB de data !

  • 51