But first - a Dad
It's a word in Thai
No body knows what it means
Is it a simple process ?
Relatively yes...
Good luck finding...
250 million lines of vanilla code !!!
did you read my previous bullet ?
did you read my previous bullet ?
Divide and Conquer - Large Scale !
Divide into small "computational" units
Each "unit" is sent to "Computational Engine"
Coordinate
Coordinate
Thread
Process
Machine
Divide and Conquer - Large Scale !
Coordinate
In Cloud - Over EC2
We have our own Framework for Elastic Cloud Computing
Written ~ 8 years ago
Today - you have
Over
Continuous Improvment
Value to Customer
Trust
Transparancy
Priority
TEAM WORK
Visit
By Uri Nativ of Klarna
Front-end ? Big-Data ?
Answer:
Answer:
"Big Data" is a term for data sets or flows that are so large or complex that traditional "handling" are inadequate
"handling" ?!
Data
Capture
Curate
Analysis
Search
Transfer
Querying
and so much more
Data is "Big Data" if you need to hire a team of smart engineers to just to handle it (distribute...)
And these are just I know of...
Intuition ?
Premenition ?
Experience ?
We have non of those
Sit with experienced people, developers, architects
Listen
Do homework
Research
Understand your expectations & limitations
Listen to Amir
Understand your "Expectations"
Understand your "Limitations"
What do you need to support for the next year (or 2)
Capture "rate"
SLA
SLA
SLA
SLA
Curation "period"
Processing time
Do we have the required knowledge ?
Dev Support
DevOps support
Ops support
+
Troubleshooting
Understand principals in Scalability & Big-Data
There are allot of good options
Choices we make now might (and should) be invalidated in the future.
SO - you can't plan for everything - but you can try
Why ?
Also...
Over
+
Is the devil we know ;)
S3 became the standard "de facto" for scaling data curation, it is cheap, high availability, easy to use, and has extension in many processing Frameworks
Spark Over EMR is currently one of the best contenders as a "Big Data Processing FW" - it continues to remain so due to a large community of users and feature developers - relentlesly making it better
High security requirements - in all aspects.
AWS maintain security standards and has a built-in encryption and key management solution we're currently researching into.
But perhaps it's biggest advantage over other tools is
Reduce the requirement of devop as "scaling" is handled internally
Server
DB
Utils
Panaya Server
Panaya DB
Server
DB
Lambda
Kinesis
Firehose
S3
encrypted
Supervised by
IAM + KMS
API Gateway
Lambda
Kinesis
Firehose
S3
encrypted
API Gateway
"front door" for applications to access data, BL, functionality in the BackEnd
event-driven function, code run in response to events from API Gateway
auto-magically buffers, than "dump" to S3 (every MB / seconds)
It's not a file-storage, It's a Key-Value storage
This is a requirement - "Key Per Customer"
IAM + KMS
Identity / Auth Management including Encrypt/Decrypt Key Management
Lambda
Kinesis
Firehose
S3
encrypted
API Gateway
"Single Point Of entrance" - will allow us not to bind code of "monitor" to AWS (by SDK). Good practice to control traffic and "Versioning"
Handling of incoming data for uses cases such as "License Validation" && / || "BlackList", as well as JSON validity and more.
As S3 is a by Key-Value storage (and not an FS) - there's no support for ops like "Append", so to generate a large file, a buffer is required
It's a Key-Value storage - sensitive data should be encrypted
This is a requirement - "Key Per Customer"
IAM + KMS
Identity / Auth Management including Encrypt/Decrypt Key Management
S3
encrypted
Over
"Timed Batch"
We have will have 2 types
"On Demand Batch"
Prepare data for "On Demand" Batch-Processing
This is the thing we were waiting got - Run the Main Algorithm
S3
encrypted
?
Consider
Scenarios still contain "sensitive" data