Answer:
Answer:
"Big Data" is a term for data sets or flows that are so large or complex that traditional "handling" are inadequate
"handling" ?!
Data
Capture
Curate
Analysis
Search
Transfer
Querying
and so much more
Data is "Big Data" if you need to hire a team of smart engineers to just to handle it (distribute...)
And these are just I know of...
Intuition ?
Premenition ?
Experience ?
We have non of those
Sit with experienced people, developers, architects
Listen
Do homework
Research
Understand your expectations & limitations
Listen to Amir
Understand your "Expectations"
Understand your "Limitations"
What do you need to support for the next year (or 2)
Capture "rate"
SLA
SLA
SLA
SLA
Curation "period"
Processing time
Do we have the required knowledge ?
Dev Support
DevOps support
Ops support
+
Troubleshooting
SECURITY
SECURITY
Understand principals in Scalability & Big-Data
There are allot of good options
Choices we make now might (and should) be invalidated in the future.
Why ?
Also...
Over
+
Is the devil we know ;)
S3 became the standard "de facto" for scaling data curation, it is cheap, high availability, easy to use, and has extension in many processing Frameworks
Spark Over EMR is currently one of the best contenders as a "Big Data Processing FW" - it continues to remain so due to a large community of users and feature developers - relentlesly making it better
High security requirements - in all aspects.
AWS maintain security standards and has a built-in encryption and key management solution we're currently researching into.
Perhaps it's biggest advantage over other tools is
Reduce the requirement of devop as "scaling" is handled internally
Server
DB
Utils
Panaya Server
Panaya DB
Server
DB
Lambda
Kinesis
Firehose
S3
encrypted
Supervised by
IAM + KMS
API Gateway
Lambda
Kinesis
Firehose
S3
encrypted
API Gateway
"front door" for applications to access data, BL, functionality in the BackEnd
event-driven function, code run in response to events from API Gateway
auto-magically buffers, than "dump" to S3 (every MB / seconds)
It's not a file-storage, It's a Key-Value storage
This is a requirement - "Key Per Customer"
IAM + KMS
Identity / Auth Management including Encrypt/Decrypt Key Management
Lambda
Kinesis
Firehose
S3
encrypted
API Gateway
"Single Point Of entrance" - will allow us not to bind code of "monitor" to AWS (by SDK). Good practice to control traffic and "Versioning"
Handling of incoming data for uses cases such as "License Validation" && / || "BlackList", as well as JSON validity and more.
As S3 is a by Key-Value storage (and not an FS) - there's no support for ops like "Append", so to generate a large file, a buffer is required
It's a Key-Value storage - sensitive data should be encrypted
This is a requirement - "Key Per Customer"
IAM + KMS
Identity / Auth Management including Encrypt/Decrypt Key Management
Plus/Minus :)
S3
encrypted
Over
"Timed Batch"
We have will have 2 types
"On Demand Batch"
Prepare data for "On Demand" Batch-Processing
This is the thing we were waiting got - Run the Main Algorithm
Schedualer
S3
encrypted
?
Consider
Scenarios still contain "sensitive" data
Server
API Gateway
DB
Lambda
Kinesis
Firehose
S3
encrypted
Supervised by
IAM + KMS
S3
encrypted
Over
"Timed Batch"
We have will have 2 types
"On Demand Batch"
Prepare data for "On Demand" Batch-Processing
This is the thing we were waiting got - Run the Main Algorithm