Distributed Systems
&
Big Data
lxsameer
CTO at
Free software contributor
This is a very basic talk
I skipped a huge amount of details.
Agenda
- Distributed systems
- Components of a distributed system
- Attributes of a distributed system
- Big data principals & archs
What is
"Distributed Software"?
Distributed System is a model in which components located on networked computers communicate and coordinate their actions by passing messages.
A program that runs in a distributed system is called distributed software
Think of it as a gigantic computer.
- You can run heavy processes.
- You can store huge amount of data in it.
- You're going to have a great amount of memory to use.
A distributed system schema
Component examples
- Coordinator
- Scheduler
- Web apps
- Data storage
- Workers
- Load balancer
- Resource allocator
- Message brokers
- ....
Stateful
Components of a stateful distributed system, track the state of the whole system or each other.
Stateless
Components of a stateless distributed system don't have any clue about the state of other components
Being stateless is easier but being stateful is more flexible
Distributed systems guarantees
- Fault tolerance
- Consistancy
- Partition tolerance
- Availability
- ....
NOTE: CAP therom states that it is impossible for a distributed computer system to simultaneously provide Availability, Consistancy and Partition tolerance
What do you mean by
"Big data" ?
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them.
Big data examples
- Facebook's users activity log
- Google search
- Stock exchange market data
- ....
Big data architectures
- Lambda
- Kappa
- iot-a
- Zeta
Lambda Arch
Lambda Arch
- Fault tolerant against both human error and hardware failure.
- Batch & speed layers are the most important layers of this arch.
- Data layer contains an immutable, append-only raw data.
Kappa Arch
Kappa Architecture is a simplification of Lambda Architecture. A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed.
Kappa exists to fix Lambda's problems.
Famous tools & frameworks
Lambda
- Apache Hadoop
- Apache Spark
- Apache Storm
- Apache Hive
- .....
Kappa
- Apache Spark
- Apache Storm
- Onyx
- Apache Samza
- Kafka streams
- ......
Data stores:
Datomic, Cassandra, HBase, HDFS, ....
"Story" time !
More info:
Big Data: Principles and Best Practices of Scalable Real-time Data Systems
Distributed Systems: Concepts and Design (5th Edition)
lxsameer@gnu.org
@lxsameer
Distributed Software
By Sameer Rahmani
Distributed Software
A brief overview of distributed computing.
- 1,470