Distributed Systems

&

Big Data 

lxsameer

CTO at 

Free software contributor

This is a very basic talk

I skipped a huge amount of details.

Agenda

  • Distributed systems
  • Components of a distributed system
  • Attributes of a distributed system
  • Big data principals & archs

What is

"Distributed Software"?

 

Distributed System is a model in which components located on networked computers communicate and coordinate their actions by passing messages.

 

A program that runs in a distributed system is called distributed software

Think of it as a gigantic computer. 

  • You can run heavy processes.
  • You can store huge amount of data in it.
  • You're going to have a great amount of memory to use.

A distributed system schema

Component examples

  • Coordinator
  • Scheduler
  • Web apps
  • Data storage
  • Workers
  • Load balancer
  • Resource allocator
  • Message brokers
  • ....

Stateful

Components of a stateful distributed system, track the state of the whole system or each other.

Stateless

Components of a stateless distributed system don't have any clue about the state of other components

Being stateless is easier but being stateful is more flexible

Distributed systems guarantees

  • Fault tolerance
  • Consistancy
  • Partition tolerance
  • Availability
  • ....

NOTE: CAP therom states that it is impossible for a distributed computer system to simultaneously provide Availability, Consistancy and Partition tolerance

What do you mean by

"Big data" ?

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them.

 

‌Big data examples

  • Facebook's users activity log
  • Google search
  • Stock exchange market data
  • ....

Big data architectures

  • Lambda
  • Kappa
  • iot-a
  • Zeta

Lambda Arch

Lambda Arch

  • Fault tolerant against both human error and hardware failure.
  • Batch & speed layers are the most important layers of this arch.
  • Data layer contains an immutable, append-only raw data.

Kappa Arch

Kappa Architecture is a simplification of Lambda Architecture. A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed.

Kappa exists to fix Lambda's problems.

Famous tools & frameworks

Lambda

  • Apache Hadoop
  • Apache Spark
  • Apache Storm
  • Apache Hive
  • .....

Kappa

  • Apache Spark
  • Apache Storm
  • Onyx
  • Apache Samza
  • Kafka streams
  • ......

Data stores:

Datomic, Cassandra, HBase, HDFS, ....

"Story" time !

More info:

Big Data: Principles and Best Practices of Scalable Real-time Data Systems

Distributed Systems: Concepts and Design (5th Edition)

lxsameer@gnu.org

@lxsameer

Distributed Software

By Sameer Rahmani

Distributed Software

A brief overview of distributed computing.

  • 1,335