Distributed File System

Topics

1

DFS

2

GFS

3

HDFS

Philosophy

Distrbute the storage with local like performance and experience

A distributed file system (DFS) is a file system that spans across multiple file servers or multiple locations

DFS

Img Url: https://scaleyourapp.com/wp-content/uploads/2022/01/distributed-client-server-min-1200x675.png

  • Access to the same data from multiple locations
  • Transparent local access
  • Location independence
  • Scale-out capabilities
  • Fault tolerance

Why DFS ?

  • Network File System (NFS)
  • Google File System (GFS)
  • Hadoop Distributed File System (HDFS)
  • Colossus

Different DFS

  • Distribution: First, a DFS distributes datasets across multiple clusters or nodes. Each node provides its own computing power, which enables a DFS to process the datasets in parallel.

How it works ?

  • Replication: A DFS will also replicate datasets onto different clusters by copying the same pieces of information into multiple clusters. This helps the distributed file system to achieve fault tolerance.

How it works ?

NFS

Img Url: https://ars.els-cdn.com/content/image/3-s2.0-B9780124201583000186-f18-01-9780124201583.jpg

GFS

Stanford Deck: Link

HDFS

U Waterloo Deck: Link

HDFS Architecture

Img Url: https://www.interviewbit.com/blog/wp-content/uploads/2022/06/HDFS-Architecture-1024x550.png

Applications of HDFS

Industry Use Cases
Finance Stock predictions
E-Commerce product recommendations
Social Media Social graph analysis
Ads Conversions
Healthcare Patient data analysis

Thank You!

Questions?

Made with Slides.com