Privy Messenger

Bachelors' Thesis presentation

Technical University of Cluj-Napoca, Computer Science department.

Presenter: Godra Adam

Supervisor: As. Eng. Sorin Buzura

  • peer-to-peer
  • end-to-end encrypted
  • user hosted
  • open source
  • free to use

Centralized v. Decentralized v. Distributed

Terminology is often loose, and differs in different domains

Open source
Open source
E2EE
E2EE
Decentralized
Decentralized

Meta messenger

Meta mess...
Instagram Dm
Instagram...
Telegram
Telegram
Signal
Signal
WhatsApp
WhatsApp
Discord
Discord
Slack
Slack
Snapchat
Snapchat
Element
Element
P2P
P2P
Skype
Skype
Privy
Privy
Wire
Wire
Text is not SVG - cannot display

Comparison of privacy features in popular chat applications (users > 1 M.)

Why P2P?

  • No service maintenance cost (users host the system with their bandwidth and disk storage) => free to use
  • Private data resides user side => they have full control over it
  • No need for trust in third parties
  • Censorship resistant: routing of messages can take on millions of different directions, no one can block them all
  • Cyber attack resistant: DDoS attacks can't target a central bottleneck to shutdown all communication

Why E2EE?

  • Most messaging service offer in-transit encryption, i.e. between sender and server and then server and recipient; server can read messages
  • Most messaging systems are free to use and instead rely on collecting private data for targeted advertisements source of revenue; this can be misuses (and is).
  • By using the recipient's keys for encryption we can prevent any intermediary entity from reading our messages.

Why open source?

  • Anyone can look at the source code and verify that what the service claims is true; WYSIWYG
  • Thorough peer review; scrutiny improves security
  • Using open source components allows us to build a service without any cost. Using open source components requires that the resulting bigger application be open source too.

Challenges

  • Identity assignment and verification in a system without central coordinator
  • Message delivery in an unstructured, P2P system without knowledge of recipient address
  • Synchronization of user data across multiple devices without a central database
  • Availability

1. Identity: uniqueness

  • Mapping from physical person online digital alias
  • Must be unique in whole system
  • Traditionally a central entity can enforce uniqueness of aliases
  • Without it we need another system to enforce uniqueness.
  • One approach is using large random numbers as id-s, which through the law of large numbers are practically impossible to have conflicts

Identity: verification

  • Traditional centralized systems require proof of ownership of a digital identity by requiring a password and optionally 2FA as well
  • In the absence of a centralized authority to verify passwords, again we need another system for proof of identity
  • Digital signatures: each user has a unique, self-generated public/private key pair. Digital signatures provide mathematical proof of ownership of a public key. Public key belonging to a person must be verified off-band i.e. face-to-face.
  • Public keys are essentially large random numbers so can be assumed to globally unique so they satisfy previous condition

Identity: device-indepency

  • The issue with asymmetric keys is that they are persisted on a user's device only. If the user wishes to log in to their account they must import their keys from the original device (which defies the point of logging in on another device).
  • To solve this we must use a modify method for generating keys: a deterministic approach, one that can be reproduced independent of time or place.
  • We must not compromise security of the encryption scheme

Solution

  • Use RSA encryption scheme
  • Use a PRNG for generating prime numbers p and q instead of a CSPRNG
  • Use a CSPRNG to generate the seed and use it to create a mnemonic passphrase that the user can memorize
  • The mnemonic can be used to recreate same seed, hence same p and q
  • Now keypairs are device independent and security is not compromised
  • See BIP-32 (Bitcoin Improvement Proposal 32)

The P2P network

  • Assume: we need to deliver a message from A to B, A does not know the address of A (meaning IP address)
  • There is no central service with that knowledge either to relay message
  • There is no central databank where the message can be cached until B fetches it either
  • We need a routing algorithm that can deliver messages quickly. (for example flood routing works but it scales poorly).
  • The network used the IPFS network which is a Kademlia DHT network.

IPFS

  • Privy uses IPFS (Interplanetary File Sharing System) as underlying P2P network
  • IPFS itself uses an implementation of a Kademlia DHT P2P network
  • IPFS is open source
  • IPFS itself does not offer encryption, that is left to the applications using IPFS to implement

IPFS pubsub

  • IPFS has a feature called pubsub
  • IPFS nodes using pubsub can subscribe to topics (unique strings)
  • IPFS nodes can publish messages to topics, and all nodes that subscribed to it will receive the message, if online at the time of publishing.
  • This feature can be used to implement one to one as well as group messaging over IPFS.
  • IPFS pubsub is a 'gossipsub' implementation, which unlike floodsub greatly reduces network congestion and redundant message passing while the message propagates through the network

Message delivery

  • Privy uses IPFS pubsub to send messages
  • Each user is assigned an 'address' which is their topic to listen on for incoming messages
  • When sending a message, the sender publishes the message to the recipient's user address topic
  • A user address should be unique
  • A user address should not reveal who the address belongs to.

User addresses

  • A user address is the SHA-256 hash of the user's public key, appended to the 'privy://' namespacing string (to differentiate from other ongoing communication in the IPFS network)
  • This ensures that knowing a user's public key we can derive their address but not the other way around.
  • This is important because anyone can subscribe to any topic, and even if the communication is encrypted, metadata can be extracted (frequency, time of day etc.)
  • User addresses also have so-called endpoints, like /inbox, /requests... This allows the node to associate different handling methods for different types of messages, much like HTTP API endpoints would.

Data synchronization

  • Privy wants to allow users to log in on different devices
  • A user expects its previous conversation to show up on the new device
  • But messages are stored locally on the user device, so we need a way of importing the previous conversations
  • Solution: store conversations in a distributed database, that is automatically synced across instances.

OrbitDB

  • OrbitDB is an eventually consistent, distributed database, built on top of IPFS
  • It uses a CRDT (conflic-free replicated data-type), an append only log to store events occurring to the database (like create, update, delete events). The state of the database can be reconstructed by applying all events in the log in chronological order to an initially empty database.
  • Chronological ordering in a distributed systems is achieved by the use of logical clocks, namely a Lamport clock. Causally related events are guaranteed to be in order. In case of ties for concurrent events, the Last Writer Wins policy is applied.

OrbitDB

  • Privy uses OrbitDB to persist data locally, in a way that data automatically synchronizes across multiple devices
  • OrbitDB does not provide encryption by default
  • Privy implements data encryption on the application layer
  • Read and write access is based on digital signature verification

Availability

  • While traditional, server-client architectures can provide high availability, P2P systems cannot
  • Consider a user has data persisted locally on device A, and tries to log in from device B, while A is online. B now has no way to fetch previous conversations from A.
  • In order to remedy this problem Privy allows users to have 'proxy nodes': nodes that host a user's data, but do not necessarily belong to that user. The user can select trusted nodes (e.g. that of a friend, family member etc.) Proxying is a two way relationship, if user B hosts user A's data on their behalf, then user A hosts user B's data on their node.
  • The more proxy node a user has, the higher the availability of their data

Architecture

  • Privy consists of three main components:
  • Privy Nodes or Privy Daemons, the backend
  • Privy Router, a sort of middleware
  • Client applications, like Privy Web or Privy CLI, the frontend

Privy nodes

  • A Privy node is a daemon process that is connected to the IPFS network via an internal IPFS node.
  • It is responsible for communicating with other Privy nodes over IPFS
  • It is responsible for persisting data in a local OrbitDB instance.
  • It is responsible for dynamically generating the user's keypair based on their credentials and keeping it in memory, but not on disk
  • There can one or more nodes running on a device: there could be proxy nodes, or there could be another user's private nodes.
  • It exposes an HTTP API over localhost to facilitate communication with the Privy Router

Privy nodes

  • Privy Node is written in the typescript programming language
  • Typescript was chosen because its dependecies, IPFS node and OrbitDB have implementations in typescript.
  • It uses Express to create an internal HTTP server.

Privy router

  • The Privy router connects the frontend application with the Privy nodes. It takes commands from the frontend and based on the currently logged in user, forwards those commands to the appropriate node.
  • This allows multiple users to log in on the same device.
  • It is responsible for the timeline of the Privy node processes, start them or stop as needed.
  • It has an internal HTTP server accepting requests only over localhost to facilitate communication with client applications.
  • Privy Router is written in python, using the FastAPI framework
  • Python was chosen for development speed

Privy CLI

  • Privy CLI is a command line application that facilitates the interaction with the Privy nodes.
  • It is written in python, using the Typer framework.

Privy CLI

Privy Web

  • Privy web is a graphical client application running in the browser.
  • It is a SPA, using React and TailwindCSS

Deployment

  • The backend, i.e. the Privy router and Privy node can be installed from a Docker image. This ensures compatibility across all platforms.
  • Privy CLI is packaged as an executable and can be installed as such
  • Privy Web is deployed on netlify and is available on the internet. It can also be served locally from code, using the 'serve' NodeJS package.

Testing

  • In order to simulate multiple computer running multiple nodes, the application was placed inside Docker containers and running multiple instances of them. The containers used the host machine's network.

Privy Messenger

By Godra Adam

Privy Messenger

  • 15