Privy Messenger

Bachelors' Thesis presentation

Technical University of Cluj-Napoca, Computer Science department.

Presenter: Godra Adam

Supervisor: As. Eng. Sorin Buzura

peer-to-peer
end-to-end encrypted
user hosted
open source
free to use

Centralized v. Decentralized v. Distributed

Terminology is often loose, and differs in different domains

Comparison of privacy features in popular chat applications (users > 1 M.)

Why P2P?

No service maintenance cost (users host the system with their bandwidth and disk storage) => free to use
Private data resides user side => they have full control over it
No need for trust in third parties
Censorship resistant: routing of messages can take on millions of different directions, no one can block them all
Cyber attack resistant: DDoS attacks can't target a central bottleneck to shutdown all communication

Why E2EE?

Most messaging service offer in-transit encryption, i.e. between sender and server and then server and recipient; server can read messages
Most messaging systems are free to use and instead rely on collecting private data for targeted advertisements source of revenue; this can be misuses (and is).
By using the recipient's keys for encryption we can prevent any intermediary entity from reading our messages.

Why open source?

Anyone can look at the source code and verify that what the service claims is true; WYSIWYG
Thorough peer review; scrutiny improves security
Using open source components allows us to build a service without any cost. Using open source components requires that the resulting bigger application be open source too.

Challenges

Identity assignment and verification in a system without central coordinator
Message delivery in an unstructured, P2P system without knowledge of recipient address
Synchronization of user data across multiple devices without a central database
Availability

1. Identity: uniqueness

Mapping from physical person online digital alias
Must be unique in whole system
Traditionally a central entity can enforce uniqueness of aliases
Without it we need another system to enforce uniqueness.
One approach is using large random numbers as id-s, which through the law of large numbers are practically impossible to have conflicts

Identity: verification

Traditional centralized systems require proof of ownership of a digital identity by requiring a password and optionally 2FA as well
In the absence of a centralized authority to verify passwords, again we need another system for proof of identity
Digital signatures: each user has a unique, self-generated public/private key pair. Digital signatures provide mathematical proof of ownership of a public key. Public key belonging to a person must be verified off-band i.e. face-to-face.
Public keys are essentially large random numbers so can be assumed to globally unique so they satisfy previous condition

Identity: device-indepency

The issue with asymmetric keys is that they are persisted on a user's device only. If the user wishes to log in to their account they must import their keys from the original device (which defies the point of logging in on another device).
To solve this we must use a modify method for generating keys: a deterministic approach, one that can be reproduced independent of time or place.
We must not compromise security of the encryption scheme

Solution

Use RSA encryption scheme
Use a PRNG for generating prime numbers p and q instead of a CSPRNG
Use a CSPRNG to generate the seed and use it to create a mnemonic passphrase that the user can memorize
The mnemonic can be used to recreate same seed, hence same p and q
Now keypairs are device independent and security is not compromised
See BIP-32 (Bitcoin Improvement Proposal 32)

The P2P network

Assume: we need to deliver a message from A to B, A does not know the address of A (meaning IP address)
There is no central service with that knowledge either to relay message
There is no central databank where the message can be cached until B fetches it either
We need a routing algorithm that can deliver messages quickly. (for example flood routing works but it scales poorly).
The network used the IPFS network which is a Kademlia DHT network.

IPFS

Privy uses IPFS (Interplanetary File Sharing System) as underlying P2P network
IPFS itself uses an implementation of a Kademlia DHT P2P network
IPFS is open source
IPFS itself does not offer encryption, that is left to the applications using IPFS to implement

IPFS pubsub

IPFS has a feature called pubsub
IPFS nodes using pubsub can subscribe to topics (unique strings)
IPFS nodes can publish messages to topics, and all nodes that subscribed to it will receive the message, if online at the time of publishing.
This feature can be used to implement one to one as well as group messaging over IPFS.
IPFS pubsub is a 'gossipsub' implementation, which unlike floodsub greatly reduces network congestion and redundant message passing while the message propagates through the network

Message delivery

Privy uses IPFS pubsub to send messages
Each user is assigned an 'address' which is their topic to listen on for incoming messages
When sending a message, the sender publishes the message to the recipient's user address topic
A user address should be unique
A user address should not reveal who the address belongs to.

User addresses

A user address is the SHA-256 hash of the user's public key, appended to the 'privy://' namespacing string (to differentiate from other ongoing communication in the IPFS network)
This ensures that knowing a user's public key we can derive their address but not the other way around.
This is important because anyone can subscribe to any topic, and even if the communication is encrypted, metadata can be extracted (frequency, time of day etc.)
User addresses also have so-called endpoints, like /inbox, /requests... This allows the node to associate different handling methods for different types of messages, much like HTTP API endpoints would.

Data synchronization

Privy wants to allow users to log in on different devices
A user expects its previous conversation to show up on the new device
But messages are stored locally on the user device, so we need a way of importing the previous conversations
Solution: store conversations in a distributed database, that is automatically synced across instances.

OrbitDB

OrbitDB is an eventually consistent, distributed database, built on top of IPFS
It uses a CRDT (conflic-free replicated data-type), an append only log to store events occurring to the database (like create, update, delete events). The state of the database can be reconstructed by applying all events in the log in chronological order to an initially empty database.
Chronological ordering in a distributed systems is achieved by the use of logical clocks, namely a Lamport clock. Causally related events are guaranteed to be in order. In case of ties for concurrent events, the Last Writer Wins policy is applied.

OrbitDB

Privy uses OrbitDB to persist data locally, in a way that data automatically synchronizes across multiple devices
OrbitDB does not provide encryption by default
Privy implements data encryption on the application layer
Read and write access is based on digital signature verification

Availability

While traditional, server-client architectures can provide high availability, P2P systems cannot
Consider a user has data persisted locally on device A, and tries to log in from device B, while A is online. B now has no way to fetch previous conversations from A.
In order to remedy this problem Privy allows users to have 'proxy nodes': nodes that host a user's data, but do not necessarily belong to that user. The user can select trusted nodes (e.g. that of a friend, family member etc.) Proxying is a two way relationship, if user B hosts user A's data on their behalf, then user A hosts user B's data on their node.
The more proxy node a user has, the higher the availability of their data

Architecture

Privy consists of three main components:
Privy Nodes or Privy Daemons, the backend
Privy Router, a sort of middleware
Client applications, like Privy Web or Privy CLI, the frontend

Privy nodes

A Privy node is a daemon process that is connected to the IPFS network via an internal IPFS node.
It is responsible for communicating with other Privy nodes over IPFS
It is responsible for persisting data in a local OrbitDB instance.
It is responsible for dynamically generating the user's keypair based on their credentials and keeping it in memory, but not on disk
There can one or more nodes running on a device: there could be proxy nodes, or there could be another user's private nodes.
It exposes an HTTP API over localhost to facilitate communication with the Privy Router

Privy nodes

Privy Node is written in the typescript programming language
Typescript was chosen because its dependecies, IPFS node and OrbitDB have implementations in typescript.
It uses Express to create an internal HTTP server.

Privy router

The Privy router connects the frontend application with the Privy nodes. It takes commands from the frontend and based on the currently logged in user, forwards those commands to the appropriate node.
This allows multiple users to log in on the same device.
It is responsible for the timeline of the Privy node processes, start them or stop as needed.
It has an internal HTTP server accepting requests only over localhost to facilitate communication with client applications.
Privy Router is written in python, using the FastAPI framework
Python was chosen for development speed

Privy CLI

Privy CLI is a command line application that facilitates the interaction with the Privy nodes.
It is written in python, using the Typer framework.

Privy CLI

Privy Web

Privy web is a graphical client application running in the browser.
It is a SPA, using React and TailwindCSS

Deployment

The backend, i.e. the Privy router and Privy node can be installed from a Docker image. This ensures compatibility across all platforms.
Privy CLI is packaged as an executable and can be installed as such
Privy Web is deployed on netlify and is available on the internet. It can also be served locally from code, using the 'serve' NodeJS package.

Testing

In order to simulate multiple computer running multiple nodes, the application was placed inside Docker containers and running multiple instances of them. The containers used the host machine's network.

Privy Messenger

Why P2P?

Why E2EE?

Why open source?

Challenges

1. Identity: uniqueness

Identity: verification

Identity: device-indepency

Solution

The P2P network

IPFS

IPFS pubsub

Message delivery

User addresses

Data synchronization

OrbitDB

OrbitDB

Availability

Architecture

Privy nodes

Privy nodes

Privy router

Privy CLI

Privy CLI

Privy Web

Deployment

Testing

Demo time

Privy Messenger

More from Godra Adam