Creating a Bittorrent Client with Asyncio

Ahmed Abdalla

PyGotham 2017 @ NYC

What is Bittorent?

  • Open, peer to peer protocol for sharing files
  • Robust, distributed protocol

Why build a Bittorrent Client?

 

...Why not?

Did a retreat @ the Recurse Center

www.recurse.com

Who uses Bittorrent?

  • Gaming companies
    • Blizzard, Eve, etc
  • Education
  • Government
  • Entertainment
    • including pirating
  • Tech companies
    • Facebook, Twitter

Agenda

  1. Explain Bittorrent protocol
  2. Define steps for writing our client
  3. Why Asyncio
  4. Look at code

 

Let's download a torrent

file.txt.torrent

Bittorrent

Client

Done!

Our Bittorrent Client

 

 

Write a program that can download a file when given file.torrent as input

$ python client.py file.txt.torrent

   file.txt has finished downloading

How does torrenting work?

Download connection

HTTP GET: Who has this file?

Resp: List of N Peers

Tracker

Peer 1

Peer N

Peer 2

.

.

.

The plan

  1. Read file.txt.torrent & get Tracker info
  2. Request list of peers from tracker
  3. Connect to each peer and download file
  4. For each file piece download, write to disk

 

Why Asyncio?

Solving for:

  • Maintain multiple concurrent connections to peers
  • Network I/O bound application - waiting for data

Asyncio

  • Single process, single threaded approach for concurrent applications
  • Application code yields control at optimal times

Asyncio Building Blocks

  • Coroutine
    • a function that can be paused and resumed
    • Can "yield" data
    • Can be "sent" data
  • Task - scheduler for coroutines
    • Tells coroutines to make an execution 'step'
  • Event loop - A queue of "tasks" to be executed

Sample Event Loop


queue = [
    Task(some_coroutine),
    Task(another_coroutine)
]


while queue:
    task = queue.pop(0)

    task.execute_coroutine()

    if task.is_coroutine_done():
        print('Finished a task')
    else:
        queue.append(task)
        


print('Event loop finished tasks')

Asyncio: Quick API Tour


async def some_coroutine():
    a = 1 + 2
    await asyncio.sleep(1)
    return a

Creating a coroutine

use "async def" to create a coroutine

Asyncio: Quick API Tour Cont'd


import asyncio

loop = asyncio.get_event_loop()

loop.run_until_complete(some_coroutine)

Running a coroutine

wrapped in Task

Asyncio: Quick API Tour Cont'd


import asyncio


async def sleep_a_lot()
    await asyncio.gather(*[some_coroutine() for i range(5)])
    print('Done')


loop = asyncio.get_event_loop()

loop.run_until_complete(sleep_a_lot)

Run multiple coroutine "simultaneously"

 Task 1

Task 2

active

inactive

Cooperative Multitasking

  • Everything runs in one process, under a single thread
  • Tasks "yield" control at optimal times

execution time

import asyncio, sys

async def download(torrent_file):
    # Read and parse ".torrent" file
    torrent = read_torrent(torrent_file)

    # Get peers list from tracker in ".torrent" file
    peer_addresses = await get_peers(torrent)

    # Object to track peer communication/state
    peers = [Peer(addr) for addr in peer_addresses]

    # Wait for all download coroutines to finish
    await asyncio.gather(
        *[peer.download() for peer in peers]  # Producer
    )


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(download(sys.argv[1])
    loop.close()
import asyncio, sys


async def download(torrent_file):
    # Read and parse ".torrent" file
    torrent = read_torrent(torrent_file)

    # Object to track peer communication/state
    peer_addresses = await get_peers(torrent)

    # Queue for storing downloaded file pieces
    file_pieces_queue = asyncio.Queue()

    # Object to coordinate writing file to disk
    file_saver = FileSaver(file_pieces_queue)

    # Object to track peer communication/state
    peers = [Peer(addr, file_pieces_queue) for addr in peer_addresses]

    await asyncio.gather(
        *([peer.download() for peer in peers] +  # Producers
          [file_saver.start()])                  # Consumers
    )

Peer 1

Peer 2

Peer 3

File Saver

Active

Inactive

Our Bittorrent client...

execution time

The plan

  1. Read file.txt.torrent & get Tracker info
  2. Request list of peers from tracker
  3. Connect to each peer and download file
  4. For each file piece download, write to disk

 

Let's open the torrent file

$ cat hello.txt.torrent

d8:announce31:http://10.0.16.36:6969/announce10:created by25:
Transmission/2.84 (14307)13:creation datei1504559097e8:encoding5
:UTF-84:infod6:lengthi35e4:name9 :hello.txt12:piece length
i32768e6:pieces20:�3�b0gŵ�{	*�$K7:privatei0eee%
  • Byte string file
  • File data is "bencoded"

BEncoding

  • Organizes data in a terse format
  • Saves bandwidth
Data type Format Example
Byte string <len>:<str> 4:spam => "spam"
Integer i<num>e i-55e => -55
list l<item>e l2:hie => ["hi"]
dictionary d<key><val>e d2:hi3:byee => {"hi": "bye"}

BDecoding the metafile file

$ cat hello.txt.torrent

d8:announce31:http://10.0.16.36:6969/announce10:created by25:Transmission/2.84
 (14307)13:creation datei1504559097e8:encoding5:UTF-84:infod6:lengthi35e4:name9
:hello.txt12:piece lengthi32768e6:pieces20:�3�b0gŵ�{	*�$K7:privatei0eee%
{b'announce': b'http://10.0.16.36:6969/announce',
 b'created by': b'Transmission/2.84 (14307)',
 b'creation date': 1504559097,
 b'encoding': b'UTF-8',
 b'info': {b'length': 35,
           b'name': b'hello.txt',
           b'piece length': 32768,
           b'pieces': b'\xad3\xd8b0g\xc5\xb5\x17\xaa{\x1a\t\x02*\xed\xbd$K\x13',
           b'private': 0}}

bdecode(   )

File Pieces

The plan

  1. Read file.txt.torrent & get tracker info
  2. Request list of peers from tracker
  3. Connect to each peer and download file
  4. For each file piece download, write to disk

 

The Tracker

The Tracker is an HTTP(S) server that responds to HTTP GET requests.

 

It keeps track of who in the swarm has a torrent.

HTTP GET: Who has this file?

RESP: List of N Peers

Tracker

Tracker: HTTP GET

Params Description
info_hash URL encoded 20-byte SHA1 of the info_dict
peer_id URL encoded 20-byte string ID that identifies our client
port Port our client is listening on
event {started, stopped, completed}
uploaded Total amount uploaded (Bytes) since 'started' event
downloaded Total amount downloaded (Bytes) since 'started' event
left Total amount uploaded (Bytes) since 'started' event
compact Bool. Denotes if client accepts compact list of peers
no_peer_id Bool. Indicates if peer can omit peer_id in peers dict response

URL: http://my-tracker.com/announce?params=params

Aiohttp GET Request

import aiohttp

torrent = get_torrent_file()

async def request_peers(self):
    async with aiohttp.ClientSession() as session:
        resp = await session.get(tracker_url, params=params)
        resp_data = await resp.read()
        peers = bencoder.decode(resp_data)
        return peers
  • Use async packages
  • If you can't, run synchronous code in a thread

Tracker Response

b'd8:completei1e10:downloadedi0e10:incomplete
i1e8:intervali1921e12:min intervali960e5:peers
12:\xc0\xa8c\x01\xe90\xc0\xa8cd\xc8\xd5e'
{
  b'complete': 1,
  b'downloaded': 0,
  b'incomplete': 1,
  b'interval': 1918,
  b'min interval': 959,
  b'peers': b'\xc0\xa8c\x01\xe90\xc0\xa8cd\xc8\xd5\xc0\xa8c\x01\xe90\xc0\xa8cd\xc8\xd5'
}
['192.168.99.100:51413', '33.67.123.201:69596']

The plan

  1. Read file.txt.torrent & get Tracker info
  2. Request list of peers from tracker
  3. Connect to each peer and download file
  4. For each file piece download, write to disk

 

Working with Peers

  • Manage Peer State
    • Choked
    • Interested
  • Handle Peer messages
    • Handshake
    • Keep Alive
    • Choke
    • Unchoke
    • Interested
    • Not Interested
    • Have
    • Bitfield
    • Request
    • Piece

Peer State

  • Choked - Not allowed to send requests.
  • Interested - Wants to send requests (download)

Initial State

  • Choked
  • Not Interested

Cooperative State

  • Unchoked
  • Interested
  1. Wait for Unchoke
  2. Send Interested

Peer State


class Peer(object):
    def __init__(self, host, port, file_queue):
        self.host = host
        self.port = port
        self.file_queue = file_queue

        # Denotes if peer is choking us
        self.peer_choking = True

        # Denotes if we've informed our peer we're interested
        self.am_interested = False


    async def download(self):
        pass

Peer Message Exchange

TCP connection

Unchoke

Interested

Have/Bitfield

Request

Piece

.

.

.

Client

Peer

Client Handshake

Peer Handshake

self.peer_choking = False
self.am_interested = True

Peer Message: Handshake


class Peer(object):
    async def download(self):
        reader, writer = await asyncio.open_connection(
            self.host, self.port
        )

        handshake = b''.join([
            chr(19).encode(),
            b'BitTorrent protocol',
            (chr(0) * 8).encode(),
            info_hash,
            PEER_ID.encode()
        ])

        # Send Handshake
        writer.write(handshake)
        await writer.drain()

        # Read and validate response
        peer_handshake = await reader.read(68)
        self.validate(peer_handshake)

        # Start exchanging messages...
Handshake format
Protocol string length
"Bittorrent protocol"
Reserved bytes
info_hash
peer id

Peer Message Formats

All remaining messages:

 

<message length> <id> <payload>

Peer Message Formats

Message Format
Keep Alive <len=0000>
Choke <len=0001><id=0>
Unchoke <len=0001><id=1>
Interested <len=0001><id=2>
Not Interested
<len=0001><id=3>
Have <len=0005><id=4><piece index>
Bitfield <len=0001+X><id=5><bitfield>
Request <len=0013><id=6><index><begin><length>
Piece <len=0009+X><id=7><index><begin><block>

Peer Message Exchange

class Peer(object):
    async def download(self):
        
        # Start exchanging messages

        buf = b''  # Holds data read from peer

        while True:
            resp = await reader.read(REQUEST_SIZE)
            buf += resp

            while True:
                if len(buf) < 4:
                    break

                msg_message_length = self.get_message_message_length(buf)

                if msg_message_length == 0:
                    # Handle Keep Alive
                    continue

                msg_id = struct.unpack('>b', buf[4:5])

Peer Message Exchange

class Peer(object):
    async def download(self):
                # ... Looping through Peer messages

                if msg_message_length == 0:
                    # Handle Keep Alive
                    continue

                msg_id = struct.unpack('>b', buf[4:5])  # 5th byte is the ID

                if msg_id == 0:
                    # Handle Choke...

                elif msg_id == 1:
                    # Handle Unchoke...
                    await self.send_interested_message()

                elif msg_id == 2:
                    # Handle Interested...

                elif msg_id == 3:
                    # Handle Not interested...

                elif msg_id == 4:
                    # Handle Have...

                elif msg_id == 5:
                    # Handle Bitfield...

                elif msg_id == 7:
                    # Handle Piece...
                    self.file_queue.enqueue(piece_data)

                await self.request_a_piece()

The plan

  1. Read file.txt.torrent & get Tracker info
  2. Request list of peers from tracker
  3. Connect to each peer and download file
  4. For each file piece download, write to disk

 

File Saver

class FileSaver(object):
    def __init__(self, file_queue):
        self.file_queue = file_queue

    async def start(self):
        while True:
            piece = await self.file_queue.get()

            if not piece: # Poison pill
                return

            await self.save(piece)
import asyncio, sys


async def download(torrent_file):
    # Read and parse ".torrent" file
    torrent = read_torrent(torrent_file)

    # Object to track peer communication/state
    peer_addresses = await get_peers(torrent)

    # Queue for storing downloaded file pieces
    file_pieces_queue = asyncio.Queue()

    # Object to coordinate writing file to disk
    file_saver = FileSaver(file_pieces_queue)

    # Object to track peer communication/state
    peers = [Peer(addr, file_pieces_queue) for addr in peer_addresses]

    await asyncio.gather(
        *([peer.download() for peer in peers] +  # Producers
          [file_saver.start()])                  # Consumers
    )

Check it out on Github

https://github.com/SimplyAhmazing/pygotham-bittorrent

Some Bittorrent Projects

  • Support file uploading
  • NAT traversal
  • Support UDP tracker
  • Support sharing of multiple files
  • Discover Peers without a tracker
    • Peer Exchange Protocol (PEX)
    • Distributed Hash Table (DHT)
  • Download Magnet Links
  • Improve download strategy
  • Play torrent videos as they're downloading

Learn more about asyncio

500 Lines or Less

A Web Crawler With asyncio Coroutines

by: Jesse Jiryu Davis & Guido van Rossum

Home Assistant

github.com/home-assistant/home-assistant

await TalkFinished('bye')

Made with Slides.com