Piotr Grzesik
dr hab. inż. Dariusz Mrozek, prof. PŚ
Silesian University of Technology
Nanopore sequencing - developed by Oxford Nanopore Technologies, it is a process of DNA sequencing that works by monitoring changes to an electrical current caused by DNA strand passing through a nanopore. The signal that is obtained as a result is decoded to specific DNA or RNA sequences. The process of such decoding is called basecalling.
MinION Nanopore - portable sequencing device, released by Oxford Nanopore Technologies in 2014. It is the first device that enables portable sequencing at affordable price (1000$). It is powered via USB, weights under 100g, which makes it possible to use it as a field device.
Serverless computing is a computing paradigm that takes advantage of simple, stateless functions (also called Functions-as-a-service) that offer low maintenance overhead, fault tolerance, support massive parallelism, allocate resources on-demand and can quickly scale both up and down. One additional benefit of this paradigm is that users pay only for actual invocations of functions and not for idle time.
Serverless computing is also getting more popular in the literature for bioinformatic purposes:
(Added at the end of 2020)
In proposed workflow, first step is uploading FAST5 files from MinION Nanopore to S3 Bucket. In the next step, the processing is triggered manually and first Lambda function splits the FAST5 files into batches and schedules execution of multiple Lambda functions that run basecalling operation and save results to S3 bucket as well.
Fast5 files with data from sequencing runs containing material of Escherichia coli and Klebsiella Pneumoniae
Measurement of samples processed per second by each basecaller and per second per MB of memory for different models
Both Guppy and Bonito were tested
Experiments were run for 256, 512, 1024, 2048, 4096, 6144, 8192 and 10240 (maximum) MBs of RAM available to a single Lambda function
Samples per second processed by Guppy with Fast model
Samples per second per MB of memory for Guppy fast model
Samples per second processed by Guppy with HAC model
Samples per second per MB of memory for Guppy high accuracy model
In proposed workflow, first step is to determine if we can take advantage of cloud offloading to speed up the edge processing. Then, the files are splitted into batches for edge and serverless processing, based on the theoretical processing speeds for both approaches, depending on upload speed. Then, the files are processed separately, and edge device monitors and collects the results from cloud-based processing as well.
Fast5 files with data from sequencing runs containing material of Escherichia coli and Klebsiella Pneumoniae
Jetson Xavier NX as edge device, tested with lowest (10W 2 core) and highest (15W 6 core) power modes
Guppy basecaller was tested
Experiments were run for 128, 256, 512 kB/s upload speeds