chewBBACA's

Nomenclature Server

Lab Meeting 31/10/2019

Pedro Cerqueira

Summary

1. Gene-by-Gene based methods

2. chewBBACA

3. Data Privacy

4. Nomenclature Server

5. Nomenclature Server - Roadmap

6. Nomenclature Server - At the moment

7. Final remarks

Is my strain the same as theirs?

Patient A

Patient B

1. Gene-by-Gene based methods

Multilocus Sequence Typing (MLST)

  • Defined scheme of typically 7 housekeeping gene fragments
  • Robust, portable and unified method for characterizing isolates at a molecular level
  • Not enough resolution to perform high resolution typing.
  • Whole-Genome Multilocus Sequence Typing  (wgMLST)
    • Extend MLST to whole-genome level
    • Set of genes that are present across a set of genomes representing a species, akin to a pan-genome.

 

  • Core-Genome Multilocus Sequence Typing  (cgMLST)
    • Gene-by-gene allelic profiling of core genome genes in a set of same species isolates

PCR & Sanger Sequecing

whole genome shotgun sequencing

1. Gene-by-Gene based methods

  • BLAST Score Ratio (BSR) Based Allele Calling Algorithm
  • Open source solution for the creation of whole genome and core genome MultiLocus Sequence Typing (wg/cgMLST) schemas
  • Performs allele calls on complete or draft genomes resulting from de novo assemblers

2. chewBBACA

  • Requirements
  • Installation
git clone https://github.com/B-UMMI/chewBBACA.git
pip install chewbbaca
conda install -c bioconda chewbbaca
Prodigal # for CDS detection
BLAST
Python >= 3.0.0 with numpy>=1.14.0 scipy>=0.13.3 biopython>=1.70 plotly>=1.12.9 SPARQLWrapper>=1.8.0 pandas>=0.22.0
ClustalW2 # optional
mafft # optional

2. chewBBACA

  • Profiles
File Locus 1 Locus 2
genome1.fasta Allele 2 Allele 5
genome2.fasta Allele 6 Allele 9
  • Profile Visualization
https://online2.phyloviz.net

2. chewBBACA

Output

What do people share?

  • Allelic profiles?
  • Raw data (reads or assemblies)?

Strict privacy laws may prevent the users from sharing raw data. Unpublished data is also a matter of concern.

  • Schemas?

Schemas can be created with different parameters, requiring the users to share their configurations.

Profiles are generated based on the schema, which may contain private data, making it hard to share and obtain the same results.

3. Data Privacy

Is my strain the same as theirs?

Goals:

  • Share and compare allele calls within the community at a global level

4. Nomenclature Server

  • Provide a public and centralized web service enabling users to:
    • Download the necessary data for the wg/cgMLST schemas
    • Query/submit results to the database
  • Allow users to perform analyses in their local machines

Server

Client

chewBBACA NS

Allele call results

  • Schemas
  • Profiles

Query/Submit

4. Nomenclature Server

What data does it store?

  • Email
  • Password hash
  • Sequence hash
  • Allelic Profiles (also hash)

4. Nomenclature Server

  • Server Front end grooming

5. Nomenclature Server - Roadmap

  • Server API Review and Documentation

5. Nomenclature Server - Roadmap

  • Client Refactorization

  • Download Schemas

  • Download Profiles

  • Synchronize Schemas

  • Upload Schemas

  • Upload metadata

5. Nomenclature Server - Roadmap

[
  {
    "schemas": {
      "type": "uri",
      "value": "http://127.0.0.1:5000/NS/api/species/13/schemas/6"
    },
    "locus": {
      "type": "uri",
      "value": "http://127.0.0.1:5000/NS/api/loci/11358"
    },
    "alleles": {
      "type": "uri",
      "value": "http://127.0.0.1:5000/NS/api/loci/11358/alleles/3"
    },
    "uniprot": {
      "type": "uri",
      "value": "http://purl.uniprot.org/uniprot/Q70EW3"
    },
    "label": {
      "type": "typed-literal",
      "datatype": "http://www.w3.org/2001/XMLSchema#string",
      "value": "Exotoxin L"
    }
  • Is this sequence in NS?

6. Nomenclature Server - At the moment

  • What are the fasta sequences of the alleles belonging to this loci?
{
  "Fasta": [
    {
      "allele_id": {
        "type": "typed-literal",
        "datatype": "http://www.w3.org/2001/XMLSchema#integer",
        "value": "1"
      },
      "nucSeq": {
        "type": "literal",
        "value": "ATGAAAAAAAATACCTTGACTTTGTTATTCCTTGTGTGTGTATCGCTTGCTCTATACACTACTGAGAGTGTCTTTTCAGATACGTACAATACAAATGATGTTAGAAATCCAAGGAACATATATGCTCCTAGATATGATAAAGACGAAATTTTGGATAATAGAAGATTAAAAGAAATATATAATAAAGAAATTATTGAAAAAAATAATATATCGATAAATGCCAAACAAGGAACGCAATTGATTTTTAATACGGATGAAAATACTACAGTTTGGAATGATAACACTTTTAAGAAAGTCATATCTAGTAATCTTTCTCCTTCACAGGAAAGAATGTTTAATGTTGGTGATCATGTGAATATTTTTGCTATAGTAAAGTCATATCATGTTGTATGCAAGGAACAATTCAATTATAGTGATGGGGGAATAATAAAAACAAGTGATGTAAAACCAGAAGAAAAAGCAATTTATATTAATATTTTTGGTGAAAAAGAATTACGAACATTAACAGCTAAAGATAAGATTACCTTTAAAAATAATATTGTAACTCTTCAGGAGATTGATGTTAGACTTAGGAAAAGTTTGATGGGGGACAGCAAAATAAAATTGTATGAGTACGATTCTTTGTATAAAAAAGGGTTTTGGGATATTCATTATAAAGACGGTGGCATTAGACACACCAATTTATTTACTTACCCCGACTATACAGATAATGAAACGATTGATATGAGTAAAGTTAGTCACTTTGATGTTCACTTAAACGAAGATTTTTCTAAAGATTAG"
      }
    }

6. Nomenclature Server - At the moment

  • What are the Uniprot annotations of the alleles belonging to this loci?
{
  "UniprotInfo": [
    {
      "UniprotLabel": {
        "type": "literal",
        "value": "Pyrogenic exotoxin SpeK"
      },
      "UniprotURI": {
        "type": "literal",
        "value": "http://purl.uniprot.org/uniprot/A0A0E1ESR1"
      }
    },
    {
      "UniprotLabel": {
        "type": "literal",
        "value": "Exotoxin L"
      },
      "UniprotURI": {
        "type": "literal",
        "value": "http://purl.uniprot.org/uniprot/Q70EW3"
      }

6. Nomenclature Server - At the moment

  • What kind of metatada is associated with a particular isolate from this species?

6. Nomenclature Server - At the moment

  • What kind of metatada is associated with a particular isolate from this species?
[
  {
    "name": {
      "type": "typed-literal",
      "datatype": "http://www.w3.org/2001/XMLSchema#string",
      "value": "SH5857.contigs.length_GCcontent_kmerCov.mappingCov.polished.fasta"
    },
    "country": {
      "type": "uri",
      "value": "http://dbpedia.org/resource/France"
    },
    "country_name": {
      "type": "literal",
      "xml:lang": "en",
      "value": "France"
    },
    "accession": {
      "type": "uri",
      "value": "https://www.ncbi.nlm.nih.gov/sra/ERR1221234"
    },
    "st": {
      "type": "typed-literal",
      "datatype": "http://www.w3.org/2001/XMLSchema#integer",
      "value": "1"
    },
    "date_entered": {
      "type": "typed-literal",
      "datatype": "http://www.w3.org/2001/XMLSchema#dateTime",
      "value": "2019-09-20T15:30:38.801202+01:00"
    },
    "host": {
      "type": "uri",
      "value": "http://purl.uniprot.org/taxonomy/9823"
    },
    "lat": {
      "type": "typed-literal",
      "datatype": "http://www.w3.org/2001/XMLSchema#long",
      "value": "-35.5"
    },
    "long": {
      "type": "typed-literal",
      "datatype": "http://www.w3.org/2001/XMLSchema#long",
      "value": "36.2222"
    },
    "isol_source": {
      "type": "typed-literal",
      "datatype": "http://www.w3.org/2001/XMLSchema#string",
      "value": "armpit"
    }
  }
]

6. Nomenclature Server - At the moment

7. Final Remarks

  • chewBBACA NS (client) aims to provide the all the necessary functions to download, call alleles and send data to the server in a flexible way.
  • Users can work with their data locally, avoiding some concerns over data privacy in sharing data.
  • Possibility to synchronize schemas without sending any data to the server
  • More wookie!

Lab meeting 31/10/2019

By Pedro Cerqueira

Lab meeting 31/10/2019

chewBBACA NS

  • 358