References
This workflow makes use of tools developed by many different people, so please cite these tools accordingly. To make this easier you find here a list of tool requirements and (if available) the associated publications.
Used databases
- Silva fasta sequences: SILVA_138.1_SSURef_NR99 downloaded from here (Quast et al. 2013)
- Silva link to ncbi taxonomy: taxmap_embl-ebi_ena_ssu_ref_nr99_138.1.txt downloaded from here
- Unite fasta sequences: sh_general_release_all_18.07.2023 (Nilsson et al. 2019)
- NCBI taxonomy dump downloaded on 19102023
Used software
The version numbers are the exact versions used to develop this workflow.
- Snakemake v7.32.4 (Mölder et al. 2021)
- Python v3.6.15
- biopython v1.79
- tabulate v0.8.10
- pandas v1.1.5
- matplotlib v3.3.4
- seaborn 0.11.2
- R v4.2
- tidyverse v1.3.2 (Wickham et al. 2019)
- phyloseq v1.42.0 (McMurdie and Holmes 2013)
- microbiome v1.20.0 (Lahti and Shetty 2012)
- NanoPack v1.1.0 (De Coster and Rademakers 2023), which comes with the following tools used in this workflow:
- NanoStat v1.6.0
- Chopper v0.6.0
- Pistis v0.3.3 github_link
- Porechop v0.2.4 github_link
- ITSx v1.1.3 (Bengtsson-Palme et al. 2013)
- Minimap2 v2.24 (Li 2018)
- Kraken2 v2.1.3 (Wood, Lu, and Langmead 2019)
Reference list
Bengtsson-Palme, Johan, Martin Ryberg, Martin Hartmann, Sara Branco, Zheng Wang, Anna Godhe, Pierre De Wit, et al. 2013. “Improved Software Detection and Extraction of ITS1 and ITS2 from Ribosomal ITS Sequences of Fungi and Other Eukaryotes for Analysis of Environmental Sequencing Data.” Methods in Ecology and Evolution 4 (10): 914–19. https://doi.org/10.1111/2041-210X.12073.
De Coster, Wouter, and Rosa Rademakers. 2023. “NanoPack2: Population-Scale Evaluation of Long-Read Sequencing Data.” Bioinformatics 39 (5): btad311. https://doi.org/10.1093/bioinformatics/btad311.
Lahti, Leo, and Sudarshan Shetty. 2012. “Microbiome r Package.”
Li, Heng. 2018. “Minimap2: Pairwise Alignment for Nucleotide Sequences.” Bioinformatics 34 (18): 3094–3100. https://doi.org/10.1093/bioinformatics/bty191.
McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLOS ONE 8 (4): e61217. https://doi.org/10.1371/journal.pone.0061217.
Mölder, Felix, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall, Christopher H. Tomkins-Tinch, Vanessa Sochat, Jan Forster, et al. 2021. “Sustainable Data Analysis with Snakemake.” F1000Research 10 (April): 33. https://doi.org/10.12688/f1000research.29032.2.
Nilsson, Rolf Henrik, Karl-Henrik Larsson, Andy F S Taylor, Johan Bengtsson-Palme, Thomas S Jeppesen, Dmitry Schigel, Peter Kennedy, et al. 2019. “The UNITE Database for Molecular Identification of Fungi: Handling Dark Taxa and Parallel Taxonomic Classifications.” Nucleic Acids Research 47 (D1): D259–64. https://doi.org/10.1093/nar/gky1022.
Quast, Christian, Elmar Pruesse, Pelin Yilmaz, Jan Gerken, Timmy Schweer, Pablo Yarza, Jörg Peplies, and Frank Oliver Glöckner. 2013. “The SILVA Ribosomal RNA Gene Database Project: Improved Data Processing and Web-Based Tools.” Nucleic Acids Research 41 (D1): D590–96. https://doi.org/10.1093/nar/gks1219.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wood, Derrick E., Jennifer Lu, and Ben Langmead. 2019. “Improved Metagenomic Analysis with Kraken 2.” Genome Biology 20 (1): 257. https://doi.org/10.1186/s13059-019-1891-0.