cp <path_to NanoITS_folder>/config/config.yaml .
Run NanoITS
NanoITS will take de-multiplexed and compressed fastq files as input and generate OTU tables and some summary statistics as output.
As input, NanoITS takes a single fastq.gz file per sample.
Modify the configuration file
To run NanoITS you need to provide some information in the configuration file. In the NanoITs configuration file, also called config.yaml, you can provide the sample name, sample path and modify some parameters for the different software used.
To change the default settings and tell Snakemake where your data is located copy the config/config.yaml
file found in the NanoITs folder to the folder in which you want to analyse your data, i.e. like this:
You can of course run your analyses in NanoITS folder you downloaded, but often its easier to separate software from analyses.
Next, open the config.yaml with an editor, such as nano. There are several things you can modify:
The project name
You can provide the project name in project: "run_v1"
. Your results will be generated in the folder you start the snakemake workflow in and the results will be generated in results/<project_name>
(results/run_v1
if you use the default settings). Your project name can contain letters, numbers, _
and -
. Do not use other symbols, such as spaces or dots.
The mapping file
Here, you need to provide the path to a comma-separated mapping file that describes the samples you want to analyse, i.e. samples_file: "input/mapping.csv"
. The mapping file itself needs to contain the following columns:
sample
: The names of your sample. This id will be used to label all files created in subsequent steps. Your sample names should be unique and only contain letters, numbers and-
. Do not use other symbols, such as spaces, dots or underscores in your sample names.barcode
: The barcode ID. Can be empty as it is not actively used in the workflow as of nowpath
: Path to the fastq.gz files. You can provide the relative path (i.e. relative to the working directory you start the snakemake workflow in) or absolute path (i.e. the location of a file or directory from the root directory(/
)). The workflow accepts one file per barcode, so if you have more than one file merge these files first, for example using thecat
command.
Example mapping file:
sample,barcode,path
bc01,barcode01,/path/barcode01.fastq.gz
bc02,barcode02,/path/barcode02.fastq.gz
...
If you use the example, ensure that the mapping.csv file resides in a folder called input or change the config.yaml accordingly.
The classifiers to use
You can choose what classifiers you want to use in classifiers: ["minimap2", "kraken2"]
. Currently, two classifiers are implemented: (a) the alignment-based classifier minimap2 and (b) the kmer-based classifier kraken2. You can use both or either of the two classifiers.
The markers to investigate
You can select what markers you want to analyse in markers: ["SSU", "ITS1", "ITS2"]
. The workflow was developed for primers targeting both the SSU and ITS1/ITS2 but the workflow will also run for either option selected and we plan to in the future extend the workflow to also accept the LSU rRNA gene.
Other parameters
Finally, you can change tool specific parameters: If desired, there are several parameters that can be changed by the user, such as the numbers of threads to use, the settings for the read filtering or the classification and so on. The configuration file provides more information on each parameter.
The most important parameters to check out are the settings for the filtering of reads and the minimum length cutoff (min_its_length) for the SSU and ITS1/ITS2 sequences.
Since several steps of this workflow are quite resource intensive, we recommend running this workflow on an HPC and set the numbers of threads accordingly.
Run NanoITS
Dry-run
To test whether the workflow is defined properly do a dry-run first. To do this, change part of the snakemake command as follows:
- Provide the path to where you installed NanoITS after
--s
- Provide the path to the edited config file after
--configfile
- Provide the path to where you want snakemake to install all program dependencies after
--conda-prefix
. We recommend to install these into the folder in which you downloaded NanoITS but you can change this if desired
#activate conda environment with your snakemake installation, i.e.
mamba activate snakemake_7.32.4
#test if everything runs as extended (edit as described above)
snakemake --use-conda --cores 1 \
-s <path_to_NanoITS_install>/workflow/Snakefile \
--configfile config/config.yaml \
--conda-prefix <path_to_NanoITS_install>/workflow/.snakemake/conda \
--rerun-incomplete --nolock -np
Run NanoITS interactively
If the dry-run was successful you can run snakemake interactively with the following command. Adjust the cores according to your system.
snakemake --use-conda --cores 1 \
-s <path_to_NanoITS_install>/workflow/Snakefile \
--configfile config/config.yaml \
--conda-prefix <path_to_NanoITS_install>/workflow/.snakemake/conda \
--rerun-incomplete --nolock
Run NanoITS on Crunchomics (Uva-specific option)
If you are a student or staff at the University of Amsterdam and part of the IBED department, you can also run NanoITS via the Crunchomics HPC. Detailed instructions how to do thos can be found here.
Generate a report
After a successful run, you can create a report with some of the key output files as follows:
snakemake --report report.html \
--configfile config/config.yaml \
-s <path_to_NanoITS_install>/workflow/Snakefile
At the moment the report contains:
- A schematic of the steps executed
- Information about the reads before and after quality filtering (one file for each sample)
- Information about the number of SSU and ITS1 sequences extracted before and after quality filtering
- The results from the different classifiers represented as barplots (total counts and relative abundance) for different taxonomic ranks
- Statistics of the snakemake run
Additionally, you can find some OTU tables in these locations:
results/<project_name>/tables/{marker}_otu_table.txt
: An OTU table based on the taxonomy assignment and including the counts for each sample and classifier. The table is generated once for each marker generated.results/<project_name>/tables/{marker}_otu_table_filtered..txt
: A filtered OTU table based on the taxonomy assignment and including the counts for each sample and classifier. The table is generated once for each marker generated. The filtering discards samples with =<20 reads and singletons and the barplots in the reports where generated from the filtered OTU table.results/<project_name>/classification/{classifier}/{marker}.merged.outmat.tsv
: An OTU table based on the taxonomy assignment and including the counts for each sample. One separate table is generated for each marker investigated and classifier used.
Generate a report with the old report format
Snakemake v7.32.4 generates reports in a new format, which is a bit more convoluted than the older format. If you prefer the old format, you can create if running snakemake –report with an older version, such as snakemake v6.8.0.