Run NanoITS

NanoITS will take de-multiplexed and compressed fastq files as input and generate OTU tables and some summary statistics as output.

As input, NanoITS takes a single fastq.gz file per sample.

Modify the configuration file

To run NanoITS you need to provide some information in the configuration file. In the NanoITs configuration file, also called config.yaml, you can provide the sample name, sample path and modify some parameters for the different software used.

To change the default settings and tell Snakemake where your data is located copy the config/config.yaml file found in the NanoITs folder to the folder in which you want to analyse your data, i.e. like this:

cp <path_to NanoITS_folder>/config/config.yaml .

You can of course run your analyses in NanoITS folder you downloaded, but often its easier to separate software from analyses.

Next, open the config.yaml with an editor, such as nano. There are several things you can modify:

The project name

You can provide the project name in project: "run_v1". Your results will be generated in the folder you start the snakemake workflow in and the results will be generated in results/<project_name> (results/run_v1 if you use the default settings). Your project name can contain letters, numbers, _ and -. Do not use other symbols, such as spaces or dots.

The mapping file

Here, you need to provide the path to a comma-separated mapping file that describes the samples you want to analyse, i.e. samples_file: "input/mapping.csv". The mapping file itself needs to contain the following columns:

sample: The names of your sample. This id will be used to label all files created in subsequent steps. Your sample names should be unique and only contain letters, numbers and -. Do not use other symbols, such as spaces, dots or underscores in your sample names.
barcode: The barcode ID. Can be empty as it is not actively used in the workflow as of now
path: Path to the fastq.gz files. You can provide the relative path (i.e. relative to the working directory you start the snakemake workflow in) or absolute path (i.e. the location of a file or directory from the root directory(/)). The workflow accepts one file per barcode, so if you have more than one file merge these files first, for example using the cat command.

Example mapping file:

sample,barcode,path
bc01,barcode01,/path/barcode01.fastq.gz
bc02,barcode02,/path/barcode02.fastq.gz
...

If you use the example, ensure that the mapping.csv file resides in a folder called input or change the config.yaml accordingly.

The classifiers to use

You can choose what classifiers you want to use in classifiers: ["minimap2", "kraken2"]. Currently, two classifiers are implemented: (a) the alignment-based classifier minimap2 and (b) the kmer-based classifier kraken2. You can use both or either of the two classifiers.

The markers to investigate

You can select what markers you want to analyse in markers: ["SSU", "ITS1", "ITS2"]. The workflow was developed for primers targeting both the SSU and ITS1/ITS2 but the workflow will also run for either option selected and we plan to in the future extend the workflow to also accept the LSU rRNA gene.

Other parameters

Finally, you can change tool specific parameters: If desired, there are several parameters that can be changed by the user, such as the numbers of threads to use, the settings for the read filtering or the classification and so on. The configuration file provides more information on each parameter.

The most important parameters to check out are the settings for the filtering of reads and the minimum length cutoff (min_its_length) for the SSU and ITS1/ITS2 sequences.

Since several steps of this workflow are quite resource intensive, we recommend running this workflow on an HPC and set the numbers of threads accordingly.

Run NanoITS

Dry-run

To test whether the workflow is defined properly do a dry-run first. To do this, change part of the snakemake command as follows:

Provide the path to where you installed NanoITS after --s
Provide the path to the edited config file after --configfile
Provide the path to where you want snakemake to install all program dependencies after --conda-prefix. We recommend to install these into the folder in which you downloaded NanoITS but you can change this if desired

#activate conda environment with your snakemake installation, i.e. 
mamba activate snakemake_7.32.4

#test if everything runs as extended (edit as described above)
snakemake --use-conda --cores 1 \
  -s <path_to_NanoITS_install>/workflow/Snakefile \
  --configfile config/config.yaml \
  --conda-prefix <path_to_NanoITS_install>/workflow/.snakemake/conda  \
  --rerun-incomplete --nolock -np

Run NanoITS interactively

If the dry-run was successful you can run snakemake interactively with the following command. Adjust the cores according to your system.

snakemake --use-conda --cores 1 \
  -s <path_to_NanoITS_install>/workflow/Snakefile \
  --configfile config/config.yaml \
  --conda-prefix <path_to_NanoITS_install>/workflow/.snakemake/conda \
  --rerun-incomplete --nolock

Run NanoITS on Crunchomics (Uva-specific option)

If you are a student or staff at the University of Amsterdam and part of the IBED department, you can also run NanoITS via the Crunchomics HPC. Detailed instructions how to do thos can be found here.

Generate a report

After a successful run, you can create a report with some of the key output files as follows:

snakemake --report report.html \
  --configfile config/config.yaml \
  -s <path_to_NanoITS_install>/workflow/Snakefile

At the moment the report contains:

A schematic of the steps executed
Information about the reads before and after quality filtering (one file for each sample)
Information about the number of SSU and ITS1 sequences extracted before and after quality filtering
The results from the different classifiers represented as barplots (total counts and relative abundance) for different taxonomic ranks
Statistics of the snakemake run

Additionally, you can find some OTU tables in these locations:

results/<project_name>/tables/{marker}_otu_table.txt: An OTU table based on the taxonomy assignment and including the counts for each sample and classifier. The table is generated once for each marker generated.
results/<project_name>/tables/{marker}_otu_table_filtered..txt: A filtered OTU table based on the taxonomy assignment and including the counts for each sample and classifier. The table is generated once for each marker generated. The filtering discards samples with =<20 reads and singletons and the barplots in the reports where generated from the filtered OTU table.
results/<project_name>/classification/{classifier}/{marker}.merged.outmat.tsv: An OTU table based on the taxonomy assignment and including the counts for each sample. One separate table is generated for each marker investigated and classifier used.

Generate a report with the old report format

Snakemake v7.32.4 generates reports in a new format, which is a bit more convoluted than the older format. If you prefer the old format, you can create if running snakemake –report with an older version, such as snakemake v6.8.0.