quarto-inpute51bdccbb9c92a47

Workflow

flowchart LR
    classDef greenfill fill:#5B888C,stroke:#333,stroke-width:1,color:#fff;
    classDef dbfill fill:#E2F0F1,stroke:#333,stroke-width:1,color:#333;

    %% Workflow boxes
    A[Raw reads] -->|FASTQ| B[Quality control]
    B -->|FASTQ| C[Quality filtering]
    C -->|FASTQ| D[Mapping to <br> Reference DB]
    D -->|PAF| E[Quality filtering]
    E -->|PAF| F[Count table]

    %% Reference database node
    DB[(16S <br> Reference  <br> Database)] -->|FASTA| D  

    %% Assign classes after nodes exist
    class A,B,C,D,E,F greenfill
    class DB dbfill

    %% Arrow styles (index starts at 0 for first arrow)
    linkStyle 0,1,2,3,4,5 stroke:#5B888C,stroke-width:2,color:#000, fill: none

Quality filtering

flowchart LR
    classDef greenfill fill:#5B888C,stroke:#333,stroke-width:1,color:#fff;
    classDef darkgreenfill fill:#365154,stroke:#333,stroke-width:1,color:#fff;
    classDef dbfill fill:#E2F0F1,stroke:#333,stroke-width:1,color:#333;

    %% Workflow boxes
    A[Raw reads] -->|FASTQ| B[Quality control]
    B -->|FASTQ| C[Quality filtering]
    C -->|FASTQ| D[Mapping to <br> Reference DB]
    D -->|PAF| E[Quality filtering]
    E -->|PAF| F[Count table]

    %% Reference database node
    DB[(16S <br> Reference  <br> Database)] -->|FASTA| D  

    %% Assign classes after nodes exist
    class A,B,C darkgreenfill
    class D,E,F greenfill
    class DB dbfill

    %% Arrow styles (index starts at 0 for first arrow)
    linkStyle 0,1,2,3,4,5 stroke:#5B888C,stroke-width:2,color:#000, fill: none

Quality filtering

Useful Tools:

Porechop (adapter removal)
Chopper
Filtlong
…

Read mapping

flowchart LR
    classDef greenfill fill:#5B888C,stroke:#333,stroke-width:1,color:#fff;
    classDef darkgreenfill fill:#365154,stroke:#333,stroke-width:1,color:#fff;
    classDef dbfill fill:#E2F0F1,stroke:#333,stroke-width:1,color:#333;

    %% Workflow boxes
    A[Raw reads] -->|FASTQ| B[Quality control]
    B -->|FASTQ| C[Quality filtering]
    C -->|FASTQ| D[Mapping to <br> Reference DB]
    D -->|PAF| E[Quality filtering]
    E -->|PAF| F[Count table]

    %% Reference database node
    DB[(16S <br> Reference  <br> Database)] -->|FASTA| D  

    %% Assign classes after nodes exist
    class DB,C,D,E darkgreenfill
    class A,B,F greenfill

    %% Arrow styles (index starts at 0 for first arrow)
    linkStyle 0,1,2,3,4,5 stroke:#5B888C,stroke-width:2,color:#000, fill: none

Read mapping

Multi-mappers

Multi-mapping reads are reads that are mapping to multiple loci on the reference genome.

Multi-mappers

We can use mismatches and differences in read coverage to select the best match.

Count table and what comes next

flowchart LR
    classDef greenfill fill:#5B888C,stroke:#333,stroke-width:1,color:#fff;
    classDef darkgreenfill fill:#365154,stroke:#333,stroke-width:1,color:#fff;
    classDef dbfill fill:#E2F0F1,stroke:#333,stroke-width:1,color:#333;

    %% Workflow boxes
    A[Raw reads] -->|FASTQ| B[Quality control]
    B -->|FASTQ| C[Quality filtering]
    C -->|FASTQ| D[Mapping to <br> Reference DB]
    D -->|PAF| E[Quality filtering]
    E -->|PAF| F[Count table]

    %% Reference database node
    DB[(16S <br> Reference  <br> Database)] -->|FASTA| D  

    %% Assign classes after nodes exist
    class F darkgreenfill
    class A,B,C,D,E greenfill
    class DB dbfill

    %% Arrow styles (index starts at 0 for first arrow)
    linkStyle 0,1,2,3,4,5 stroke:#5B888C,stroke-width:2,color:#000, fill: none

Count table and what comes next

culture	taxon
Expected taxa composition:
C1	Pseudomonas
C1	Flavobacterium
C2	Pseudomonas
C2	Flavobacterium
C2	Streptomyces

taxon	C1_rep1	C1_rep2	C1_rep3	C2_rep1	C2_rep2	C2_rep3
Count table
Pseudomonas	900	850	800	300	400	250
Streptomyces	0	0	0	800	3600	850
Flavobacterium	800	600	1200	900	4200	850

taxon	C1_rep1	C1_rep2	C1_rep3	C2_rep1	C2_rep2	C2_rep3
Count table
Pseudomonas	900	850	800	300	400	250
Streptomyces	0	0	0	800	3600	850
Flavobacterium	800	600	1200	900	4200	850
total	1700	1450	2000	2000	8200	1950

Data wrangling

# Data re-structuring
## Convert wide to long format
## and add extra columns
df <- counts_wide |> 
    pivot_longer(
        cols = starts_with("C"),
        names_to = "sample",
        values_to = "count"
    ) |> 
    separate_wider_delim(sample, delim = "_", names = c("culture", "rep"), cols_remove = FALSE) 

## Calculate relative abundance
## and order factors by taxa abundance 
df <- df |> 
    group_by(sample) |> 
    mutate(rel_abund = count / sum(count) * 100) |> 
    ungroup() |> 
    mutate(taxon = fct_reorder(taxon, rel_abund, .fun = sum))

# Plot data
p <- ggplot(df, aes(x = sample, y = rel_abund, fill = taxon)) +
  geom_col(width = 0.9) +
  scale_fill_manual(values = c('#CCEDB1', '#41B7C4', '#144348ff')) +
  labs(x = "", y = "Relative abundance (%)", fill = "Genus") +
  facet_wrap(~culture, scales = "free_x") +
  theme_classic()

Statistics

# Filter taxa based on expected presence 
filtered_counts <- df |> 
  inner_join(expected, by = c("culture", "taxon"))

# Run ANOVA
res_aov <- aov(rel_abund ~ taxon * culture, data = filtered_counts)
summary(res_aov)

              Df Sum Sq Mean Sq F value   Pr(>F)    
taxon          2  924.5   462.2   9.974 0.004151 ** 
culture        1 1354.5  1354.5  29.228 0.000299 ***
taxon:culture  1 1012.6  1012.6  21.851 0.000875 ***
Residuals     10  463.4    46.3                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Run Posthoc test
res_tukey <- TukeyHSD(res_aov)
res_tukey$`taxon:culture` |> 
  as.data.frame() |> 
  filter(`p adj` < 0.05)

                                      diff       lwr       upr        p adj
Pseudomonas:C2-Flavobacterium:C1 -38.57986 -57.88592 -19.27379 0.0004143459
Pseudomonas:C2-Pseudomonas:C1    -39.62110 -58.92717 -20.31504 0.0003321700
Pseudomonas:C2-Flavobacterium:C2 -35.70356 -55.00963 -16.39750 0.0007787975
Streptomyces:C2-Pseudomonas:C2    31.59787  12.29181  50.90394 0.0020231948

Data interpretation

flowchart LR
    classDef greenfill fill:#5B888C,stroke:#333,stroke-width:1,color:#fff;
    classDef dbfill fill:#E2F0F1,stroke:#333,stroke-width:1,color:#333;
    classDef darkgreenfill fill:#365154,stroke:#333,stroke-width:1,color:#fff;

    %% Workflow boxes
    A[Raw reads] -->|FASTQ| B[Quality control]
    B -->|FASTQ| C[Quality filtering]
    C -->|FASTQ| D[Mapping to <br> Reference DB]
    D -->|PAF| E[Quality filtering]
    E -->|PAF| F[Count table]

    %% Reference database node
    DB[(16S <br> Reference  <br> Database)] -->|FASTA| D  

    %% Assign classes after nodes exist
    class A,B,C,D,E,F darkgreenfill
    class DB dbfill

    %% Arrow styles (index starts at 0 for first arrow)
    linkStyle 0,1,2,3,4,5 stroke:#5B888C,stroke-width:2,color:#000, fill: none

Once you have visualized your data and performed statistics, you can check whether the results fit the hypotheses you have made based on your interaction experiments

Introduction to Data Analysis in Microbial Ecology

Workflow

Quality filtering

Quality filtering

Quality filtering

Read mapping

Read mapping

Multi-mappers

Multi-mappers

Count table and what comes next

Count table and what comes next

Data wrangling

Statistics

Data interpretation

Introduction to Data Analysis

in Microbial Ecology