graph TD
A[Raw data]-->B[Cleaning of low quality reads and human reads: cutadapt, sickle, bwa mem]
A-->C[Quality control: FastQC and MultiQC]
B-->C[Quality control: FastQC and MultiQC]
B-->D[Taxonomic classification of read: Kaiju]
B-->E[Assembly: generate per sample contigs with metaSPAdes or megahit]
E-->F[Annotation of contigs: Prokka]
F-->G[Clustering: removal of gene redundancy with cd-hit]
G-->H[Quantification of reads for each gene cluster in each sample: featureCounts]
F-->I[Taxonomic affiliation of contigs: DIAMOND and Python parser]