Workflow analysis

Overview

Genomic Workflow

The ABRomics genomic workflow, powered by Galaxy France and launched from the ABRomics platform, is designed to process and analyze bacterial genomic data through a systematic approach. It is divided into four main steps, each ensuring robust and reliable results.

Quality and Contamination Control (v1.1.9)

This initial step ensures that raw paired-end Illumina reads are of high quality and control the contamination.

Key Steps:

Quality control and trimming
- fastp (Chen et al., 2018) QC control and trimming
Taxonomic assignation on trimmed data
- Kraken2 (Wood et al., 2019) assignation
- Bracken (Lu et al., 2017) to re-estimate abundance to the species level
- Recentrifuge (Martı́ Jose Manuel 2019) to make a krona chart
Aggregating outputs into a single JSON file
- ToolDistillator (ABRomics consortium, 2023) to extract and aggregate information from different tool outputs to JSON parsable files

Outputs:

Quality control:
- quality report
- trimmed raw reads
Taxonomic assignation:
- Tabular report of identified species
- Tabular file with assigned read to a taxonomic level
- Krona chart to illustrate species diversity of the sample
Aggregating outputs:
- JSON file with information about the outputs of fastp, Kraken2, Bracken, Recentrifuge

Specifications:

*Tool*	*Version*	*Parameter*	*Database*
fastp	0.26.0	Default.	/
Kraken2	2.1.3	Default.	PlusPF-16 (version 2022-06-07)
Bracken	3.1	Default. Taxonomic level: Species	PlusPF-16 (version 2022-06-07)
Recentrifuge	1.16.1	Default.	NCBI-2024-06-11
ToolDistillator	0.9.3	fastp – Default: report.json – Optional: trimmed_R1.fastq, trimmed_R2.fastq, report.html Kraken2 – Default: taxonomy_assignation.tsv – Optional: reads_assignation.txt Bracken – Default: output.tsv – Optional: kraken_reestimated_report.tsv, prior of read for estimation (default 0), read length, taxonomic level Recentrifuge – Default: data.tsv – Optional: report.html, stat.tsv	/

Genome Assembly (v1.1.7)

Once the data is cleaned, it is assembled into contigs to form a coherent genomic sequence.

Key Steps:

Assembly raw reads to a final contig fasta file
- Shovill (Seemann Torsten 2016)
Quality control of the assembly
- CheckM2 (Chklovski et al., 2023) to predict the completeness and contamination
- Quast (Gurevich et al., 2013)
- Bandage (Wick et al., 2015) to plot assembly graph
- Refseqmasher (Ondov et al., 2016) to identify the closed reference genome
Aggregating outputs into a single JSON file
- ToolDistillator (ABRomics consortium, 2023) to extract and aggregate information from different tool outputs to JSON parsable files

Outputs:

Assembly:
- Assembly with contig in fasta
- Mapped read on assembly in bam format
- Graph assembly in gfa format
Quality of Assembly:
- Expected completeness and contamination report
- Assembly report
- Assembly Graph
- Tabular result of closed reference genome
Aggregating outputs:
- JSON file with information about the outputs of Shovill, CheckM2, Quast, Bandage, Refseqmasher

Specifications:

*Tool*	*Version*	*Parameter*	*Database*
Shovill	1.1.0	Default.	/
CheckM2	1.0.2	Default. Model options: –allmodels Prodigal table: 11	/
Quast	5.3.0	Default. –min-contig 200 –contig-thresholds 0,200,500,1000	/
Bandage	2022.09	Default.	/
Refseqmasher	0.1.2	Default. Top N matches to report: 3	/
ToolDistillator	0.9.3	Shovill – Default: contigs.fasta – Optional: alignment.bam, contigs_graph.gfa CheckM2 – Default: quality_report.tsv – Optional: DIAMOND_RESULTS.tsv Quast – Default: output.tsv – Optional: report.html Bandage – Default: report_info.txt – Optional: plot.svg Refseqmasher – Default: results.txt	/

Genome Annotation (v1.1.9)

This step annotates assembled genomes and identifies key genetic elements.

Key Steps:

Genomic annotation
- Bakta (Schwengers et al., 2021) to predict CDS and small proteins (sORF)
Integron identification
- IntegronFinder2 (Néron et al., 2022) to identify CALIN elements, In0 elements, and complete integrons
Plasmid gene identification
- Plasmidfinder (Carattoli and Hasman 2020) to identify and typing plasmid sequences
Inserted sequence (IS) detection
- ISEScan (Xie and Tang 2017) to detect IS elements
Aggregating outputs into a single JSON file
- ToolDistillator (ABRomics consortium, 2023) to extract and aggregate information from different tool outputs to JSON parsable files

Outputs:

Genomic annotation:
- Genome annotation in tabular, gff and several other formats
- Annotation plot
- Nucleotide and protein sequences identified
- Summary of genomic identified elements
Integron identification:
- Integron identification in tabular format and a summary
Plasmid gene identification:
- Plasmid gene identified and associated blast hits
Inserted Element (IS) detection:
- IS element list in tabular format
- IS hits in fasta format
- ORF hits in protein and nucleotide fasta format
- IS annotation gff format
Aggregating outputs:
- JSON file with information about the outputs of Bakta, IntegronFinder2, Plasmidfinder, ISEScan

Specifications:

*Tool*	*Version*	*Parameter*	*Database*
Bakta	1.9.4	Default (“Full” annotation).	V5.1_2024-01-19 AMRFinderPlus: V3.12_2024-05-02.2
IntegronFinder2	2.0.5	Default. Thorough local detection: Yes Search also for promoter and attI sites? Yes	/
PlasmidFinder	2.1.6	Default.	commit 81c11f4 – 2023-12-04
ISEScan	1.7.2.3	Default.	/
ToolDistillator	0.9.3	Bakta – Default: output.json – Optional: protein.faa, nucleotide.fna, annotation.gff3, annotation.tsv, summary.txt, Genbank file, Embl file, contigs.fasta, hypothetical_protein.fasta, hypothetical_annotation.tsv, plot.svg IntegronFinder2 – Default: output.integrons – Optional: output.summary PlasmidFinder – Default: output.json – Optional: genome_hits.fasta, plasmid_hits.fasta ISEScan – Default: output.tsv – Optional: is.fna, orf.faa, orf.fna, annotation.gff3	/

AMR Gene Detection (v1.1.5)

Performed in parallel with annotation, this step focuses on detecting antimicrobial resistance (AMR) genes.

Key Steps:

Genomic detection
- Antimicrobial resistance gene identification:
  - StarAMR (Bharat et al., 2022) to blast against ResFinder (Zankari et al., 2012) and PlasmidFinder (Carattoli et al., 2014) databases
  - AMRFinderPlus (Feldgarden et al., 2021) to find antimicrobial resistance genes and point mutations
- Virulence gene identification:
  - ABRicate (Seemann Torsten 2016) with VFDB_A database
Aggregating outputs into a single JSON file
- ToolDistillator (ABRomics consortium, 2023) to extract and aggregate information from different tool outputs to JSON parsable files

Outputs:

Genomic detection
- Antimicrobial resistance gene identification:
  - AMR gene list
  - MLST typing
  - Plasmid gene identification
  - Blast hits
  - AMR gene fasta (assembled nucleotide sequences)
  - Point mutation list
- Virulence gene identification:
  - Gene identification in tabular format
Aggregating outputs:
- JSON file with information about the outputs of StarAMR, AMRFinderPlus, ABRicate

Specifications:

*Tool*	*Version*	*Parameter*	*Database*
StarAMR	0.10.0	Default. Percent identity threshold for BLAST: 90.0	ResFinder: 2.4.0 – commit e0525f2 – 2024-09-23 PointFinder: 4.1.1 – commit 694919f – 2024-08-08 PlasmidFinder: commit 4add282 – 2024-11-14 MLST version: 2.23.0
AMRFinderPlus	3.12.8	Default.	V3.12_2024-05-02.2
ABRicate	1.0.1	Default (Minimum DNA %identity and %coverage: 80.0).	VFDB
ToolDistillator	0.9.1	StarAMR – Default: resfinder.tsv – Optional: mlst.tsv, pointfinder.tsv, plasmidfinder.tsv, settings.tsv AMRFinderPlus – Default: report.tsv – Optional: point_mutation_report.tsv, nucleotide_sequence.fasta ABRicate – Default: report.tsv	/

Conclusion

The ABRomics workflow provides a comprehensive and integrated approach for bacterial genomic data analysis. From ensuring data quality to identifying critical genes, each step is optimized to deliver actionable and well-organized results.

Useful Links

Galaxy France platform: A web-based platform providing access to powerful, open-source tools for large-scale genomic and metagenomic data analysis.
Learning Pathway “Detection of AMR genes in bacterial genomes” part of the Galaxy Training Network. This pathway provides hands-on tutorials for researchers and students interested in detecting antimicrobial resistance (AMR) genes in bacterial genomes.

References

ABRomics consortium (2023). ToolDistillator: a tool to extract and aggregate information from different tool outputs to JSON parsable files. https://gitlab.com/ifb-elixirfr/abromics/tooldistillator
Bharat A, Petkau A, Avery BP, Chen JC, Folster JP, Carson CA, Kearney A, Nadon C, Mabon P, Thiessen J, Alexander DC, Allen V, El Bailey S, Bekal S, German GJ, Haldane D, Hoang L, Chui L, Minion J, Zahariadis G, Domselaar GV, Reid-Smith RJ, Mulvey MR. Correlation between Phenotypic and In Silico Detection of Antimicrobial Resistance in Salmonella enterica in Canada Using Staramr. Microorganisms. 2022; 10(2):292. 10.3390/microorganisms10020292
Carattoli, A., and H. Hasman, 2020 PlasmidFinder and in silico pMLST: identification and typing of plasmid replicons in whole-genome sequencing (WGS). Horizontal gene transfer: methods and protocols 285–294. 10.1007/978-1-4939-9877-7_20
Chen, S., Y. Zhou, Y. Chen, and J. Gu, 2018 fastp: an ultra-fast all-in-one FASTQ preprocessor. 10.1093/bioinformatics/bty560
Chklovski, A., Parks, D.H., Woodcroft, B.J. et al. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods 20, 1203–1212 (2023). https://doi.org/10.1038/s41592-023-01940-w
Feldgarden, M., Brover, V., Gonzalez-Escalona, N. et al. AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci Rep 11, 12728 (2021). 10.1038/s41598-021-91456-0
Gurevich, A., V. Saveliev, N. Vyahhi, and G. Tesler, 2013 QUAST: quality assessment tool for genome assemblies. Bioinformatics 29: 1072–1075. 10.1093/bioinformatics/btt086
Lu, J., F. P. Breitwieser, P. Thielen, and S. L. Salzberg, 2017 Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3: e104. 10.7717/peerj-cs.104
Martı́ Jose Manuel, 2019 Recentrifuge: Robust comparative analysis and contamination removal for metagenomics. PLoS computational biology 15: e1006967. 10.1371/journal.pcbi.100696
Néron, B., E. Littner, M. Haudiquet, A. Perrin, J. Cury et al., 2022 IntegronFinder 2.0: identification and analysis of integrons across bacteria, with a focus on antibiotic resistance in Klebsiella. Microorganisms 10: 700. 10.3390/microorganisms10040700
Ondov, B. D., Treangen, T. J., Melsted, P., Mallonee, A. B., Bergman, N. H., Koren, S., & Phillippy, A. M. (2016). Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology, 17(1). 10.1186/s13059-016-0997-x
Schwengers, O., L. Jelonek, M. A. Dieckmann, S. Beyvers, J. Blom et al., 2021 Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial genomics 7: 000685. 10.1099/mgen.0.000685
Seemann, T. (2016). ABRicate: mass screening of contigs for antiobiotic resistance genes. https://github.com/tseemann/abricate
Seemann, T. (2016). Shovill: Assemble bacterial isolate genomes from Illumina paired-end reads. https://github.com/tseemann/shovill
Wick, R. R., M. B. Schultz, J. Zobel, and K. E. Holt, 2015 Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31: 3350–3352. 10.1093/bioinformatics/btv383
Wood, D. E., and S. L. Salzberg, 2014 Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 15: R46. 10.1186/gb-2014-15-3-r46
Wood, D. E., J. Lu, and B. Langmead, 2019 Improved metagenomic analysis with Kraken 2. Genome biology 20: 1–13. 10.1186/s13059-019-1891-0
Xie, Z., and H. Tang, 2017 ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 33: 3340–3347. 10.1093/bioinformatics/btx433
Zankari, E., H. Hasman, S. Cosentino, M. Vestergaard, S. Rasmussen et al., 2012 Identification of acquired antimicrobial resistance genes. Journal of antimicrobial chemotherapy 67: 2640–2644. 10.1093/jac/dks261

Last Modified

07/01/2025

Contact Info

Overview

Genomic Workflow

Quality and Contamination Control (v1.1.9)

Key Steps:

Outputs:

Specifications:

Genome Assembly (v1.1.7)

Key Steps:

Outputs:

Specifications:

Genome Annotation (v1.1.9)

Key Steps:

Outputs:

Specifications:

AMR Gene Detection (v1.1.5)

Key Steps:

Outputs:

Specifications:

Conclusion

Useful Links

References