Introduction

NCBI AMRFinderPlus (Antimicrobial Resistance Gene Finder Plus; citation) is used to find AMR-specific genes and proteins from the NCBI Bacterial Antimicrobial Resistance Reference Gene Database (BioProject PRJNA313047). Furthermore , genes related to biocide and stress resistance, general efflux, virulence, or antigenicity are searched ('plus' option). In SeqSphere+, only the BLASTX protein search from translated nucleotide sequences of the assembly contigs against the AMR protein database is used to identify AMR proteins (the HMMER search to detect new resistances is currently not used in SeqSphere+).

NCBI AMRFinderPlus is deployed with the SeqSphere+ installation, but requires that the SeqSphere+ client is running on Linux or on Windows with installed Windows Subsystem for Linux (WSL).

The predefined task template 'NCBI AMRFinderPlus' can be downloaded with Linux or Windows clients from the Task Template Sphere (requires SeqSphere+ version 7.0 or later) for all organisms except Mycobacterium. For Escherichia coli the specific task template 'E. coli NCBI AMRFinderPlus' is listed for download in the Task Template Sphere.

Alternatively, AMRFinderPlus can also be called with those clients as standalone function in the tools menu. Once downloaded the task template is stored on the server. Therefore, this task template now becomes also available for clients without installed WSL. If such a client tries to execute a pipeline that makes use of a ‘NCBI AMRFinderPlus’ task template an error will be elicited (also when the Test Pipeline Script function is performed). However, viewing AMRFinderPlus results produced with an appropriate client is possible with all clients.

Disclaimer: Users of AMRFinderPlus or its supporting data files are cautioned that presence of a gene encoding an antimicrobial resistance (AMR) protein or resistance causing mutation does not necessarily indicate that the isolate carrying the gene is resistant to the corresponding antibiotic. AMR genes must be expressed to confer resistance. Many AMR proteins reduce antibiotic susceptibility somewhat, but not sufficiently to cross clinical breakpoints. Meanwhile, an isolate may gain or lose resistance to an antibiotic by mutational processes, such as the loss of a porin required to allow the antibiotic into the cell. For some families of AMR proteins, especially those borne by plasmids, correlations of genotype to phenotype are much more easily deciphered, but users are cautioned against over-interpretation (cited from AMRFinderPlus documentation).

Task Entry Overview

Genotyping result allele table of Task Entry Overview for AMRFinderPlus task

When a NCBI AMRFinderPlus task entry is processed, SeqSphere+ starts the deployed AMRFinderPlus. The Task Entry Overview of the processed task entry shows a table with the AMRFinderPlus output with the target genes found. If multiple matches for a target are found on different locations, each match is listed as separate row in the table.

The table rows for Element subtype AMR (core) and for E. coli Class STX are colored by the percental identity and alignment overlap with allele in database using the following thresholds:

  • Dark green row: Identity = 100% and Aligned Overlap = 100%
  • Light green row: Identity ≥ 90% and Aligned Overlap = 100%
  • Gray row: Identity ≥ 90% and Aligned Overlap ≥ 50%

If the Method column contains INTERNAL_STOP or PARTIAL_CONTIG_END the table cells are highlighted in orange to indicate a warning.

The table contains the following columns:

  • Class - For AMR genes this is the class of drugs that this gene is known to contribute to resistance of.
  • Subclass - If more specificity about drugs within the drug class is known it is elaborated here.
  • Gene symbol - Gene or gene-family symbol for nucleotide hit. For point mutations it is a combination of the gene symbol and the SNP definition separated by "_"
  • Sequence name - Full-text name for the nucleotide.
  • Method - Type of hit found by AMRFinder.
    • ALLELE: 100% sequence match over 100% of length to a protein named at the allele level in the AMRFinderPlus database.
    • EXACT: 100% sequence match over 100% of length to a protein in the database that is not a named allele.
    • BLAST: BLAST alignment is > 90% of length and > 90% identity to a protein in the AMRFinderPlus database.
    • PARTIAL: BLAST alignment is > 50% of length, but < 90% of length and > 90% identity to the reference, and does not end at a contig boundary.
    • PARTIAL_CONTIG_END: BLAST alignment is > 50% of length, but < 90% of length and > 90% identity to the reference, and the break occurs at a contig boundary indicating that this gene is more likely to have been split by an assembly issue.
    • INTERNAL_STOP: Translated blast reveals a stop codon that occurred before the end of the protein.
    • POINT1): Point mutation identified by blast1).
  • % Coverage of reference sequence - % of reference covered by blast hit.
  • % Identity to reference sequence - % nucleotide identity for nucleotide reference.
  • Element type - AMRFinder+ genes are placed into functional categories based on predominant function AMR, STRESS, or VIRULENCE.
  • Element subtype2) - Further elaboration of functional category (ANTIGEN, BIOCIDE, HEAT, METAL, PORIN). If no more specific category is available, the element type is repeated.
  • Scope - The AMRFinderPlus database is split into 'core' AMR proteins that are expected to have an effect on resistance and 'plus' proteins of interest added with less stringent inclusion criteria. These may or may not be expected to have an effect on phenotype.
    • Core: this subset includes highly curated AMR-specific genes and proteins from the Bacterial Antimicrobial Resistance Reference Gene Database (BioProject PRJNA313047) plus point mutations. The sources of input for this curated database include allele assignments, exchanges with other external curated resources, and reports of novel antimicrobial resistance proteins in the literature.
    • Plus: this subset includes genes related to biocide and stress resistance, general efflux, virulence, or antigenicity. These genes are only shown if the --plus option is used.

Below the table a colored threshold legend, version information, and citation(s) are stated. More details can be found in the NCBI AMRFinderPlus documentation.

1) Point mutations are only supported for the following species: Acinetobacter baumannii, Burkholderia cepacia, Burkholderia pseudomallei, Campylobacter coli/jejuni, Clostridioides difficile, Enterococcus faecalis, Enterococcus faecium, Escherichia, Klebsiella oxytoca, Klebsiella pneumoniae, Neisseria gonorrhoeae, Neisseria meningitidis, Pseudomonas aeruginosa, Salmonella, Staphylococcus aureus, Staphylococcus pseudintermedius, Streptococcus agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, Vibrio cholerae

2) For Streptococcus pneumoniae please see the note about subtype AMR-SUSCEPTIBLE]

Result Fields

Sample result table containing the result fields of an E. coli AMRFinderPlus task entry
Sample search for a result fields of an AMRFinderPlus task entry with criteria contains
Sample search for a result fields of an AMRFinderPlus task entry with criteria is not empty
Comparison Table function to remove resistance/virulence columns containing only empty values
Choosing AMRFinderPlus for distance calculation in a Comparison Table

The stored result fields are per following AMR (core) subclass aggregated and therefore may contain more than one target that are then delimited by '/' (e.g., "Beta-lactam" = "blaSHV-11 / blaSHV-12 (ESBL) / blaTEM-1"):

Class Subclass/Resistance
AMINOGLYCOSIDE APRAMYCIN
AMINOGLYCOSIDE AMIKACIN
AMINOGLYCOSIDE GENTAMICIN
AMINOGLYCOSIDE HYGROMYCIN
AMINOGLYCOSIDE KANAMYCIN
AMINOGLYCOSIDE KASUGAMYCIN
AMINOGLYCOSIDE SPECTINOMYCIN
AMINOGLYCOSIDE STREPTOMYCIN
AMINOGLYCOSIDE TOBRAMYCIN
AVILAMYCIN AVILAMYCIN
BACITRACIN BACITRACIN
BETA-LACTAM AMPICILLIN (obsolete)
BETA-LACTAM BETA-LACTAM
BETA-LACTAM CARBAPENEM
BETA-LACTAM CEPHALOSPORIN
BETA-LACTAM CEPHALOTHIN
BETA-LACTAM METHICILLIN
BLEOMYCIN BLEOMYCIN
BLEOMYCIN ZORBAMYCIN
COLISTIN COLISTIN
EFFLUX EFFLUX
FLUOROQUINOLONE FLUOROQUINOLONE
FOSFOMYCIN FOSFOMYCIN
FUSIDIC ACID FUSIDIC ACID
GLYCOPEPTIDE VANCOMYCIN
LINCOSAMIDE LINCOSAMIDE
MACROLIDE ERYTHROMYCIN
MACROLIDE TELITHROMYCIN
MACROLIDE TYLOSIN
MUPIROCIN MUPIROCIN
NITROIMIDAZOLE NITROIMIDAZOLE
PHENICOL CHLORAMPHENICOL
PHENICOL FLORFENICOL
PHENICOL PHENICOL
OXAZOLIDINONE OXAZOLIDINONE
OXAZOLIDINONE LINEZOLID
PLEUROMUTILIN PLEUROMUTILIN
PLEUROMUTILIN TIAMULIN
QUINOLONE QUINOLONE
RIFAMYCIN RIFAMPIN
RIFAMYCIN RIFAMYCIN
STREPTOGRAMIN STREPTOGRAMIN
STREPTOTHRICIN STREPTOTHRICIN
SULFONAMIDE SULFONAMIDE
TETRACENOMYCIN TETRACENOMYCIN
TETRACYCLINE TETRACYCLINE
TETRACYCLINE TIGECYCLINE
THIOSTREPTON THIOSTREPTON
TRIMETHOPRIM TRIMETHOPRIM
TUBERACTINOMYCIN VIOMYCIN

Those fields are filled-in by the AMRFinderPlus output column 'Subclass/Resistance' of all confident calls, i.e., green rows. 'Classes' that appear as result in 'Subclass/Resistance' column (like AMINOGLYCOSIDE) are resolved into the specific subclasses. If the result for class BETA-LACTAM contains the word extended in the column 'Sequence Name', then the postifx (ESBL) is appended to the result field value except for those cases that contain in the same column also the text class C that will get the postfix (AmpC). Furthermore, if only the text class C is found also the postfix (AmpC) will be appended. Finally, if in the column 'Sequence Name' the text carbapenem-hydrolyzing is found, then the postfix (carbapenemase) will be attached. If multiple matches for a target are found on different locations, the targets matched are concatenated with "," (e.g., "Beta-lactam" = "blaSHV-11 / blaSHV-12 (ESBL) / blaTEM-1, blaTEM-1").

Priority AMR Targets (targets that might confer resistance to carbapenem, colistin, vancomycin, or methicilin or that contain ESBL or AmpC in their name) are highlighted in red.

Additionally to the fields listed above, the Escherichia coli specific task template 'E. coli NCBI AMRFinderPlus' defines the two result fields STX1 and STX2.

The result fields are shown in the result tab of the Sample Overview. By clicking the NCBI AMRFinderPlus category links all details for this sample can be viewed. The result fields can be selected from the NCBI AMRFinderPlus entry for searching under 'Field Criteria' in the advanced mode of the sample search dialog. The search can be done either by using the operator 'contains' for specific targets or by using the operator 'is not empty' for any target in this field.

The result fields can also be retrieved for a Comparison Table and for exporting metadata. If the NCBI AMRFinderPlus Task Template is chosen in the Create Comparison Table dialog, then the results fields are not used for distance calculation but are shown with gray column headings for descriptive purpose only. For a better overview it is recommended to use the command Columns | Remove Resistance/Virulence Genotyping Columns where All Values Are Missing to get rid of those columns that are for all samples empty.

If a resistance profile (presence/absence) comparison of several samples is intended to be done, then the command Columns | Transform Resistance/Virulence Genotyping Columns to Absence/Presence (+/-) may be used. Alternatively handle missing values as an own category when building trees. Next the command Button16-selectGenotypingSchemesForDist.png Columns | Select Genotyping Schemes for Distance Calculation ... must be elicited and in the upcoming dialog the AMRFinderPlus Task Template must be selected and all other Task Templates should best be deselected from distance calculation. If data were not transformed, then once the command for calculating a tree was elicited in the upcoming missing values dialogue the option Missing Values are Own Category must be selected.

Tools Menu Function

Tools menu function dialog for NCBI AMRFinderPlus

The NCBI AMRFinderPlus can also be invoked manually for a FASTA file using the menu function Tools | Genome Utilities | Antimicrobial Resistance Finder (AMRFinderPlus). The following options in the dialog can be set:

  • Organism: This field is selected as Undefined by default and it contains values of supported organisms. Selecting organism enables screening for point mutations which suppresses the reporting of some that are extremely common in selected organism.
  • Provide results from 'Plus' genes: AMRFinderPlus splits the database into two subsets and the one that includes genes related to biocide and stress resistance, general efflux, virulence, or antigenicity will be only shown if this option is selected (selected by default).
  • Report genotypes at all locations screened for point mutations: For supported organisms point mutations are identified by BLAST alignments that cover at least 50% of the reference at 90% identity (selected by default). Offsets are calculated relative to the beginning of the reference and reported in that coordinate system. That is if there are indels within the query sequence the coordinates of the point mutation will reflect the offset from the start codon in the reference rather than in the query sequence. When enabled, the result dialog will show a second table with all locations that were screened for point mutations. The type of mutation is indicated by a keyword added to the 'Sequence name' column: this keyword is [WILDTYPE] for non-observed reference alleles and [UNKNOWN] for observed non-reference alleles (more details). Only the locations that do not contain the keywords [WILDTYPE] or [UNKNOWN] appear in the first table, too. Those locations are also always reported when the NCBI AMRFinderPlus is run non-interactively in a pipeline using the AMRFinder Task Template. However, this second table is then only created temporary and not stored.

Chromosome and Plasmids Overview

If the Chromosome and Plasmids Overview Task Template is used for the same Sample, some AMR results are integrated there.

Runtimes

The following table contains the measured NCBI AMRFinderPlus runtimes on a quad core (8 threads) desktop with 32GB RAM (using Windows Subsystem for Linux). Two different thread settings were used: single-threaded and parallel using all cores/hyperthreads. The speedup when using multiple threads depends on the number of (large) contigs as the BLASTX search is parallelizable per contig. Therefore, finished genomes consisting of a single contig take longer than draft genomes.

Strain Contig Count Genome Size (Mbases) Runtime with 1 thread Runtime with 8 threads
Staphylococcus aureus COL (draft genome) 36 2.8 122s 43s
Escherichia coli Sakai (NCBI) 1 5.5 286s 287s
Escherichia coli Sakai (draft genome) 248 5.3 291s 87s
Pseudomonas aeruginosa PAO1 (draft genome) 165 6.2 423s 107s