A stacked LSTM-nested deep-autoencoder network for identification of antimicrobial resistance of nosocomial pathogens
Antimicrobial resistance (AMR) is a pressing global health concern as microbial strains are becoming resistant to conventional antibiotics, undermining the efficacy of these drugs and leading to increased illness, death rates, and healthcare costs. Despite emergence of several computational approaches, they lack in generalization, scope, and scalability aspects. Here, we have developed StaLAENet (stacked LSTM-nested deep-autoencoder network) to predict antibiotic resistant gene (ARG) drug classes targeting ESKAPE pathogens. We considered completely annotated sequences of all strains and performed annotation of ARGs from core orthologous gene clusters. Subsequently, K-mer matrices were generated for the labelled ARGs, and were used as input to the StaLAENet framework. StaLAENet comprises two modules: a feature representation module and a classification module. The first module introduces a stacked LSTM-nested deep-autoencoder, leveraging the synergies of LSTM cells and autoencoder. The second module utilizes the encoded latent features to construct a dense network for final classification.
Block diagram of the suggested framework is depicted in Figure 1. We acquired publicly available genome data from the National Center for Biotechnology Information (NCBI) repository. Completely annotated strains of ESKAPE pathogens viz., E. faecium, S. aureus, K. pneumoniae, A. baumannii, P. aeruginosa, and Enterobacter sp. were considered for the present study. To ensure data relevance and accuracy, we implemented various filters during the search process. CD-HIT was utilized to identify clusters of orthologous sequences, followed by extraction of each pathogenic species-specific core gene clusters. After removing duplicates, non-redundant core orthologous gene clusters were acquired. The Resistance Gene Identifier (RGI) was used to explore ARGs within these non-redundant orthologous core clusters, referencing CARD. Following this, K-mer matrices were computed for these pathogen-specific core orthologous clusters. Subsequently, each pathogen-specific dataset was split into training and testing groups. StaLAENet framework was engineered and trained for the prediction of pathogen-specific antibiotic-resistant strains.
