It is increasingly challenging to rapidly develop medical countermeasures (MCMs) against the expansive growth of chemical and biological (CB) threats to the Joint Force. Drug discovery and development are time-consuming and do not always provide the expected results. Basic research into developing automated machine learning (AutoML) specifically for biological sequence data created a promising tool for these challenges: an AutoML framework called BioAutoMATED to analyze biological sequences, interpret biological sequence data, and to offer functionalities for designing new biological sequences with the desired properties.
The Defense Threat Reduction Agency’s (DTRA) Chemical and Biological Technologies Department in its role as the Joint Science and Technology Office (JSTO) for Chemical and Biological Defense, an integral component of the Chemical and Biological Defense Program, is investing with Massachusetts Institute of Technology (MIT) researchers toward the development of rapid MCMs for CB threats using BioAutoMATED.
The rapid growth of biological datasets has created opportunities for using ML as a new tool in biomedical research. These datasets often contain complex sequences, like DNA or protein sequences, and analyzing them with ML provides valuable insights and accelerates the development of new biological tools and treatments. Building, training, and deploying ML models requires expertise and involves many critical decisions that affect the outcome. For life science researchers with limited computational biological experience in ML, AutoML can handle tasks like data-preprocessing, selecting the right type of ML model, tuning the model parameters, and evaluating the model’s performance. Despite the availability of various AutoML tools, few are optimized for biological sequence data.
MIT researchers developed BioAutoMATED to be specifically designed for analyzing biological sequences and optimized for building models with nucleic acid, peptide, and glycan sequence inputs. For instance, it can identify potential regulatory sequences that are involved in gene regulation and analyze sequence features of peptides that are relevant for peptide-drug interactions. BioAutoMATED can also classify glycans (sugars, for example found on cell membrane or bacteria cell wall surfaces) based on their sequence and predict their immunogenicity in humans, which could influence the development of vaccines and therapeutics. BioAutoMATED also has the capability to design new biological sequences with specific functions to guide the development of new and more effective drugs.
BioAutoMATED provides not just predictions but also insights into the underlying biological sequence features driving those predictions. A key feature of BioAutoMATED is interpretability by offering tools to analyze the best models automatically. These tools can assist in identifying specific locations and patterns within sequences that contribute most to the model’s predictions. This transparency allows researchers to gain a deeper understanding of their data and to build more robust models.
Although BioAutoMATED is not a complete substitute for human expertise in ML, it is useful as a starting point in merging computational results with experimental work, which not only simplifies the integration of ML into accelerated drug discovery against new biological threats, but also presents new possibilities for scientists in drug development who lack extensive computational biology techniques.
Beyond its role in data analysis and synthetic biology, BioAutoMATED allows for accessible and interpretable ML applications and is a valuable tool in the field of drug discovery and drug development for emerging threats against the Joint Force.
POC: Annette von dem Bussche-Huennefeld, PhD,
annette.e.vondembussche-huennefeld.civ@mail.mil
Date Taken: | 08.12.2024 |
Date Posted: | 08.12.2024 22:58 |
Story ID: | 478445 |
Location: | FT. BELVOIR, VIRGINIA, US |
Web Views: | 236 |
Downloads: | 0 |
This work, Bioautomated: An End-To-End Machine Learning Tool for Bio-Medical Science and Drug Development, must comply with the restrictions shown on https://www.dvidshub.net/about/copyright.