The Falls Informatics Group operates across four interconnected research thrusts spanning computational drug discovery, clinical data science, AI-guided molecular design, and translational applications in infectious disease and oncology.
The computational cornerstone of the group is CANDO (Computational Analysis of Novel Drug Opportunities), an advanced multiscale drug discovery platform that integrates heterogeneous biological data—including small molecules, genes, proteins, and pathways—into a human interactomic graph network. This architecture enables robust, holistic characterization of compound behavior at a systems level, moving beyond single-target approaches to model how drugs engage the entire biological landscape. By generating proteome-wide interactomic signatures for each compound, CANDO enables systematic therapeutic repurposing and novel drug candidate prioritization across virtually any disease indication.
CANDO integrates CANDOCK, a fragment-based molecular docking algorithm that operates at atomic resolution across thousands of protein structures from the Protein Data Bank. Each drug is scored against the full structural proteome, producing a compound interactomic signature capturing polypharmacological behavior. Machine learning models compare these signatures across disease contexts—compounds most similar to approved drugs for a given indication are ranked as strong repurposing candidates, while highly ranked but previously unassociated compounds emerge as novel therapeutic predictions.
The platform is open source and actively maintained through the cando.py Python library, enabling reproducible bioanalytics workflows at scale. The group continuously benchmarks CANDO's performance against experimental databases, measuring recovery of approved drugs across similarity and consensus lists for multiple indications. By contextualizing drugs in a systems-level, multitarget landscape, CANDO strengthens both efficacy and safety prediction. Applications span COVID-19, influenza, NSCLC, glioma, COPD, HIV, schizophrenia, and osteoarthritis.
A major arm of the group's research leverages large-scale electronic health records (EHR) and statewide Medicaid administrative claims to characterize real-world prescribing patterns, disease trajectories, and treatment outcomes. The group has curated a large corpus of client and claims data for patients treated for substance use disorders—spanning millions of encounters across New York State—enabling epidemiological analyses of overdose and abuse risks that are impossible in smaller clinical trial settings.
Research in this thrust centers on the opioid crisis: identifying social and clinical predictors of opioid use disorder (OUD) emergence among patients treated for alcohol use disorder (AUD), quantifying how prescribing of opioids, benzodiazepines, and controlled substances has shifted across the three waves of the opioid epidemic, and mapping disparities in access to opioid agonist therapy by race, geography, and social determinants of health. These contributions generate evidence-based results that can inform providers and drive policy changes to better serve at-risk populations.
A connected line of work investigates polypharmacy-induced adverse drug reactions (ADRs). Under a NIDA K01 award, the group is developing a translational bioinformatics model that integrates drug–protein interaction data and patient-level clinical data into a deep learning architecture to accurately predict severe ADRs in patients receiving opioid therapeutics with co-prescriptions. A parallel Sinsheimer-funded project develops AI-driven proteomic ensemble models for precision clinical trial design. Together, these efforts bridge computational and clinical data to identify dangerous drug combinations and enable safer, personalized treatment strategies.
A foundational aspect of the group's work is the extraction, curation, and maintenance of diverse biomedical data from public and private sources. The group creates and maintains unique databases integrating heterogeneous data types—structural proteomics, genomics, chemical libraries, clinical records, and biological networks—into unified computational frameworks. This data science infrastructure underpins the CANDO platform and enables multiscale drug discovery across all research thrusts, continuously moving towards bridging gaps between separate data sources and types.
On the deep learning front, the group has developed several novel tools extending CANDO's predictive and generative capabilities: FusionNet for small molecule–protein binding affinity prediction; CVAE and CGGM for proteomic-scale generative drug design using deep learning with proteomic-level objectives; and RNAsee for transcriptome-scale RNA editing site identification. These tools represent new methodological contributions to biomedical informatics and cheminformatics, and are complemented by machine learning-based toxicity prediction using large-scale bioassay data such as Tox21.
A distinct but related line investigates RNA biology informatics. APOBEC3 enzyme-mediated C-to-U RNA editing is a post-transcriptional regulatory mechanism with broad implications for human disease, cancer biology, and human biodiversity. The group catalogs C-to-U editing events across human transcriptomes, assesses their consequences for protein function and disease variant interpretation, and evaluates their biological roles. The group also applies computational methods to small molecule modulation of pre-mRNA splicing and protein aggregation inhibition in cancer-associated mutant p53.
Computational insights from the CANDO platform and the group's AI tools are applied to discover repurposed therapeutics and design novel chemical entities with desired biological behavior across multiple disease domains. In infectious disease, the group pioneered "shotgun drug repurposing"—proteome-wide screening of all approved drugs to identify candidates against emerging pathogens. Applications to COVID-19 produced repurposed therapeutic candidates early in the pandemic; analogous programs targeted influenza antiviral mechanisms and HIV-1 protease inhibition, demonstrating the speed and breadth of the CANDO approach.
In oncology, the group focuses on driver mutation-targeted drug combination strategies. For KRAS-mutant non-small cell lung cancer (NSCLC)—one of the most treatment-refractory cancer subtypes—multiscale CANDO analysis identified synergistic drug pairings validated through computational and experimental collaboration with Roswell Park. For glioma, optimal therapeutic candidates are systematically prioritized from approved and investigational compound libraries, with results published in the Journal of Cheminformatics (2026). Additional contributions span COPD proteomics-guided discovery, cancer-associated mutant p53 aggregation inhibition, and structure-based HIV-1 inhibitor design.
A third disease focus is substance use disorder. Aligned with NIDA-funded research, the group applies CANDO to design and validate optimal nonaddictive analgesics—performing analytics across all drugs and compounds to define proteomic objectives for candidate analgesics that are subsequently synthesized and preclinically validated. This work bridges the group's computational drug discovery expertise with its clinical informatics research on opioid prescribing and adverse outcomes, creating an integrated pipeline from molecular design to real-world therapeutic impact.
Active grants and awards supporting the Falls Informatics Group research program.
We welcome collaborators from wet-lab biology, clinical medicine, and data science, as well as motivated graduate students, postdoctoral fellows, and undergraduates interested in computational drug discovery and biomedical informatics.