Research — Falls Informatics Group

Key Publications

Falls et al. Front. Chemistry 2021 — HIV-1 protease inhibitor binding via CANDOCK
Mangione, Falls & Samudrala, JCIM 2020 — cando.py open source release
Falls et al. BMC Res. Notes 2019 — CANDO scoring criteria
Van Norden, Mangione, Falls & Samudrala, Bioinformatics 2025 — Platform benchmarking
Mangione, Falls & Samudrala, Front. Pharmacol. 2023 — Biological network characterization

Related Funding

NIST High-Performance Computing Drug Discovery Initiative, 2022–2024 ($1M)
NIDA K01 — Polypharmacy & Adverse Drug Reactions, 2022–2027 ($1.05M)

Collaborators

Ram Samudrala Lab (CANDO co-developer)
Roswell Park Comprehensive Cancer Center
UB Computational & Data-Enabled Science

01

Core Platform

Multiscale Drug Discovery & the CANDO Platform

The computational cornerstone of the group is CANDO (Computational Analysis of Novel Drug Opportunities), an advanced multiscale drug discovery platform that integrates heterogeneous biological data—including small molecules, genes, proteins, and pathways—into a human interactomic graph network. This architecture enables robust, holistic characterization of compound behavior at a systems level, moving beyond single-target approaches to model how drugs engage the entire biological landscape. By generating proteome-wide interactomic signatures for each compound, CANDO enables systematic therapeutic repurposing and novel drug candidate prioritization across virtually any disease indication.

CANDO integrates CANDOCK, a fragment-based molecular docking algorithm that operates at atomic resolution across thousands of protein structures from the Protein Data Bank. Each drug is scored against the full structural proteome, producing a compound interactomic signature capturing polypharmacological behavior. Machine learning models compare these signatures across disease contexts—compounds most similar to approved drugs for a given indication are ranked as strong repurposing candidates, while highly ranked but previously unassociated compounds emerge as novel therapeutic predictions.

The platform is open source and actively maintained through the cando.py Python library, enabling reproducible bioanalytics workflows at scale. The group continuously benchmarks CANDO's performance against experimental databases, measuring recovery of approved drugs across similarity and consensus lists for multiple indications. By contextualizing drugs in a systems-level, multitarget landscape, CANDO strengthens both efficacy and safety prediction. Applications span COVID-19, influenza, NSCLC, glioma, COPD, HIV, schizophrenia, and osteoarthritis.

Methods & Tools

CANDO CANDOCK cando.py Interactomic Graph Networks Molecular Docking Proteome-Wide Screening DrugBank Protein Data Bank Machine Learning Drug Repurposing

Key Publications

Falls et al. JSAD 2025 — Predictors of OUD development
Falls et al. SAR 2024 — Opioid epidemic wave analysis
Lu et al. JAPHA 2025 — Prescribing disparity by social determinants
Jacobs et al. JGIM 2022 — Opioid & benzodiazepine co-prescribing trends
Lu et al. J Biomed Informatics 2023 — NY State AUD cohort linkage
Kuo et al. DADR 2024 — High-risk opioid prescribing, 2005–2018

Related Funding

NIDA K01 — Polypharmacy & Adverse Drug Reactions, 2022–2027 ($1.05M)
NCATS CTSA — UB Clinical & Translational Science Award, 2025–2031 ($29.2M)
NLM BRIGHT Education Training Program, 2022–2027 ($2M)

Data Sources

NY OASAS Medicaid Claims (2005–2018)
SPARCS Inpatient/Outpatient Records
UB Clinical & Translational Research Center EHR

02

Real-World Evidence

Clinical & Population-Scale Informatics

A major arm of the group's research leverages large-scale electronic health records (EHR) and statewide Medicaid administrative claims to characterize real-world prescribing patterns, disease trajectories, and treatment outcomes. The group has curated a large corpus of client and claims data for patients treated for substance use disorders—spanning millions of encounters across New York State—enabling epidemiological analyses of overdose and abuse risks that are impossible in smaller clinical trial settings.

Research in this thrust centers on the opioid crisis: identifying social and clinical predictors of opioid use disorder (OUD) emergence among patients treated for alcohol use disorder (AUD), quantifying how prescribing of opioids, benzodiazepines, and controlled substances has shifted across the three waves of the opioid epidemic, and mapping disparities in access to opioid agonist therapy by race, geography, and social determinants of health. These contributions generate evidence-based results that can inform providers and drive policy changes to better serve at-risk populations.

A connected line of work investigates polypharmacy-induced adverse drug reactions (ADRs). Under a NIDA K01 award, the group is developing a translational bioinformatics model that integrates drug–protein interaction data and patient-level clinical data into a deep learning architecture to accurately predict severe ADRs in patients receiving opioid therapeutics with co-prescriptions. A parallel Sinsheimer-funded project develops AI-driven proteomic ensemble models for precision clinical trial design. Together, these efforts bridge computational and clinical data to identify dangerous drug combinations and enable safer, personalized treatment strategies.

Methods & Tools

SAS R SQL Deep Learning ADR Prediction Medicaid Claims SPARCS EHR Analytics Epidemiology Longitudinal Cohort Design Social Determinants of Health

Key Publications

Overhoff, Falls et al. Pharmaceuticals 2021 — Proteomic-scale deep learning for drug design
Moukheiber, Mangione, Falls et al. Molecules 2022 — ML toxicity with Tox21
Van Norden, Falls et al. Comm. Biology 2024 — APOBEC3 C-to-U RNA editing & disease
Xu, Falls et al. BMC Pharmacol. Toxicol. 2025 — De novo therapeutic candidate generation
Chatrikhi, Falls et al. Cell Chem. Biol. 2021 — Small molecule pre-mRNA splicing

Related Funding

Sinsheimer Scholar Award, 2024–2026 ($60K)
NLM BRIGHT Program (AI/ML training), 2022–2027

Focus Areas

Structure-based de novo molecular generation
Proteome-scale deep learning architectures
APOBEC3-mediated RNA editing & disease variants
Computational toxicology & bioassay ML

03

Data Science & Machine Learning

AI-Guided Drug Design, Data Science & Genomics

A foundational aspect of the group's work is the extraction, curation, and maintenance of diverse biomedical data from public and private sources. The group creates and maintains unique databases integrating heterogeneous data types—structural proteomics, genomics, chemical libraries, clinical records, and biological networks—into unified computational frameworks. This data science infrastructure underpins the CANDO platform and enables multiscale drug discovery across all research thrusts, continuously moving towards bridging gaps between separate data sources and types.

On the deep learning front, the group has developed several novel tools extending CANDO's predictive and generative capabilities: FusionNet for small molecule–protein binding affinity prediction; CVAE and CGGM for proteomic-scale generative drug design using deep learning with proteomic-level objectives; and RNAsee for transcriptome-scale RNA editing site identification. These tools represent new methodological contributions to biomedical informatics and cheminformatics, and are complemented by machine learning-based toxicity prediction using large-scale bioassay data such as Tox21.

A distinct but related line investigates RNA biology informatics. APOBEC3 enzyme-mediated C-to-U RNA editing is a post-transcriptional regulatory mechanism with broad implications for human disease, cancer biology, and human biodiversity. The group catalogs C-to-U editing events across human transcriptomes, assesses their consequences for protein function and disease variant interpretation, and evaluates their biological roles. The group also applies computational methods to small molecule modulation of pre-mRNA splicing and protein aggregation inhibition in cancer-associated mutant p53.

Methods & Tools

FusionNet RNAsee CVAE CGGM PyTorch Deep Learning Graph Neural Networks RDKit Tox21 De Novo Generation RNA Editing APOBEC3 Transcriptomics Network Toxicology

Key Publications

Mangione, Falls & Samudrala, Front. Pharmacol. 2022 — COVID-19 CANDO candidates
Mangione, Falls et al. Drug Discov. Today 2020 — Shotgun repurposing for pandemics
Kumari, Falls et al. Clin. Microbiol. Rev. 2023 — Influenza antiviral strategies
Bruggemann, Falls et al. IJMS 2023 — KRAS drug combinations in NSCLC
Xu, Falls et al. bioRxiv 2025 — Glioma therapeutic candidates via CANDO
Falls, Fine, Chopra & Samudrala, Front. Chem. 2021 — HIV-1 protease inhibitor prediction

Disease Focus Areas

COVID-19 & SARS-CoV-2
Influenza A/B
HIV-1
KRAS-mutant NSCLC
Glioblastoma & Glioma
COPD

Wet-Lab Partners

Roswell Park Comprehensive Cancer Center
UB Jacobs School of Medicine
External virology validation collaborators

04

Disease Translation

Translational Applications: Infectious Disease, Oncology & Substance Use

Computational insights from the CANDO platform and the group's AI tools are applied to discover repurposed therapeutics and design novel chemical entities with desired biological behavior across multiple disease domains. In infectious disease, the group pioneered "shotgun drug repurposing"—proteome-wide screening of all approved drugs to identify candidates against emerging pathogens. Applications to COVID-19 produced repurposed therapeutic candidates early in the pandemic; analogous programs targeted influenza antiviral mechanisms and HIV-1 protease inhibition, demonstrating the speed and breadth of the CANDO approach.

In oncology, the group focuses on driver mutation-targeted drug combination strategies. For KRAS-mutant non-small cell lung cancer (NSCLC)—one of the most treatment-refractory cancer subtypes—multiscale CANDO analysis identified synergistic drug pairings validated through computational and experimental collaboration with Roswell Park. For glioma, optimal therapeutic candidates are systematically prioritized from approved and investigational compound libraries, with results published in the Journal of Cheminformatics (2026). Additional contributions span COPD proteomics-guided discovery, cancer-associated mutant p53 aggregation inhibition, and structure-based HIV-1 inhibitor design.

A third disease focus is substance use disorder. Aligned with NIDA-funded research, the group applies CANDO to design and validate optimal nonaddictive analgesics—performing analytics across all drugs and compounds to define proteomic objectives for candidate analgesics that are subsequently synthesized and preclinically validated. This work bridges the group's computational drug discovery expertise with its clinical informatics research on opioid prescribing and adverse outcomes, creating an integrated pipeline from molecular design to real-world therapeutic impact.

Methods & Tools

CANDO CANDOCK Drug Repurposing Molecular Docking Network Analysis Drug Combination Synergy Nonaddictive Analgesics Substance Use Disorder Proteomics Bioinformatics In Vitro Validation

Research Thrusts

Key Publications

Related Funding

Collaborators

Multiscale Drug Discovery & the CANDO Platform

Methods & Tools

Key Publications

Related Funding

Data Sources

Clinical & Population-Scale Informatics

Methods & Tools

Key Publications

Related Funding

Focus Areas

AI-Guided Drug Design, Data Science & Genomics

Methods & Tools

Key Publications

Disease Focus Areas

Wet-Lab Partners

Translational Applications: Infectious Disease, Oncology & Substance Use

Methods & Tools

Research Support

Interested in Collaboration or Joining?