Research Thrusts

The Falls Informatics Group operates across four interconnected research thrusts spanning computational drug discovery, clinical data science, AI-guided molecular design, and translational applications in infectious disease and oncology.

🧬
Multiscale Drug Discovery & CANDO
🏥
Clinical & Population Informatics
🤖
AI-Guided Drug Design & Data Science
🦠
Translational Disease Applications

Key Publications

  • Falls et al. Front. Chemistry 2021 — HIV-1 protease inhibitor binding via CANDOCK
  • Mangione, Falls & Samudrala, JCIM 2020 — cando.py open source release
  • Falls et al. BMC Res. Notes 2019 — CANDO scoring criteria
  • Van Norden, Mangione, Falls & Samudrala, Bioinformatics 2025 — Platform benchmarking
  • Mangione, Falls & Samudrala, Front. Pharmacol. 2023 — Biological network characterization

Related Funding

  • NIST High-Performance Computing Drug Discovery Initiative, 2022–2024 ($1M)
  • NIDA K01 — Polypharmacy & Adverse Drug Reactions, 2022–2027 ($1.05M)

Collaborators

  • Ram Samudrala Lab (CANDO co-developer)
  • Roswell Park Comprehensive Cancer Center
  • UB Computational & Data-Enabled Science
01
Core Platform

Multiscale Drug Discovery & the CANDO Platform

The computational cornerstone of the group is CANDO (Computational Analysis of Novel Drug Opportunities), an advanced multiscale drug discovery platform that integrates heterogeneous biological data—including small molecules, genes, proteins, and pathways—into a human interactomic graph network. This architecture enables robust, holistic characterization of compound behavior at a systems level, moving beyond single-target approaches to model how drugs engage the entire biological landscape. By generating proteome-wide interactomic signatures for each compound, CANDO enables systematic therapeutic repurposing and novel drug candidate prioritization across virtually any disease indication.

CANDO integrates CANDOCK, a fragment-based molecular docking algorithm that operates at atomic resolution across thousands of protein structures from the Protein Data Bank. Each drug is scored against the full structural proteome, producing a compound interactomic signature capturing polypharmacological behavior. Machine learning models compare these signatures across disease contexts—compounds most similar to approved drugs for a given indication are ranked as strong repurposing candidates, while highly ranked but previously unassociated compounds emerge as novel therapeutic predictions.

The platform is open source and actively maintained through the cando.py Python library, enabling reproducible bioanalytics workflows at scale. The group continuously benchmarks CANDO's performance against experimental databases, measuring recovery of approved drugs across similarity and consensus lists for multiple indications. By contextualizing drugs in a systems-level, multitarget landscape, CANDO strengthens both efficacy and safety prediction. Applications span COVID-19, influenza, NSCLC, glioma, COPD, HIV, schizophrenia, and osteoarthritis.

Methods & Tools

CANDO CANDOCK cando.py Interactomic Graph Networks Molecular Docking Proteome-Wide Screening DrugBank Protein Data Bank Machine Learning Drug Repurposing

Key Publications

  • Falls et al. JSAD 2025 — Predictors of OUD development
  • Falls et al. SAR 2024 — Opioid epidemic wave analysis
  • Lu et al. JAPHA 2025 — Prescribing disparity by social determinants
  • Jacobs et al. JGIM 2022 — Opioid & benzodiazepine co-prescribing trends
  • Lu et al. J Biomed Informatics 2023 — NY State AUD cohort linkage
  • Kuo et al. DADR 2024 — High-risk opioid prescribing, 2005–2018

Related Funding

  • NIDA K01 — Polypharmacy & Adverse Drug Reactions, 2022–2027 ($1.05M)
  • NCATS CTSA — UB Clinical & Translational Science Award, 2025–2031 ($29.2M)
  • NLM BRIGHT Education Training Program, 2022–2027 ($2M)

Data Sources

  • NY OASAS Medicaid Claims (2005–2018)
  • SPARCS Inpatient/Outpatient Records
  • UB Clinical & Translational Research Center EHR
02
Real-World Evidence

Clinical & Population-Scale Informatics

A major arm of the group's research leverages large-scale electronic health records (EHR) and statewide Medicaid administrative claims to characterize real-world prescribing patterns, disease trajectories, and treatment outcomes. The group has curated a large corpus of client and claims data for patients treated for substance use disorders—spanning millions of encounters across New York State—enabling epidemiological analyses of overdose and abuse risks that are impossible in smaller clinical trial settings.

Research in this thrust centers on the opioid crisis: identifying social and clinical predictors of opioid use disorder (OUD) emergence among patients treated for alcohol use disorder (AUD), quantifying how prescribing of opioids, benzodiazepines, and controlled substances has shifted across the three waves of the opioid epidemic, and mapping disparities in access to opioid agonist therapy by race, geography, and social determinants of health. These contributions generate evidence-based results that can inform providers and drive policy changes to better serve at-risk populations.

A connected line of work investigates polypharmacy-induced adverse drug reactions (ADRs). Under a NIDA K01 award, the group is developing a translational bioinformatics model that integrates drug–protein interaction data and patient-level clinical data into a deep learning architecture to accurately predict severe ADRs in patients receiving opioid therapeutics with co-prescriptions. A parallel Sinsheimer-funded project develops AI-driven proteomic ensemble models for precision clinical trial design. Together, these efforts bridge computational and clinical data to identify dangerous drug combinations and enable safer, personalized treatment strategies.

Methods & Tools

SAS R SQL Deep Learning ADR Prediction Medicaid Claims SPARCS EHR Analytics Epidemiology Longitudinal Cohort Design Social Determinants of Health

Key Publications

  • Overhoff, Falls et al. Pharmaceuticals 2021 — Proteomic-scale deep learning for drug design
  • Moukheiber, Mangione, Falls et al. Molecules 2022 — ML toxicity with Tox21
  • Van Norden, Falls et al. Comm. Biology 2024 — APOBEC3 C-to-U RNA editing & disease
  • Xu, Falls et al. BMC Pharmacol. Toxicol. 2025 — De novo therapeutic candidate generation
  • Chatrikhi, Falls et al. Cell Chem. Biol. 2021 — Small molecule pre-mRNA splicing

Related Funding

  • Sinsheimer Scholar Award, 2024–2026 ($60K)
  • NLM BRIGHT Program (AI/ML training), 2022–2027

Focus Areas

  • Structure-based de novo molecular generation
  • Proteome-scale deep learning architectures
  • APOBEC3-mediated RNA editing & disease variants
  • Computational toxicology & bioassay ML
03
Data Science & Machine Learning

AI-Guided Drug Design, Data Science & Genomics

A foundational aspect of the group's work is the extraction, curation, and maintenance of diverse biomedical data from public and private sources. The group creates and maintains unique databases integrating heterogeneous data types—structural proteomics, genomics, chemical libraries, clinical records, and biological networks—into unified computational frameworks. This data science infrastructure underpins the CANDO platform and enables multiscale drug discovery across all research thrusts, continuously moving towards bridging gaps between separate data sources and types.

On the deep learning front, the group has developed several novel tools extending CANDO's predictive and generative capabilities: FusionNet for small molecule–protein binding affinity prediction; CVAE and CGGM for proteomic-scale generative drug design using deep learning with proteomic-level objectives; and RNAsee for transcriptome-scale RNA editing site identification. These tools represent new methodological contributions to biomedical informatics and cheminformatics, and are complemented by machine learning-based toxicity prediction using large-scale bioassay data such as Tox21.

A distinct but related line investigates RNA biology informatics. APOBEC3 enzyme-mediated C-to-U RNA editing is a post-transcriptional regulatory mechanism with broad implications for human disease, cancer biology, and human biodiversity. The group catalogs C-to-U editing events across human transcriptomes, assesses their consequences for protein function and disease variant interpretation, and evaluates their biological roles. The group also applies computational methods to small molecule modulation of pre-mRNA splicing and protein aggregation inhibition in cancer-associated mutant p53.

Methods & Tools

FusionNet RNAsee CVAE CGGM PyTorch Deep Learning Graph Neural Networks RDKit Tox21 De Novo Generation RNA Editing APOBEC3 Transcriptomics Network Toxicology

Key Publications

  • Mangione, Falls & Samudrala, Front. Pharmacol. 2022 — COVID-19 CANDO candidates
  • Mangione, Falls et al. Drug Discov. Today 2020 — Shotgun repurposing for pandemics
  • Kumari, Falls et al. Clin. Microbiol. Rev. 2023 — Influenza antiviral strategies
  • Bruggemann, Falls et al. IJMS 2023 — KRAS drug combinations in NSCLC
  • Xu, Falls et al. bioRxiv 2025 — Glioma therapeutic candidates via CANDO
  • Falls, Fine, Chopra & Samudrala, Front. Chem. 2021 — HIV-1 protease inhibitor prediction

Disease Focus Areas

  • COVID-19 & SARS-CoV-2
  • Influenza A/B
  • HIV-1
  • KRAS-mutant NSCLC
  • Glioblastoma & Glioma
  • COPD

Wet-Lab Partners

  • Roswell Park Comprehensive Cancer Center
  • UB Jacobs School of Medicine
  • External virology validation collaborators
04
Disease Translation

Translational Applications: Infectious Disease, Oncology & Substance Use

Computational insights from the CANDO platform and the group's AI tools are applied to discover repurposed therapeutics and design novel chemical entities with desired biological behavior across multiple disease domains. In infectious disease, the group pioneered "shotgun drug repurposing"—proteome-wide screening of all approved drugs to identify candidates against emerging pathogens. Applications to COVID-19 produced repurposed therapeutic candidates early in the pandemic; analogous programs targeted influenza antiviral mechanisms and HIV-1 protease inhibition, demonstrating the speed and breadth of the CANDO approach.

In oncology, the group focuses on driver mutation-targeted drug combination strategies. For KRAS-mutant non-small cell lung cancer (NSCLC)—one of the most treatment-refractory cancer subtypes—multiscale CANDO analysis identified synergistic drug pairings validated through computational and experimental collaboration with Roswell Park. For glioma, optimal therapeutic candidates are systematically prioritized from approved and investigational compound libraries, with results published in the Journal of Cheminformatics (2026). Additional contributions span COPD proteomics-guided discovery, cancer-associated mutant p53 aggregation inhibition, and structure-based HIV-1 inhibitor design.

A third disease focus is substance use disorder. Aligned with NIDA-funded research, the group applies CANDO to design and validate optimal nonaddictive analgesics—performing analytics across all drugs and compounds to define proteomic objectives for candidate analgesics that are subsequently synthesized and preclinically validated. This work bridges the group's computational drug discovery expertise with its clinical informatics research on opioid prescribing and adverse outcomes, creating an integrated pipeline from molecular design to real-world therapeutic impact.

Methods & Tools

CANDO CANDOCK Drug Repurposing Molecular Docking Network Analysis Drug Combination Synergy Nonaddictive Analgesics Substance Use Disorder Proteomics Bioinformatics In Vitro Validation

Research Support

Active grants and awards supporting the Falls Informatics Group research program.

NCATS / CTSA
CTSA — UB Clinical & Translational Science Award
$29.2M
2025 – 2031
NIDA
Polypharmacy Adverse Drug Reactions — Translational Bioinformatics (K01, PI)
$1.05M
2022 – 2027
NLM
BRIGHT Education Training Program
$2M
2022 – 2027
Sinsheimer Foundation
Sinsheimer Scholar Award (PI)
$60K
2024 – 2026
NIST
High-Performance Computing Drug Discovery Initiative
$1M
2022 – 2024

Interested in Collaboration or Joining?

We welcome collaborators from wet-lab biology, clinical medicine, and data science, as well as motivated graduate students, postdoctoral fellows, and undergraduates interested in computational drug discovery and biomedical informatics.

Contact Dr. Falls View Publications