Skip to main content

(Soon to be) community-curated list of software packages and data resources for deep learning for genomics (DL4G)

3
GitHub Stars
54
Curated Resources
5
Categories
21 hours ago
Last Refreshed
Software packagesModelsDatasets and databasesJournal articles of general interestSimilar lists and collections

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me data wrangling resources from awesome-dl4g"

Installation instructions →

What's inside

Software packages

  • BedToolsData wrangling

    Swiss-army knife of tools for a wide-range of genomics analysis tasks

  • BioNumPyData wrangling

    A Python library for easy and efficient representation and analysis of biological data. (2022)

  • BioPythonData wrangling

    Biopython is a set of freely available tools for biological computation written in Python by an international team of developers.

  • CaptumInterpretability

    [PyTorch] - General library for model interpretability in PyTorch

  • DeepAccessDL4G Packages

    [TensorFlow] - Training and interpreting CNNs for predicting cell type-specific accessibility. (2021)

  • DeepChemDL4G Packages

    [PyTorch, TensorFlow, jax] - Open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology. (2019)

Models

  • DanQHybrid

    Trained on the same dataset as DeepSEA to predict binarized epigenomic tracks from ENCODE and Roadmap. Added in a bi-directional LSTM layer after the convolutions and experimented with initializing convoultional filter weights with motifs.

  • DeepBindConvolutional

    One of the seminal convolutional based architectures trained to predict the binding of transcription factors and rna binding proteins.

Journal articles of general interest

Datasets and databases

  • RNA completeRNA binding

    in vitro RNA-binding protein assay of 244 RNA binding proteins. The dataset is downloaded as a single TSV file with RNA probes as rows and RNA binding proteins (RBP) as columns. Each entry in the table is an intensity measurement (can be normalized or raw) of the binding of each protein to each probe. There are over 244 RBP columns and 241,357 sequences spanning two sets (SetA and SetB)

Showing a sample of 54 resources. View the full list on GitHub →