Skip to main content

đź§« A curated list of resources relevant to doing Biomedical Information Extraction (including BioNLP)

447
GitHub Stars
114
Curated Resources
11
Categories
5 hours ago
Last Refreshed
Research OverviewsGroups Active in the FieldOrganizationsJournals and EventsTutorialsCode LibrariesTools, Platforms, and ServicesTechniques and ModelsDatasetsOntologies and Controlled VocabulariesData Models

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me conferences and other events resources from awesome-bioie"

Installation instructions →

What's inside

Journals and Events

  • ACM-BCBConferences and Other Events

    The ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Held annually since 2010.

  • BIBMConferences and Other Events

    The IEEE International Conference on Bioinformatics and Biomedicine.

  • BioASQChallenges

    Challenges on biomedical semantic indexing and question answering. Challenges and workshops held annually since 2013.

  • BioCreAtIvE workshopChallenges

    These workshops have been organized since 2004, with BioCreative VI happening February 2017 and the

  • DatabaseJournals

    Its subtitle is "The Journal of Biological Databases and Curation". Open access.

  • eHealth-KDChallenges

    Challenges for encouraging "development of software technologies to automatically extract a large variety of knowledge from eHealth documents written in the Spanish Language". Previously held as part of

Datasets

  • AIMedProtein-protein Interaction Annotated Corpora

    225 MEDLINE abstracts annotated for PPI.

  • BioC-BioGRIDProtein-protein Interaction Annotated Corpora

    120 full text articles annotated for PPI and genetic interactions. Used in the BioCreative V BioC task.

  • BioCreAtIvE 2Annotated Text Data

    15,000 sentences (10,000 training and 5,000 test, different from the first corpus) annotated for protein and gene names. 542 abstracts linked to EntrezGene identifiers. A variety of research articles annotated for features of protein–protein interactions.

  • BioCreAtIvE V CDR Task Corpus (BC5CDR)Annotated Text Data

    1,500 articles (title and abstract) published in 2014 or later, annotated for 4,409 chemicals, 5,818 diseases and 3116 chemical–disease interactions. Requires registration.

  • BioCreative VI CHEMPROT CorpusAnnotated Text Data

    >2,400 articles annotated with chemical-protein interactions of a variety of relation types. Requires registration.

  • BioInferProtein-protein Interaction Annotated Corpora

    1,100 sentences from biomedical research abstracts annotated for relationships (including PPI), named entities, and syntactic dependencies.

Techniques and Models

  • Alsentzer et al Clinical BERTBERT models

  • BioASQword2vecText Embeddings

    Qord embeddings derived from biomedical text (>10 million PubMed abstracts) using the popular

  • BioBERTBERT models

    A PubMed and PubMed Central-trained version of the

  • BioGPTGPT-2 models

    A GPT-2 model pre-trained on 15 million PubMed abstracts, along with fine-tuned versions for several biomedical tasks.

  • BioWordVecText Embeddings

    Word embeddings derived from biomedical text (>27 million PubMed titles and abstracts), including subword embedding model based on MeSH.

  • BlueBERTBERT models

    A BERT model pre-trained on PubMed text and MIMIC-III notes.

Organizations

  • AMIA

    Many—but certainly not all—individuals studying biomedical informatics are members of the American Medical Informatics Association. AMIA publishes a journal, JAMIA (see below).

  • IMIA

    The International Medical Informatics Association. Publishes the IMIA Yearbook of Medical Informatics.

Tools, Platforms, and Services

  • AnaforaAnnotation Tools

    An annotation tool with adjudication and progress tracking features.

  • bratAnnotation Tools

    The brat rapid annotation tool. Supports producing text annotations visually, through the browser. Not subject specific; appropriate for many annotation projects. Visualization is based on that of the

  • CLAMP

    A natural language processing toolkit intended for use with the text in clinical reports. Check out their

  • cTAKES

    A system for processing the text in electronic medical records. Widely used and open source.

  • DeepPhe

    A system for processing documents describing cancer presentations. Based on cTAKES (see above).

  • DNorm

    A method for disease normalization, i.e., linking mentions of disease names and acronyms to unique concept identifiers. Downloadable version includes the NCBI Disease Corpus and BC5CDR (see Annotated Text Data below).

Research Overviews

Data Models

  • Biolink

    A data model of biological entities. Provided as a

  • BioUML

    An architecture for biomedical data analysis, integration, and visualization. Conceptually based on the visual modeling language

  • OMOP Common Data Model

    a standard for observational healthcare data.

  • unmiri-ngs-fhir-schema

    Apache-2.0 JSON Schema (Draft 2020-12) API contract for cross-vendor somatic NGS interpretation output (Foundation Medicine, Tempus, Caris, Guardant), aligned with the HL7 FHIR Genomics IG. A standards-aligned target representation for biomedical information-extraction pipelines that parse oncology lab reports.

Tutorials

  • Biomedical Literature MiningPre-LLM Guides, Lectures, and Courses

    A (non-free) volume of Methods in Molecular Biology from 2014. Chapters covers introductory principles in text mining, applications in the biological sciences, and potential for use in clinical or medical safety scenarios.

  • Coursera - Foundations of mining non-structured medical dataPre-LLM Guides, Lectures, and Courses

    About three hours worth of video lectures on working with medical data of various types and structures, including text and image data. Appears fairly high-level and intended for beginners.

  • Getting Started in Text MiningPre-LLM Guides, Lectures, and Courses

    A brief introduction to bio-text mining from Cohen and Hunter. More than ten years old but still quite relevant. See also an

  • JensenLab text mining exercisesPre-LLM Guides, Lectures, and Courses

  • VIB text mining and curation trainingPre-LLM Guides, Lectures, and Courses

    This training workshop happenened in 2013 but the slides are still online.

Showing a sample of 114 resources. View the full list on GitHub →