Skip to main content

A curated repository of software engineering repository mining data sets

483
GitHub Stars
72
Curated Resources
4
Categories
20 hours ago
Last Refreshed
RepositoriesData SetsToolsResearch Outlets

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me research outlets resources from awesome-msr"

Installation instructions →

What's inside

Data Sets

  • AndroidTimeMachine

    Graph-based dataset of commit history of 8,431 real-world Android apps.

  • AndroZoo

    Collection of Android Applications.

  • Bug Prediction Dataset

    Collection of models and metrics from Eclipse JDT Core, PDE UI, Equinox Framework, Lucene, Mylyn, and their histories.

  • Code Reviews

    Code reviews of OpenStack, LibreOffice, AOSP, Qt, Eclipse.

  • CoREBench

    Collection of 70 realistically Complex Regression Errors that were systematically extracted from the repositories and bug reports of four open-source software projects: Make, Grep, Findutils, and Coreutils.

  • Cryptocurrency GitHub Activity and Market Cap Dataset

    Activity such as commits, stars, prices, and market cap of over 200 cryptocurrency projects on GitHub over time. Raw, historic data is also

Tools

  • astminer

    Library and tool for mining of path-based representations of code and other data derived from ASTs.

  • Boa

    Domain-specific language and infrastructure that eases mining software repositories.

  • buckwheat

    Multi-language tokenizer for extracting identifiers from source code.

  • ckjm

    Chidamber and Kemerer Java Metrics.

  • Coming

    A Java framework for analyzing code changes and mining instances of change patterns from Git repositories.

  • CryptOSS

    Mine GitHub activity and market cap data for cryptocurrency projects.

Resources

Repositories

Showing a sample of 72 resources. View the full list on GitHub →