awesome-msr
github.com/dspinellis/awesome-msr ↗A curated repository of software engineering repository mining data sets
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me research outlets resources from awesome-msr"
Installation instructions →What's inside
Research Outlets
- ACM Transactions on Software Engineering and Methodology (TOSEM)
- Empirical Software Engineering journal
- ESEC/FSE: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
- ICSE: International Conference on Software Engineering
- IEEE Software magazine
- IEEE Transactions on Software Engineering
Data Sets
- AndroidTimeMachine
Graph-based dataset of commit history of 8,431 real-world Android apps.
- AndroZoo
Collection of Android Applications.
- Bug Prediction Dataset
Collection of models and metrics from Eclipse JDT Core, PDE UI, Equinox Framework, Lucene, Mylyn, and their histories.
- Code Reviews
Code reviews of OpenStack, LibreOffice, AOSP, Qt, Eclipse.
- CoREBench
Collection of 70 realistically Complex Regression Errors that were systematically extracted from the repositories and bug reports of four open-source software projects: Make, Grep, Findutils, and Coreutils.
- Cryptocurrency GitHub Activity and Market Cap Dataset
Activity such as commits, stars, prices, and market cap of over 200 cryptocurrency projects on GitHub over time. Raw, historic data is also
Tools
- astminer
Library and tool for mining of path-based representations of code and other data derived from ASTs.
- Boa
Domain-specific language and infrastructure that eases mining software repositories.
- buckwheat
Multi-language tokenizer for extracting identifiers from source code.
- ckjm
Chidamber and Kemerer Java Metrics.
- Coming
A Java framework for analyzing code changes and mining instances of change patterns from Git repositories.
- CryptOSS
Mine GitHub activity and market cap data for cryptocurrency projects.
Resources
Repositories
- Directory of MSR Datasets
- Empirical Software Engineering
- ESEUR
- FLOSSmole
Collaborative collection and analysis of free/libre/open source project data.
- Mining Software Repositories
- PROMISE
About 20 datasets related to software engineering research.
Showing a sample of 72 resources. View the full list on GitHub →