awesome-oss-research-data
github.com/sboysel/awesome-oss-research-data ↗A (curated) list of empirical research and datasets in the space of Open Source Software
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me surveys resources from awesome-oss-research-data"
Installation instructions →What's inside
Surveys
Community discourse
- Apache Mail Archives
Mailing list archives for Apache projects
- GNU Mail Archives
Mailing lists used by various GNU projects
- Linux Kernel Mailing List
The Linux kernel mailing list.
- Mailing list ARChives
- Python Mailing Lists
Mailing lists used by various Python projects
- StackExchange
Public Q&A data across the StackExchange network. SO's Data Explorer and latest data dump hosted by Internet Archive . Older vintages can be tracked down.
General web archives
- Archive Team
A group of volunteers that archives web pages and other content. Data is available via the Wayback Machine or its API
- Common Crawl
Raw page data, metadata, and extracted text from publicly accessible segments of the internet. Timeframe: 2008 - present, monthly since March 2014. Data hosted on Amazon S3: getting started docs
- Internet Archive
Less systematic crawls with a longer history. Access via the Wayback Machine or its API
- Wikidata
A free and open knowledge base that can be read and edited by both humans and machines. Data is available via the Wikidata Query Service or its API
- Wikimedia Commons
A free media repository. Data is available via the Commons API or its API
Valuation
Other Resources
Bounty platforms
- boss.dev
A platform for funding open source projects.
- Bountysource
A platform for funding open source projects.
- IssueHunt
A platform for funding open source projects.
Development activity
- Census III of Free and Open Source Software
Report and open data
- Census II of Free and Open Source Software
Survey of OSS library production usage at the application library level. Report and data appendix
- data collection scripts
- Ecosyste.ms
Tools and open datasets to support, sustain, and secure critical digital infrastructure
- GH Archive
Records GitHub's public timeline of activity
- GHTorrent
Offline mirror of historical data offered by GitHub's REST API
Community and project health
- CHAOSS
Linux Foundation project to establish OSS community health metrics. Metric definitions
- GitWhois
High-level glance into GitHub repositories
- isitmaintained.com
Quick status checks for public GitHub repositories (e.g. median issue resolution time, percentage of open issues).
- OpenSSF Best Practices Badge Program
Listing of projects , high-level project statistics , high-level criteria statistics
Showing a sample of 103 resources. View the full list on GitHub →