Skip to main content

Cool links & research papers related to Machine Learning applied to source code (MLonCode)

0
GitHub Stars
207
Curated Resources
8
Categories
16 hours ago
Last Refreshed
DigestsConferencesCompetitionsPapersPostsTalksSoftwareCredits

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me software resources from awesome-machine-learning-on-source-code"

Installation instructions →

What's inside

Software

  • 150k JavaScript Dataset

    Dataset consisting of 150,000 JavaScript files and their parsed ASTs.

  • 150k Python Dataset

    Dataset consisting of 150,000 Python ASTs.

  • 452M commits on GitHub

    ≈ 452M commits' metadata from 16M repositories on GitHub (October 2016).

  • apollo

    Source code deduplication as scale, research.

  • bblfsh

    Self-hosted server for source code parsing.

  • card2code

    This dataset contains the language to code datasets described in the paper

Papers

Digests

Competitions

  • CodRep

    competition on automatic program repair: given a source line, find the insertion point.

Showing a sample of 207 resources. View the full list on GitHub →