awesome-machine-learning-on-source-code
github.com/ermuur/awesome-machine-learning-on-source-code ↗Cool links & research papers related to Machine Learning applied to source code
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me software resources from awesome-machine-learning-on-source-code"
Installation instructions →What's inside
Software
- 150k JavaScript Dataset
Dataset consisting of 150,000 JavaScript files and their parsed ASTs.
- 150k Python Dataset
Dataset consisting of 150,000 Python ASTs.
- 452M commits on GitHub
≈ 452M commits' metadata from 16M repositories on GitHub (October 2016).
- apollo
Source code deduplication as scale, research.
- bblfsh
Self-hosted server for source code parsing.
- card2code
This dataset contains the language to code datasets described in the paper
Conferences
- 2018 IEEE 25th International Conference on Software Analysis, Evolution, and Reengineering (SANER)
- ACM International Conference on Software Engineering, ICSE
- ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE)
- AIFORSE
- CamAIML
Alexander Gaunt.
- Learning to Code: Machine Learning for Program Induction
Alexander Gaunt.
Papers
- A Benchmark Study on Sentiment Analysis for Software Engineering Research
Nicole Novielli, Daniela Girardi, Filippo Lanubile, MSR 2018.
- Abstract Syntax Networks for Code Generation and Semantic Parsing
Maxim Rabinovich, Mitchell Stern, Dan Klein, ACL 2017.
- A Convolutional Attention Network for Extreme Summarization of Source Code
Miltiadis Allamanis, Hao Peng, Charles Sutton, ICML 2016.
- Adaptive Neural Compilation
Rudy Bunel, Alban Desmaison, Pushmeet Kohli, Philip H.S. Torr, M. Pawan Kumar, NIPS 2016.
- A deep language model for software code
Hoa Khanh Dam, Truyen Tran, Trang Pham, 2016.
- A Deep Learning Approach to Program Similarity
Niccolò Marastoni, Roberto Giacobazzi and Mila Dalla Preda, MASES 2018.
Posts
Digests
- A Survey of Machine Learning for Big Code and Naturalness
Survey and literature review on Machine Learning on Source Code.
- Learning from "Big Code"
Techniques, challenges, tools, datasets on "Big Code".
Competitions
- CodRep
competition on automatic program repair: given a source line, find the insertion point.
Showing a sample of 257 resources. View the full list on GitHub →