Skip to main content

Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.

233
GitHub Stars
642
Curated Resources
3
Categories
4 hours ago
Last Refreshed
Large Scale Pre-training for Language GenerationAnalysisDetection

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me datasets resources from awesome-machine-generated-text"

Installation instructions →

What's inside

Detection

  • AlpacaDatasets

    52K instruction-following data generated by

  • ArguGPTDatasets

    4,115 human-written essays and 4,038 machine-generated essays produced by 7 GPT models

  • BUSTDatasets

    25K texts from humans and 7 LLMs responding to instructions across 10 tasks from 3 diverse sources.

  • ChatGPT Generated Text Detection CorpusDatasets

    126 humans essays and 126 nonhumans essays

  • CHEATDatasets

    35,304 synthetic abstracts, with Generation, Polish, and Fusion as prominent representatives.

  • Cleaned AlpacaDatasets

    A cleaned version of the original Alpaca Dataset

Large Scale Pre-training for Language Generation

Analysis

Showing a sample of 642 resources. View the full list on GitHub →