Skip to main content

Curated list of data serialization formats — API, ML, Agentic AI, Big Data, Configuration, and beyond

55
GitHub Stars
85
Curated Resources
11
Categories
1 hour ago
Last Refreshed
APIAgenticMachine LearningBig DataConfigurationSecurity-FocusedScientificGraphWorkflowProgrammingAcademic

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me agentic resources from awesome-serialization"

Installation instructions →

What's inside

Agentic

  • A2A

    Agent2Agent Protocol. Google's open protocol for agent-to-agent communication and interoperability. JSON based. Textual.

  • BAML

    Boundary AI Markup Language. Domain-specific language for defining LLM function signatures with type-safe structured output. Textual.

  • Markdown

    Lightweight markup widely used as the native "language" of LLM input/output. Highly token-efficient vs HTML/XML. Textual.

  • MCP

    Model Context Protocol. Anthropic's open standard for connecting LLM agents to tools and data sources. JSON-RPC based. Textual.

  • TOON

    Token-Oriented Object Notation. Compact, schema-aware JSON alternative achieving 30–60% token savings for LLM prompts. Textual.

  • YAML

    Indentation-based format often more token-efficient than JSON for LLM contexts due to lack of braces/quotes. Textual.

Workflow

  • Apache Airflow DAGs

    Python-based Directed Acyclic Graphs for workflows.

  • common-workflow-language

    Specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments.

  • Cromwell

    Scientific workflow management, compatible with WDL and CWL.

  • Nextflow

    Scalable and reproducible scientific workflows.

  • Relational Algebra and Datalog for Graphs

    Coursera course on graph data manipulation.

  • WDL

    Workflow Description Language for genomics and scientific workflows.

Big Data

  • Arrow

    Cross-language columnar data format optimized for analytics workloads. Binary.

  • Avro

    Scheme embedded, dynamic rich data structures. Textual/Binary.

  • Delta Lake

    Transactional storage layer for big data workflows. Binary.

  • FlatBuffers

    Protocol Buffers suitable for larger datasets. Binary.

  • Iceberg

    Open table format for large datasets. Binary.

  • Ion

    Row storage with skip scan parsing. Structured, schema embedded. Amazon. Textual/Binary.

Scientific

  • ASDF

    Advanced Scientific Data Format for astronomy and beyond. Binary/Textual.

  • HDF5®

    n-dimensional datasets, complex objects, with schema. Efficient I/O. Binary.

  • NetCDF

    Self-describing, machine-independent data format for scientific data. Binary.

  • npy

    Numpy arrays, cell sparse metadata. Binary.

  • Zarr

    Scalable storage of n-dimensional arrays. Binary.

API

  • AsyncAPI

    OpenAPI equivalent for event-driven and message-driven architectures. Textual.

  • bson

    Binary schemeless JSON encoding. Binary.

  • Cap'n Proto

    High-performance, schema-based data interchange format. Binary.

  • CBOR

    Concise Binary Object Representation. Schema-free. Binary.

  • CloudEvents

    CNCF specification for describing event data in a common way. Textual/Binary.

  • Connect

    Modern RPC framework compatible with gRPC, with HTTP/1.1, JSON, and browser support. Binary/Textual.

Programming

Academic

Machine Learning

  • CoreML

    Apple's on-device ML model format. Binary.

  • GGUF

    Quantized model format for llama.cpp/ggml. The de facto standard for local LLM inference. Binary.

  • GraphDef

    TensorFlow graphs. Binary.

  • MLIR

    Intermediate representation for machine learning computations. Textual/Binary.

  • MLX format

    Apple's ML framework format, safetensors-based. Optimized for Apple Silicon. Binary.

  • ONNX

    Open Neural Network Exchange. Interoperability focused. Binary.

Showing a sample of 85 resources. View the full list on GitHub →