TOON Format Specification

Complete syntax and structure guide for Token-Oriented Object Notation

Published: January 2025 • 12 min read

TOON (Token-Oriented Object Notation) is a structured data format optimized for Large Language Model applications. This specification defines the complete syntax, data types, and structural rules for creating valid TOON documents. For an introduction to TOON, see What is TOON Format?

This document serves as the definitive technical reference for TOON format. Developers implementing TOON parsers, validators, or converters should use this specification. Validate your TOON syntax with ourTOON Validator tool.

Design Principles

TOON format was designed with four core principles:

1. Token Efficiency

Minimize token count for LLM applications by eliminating redundant syntax. Property names are declared once per array rather than repeated for each object, achieving approximately 50% token reduction compared to JSON.

2. Human Readability

Maintain clear, readable structure that developers can understand without specialized tools. TOON documents are plain text and can be edited in any text editor.

3. Explicit Structure

Use explicit length markers and field definitions to help LLMs understand data structure without counting. Arrays declare their length upfront: items[5]

4. Lossless Conversion

Support bidirectional conversion with JSON without data loss. Any valid JSON can be converted to TOON and back to JSON while preserving all data and structure.

Basic Syntax Rules

Key-Value Pairs

Simple key-value pairs use colon syntax, similar to YAML:

name: "Sarah Mitchell"
age: 32
email: "[email protected]"
active: true

Indentation

TOON uses indentation to represent hierarchy. Two spaces per level is recommended:

user:
  name: "Alex Johnson"
  address:
    street: "123 Main St"
    city: "Boston"
    zip: "02101"

Comments

Comments start with # and continue to end of line:

# Customer data
name: "John Doe"  # Customer's full name
age: 45           # Age in years

Data Types

TOON supports six fundamental data types:

1. String

Text values enclosed in double quotes. Supports escape sequences for special characters.

name: "Sarah Mitchell"
message: "Hello, World!"
path: "C:\\Users\\Documents"  # Escaped backslash
quote: "She said \"Hello\""     # Escaped quotes

2. Number

Integer or floating-point numbers without quotes. Supports scientific notation.

count: 42
price: 99.99
negative: -15
scientific: 1.5e10
percentage: 0.485

3. Boolean

Logical values: true or false (lowercase only).

active: true
verified: false
deleted: false

4. Null

Represents absence of value using null keyword.

middleName: null
optional: null

5. Array

Ordered collections with explicit length markers. See detailed array syntax below.

tags[3]: "javascript", "frontend", "react"

6. Object

Nested structures using indentation to represent hierarchy.

address:
  street: "123 Main St"
  city: "Boston"

Array Notation

Arrays are the most distinctive feature of TOON format. The array notation is designed for maximum token efficiency.

Simple Arrays

Simple arrays contain primitive values (strings, numbers, booleans):

# Array declaration: name[length]: value1, value2, value3
tags[3]: "javascript", "react", "nodejs"
scores[5]: 95, 87, 92, 88, 91
flags[2]: true, false

Syntax: arrayName[length]: value1, value2, ...

The length marker [3] explicitly states the array has 3 elements, helping LLMs understand structure without counting.

Structured Arrays (Tabular Data)

Structured arrays contain objects with consistent properties - the most token-efficient format:

# Array with field definitions
customers[3]{id,name,email,active}:
  1,Sarah Mitchell,[email protected],true
  2,Michael Chen,[email protected],true
  3,Jennifer Kumar,[email protected],false

Syntax: arrayName[length]{field1,field2,field3}: data rows

  • [3] - Length marker (3 objects)
  • {id,name,email,active} - Field definition (declared once)
  • • Each line represents one object with values in field order
  • • Values separated by commas (default delimiter)

Custom Delimiters

When data contains commas, use alternative delimiters like pipes or tabs:

# Using pipe delimiter for data containing commas
addresses[2]{street,city,country}|:
  123 Main St, Suite 100|Boston|USA
  456 Oak Ave, Apt 5B|Seattle|USA

Specify delimiter after field definition: {fields}|:

Nested Objects and Complex Structures

TOON supports arbitrary nesting depth using indentation:

Nested Objects

user:
  id: 101
  name: "Alex Johnson"
  profile:
    avatar: "https://example.com/avatar.jpg"
    bio: "Software Engineer"
    social:
      twitter: "@alexj"
      github: "alexjohnson"
  settings:
    theme: "dark"
    notifications: true

Arrays of Objects with Nested Properties

orders[2]:
  - id: 1001
    customer: "Sarah Mitchell"
    items[2]:
      - product: "Laptop"
        price: 1299
      - product: "Mouse"
        price: 29
    total: 1328
  - id: 1002
    customer: "Michael Chen"
    items[1]:
      - product: "Monitor"
        price: 449
    total: 449

Mixed Data Types

response:
  success: true
  data:
    users[2]{id,name}:
      1,Alice
      2,Bob
  metadata:
    timestamp: "2025-01-15T10:30:00Z"
    version: 2
  errors: null

Validation Rules

Valid TOON documents must adhere to these rules:

1. Length Accuracy

Array length markers must match actual element count. items[3] must contain exactly 3 elements.

2. Field Count Consistency

In structured arrays, each row must have the same number of values as fields defined in the header.{id,name,email} requires 3 values per row.

3. Consistent Indentation

Use consistent indentation throughout document. Mixing spaces and tabs is not allowed. Recommended: 2 spaces per indentation level.

4. Proper String Escaping

Strings containing special characters must use proper escape sequences:\\" for quotes,\\\\ for backslash,\\n for newline.

5. Type Consistency

Boolean values must be lowercase true orfalse. Null must be lowercasenull.

6. No Trailing Delimiters

Array rows and value lists should not have trailing delimiters.1,2,3 is valid,1,2,3, is invalid.

Validation Tools

Use our TOON Validator to check your TOON documents against these rules. The validator provides detailed error messages for any violations.

TOON vs JSON Syntax Comparison

Understanding how TOON syntax differs from JSON helps developers transition between formats:

FeatureJSONTOON
String values"value""value"
Key-value pairs"key": "value"key: "value"
Object bracesRequired {}Indentation-based
Array notation[val1, val2]arr[2]: val1, val2
Array of objectsRepeats keysKeys defined once
CommentsNot supported# comment
Trailing commasOptionalNot allowed
Booleanstrue, falsetrue, false

Best Practices

Use Structured Arrays for Tabular Data

When representing multiple objects with consistent properties, use structured array notation[N]{fields} for maximum token efficiency.

Consistent Indentation

Use 2 spaces for indentation consistently throughout your document. Many TOON formatters default to 2 spaces. Use our TOON Formatter to standardize indentation.

Descriptive Field Names

Use clear, descriptive field names. Since field names appear only once in TOON arrays, slightly longer names don't significantly impact token count: customerEmail vsemail

Add Length Markers

Always include explicit length markers [N] for arrays. This helps LLMs understand structure and validates data integrity.

Choose Appropriate Delimiters

Use comma delimiters by default. Switch to pipes | or tabs when data naturally contains commas (addresses, descriptions, CSV-like data).

Comment Complex Structures

Use comments to explain complex structures or data semantics. Comments are ignored by parsers but help human readers and can provide context for LLMs.

TOON Tools

External Resources