TOON (Token-Oriented Object Notation) is a structured data format optimized for Large Language Model applications. This specification defines the complete syntax, data types, and structural rules for creating valid TOON documents. For an introduction to TOON, see What is TOON Format?
This document serves as the definitive technical reference for TOON format. Developers implementing TOON parsers, validators, or converters should use this specification. Validate your TOON syntax with ourTOON Validator tool.
Design Principles
TOON format was designed with four core principles:
1. Token Efficiency
Minimize token count for LLM applications by eliminating redundant syntax. Property names are declared once per array rather than repeated for each object, achieving approximately 50% token reduction compared to JSON.
2. Human Readability
Maintain clear, readable structure that developers can understand without specialized tools. TOON documents are plain text and can be edited in any text editor.
3. Explicit Structure
Use explicit length markers and field definitions to help LLMs understand data structure without counting. Arrays declare their length upfront: items[5]
4. Lossless Conversion
Support bidirectional conversion with JSON without data loss. Any valid JSON can be converted to TOON and back to JSON while preserving all data and structure.
Basic Syntax Rules
Key-Value Pairs
Simple key-value pairs use colon syntax, similar to YAML:
name: "Sarah Mitchell" age: 32 email: "[email protected]" active: true
Indentation
TOON uses indentation to represent hierarchy. Two spaces per level is recommended:
user:
name: "Alex Johnson"
address:
street: "123 Main St"
city: "Boston"
zip: "02101"Comments
Comments start with # and continue to end of line:
# Customer data name: "John Doe" # Customer's full name age: 45 # Age in years
Data Types
TOON supports six fundamental data types:
1. String
Text values enclosed in double quotes. Supports escape sequences for special characters.
name: "Sarah Mitchell" message: "Hello, World!" path: "C:\\Users\\Documents" # Escaped backslash quote: "She said \"Hello\"" # Escaped quotes
2. Number
Integer or floating-point numbers without quotes. Supports scientific notation.
count: 42 price: 99.99 negative: -15 scientific: 1.5e10 percentage: 0.485
3. Boolean
Logical values: true or false (lowercase only).
active: true verified: false deleted: false
4. Null
Represents absence of value using null keyword.
middleName: null optional: null
5. Array
Ordered collections with explicit length markers. See detailed array syntax below.
tags[3]: "javascript", "frontend", "react"
6. Object
Nested structures using indentation to represent hierarchy.
address: street: "123 Main St" city: "Boston"
Array Notation
Arrays are the most distinctive feature of TOON format. The array notation is designed for maximum token efficiency.
Simple Arrays
Simple arrays contain primitive values (strings, numbers, booleans):
# Array declaration: name[length]: value1, value2, value3 tags[3]: "javascript", "react", "nodejs" scores[5]: 95, 87, 92, 88, 91 flags[2]: true, false
Syntax: arrayName[length]: value1, value2, ...
The length marker [3] explicitly states the array has 3 elements, helping LLMs understand structure without counting.
Structured Arrays (Tabular Data)
Structured arrays contain objects with consistent properties - the most token-efficient format:
# Array with field definitions
customers[3]{id,name,email,active}:
1,Sarah Mitchell,[email protected],true
2,Michael Chen,[email protected],true
3,Jennifer Kumar,[email protected],falseSyntax: arrayName[length]{field1,field2,field3}: data rows
- •
[3]- Length marker (3 objects) - •
{id,name,email,active}- Field definition (declared once) - • Each line represents one object with values in field order
- • Values separated by commas (default delimiter)
Custom Delimiters
When data contains commas, use alternative delimiters like pipes or tabs:
# Using pipe delimiter for data containing commas
addresses[2]{street,city,country}|:
123 Main St, Suite 100|Boston|USA
456 Oak Ave, Apt 5B|Seattle|USASpecify delimiter after field definition: {fields}|:
Nested Objects and Complex Structures
TOON supports arbitrary nesting depth using indentation:
Nested Objects
user:
id: 101
name: "Alex Johnson"
profile:
avatar: "https://example.com/avatar.jpg"
bio: "Software Engineer"
social:
twitter: "@alexj"
github: "alexjohnson"
settings:
theme: "dark"
notifications: trueArrays of Objects with Nested Properties
orders[2]:
- id: 1001
customer: "Sarah Mitchell"
items[2]:
- product: "Laptop"
price: 1299
- product: "Mouse"
price: 29
total: 1328
- id: 1002
customer: "Michael Chen"
items[1]:
- product: "Monitor"
price: 449
total: 449Mixed Data Types
response:
success: true
data:
users[2]{id,name}:
1,Alice
2,Bob
metadata:
timestamp: "2025-01-15T10:30:00Z"
version: 2
errors: nullValidation Rules
Valid TOON documents must adhere to these rules:
1. Length Accuracy
Array length markers must match actual element count. items[3] must contain exactly 3 elements.
2. Field Count Consistency
In structured arrays, each row must have the same number of values as fields defined in the header.{id,name,email} requires 3 values per row.
3. Consistent Indentation
Use consistent indentation throughout document. Mixing spaces and tabs is not allowed. Recommended: 2 spaces per indentation level.
4. Proper String Escaping
Strings containing special characters must use proper escape sequences:\\" for quotes,\\\\ for backslash,\\n for newline.
5. Type Consistency
Boolean values must be lowercase true orfalse. Null must be lowercasenull.
6. No Trailing Delimiters
Array rows and value lists should not have trailing delimiters.1,2,3 is valid,1,2,3, is invalid.
Validation Tools
Use our TOON Validator to check your TOON documents against these rules. The validator provides detailed error messages for any violations.
TOON vs JSON Syntax Comparison
Understanding how TOON syntax differs from JSON helps developers transition between formats:
| Feature | JSON | TOON |
|---|---|---|
| String values | "value" | "value" |
| Key-value pairs | "key": "value" | key: "value" |
| Object braces | Required {} | Indentation-based |
| Array notation | [val1, val2] | arr[2]: val1, val2 |
| Array of objects | Repeats keys | Keys defined once |
| Comments | Not supported | # comment |
| Trailing commas | Optional | Not allowed |
| Booleans | true, false | true, false |
Best Practices
Use Structured Arrays for Tabular Data
When representing multiple objects with consistent properties, use structured array notation[N]{fields} for maximum token efficiency.
Consistent Indentation
Use 2 spaces for indentation consistently throughout your document. Many TOON formatters default to 2 spaces. Use our TOON Formatter to standardize indentation.
Descriptive Field Names
Use clear, descriptive field names. Since field names appear only once in TOON arrays, slightly longer names don't significantly impact token count: customerEmail vsemail
Add Length Markers
Always include explicit length markers [N] for arrays. This helps LLMs understand structure and validates data integrity.
Choose Appropriate Delimiters
Use comma delimiters by default. Switch to pipes | or tabs when data naturally contains commas (addresses, descriptions, CSV-like data).
Comment Complex Structures
Use comments to explain complex structures or data semantics. Comments are ignored by parsers but help human readers and can provide context for LLMs.
TOON Tools
External Resources
- •TOON Official GitHub - Official specification and reference implementation
- •OpenAI Tokenizer - Validate your TOON syntax and token counts
- •JSON Specification - Official JSON format specification for comparison
- •YAML Specification - YAML syntax reference (similar indentation approach)
- •What is TOON Format? - Introduction and basics
- •TOON vs JSON Comparison - Token savings and use cases