Protocol Buffers (Protobuf) is a fast, efficient data serialization format perfect for Python applications. Whether you're building gRPC services, microservices, or need efficient data storage, Protobuf with Python is a powerful combination.

This tutorial covers everything: installation, creating schemas, serialization, deserialization, and real-world examples. If you're new to Protobuf, start with our What is Protobuf guide.

What You'll Learn: How to install Protobuf, create .proto files, generate Python code, serialize/deserialize data, and use Protobuf in real Python applications.

Installation

Step 1: Install Protocol Buffer Compiler

# macOS (using Homebrew)
brew install protobuf

# Ubuntu/Debian
sudo apt-get install protobuf-compiler

# Windows (using Chocolatey)
choco install protoc

# Or download from GitHub:
# https://github.com/protocolbuffers/protobuf/releases

# Verify installation
protoc --version
# Output: libprotoc 3.21.12 (or similar)

Step 2: Install Python Protobuf Library

# Install protobuf library
pip install protobuf

# Verify installation
python -c "import google.protobuf; print(google.protobuf.__version__)"
# Output: 4.25.1 (or similar)

Creating Your First .proto File

A .proto file defines your data structure. Let's create a simple user schema:

user.proto

syntax = "proto3";

// Define the User message
message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
  bool is_active = 4;
  repeated string hobbies = 5;
}

Key points: Each field has a type, name, and unique number (called a "tag"). The repeated keyword makes hobbies a list.

Compiling to Python Code

Use the protoc compiler to generate Python code from your schema:

Compile Command

# Compile user.proto to Python
protoc --python_out=. user.proto

# This generates: user_pb2.py

The --python_out=. flag tells protoc to generate Python code in the current directory. The generated file is named user_pb2.py.

Serialization: Python Object → Binary

Now let's use the generated code to create and serialize a User object:

Creating and Serializing

import user_pb2

# Create a User object
user = user_pb2.User()
user.id = 1
user.name = "Alice"
user.email = "[email protected]"
user.is_active = True
user.hobbies.append("reading")
user.hobbies.append("coding")

# Serialize to binary format
binary_data = user.SerializeToString()
print(f"Serialized size: {len(binary_data)} bytes")
print(f"Binary data: {binary_data}")

# Output:
# Serialized size: 45 bytes
# Binary data: b'\x08\x01\x12\x05Alice\x1a\[email protected] ...'

The SerializeToString() method converts the User object to compact binary format.

Deserialization: Binary → Python Object

Read binary data back into a Python object:

Deserializing Data

import user_pb2

# Assume we have binary_data from serialization
# binary_data = user.SerializeToString()

# Create a new User object
new_user = user_pb2.User()

# Parse binary data
new_user.ParseFromString(binary_data)

# Access the fields
print(f"ID: {new_user.id}")
print(f"Name: {new_user.name}")
print(f"Email: {new_user.email}")
print(f"Active: {new_user.is_active}")
print(f"Hobbies: {list(new_user.hobbies)}")

# Output:
# ID: 1
# Name: Alice
# Email: [email protected]
# Active: True
# Hobbies: ['reading', 'coding']

The ParseFromString() method reads binary data and populates the object fields.

Reading and Writing Protobuf Files

Writing to File

import user_pb2

# Create user
user = user_pb2.User()
user.id = 1
user.name = "Bob"
user.email = "[email protected]"
user.is_active = True

# Write to file
with open('user.pb', 'wb') as f:
    f.write(user.SerializeToString())

print("User saved to user.pb")

Reading from File

import user_pb2

# Read from file
user = user_pb2.User()
with open('user.pb', 'rb') as f:
    user.ParseFromString(f.read())

print(f"Loaded user: {user.name}")

Working with Nested Messages

Protobuf supports nested message types for complex data structures:

company.proto

syntax = "proto3";

message Company {
  string name = 1;
  
  message Address {
    string street = 1;
    string city = 2;
    string country = 3;
  }
  
  Address headquarters = 2;
  repeated Employee employees = 3;
}

message Employee {
  int32 id = 1;
  string name = 2;
  string department = 3;
}

Using Nested Messages in Python

import company_pb2

# Create company
company = company_pb2.Company()
company.name = "TechCorp"

# Set nested Address
company.headquarters.street = "123 Tech St"
company.headquarters.city = "San Francisco"
company.headquarters.country = "USA"

# Add employees (repeated field)
emp1 = company.employees.add()
emp1.id = 1
emp1.name = "Alice"
emp1.department = "Engineering"

emp2 = company.employees.add()
emp2.id = 2
emp2.name = "Bob"
emp2.department = "Sales"

# Serialize
data = company.SerializeToString()
print(f"Company data: {len(data)} bytes")

Converting Between Protobuf and JSON

Python Protobuf library provides utilities to convert between Protobuf and JSON:

Protobuf to JSON

from google.protobuf.json_format import MessageToJson
import user_pb2

# Create user
user = user_pb2.User()
user.id = 1
user.name = "Alice"
user.email = "[email protected]"

# Convert to JSON
json_string = MessageToJson(user)
print(json_string)

# Output:
# {
#   "id": 1,
#   "name": "Alice",
#   "email": "[email protected]",
#   "isActive": false,
#   "hobbies": []
# }

Perfect for debugging or when you need human-readable output.

JSON to Protobuf

from google.protobuf.json_format import Parse
import user_pb2

json_string = '''
{
  "id": 2,
  "name": "Bob",
  "email": "[email protected]",
  "isActive": true
}
'''

# Parse JSON into Protobuf object
user = Parse(json_string, user_pb2.User())

print(f"Name: {user.name}")
print(f"Active: {user.is_active}")

# Output:
# Name: Bob
# Active: True

Useful for accepting JSON input and converting to Protobuf.

Real-World Example: User Service

Let's build a complete example that saves and loads user data:

user_service.py

import user_pb2
import os

class UserService:
    def __init__(self, storage_path='users.pb'):
        self.storage_path = storage_path
    
    def save_user(self, user_id, name, email, is_active=True, hobbies=None):
        """Save a user to file"""
        user = user_pb2.User()
        user.id = user_id
        user.name = name
        user.email = email
        user.is_active = is_active
        
        if hobbies:
            user.hobbies.extend(hobbies)
        
        # Write to file
        with open(self.storage_path, 'wb') as f:
            f.write(user.SerializeToString())
        
        print(f"User {name} saved successfully!")
        return user
    
    def load_user(self):
        """Load user from file"""
        if not os.path.exists(self.storage_path):
            print("No user file found")
            return None
        
        user = user_pb2.User()
        with open(self.storage_path, 'rb') as f:
            user.ParseFromString(f.read())
        
        return user
    
    def display_user(self, user):
        """Display user information"""
        print(f"\nUser Information:")
        print(f"  ID: {user.id}")
        print(f"  Name: {user.name}")
        print(f"  Email: {user.email}")
        print(f"  Active: {user.is_active}")
        print(f"  Hobbies: {', '.join(user.hobbies)}")

# Usage
if __name__ == "__main__":
    service = UserService()
    
    # Save a user
    user = service.save_user(
        user_id=1,
        name="Alice",
        email="[email protected]",
        is_active=True,
        hobbies=["reading", "coding", "hiking"]
    )
    
    # Load and display
    loaded_user = service.load_user()
    if loaded_user:
        service.display_user(loaded_user)

Best Practices

Never change field numbers: Once assigned, field numbers must never change to maintain compatibility

Use descriptive field names: Clear names make your schema self-documenting

Add new fields with new numbers: For backward compatibility, always use new field numbers

Handle missing fields gracefully: Check if fields are set before accessing

Version your .proto files: Use comments to track changes and versions

Use binary mode for files: Always use 'wb' and 'rb' modes

Helpful Resources

What is Protobuf?

Complete beginner's guide to Protocol Buffers

Protobuf vs JSON

Detailed performance comparison

Python JSON Parser

Working with JSON in Python

JSON Formatter

Format and beautify JSON data

Learn More

Official Python Tutorial

Protocol Buffers Python documentation

Python Protobuf API

Complete API reference

gRPC Python

Using Protobuf with gRPC in Python

Protobuf Python on GitHub

Source code and examples

Summary

Protocol Buffers with Python provides a powerful, efficient way to serialize and deserialize data. Whether you're building microservices, mobile backends, or data pipelines, Protobuf offers significant performance benefits.

•Install protoc and protobuf Python library
•Define schemas in .proto files
•Generate Python code with protoc
•Use SerializeToString() and ParseFromString()
•Convert to/from JSON when needed for debugging

Next Steps: Learn how Protobuf compares to JSON or explore gRPC with Python for building high-performance APIs.

Protobuf Python Tutorial - Complete Guide