Protobuf Python Tutorial - Complete Guide

Learn to use Protocol Buffers in Python with practical examples

Published: January 2025 • 10 min read

Protocol Buffers (Protobuf) is a fast, efficient data serialization format perfect for Python applications. Whether you're building gRPC services, microservices, or need efficient data storage, Protobuf with Python is a powerful combination.

This tutorial covers everything: installation, creating schemas, serialization, deserialization, and real-world examples. If you're new to Protobuf, start with our What is Protobuf guide.

What You'll Learn: How to install Protobuf, create .proto files, generate Python code, serialize/deserialize data, and use Protobuf in real Python applications.

Installation

Step 1: Install Protocol Buffer Compiler

# macOS (using Homebrew)
brew install protobuf

# Ubuntu/Debian
sudo apt-get install protobuf-compiler

# Windows (using Chocolatey)
choco install protoc

# Or download from GitHub:
# https://github.com/protocolbuffers/protobuf/releases

# Verify installation
protoc --version
# Output: libprotoc 3.21.12 (or similar)

Step 2: Install Python Protobuf Library

# Install protobuf library
pip install protobuf

# Verify installation
python -c "import google.protobuf; print(google.protobuf.__version__)"
# Output: 4.25.1 (or similar)

Creating Your First .proto File

A .proto file defines your data structure. Let's create a simple user schema:

user.proto

syntax = "proto3";

// Define the User message
message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
  bool is_active = 4;
  repeated string hobbies = 5;
}

Key points: Each field has a type, name, and unique number (called a "tag"). The repeated keyword makes hobbies a list.

Compiling to Python Code

Use the protoc compiler to generate Python code from your schema:

Compile Command

# Compile user.proto to Python
protoc --python_out=. user.proto

# This generates: user_pb2.py

The --python_out=. flag tells protoc to generate Python code in the current directory. The generated file is named user_pb2.py.

Serialization: Python Object → Binary

Now let's use the generated code to create and serialize a User object:

Creating and Serializing

import user_pb2

# Create a User object
user = user_pb2.User()
user.id = 1
user.name = "Alice"
user.email = "[email protected]"
user.is_active = True
user.hobbies.append("reading")
user.hobbies.append("coding")

# Serialize to binary format
binary_data = user.SerializeToString()
print(f"Serialized size: {len(binary_data)} bytes")
print(f"Binary data: {binary_data}")

# Output:
# Serialized size: 45 bytes
# Binary data: b'\x08\x01\x12\x05Alice\x1a\[email protected] ...'

The SerializeToString() method converts the User object to compact binary format.

Deserialization: Binary → Python Object

Read binary data back into a Python object:

Deserializing Data

import user_pb2

# Assume we have binary_data from serialization
# binary_data = user.SerializeToString()

# Create a new User object
new_user = user_pb2.User()

# Parse binary data
new_user.ParseFromString(binary_data)

# Access the fields
print(f"ID: {new_user.id}")
print(f"Name: {new_user.name}")
print(f"Email: {new_user.email}")
print(f"Active: {new_user.is_active}")
print(f"Hobbies: {list(new_user.hobbies)}")

# Output:
# ID: 1
# Name: Alice
# Email: [email protected]
# Active: True
# Hobbies: ['reading', 'coding']

The ParseFromString() method reads binary data and populates the object fields.

Reading and Writing Protobuf Files

Writing to File

import user_pb2

# Create user
user = user_pb2.User()
user.id = 1
user.name = "Bob"
user.email = "[email protected]"
user.is_active = True

# Write to file
with open('user.pb', 'wb') as f:
    f.write(user.SerializeToString())

print("User saved to user.pb")

Reading from File

import user_pb2

# Read from file
user = user_pb2.User()
with open('user.pb', 'rb') as f:
    user.ParseFromString(f.read())

print(f"Loaded user: {user.name}")

Working with Nested Messages

Protobuf supports nested message types for complex data structures:

company.proto

syntax = "proto3";

message Company {
  string name = 1;
  
  message Address {
    string street = 1;
    string city = 2;
    string country = 3;
  }
  
  Address headquarters = 2;
  repeated Employee employees = 3;
}

message Employee {
  int32 id = 1;
  string name = 2;
  string department = 3;
}

Using Nested Messages in Python

import company_pb2

# Create company
company = company_pb2.Company()
company.name = "TechCorp"

# Set nested Address
company.headquarters.street = "123 Tech St"
company.headquarters.city = "San Francisco"
company.headquarters.country = "USA"

# Add employees (repeated field)
emp1 = company.employees.add()
emp1.id = 1
emp1.name = "Alice"
emp1.department = "Engineering"

emp2 = company.employees.add()
emp2.id = 2
emp2.name = "Bob"
emp2.department = "Sales"

# Serialize
data = company.SerializeToString()
print(f"Company data: {len(data)} bytes")

Converting Between Protobuf and JSON

Python Protobuf library provides utilities to convert between Protobuf and JSON:

Protobuf to JSON

from google.protobuf.json_format import MessageToJson
import user_pb2

# Create user
user = user_pb2.User()
user.id = 1
user.name = "Alice"
user.email = "[email protected]"

# Convert to JSON
json_string = MessageToJson(user)
print(json_string)

# Output:
# {
#   "id": 1,
#   "name": "Alice",
#   "email": "[email protected]",
#   "isActive": false,
#   "hobbies": []
# }

Perfect for debugging or when you need human-readable output.

JSON to Protobuf

from google.protobuf.json_format import Parse
import user_pb2

json_string = '''
{
  "id": 2,
  "name": "Bob",
  "email": "[email protected]",
  "isActive": true
}
'''

# Parse JSON into Protobuf object
user = Parse(json_string, user_pb2.User())

print(f"Name: {user.name}")
print(f"Active: {user.is_active}")

# Output:
# Name: Bob
# Active: True

Useful for accepting JSON input and converting to Protobuf.

Real-World Example: User Service

Let's build a complete example that saves and loads user data:

user_service.py

import user_pb2
import os

class UserService:
    def __init__(self, storage_path='users.pb'):
        self.storage_path = storage_path
    
    def save_user(self, user_id, name, email, is_active=True, hobbies=None):
        """Save a user to file"""
        user = user_pb2.User()
        user.id = user_id
        user.name = name
        user.email = email
        user.is_active = is_active
        
        if hobbies:
            user.hobbies.extend(hobbies)
        
        # Write to file
        with open(self.storage_path, 'wb') as f:
            f.write(user.SerializeToString())
        
        print(f"User {name} saved successfully!")
        return user
    
    def load_user(self):
        """Load user from file"""
        if not os.path.exists(self.storage_path):
            print("No user file found")
            return None
        
        user = user_pb2.User()
        with open(self.storage_path, 'rb') as f:
            user.ParseFromString(f.read())
        
        return user
    
    def display_user(self, user):
        """Display user information"""
        print(f"\nUser Information:")
        print(f"  ID: {user.id}")
        print(f"  Name: {user.name}")
        print(f"  Email: {user.email}")
        print(f"  Active: {user.is_active}")
        print(f"  Hobbies: {', '.join(user.hobbies)}")

# Usage
if __name__ == "__main__":
    service = UserService()
    
    # Save a user
    user = service.save_user(
        user_id=1,
        name="Alice",
        email="[email protected]",
        is_active=True,
        hobbies=["reading", "coding", "hiking"]
    )
    
    # Load and display
    loaded_user = service.load_user()
    if loaded_user:
        service.display_user(loaded_user)

Best Practices

1.
Never change field numbers: Once assigned, field numbers must never change to maintain compatibility
2.
Use descriptive field names: Clear names make your schema self-documenting
3.
Add new fields with new numbers: For backward compatibility, always use new field numbers
4.
Handle missing fields gracefully: Check if fields are set before accessing
5.
Version your .proto files: Use comments to track changes and versions
6.
Use binary mode for files: Always use 'wb' and 'rb' modes

Helpful Resources

Learn More

Summary

Protocol Buffers with Python provides a powerful, efficient way to serialize and deserialize data. Whether you're building microservices, mobile backends, or data pipelines, Protobuf offers significant performance benefits.

  • Install protoc and protobuf Python library
  • Define schemas in .proto files
  • Generate Python code with protoc
  • Use SerializeToString() and ParseFromString()
  • Convert to/from JSON when needed for debugging

Next Steps: Learn how Protobuf compares to JSON or explore gRPC with Python for building high-performance APIs.