How to Use Protocol Buffers in Python

A practical guide to working with Protocol Buffers in Python applications

Published: January 2025 • 8 min read

Protocol Buffers works seamlessly with Python. It's commonly used in data science, backend services, and microservices where efficient serialization matters. Python's simple syntax makes protobuf easy to learn and use.

This guide shows you everything you need to get started with Protocol Buffers in Python, from installation to writing your first working examples.

What You'll Need

  • Python: Version 3.7 or higher (check with python --version)
  • pip: Python package manager (usually comes with Python)
  • Protocol Buffer Compiler: We'll install this next

Step 1: Install Protocol Buffers

Install the Python protobuf library:

pip install protobuf

Next, install the Protocol Buffer compiler. On Windows:

# Download from https://github.com/protocolbuffers/protobuf/releases
# Extract and add to PATH, or use chocolatey:
choco install protoc

On macOS:

brew install protobuf

On Linux:

sudo apt install protobuf-compiler

Verify installation:

protoc --version
# Should show: libprotoc 3.x.x or higher

Step 2: Set Up Your Project

Create a simple project structure:

protobuf-python-example/
├── protos/
│   └── person.proto
├── generated/
│   └── (generated files go here)
└── main.py

You can create this manually or with these commands:

mkdir protobuf-python-example
cd protobuf-python-example
mkdir protos generated
touch main.py

Step 3: Create Your .proto File

Create protos/person.proto:

syntax = "proto3";

package tutorial;

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
  repeated string phone_numbers = 4;
  
  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }
  
  message PhoneNumber {
    string number = 1;
    PhoneType type = 2;
  }
  
  repeated PhoneNumber phones = 5;
  
  // Optional fields
  string address = 6;
  bool is_active = 7;
}

message AddressBook {
  repeated Person people = 1;
}

Key points:

  • repeated - Creates a Python list
  • message - Becomes a Python class
  • Numbers (1, 2, 3) are field tags, not values

Step 4: Compile the .proto File

Run the protobuf compiler to generate Python code:

protoc -I=protos --python_out=generated protos/person.proto

This creates generated/person_pb2.py. This file contains all the classes you need.

Command breakdown:

  • -I=protos - Input directory
  • --python_out=generated - Output directory
  • protos/person.proto - Source file

Step 5: Use Protocol Buffers in Python

Create main.py:

import sys
sys.path.append('generated')

import person_pb2

def create_person():
    """Create a Person message"""
    person = person_pb2.Person()
    person.name = "Telecom Subscriber"
    person.id = 917123456789
    person.email = "subscriber@telecom.com"
    
    # Add simple phone numbers
    person.phone_numbers.append("+91-9876543210")
    person.phone_numbers.append("+91-9876543211")
    
    # Add structured phone numbers
    phone = person.phones.add()
    phone.number = "+91-9876543212"
    phone.type = person_pb2.Person.MOBILE
    
    # Optional fields
    person.address = "Cell Tower Sector A, Base Station 001"
    person.is_active = True
    
    return person

def serialize_example():
    """Serialize to binary"""
    person = create_person()
    
    print("Created Person:")
    print(f"Name: {person.name}")
    print(f"ID: {person.id}")
    print(f"Email: {person.email}")
    print(f"Phone numbers: {list(person.phone_numbers)}")
    
    # Serialize to bytes
    binary_data = person.SerializeToString()
    print(f"\nSerialized to {len(binary_data)} bytes")
    
    return binary_data

def deserialize_example(binary_data):
    """Deserialize from binary"""
    person = person_pb2.Person()
    person.ParseFromString(binary_data)
    
    print("\nDeserialized Person:")
    print(f"Name: {person.name}")
    print(f"ID: {person.id}")
    print(f"Email: {person.email}")
    print(f"Active: {person.is_active}")
    
    return person

def file_example():
    """Save to file and read back"""
    person = create_person()
    
    # Write to file
    with open("person.bin", "wb") as f:
        f.write(person.SerializeToString())
    print("\nSaved to person.bin")
    
    # Read from file
    person_from_file = person_pb2.Person()
    with open("person.bin", "rb") as f:
        person_from_file.ParseFromString(f.read())
    print(f"Read from file: {person_from_file.name}")

def address_book_example():
    """Work with multiple people"""
    address_book = person_pb2.AddressBook()
    
    # Add first person
    person1 = address_book.people.add()
    person1.name = "Telecom Subscriber"
    person1.id = 917123456789
    person1.email = "subscriber@telecom.com"
    
    # Add second person
    person2 = address_book.people.add()
    person2.name = "Network Admin"
    person2.id = 919876543210
    person2.email = "admin@telecom.com"
    
    print(f"\nAddress book has {len(address_book.people)} people")
    
    # Iterate through people
    for person in address_book.people:
        print(f"- {person.name} (ID: {person.id})")

if __name__ == "__main__":
    # Run examples
    binary_data = serialize_example()
    deserialize_example(binary_data)
    file_example()
    address_book_example()

Step 6: Run Your Application

Run the script:

python main.py

Expected output:

Created Person:
Name: Telecom Subscriber
ID: 917123456789
Email: subscriber@telecom.com
Phone numbers: ['+91-9876543210', '+91-9876543211']

Serialized to 102 bytes

Deserialized Person:
Name: Telecom Subscriber
ID: 917123456789
Email: subscriber@telecom.com
Active: True

Saved to person.bin
Read from file: Telecom Subscriber

Address book has 2 people
- Telecom Subscriber (ID: 917123456789)
- Network Admin (ID: 919876543210)

Common Operations in Python

Check if Field is Set

if person.HasField('email'):
    print(f"Email: {person.email}")

Clear a Field

person.ClearField('email')
# Or clear entire message
person.Clear()

Copy a Message

person2 = person_pb2.Person()
person2.CopyFrom(person1)

Merge Messages

# Merge person2 into person1
person1.MergeFrom(person2)

Convert to JSON

from google.protobuf.json_format import MessageToJson

json_string = MessageToJson(person)
print(json_string)

Parse from JSON

from google.protobuf.json_format import Parse

json_str = '{"name": "Mobile User", "id": 919123456789}'
person = Parse(json_str, person_pb2.Person())

Print Debug String

# Great for debugging
print(person)
# Or more readable format:
from google.protobuf import text_format
print(text_format.MessageToString(person))

Working with Repeated Fields

Repeated fields work like Python lists:

# Add items one by one
person.phone_numbers.append("555-1111")
person.phone_numbers.append("555-2222")

# Extend with multiple items
numbers = ["555-3333", "555-4444"]
person.phone_numbers.extend(numbers)

# Get length
count = len(person.phone_numbers)

# Access by index
first_number = person.phone_numbers[0]

# Iterate
for number in person.phone_numbers:
    print(number)

# Clear all items
del person.phone_numbers[:]

# For nested messages, use add()
phone = person.phones.add()
phone.number = "555-7777"
phone.type = person_pb2.Person.HOME

Best Practices for Python

Always Check Field Presence

Use HasField() before accessing optional fields to avoid getting default values.

Use Binary Mode for Files

Always open files in binary mode ('wb' or'rb') when working with protobuf data.

Handle ParseError

Wrap ParseFromString() in try-except to handle corrupted data gracefully.

Don't Edit Generated Files

Never modify *_pb2.py files. They're auto-generated and will be overwritten.

Using Virtual Environments

It's good practice to use a virtual environment:

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate

# Activate (macOS/Linux)
source venv/bin/activate

# Install protobuf
pip install protobuf

# Create requirements.txt
pip freeze > requirements.txt

# Later, install from requirements
pip install -r requirements.txt

Using Protobuf with Web Frameworks

Here's a simple Flask example:

from flask import Flask, request, Response
import person_pb2

app = Flask(__name__)

@app.route('/api/person', methods=['POST'])
def create_person():
    # Parse binary protobuf from request
    person = person_pb2.Person()
    person.ParseFromString(request.data)
    
    # Process the person...
    print(f"Received: {person.name}")
    
    # Return binary protobuf
    return Response(
        person.SerializeToString(),
        mimetype='application/x-protobuf'
    )

@app.route('/api/person/json', methods=['POST'])
def create_person_json():
    # Accept JSON, return protobuf
    from google.protobuf.json_format import Parse
    
    person = Parse(request.data, person_pb2.Person())
    return Response(
        person.SerializeToString(),
        mimetype='application/x-protobuf'
    )

Common Issues

Issue: Cannot import person_pb2

Solution: Make sure you've compiled the .proto file and the generated folder is in your Python path. Use sys.path.append('generated') at the top of your script.

Issue: protoc command not found

Solution: Install the Protocol Buffer compiler using your system's package manager, or download from the official GitHub releases page.

Issue: TypeError with repeated fields

Solution: For nested messages, use add() instead of append(). Example: person.phones.add()

Issue: ParseError when reading files

Solution: Ensure you're opening files in binary mode ('rb'). Also check that the .proto schema matches the serialized data.

Related Tools

Additional Resources

Official Documentation & References

Conclusion

Protocol Buffers fits naturally into Python's ecosystem. The generated code is clean and the API is straightforward. Python's dynamic nature makes working with protobuf messages intuitive and easy.

Start with simple examples like we covered here, then expand to more complex use cases. Whether you're building microservices, data pipelines, or API clients, protobuf provides efficient serialization that scales well.