Protocol Buffers works seamlessly with Python. It's commonly used in data science, backend services, and microservices where efficient serialization matters. Python's simple syntax makes protobuf easy to learn and use.
This guide shows you everything you need to get started with Protocol Buffers in Python, from installation to writing your first working examples.
What You'll Need
- •Python: Version 3.7 or higher (check with
python --version
) - •pip: Python package manager (usually comes with Python)
- •Protocol Buffer Compiler: We'll install this next
Step 1: Install Protocol Buffers
Install the Python protobuf library:
pip install protobuf
Next, install the Protocol Buffer compiler. On Windows:
# Download from https://github.com/protocolbuffers/protobuf/releases # Extract and add to PATH, or use chocolatey: choco install protoc
On macOS:
brew install protobuf
On Linux:
sudo apt install protobuf-compiler
Verify installation:
protoc --version # Should show: libprotoc 3.x.x or higher
Step 2: Set Up Your Project
Create a simple project structure:
protobuf-python-example/ ├── protos/ │ └── person.proto ├── generated/ │ └── (generated files go here) └── main.py
You can create this manually or with these commands:
mkdir protobuf-python-example cd protobuf-python-example mkdir protos generated touch main.py
Step 3: Create Your .proto File
Create protos/person.proto
:
syntax = "proto3"; package tutorial; message Person { string name = 1; int32 id = 2; string email = 3; repeated string phone_numbers = 4; enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { string number = 1; PhoneType type = 2; } repeated PhoneNumber phones = 5; // Optional fields string address = 6; bool is_active = 7; } message AddressBook { repeated Person people = 1; }
Key points:
repeated
- Creates a Python listmessage
- Becomes a Python class- Numbers (1, 2, 3) are field tags, not values
Step 4: Compile the .proto File
Run the protobuf compiler to generate Python code:
protoc -I=protos --python_out=generated protos/person.proto
This creates generated/person_pb2.py
. This file contains all the classes you need.
Command breakdown:
-I=protos
- Input directory--python_out=generated
- Output directoryprotos/person.proto
- Source file
Step 5: Use Protocol Buffers in Python
Create main.py
:
import sys sys.path.append('generated') import person_pb2 def create_person(): """Create a Person message""" person = person_pb2.Person() person.name = "Telecom Subscriber" person.id = 917123456789 person.email = "subscriber@telecom.com" # Add simple phone numbers person.phone_numbers.append("+91-9876543210") person.phone_numbers.append("+91-9876543211") # Add structured phone numbers phone = person.phones.add() phone.number = "+91-9876543212" phone.type = person_pb2.Person.MOBILE # Optional fields person.address = "Cell Tower Sector A, Base Station 001" person.is_active = True return person def serialize_example(): """Serialize to binary""" person = create_person() print("Created Person:") print(f"Name: {person.name}") print(f"ID: {person.id}") print(f"Email: {person.email}") print(f"Phone numbers: {list(person.phone_numbers)}") # Serialize to bytes binary_data = person.SerializeToString() print(f"\nSerialized to {len(binary_data)} bytes") return binary_data def deserialize_example(binary_data): """Deserialize from binary""" person = person_pb2.Person() person.ParseFromString(binary_data) print("\nDeserialized Person:") print(f"Name: {person.name}") print(f"ID: {person.id}") print(f"Email: {person.email}") print(f"Active: {person.is_active}") return person def file_example(): """Save to file and read back""" person = create_person() # Write to file with open("person.bin", "wb") as f: f.write(person.SerializeToString()) print("\nSaved to person.bin") # Read from file person_from_file = person_pb2.Person() with open("person.bin", "rb") as f: person_from_file.ParseFromString(f.read()) print(f"Read from file: {person_from_file.name}") def address_book_example(): """Work with multiple people""" address_book = person_pb2.AddressBook() # Add first person person1 = address_book.people.add() person1.name = "Telecom Subscriber" person1.id = 917123456789 person1.email = "subscriber@telecom.com" # Add second person person2 = address_book.people.add() person2.name = "Network Admin" person2.id = 919876543210 person2.email = "admin@telecom.com" print(f"\nAddress book has {len(address_book.people)} people") # Iterate through people for person in address_book.people: print(f"- {person.name} (ID: {person.id})") if __name__ == "__main__": # Run examples binary_data = serialize_example() deserialize_example(binary_data) file_example() address_book_example()
Step 6: Run Your Application
Run the script:
python main.py
Expected output:
Created Person: Name: Telecom Subscriber ID: 917123456789 Email: subscriber@telecom.com Phone numbers: ['+91-9876543210', '+91-9876543211'] Serialized to 102 bytes Deserialized Person: Name: Telecom Subscriber ID: 917123456789 Email: subscriber@telecom.com Active: True Saved to person.bin Read from file: Telecom Subscriber Address book has 2 people - Telecom Subscriber (ID: 917123456789) - Network Admin (ID: 919876543210)
Common Operations in Python
Check if Field is Set
if person.HasField('email'): print(f"Email: {person.email}")
Clear a Field
person.ClearField('email') # Or clear entire message person.Clear()
Copy a Message
person2 = person_pb2.Person() person2.CopyFrom(person1)
Merge Messages
# Merge person2 into person1 person1.MergeFrom(person2)
Convert to JSON
from google.protobuf.json_format import MessageToJson json_string = MessageToJson(person) print(json_string)
Parse from JSON
from google.protobuf.json_format import Parse json_str = '{"name": "Mobile User", "id": 919123456789}' person = Parse(json_str, person_pb2.Person())
Print Debug String
# Great for debugging print(person) # Or more readable format: from google.protobuf import text_format print(text_format.MessageToString(person))
Working with Repeated Fields
Repeated fields work like Python lists:
# Add items one by one person.phone_numbers.append("555-1111") person.phone_numbers.append("555-2222") # Extend with multiple items numbers = ["555-3333", "555-4444"] person.phone_numbers.extend(numbers) # Get length count = len(person.phone_numbers) # Access by index first_number = person.phone_numbers[0] # Iterate for number in person.phone_numbers: print(number) # Clear all items del person.phone_numbers[:] # For nested messages, use add() phone = person.phones.add() phone.number = "555-7777" phone.type = person_pb2.Person.HOME
Best Practices for Python
Always Check Field Presence
Use HasField()
before accessing optional fields to avoid getting default values.
Use Binary Mode for Files
Always open files in binary mode ('wb'
or'rb'
) when working with protobuf data.
Handle ParseError
Wrap ParseFromString()
in try-except to handle corrupted data gracefully.
Don't Edit Generated Files
Never modify *_pb2.py
files. They're auto-generated and will be overwritten.
Using Virtual Environments
It's good practice to use a virtual environment:
# Create virtual environment python -m venv venv # Activate (Windows) venv\Scripts\activate # Activate (macOS/Linux) source venv/bin/activate # Install protobuf pip install protobuf # Create requirements.txt pip freeze > requirements.txt # Later, install from requirements pip install -r requirements.txt
Using Protobuf with Web Frameworks
Here's a simple Flask example:
from flask import Flask, request, Response import person_pb2 app = Flask(__name__) @app.route('/api/person', methods=['POST']) def create_person(): # Parse binary protobuf from request person = person_pb2.Person() person.ParseFromString(request.data) # Process the person... print(f"Received: {person.name}") # Return binary protobuf return Response( person.SerializeToString(), mimetype='application/x-protobuf' ) @app.route('/api/person/json', methods=['POST']) def create_person_json(): # Accept JSON, return protobuf from google.protobuf.json_format import Parse person = Parse(request.data, person_pb2.Person()) return Response( person.SerializeToString(), mimetype='application/x-protobuf' )
Common Issues
Issue: Cannot import person_pb2
Solution: Make sure you've compiled the .proto file and the generated folder is in your Python path. Use sys.path.append('generated')
at the top of your script.
Issue: protoc command not found
Solution: Install the Protocol Buffer compiler using your system's package manager, or download from the official GitHub releases page.
Issue: TypeError with repeated fields
Solution: For nested messages, use add()
instead of append()
. Example: person.phones.add()
Issue: ParseError when reading files
Solution: Ensure you're opening files in binary mode ('rb'
). Also check that the .proto schema matches the serialized data.
Related Tools
Additional Resources
Official Documentation & References
- Official Python Protobuf Tutorial - Google's official Python guide
- Python Protobuf on GitHub - Source code and examples
- Protobuf on PyPI - Python package repository
Conclusion
Protocol Buffers fits naturally into Python's ecosystem. The generated code is clean and the API is straightforward. Python's dynamic nature makes working with protobuf messages intuitive and easy.
Start with simple examples like we covered here, then expand to more complex use cases. Whether you're building microservices, data pipelines, or API clients, protobuf provides efficient serialization that scales well.