← Back to XML Articles

Python XML Tutorial

Complete guide to parsing, creating, and processing XML files in Python with ElementTree and lxml

Python16 min read

Introduction to XML in Python

Python provides excellent built-in and third-party libraries for working with XML files. Whether you need to parse configuration files, process data feeds, or interact with XML-based APIs, Python makes it straightforward and efficient.

In this comprehensive tutorial, you'll learn how to:

  • Parse XML files and extract data
  • Navigate XML tree structures
  • Create new XML documents from scratch
  • Modify existing XML elements and attributes
  • Handle namespaces and validation
  • Process large XML files efficiently

Sample XML File

We'll use this XML throughout the tutorial:

XML
<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book id="001" category="fiction">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
        <year>1925</year>
        <price currency="USD">10.99</price>
        <available>true</available>
    </book>
    <book id="002" category="fiction">
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price currency="USD">8.99</price>
        <available>true</available>
    </book>
    <book id="003" category="programming">
        <title>Clean Code</title>
        <author>Robert C. Martin</author>
        <year>2008</year>
        <price currency="USD">45.99</price>
        <available>false</available>
    </book>
</library>

Save this as library.xml to follow along.

Python XML Libraries

Python offers several libraries for XML processing. Here are the main options:

xml.etree.ElementTree

Built-in library

  • Part of Python standard library
  • No installation needed
  • Simple and lightweight
  • Good for most use cases

Recommended for: Beginners, simple XML tasks

lxml

Third-party library

  • More features and faster
  • XPath and XSLT support
  • Better error handling
  • XML validation support

Install:

pip install lxml

Recommended for: Advanced features, performance

xml.dom.minidom

Built-in DOM parser

  • DOM-style API
  • Loads entire document
  • More verbose syntax

Recommended for: DOM-style parsing

xml.sax

Event-driven parser

  • Streaming parser
  • Memory efficient
  • More complex to use

Recommended for: Very large files

Tutorial Focus:

This tutorial focuses on ElementTree (built-in) with examples in lxml where beneficial. ElementTree is perfect for learning and handles 90% of XML tasks.

Parsing XML Files

Method 1: Parse from File

The most common way to parse XML is from a file:

Python
import xml.etree.ElementTree as ET

# Parse XML file
tree = ET.parse('library.xml')
root = tree.getroot()

# Get root element info
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")
print(f"Number of children: {len(root)}")

# Output:
# Root tag: library
# Root attributes: {}
# Number of children: 3

Method 2: Parse from String

Parse XML directly from a string:

Python
import xml.etree.ElementTree as ET

xml_string = """
<book id="001">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
</book>
"""

# Parse from string
root = ET.fromstring(xml_string)

print(f"Tag: {root.tag}")
print(f"ID: {root.attrib['id']}")
print(f"Title: {root.find('title').text}")

# Output:
# Tag: book
# ID: 001
# Title: The Great Gatsby

Method 3: Parse from URL

Parse XML from a remote URL:

Python
import xml.etree.ElementTree as ET
import urllib.request

# Download and parse XML from URL
url = "https://example.com/data.xml"
with urllib.request.urlopen(url) as response:
    xml_data = response.read()

root = ET.fromstring(xml_data)
print(f"Root element: {root.tag}")

Error Handling:

Python
import xml.etree.ElementTree as ET

try:
    tree = ET.parse('library.xml')
    root = tree.getroot()
except ET.ParseError as e:
    print(f"Parse error: {e}")
except FileNotFoundError:
    print("File not found")
except Exception as e:
    print(f"Error: {e}")

Creating XML Documents

Building XML from Scratch

Python
import xml.etree.ElementTree as ET

# Create root element
root = ET.Element('library')

# Create first book
book1 = ET.SubElement(root, 'book')
book1.set('id', '001')
book1.set('category', 'fiction')

# Add child elements
title1 = ET.SubElement(book1, 'title')
title1.text = 'The Great Gatsby'

author1 = ET.SubElement(book1, 'author')
author1.text = 'F. Scott Fitzgerald'

year1 = ET.SubElement(book1, 'year')
year1.text = '1925'

# Create second book
book2 = ET.SubElement(root, 'book')
book2.set('id', '002')
book2.set('category', 'fiction')

title2 = ET.SubElement(book2, 'title')
title2.text = '1984'

author2 = ET.SubElement(book2, 'author')
author2.text = 'George Orwell'

# Create tree and save
tree = ET.ElementTree(root)
ET.indent(tree, space="    ")  # Python 3.9+
tree.write('new_library.xml', encoding='utf-8', xml_declaration=True)

print("XML file created successfully!")

Pretty Printing XML

Python
import xml.etree.ElementTree as ET

# For Python 3.9+, use ET.indent()
tree = ET.ElementTree(root)
ET.indent(tree, space="    ")
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

# For older Python versions, use xml.dom.minidom
import xml.dom.minidom as minidom

xml_string = ET.tostring(root, encoding='utf-8')
dom = minidom.parseString(xml_string)
pretty_xml = dom.toprettyxml(indent="    ")

with open('output.xml', 'w', encoding='utf-8') as f:
    f.write(pretty_xml)

Creating from Dictionary

Python
import xml.etree.ElementTree as ET

def dict_to_xml(tag, d):
    """Convert dictionary to XML Element"""
    elem = ET.Element(tag)
    for key, val in d.items():
        if isinstance(val, dict):
            child = dict_to_xml(key, val)
            elem.append(child)
        elif isinstance(val, list):
            for item in val:
                child = dict_to_xml(key[:-1], item)  # Remove 's'
                elem.append(child)
        else:
            child = ET.SubElement(elem, key)
            child.text = str(val)
    return elem

# Example usage
book_data = {
    'title': 'Python Programming',
    'author': 'John Doe',
    'year': 2025,
    'price': 39.99
}

book_elem = dict_to_xml('book', book_data)
tree = ET.ElementTree(book_elem)
tree.write('book.xml', encoding='utf-8', xml_declaration=True)

Modifying Existing XML

Update Element Values

Python
import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

# Update text content
first_book = root.find('book')
price = first_book.find('price')
price.text = '12.99'  # Update price

# Update attribute
first_book.set('category', 'classic')

# Save changes
tree.write('library_updated.xml', encoding='utf-8', xml_declaration=True)
print("XML updated successfully!")

Add New Elements

Python
import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

# Add new book
new_book = ET.SubElement(root, 'book')
new_book.set('id', '004')
new_book.set('category', 'programming')

title = ET.SubElement(new_book, 'title')
title.text = 'Python Crash Course'

author = ET.SubElement(new_book, 'author')
author.text = 'Eric Matthes'

year = ET.SubElement(new_book, 'year')
year.text = '2023'

# Add new child to existing element
first_book = root.find('book')
rating = ET.SubElement(first_book, 'rating')
rating.text = '4.5'

# Save
tree.write('library_expanded.xml', encoding='utf-8', xml_declaration=True)

Remove Elements

Python
import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

# Remove specific book by ID
for book in root.findall('book'):
    if book.get('id') == '003':
        root.remove(book)
        print("Book removed")

# Remove all unavailable books
for book in root.findall('book'):
    available = book.find('available')
    if available is not None and available.text == 'false':
        root.remove(book)

# Remove a child element
first_book = root.find('book')
year = first_book.find('year')
if year is not None:
    first_book.remove(year)

# Save
tree.write('library_cleaned.xml', encoding='utf-8', xml_declaration=True)

Complete CRUD Example

Python
import xml.etree.ElementTree as ET

class XMLLibrary:
    def __init__(self, filename):
        self.filename = filename
        self.tree = ET.parse(filename)
        self.root = self.tree.getroot()
    
    def add_book(self, book_id, title, author, year, price, category):
        """Create - Add new book"""
        book = ET.SubElement(self.root, 'book')
        book.set('id', book_id)
        book.set('category', category)
        
        ET.SubElement(book, 'title').text = title
        ET.SubElement(book, 'author').text = author
        ET.SubElement(book, 'year').text = str(year)
        ET.SubElement(book, 'price').text = str(price)
    
    def get_book(self, book_id):
        """Read - Get book by ID"""
        for book in self.root.findall('book'):
            if book.get('id') == book_id:
                return {
                    'id': book.get('id'),
                    'title': book.find('title').text,
                    'author': book.find('author').text,
                    'year': book.find('year').text
                }
        return None
    
    def update_book(self, book_id, **kwargs):
        """Update - Modify book details"""
        for book in self.root.findall('book'):
            if book.get('id') == book_id:
                for key, value in kwargs.items():
                    elem = book.find(key)
                    if elem is not None:
                        elem.text = str(value)
                return True
        return False
    
    def delete_book(self, book_id):
        """Delete - Remove book"""
        for book in self.root.findall('book'):
            if book.get('id') == book_id:
                self.root.remove(book)
                return True
        return False
    
    def save(self):
        """Save changes to file"""
        ET.indent(self.tree, space="    ")
        self.tree.write(self.filename, encoding='utf-8', xml_declaration=True)

# Usage
library = XMLLibrary('library.xml')

# Add new book
library.add_book('004', 'Design Patterns', 'Gang of Four', 1994, 54.99, 'programming')

# Update book
library.update_book('001', price='11.99', year='1926')

# Delete book
library.delete_book('002')

# Save all changes
library.save()
print("Library updated!")

Advanced Techniques

1. Working with Namespaces

XML
<?xml version="1.0"?>
<library xmlns="http://example.com/library"
         xmlns:pub="http://example.com/publisher">
    <book>
        <title>Example</title>
        <pub:publisher>Example Press</pub:publisher>
    </book>
</library>
Python
import xml.etree.ElementTree as ET

# Register namespace
ns = {
    'lib': 'http://example.com/library',
    'pub': 'http://example.com/publisher'
}

tree = ET.parse('library_with_ns.xml')
root = tree.getroot()

# Find with namespace
books = root.findall('lib:book', ns)
for book in books:
    title = book.find('lib:title', ns).text
    publisher = book.find('pub:publisher', ns).text
    print(f"{title} - {publisher}")

2. Processing Large Files (Streaming)

Python
import xml.etree.ElementTree as ET

def process_large_xml(filename):
    """Memory-efficient parsing of large XML files"""
    context = ET.iterparse(filename, events=('start', 'end'))
    context = iter(context)
    
    event, root = next(context)
    
    for event, elem in context:
        if event == 'end' and elem.tag == 'book':
            # Process book element
            title = elem.find('title').text
            author = elem.find('author').text
            print(f"{title} by {author}")
            
            # Clear element to free memory
            elem.clear()
            root.clear()

# Process without loading entire file into memory
process_large_xml('huge_library.xml')

3. XML to Dict Conversion

Python
import xml.etree.ElementTree as ET

def xml_to_dict(element):
    """Convert XML element to dictionary"""
    result = {}
    
    # Add attributes
    if element.attrib:
        result['@attributes'] = element.attrib
    
    # Add text content
    if element.text and element.text.strip():
        result['text'] = element.text.strip()
    
    # Add children
    for child in element:
        child_data = xml_to_dict(child)
        
        if child.tag in result:
            # Multiple children with same tag -> list
            if not isinstance(result[child.tag], list):
                result[child.tag] = [result[child.tag]]
            result[child.tag].append(child_data)
        else:
            result[child.tag] = child_data
    
    return result

# Usage
tree = ET.parse('library.xml')
root = tree.getroot()
data = xml_to_dict(root)

import json
print(json.dumps(data, indent=2))

4. Using lxml for XPath

lxml provides full XPath 1.0 support:

Python
# First install: pip install lxml
from lxml import etree

tree = etree.parse('library.xml')
root = tree.getroot()

# Advanced XPath queries
# Find books with price > 10
expensive_books = root.xpath('//book[price > 10]/title/text()')
print(expensive_books)

# Find fiction books only
fiction_titles = root.xpath('//book[@category="fiction"]/title/text()')
print(fiction_titles)

# Get book count by category
programming_count = root.xpath('count(//book[@category="programming"])')
print(f"Programming books: {programming_count}")

# Find books published after 2000
modern_books = root.xpath('//book[year > 2000]')
for book in modern_books:
    print(etree.tostring(book, pretty_print=True).decode())

5. XML Validation

Python
from lxml import etree

# Load XML Schema (XSD)
with open('library.xsd', 'r') as schema_file:
    schema_root = etree.XML(schema_file.read())
    schema = etree.XMLSchema(schema_root)

# Parse and validate XML
parser = etree.XMLParser(schema=schema)
try:
    tree = etree.parse('library.xml', parser)
    print("XML is valid!")
except etree.XMLSyntaxError as e:
    print(f"Validation error: {e}")

Best Practices

Use ElementTree for Most Tasks

It's built-in, lightweight, and sufficient for 90% of XML processing needs.

Always Handle Exceptions

XML parsing can fail. Wrap parse operations in try-except blocks.

Check for None Before Accessing

Use if elem is not None: before accessing .text or .attrib

Use Streaming for Large Files

Use iterparse() instead of parse() for large XML files.

Validate External XML

Always validate XML from untrusted sources to prevent security issues.

Use UTF-8 Encoding

Always specify encoding='utf-8' when writing XML files.

Pretty Print for Readability

Use ET.indent() (Python 3.9+) for human-readable output.

Consider Using lxml for Complex Tasks

If you need XPath, XSLT, or better performance, use lxml library.

❌ Common Mistakes to Avoid:

  • Not checking if element exists before accessing .text
  • Loading huge XML files entirely into memory
  • Forgetting to save changes after modifying XML
  • Not handling encoding issues with special characters
  • Using string concatenation to build XML (use ElementTree instead)

Additional Resources

Learn More:

XML Tools:

Related Articles