Introduction to XML in Python

Python provides excellent built-in and third-party libraries like ElementTree and lxml for working with XML files. Whether you need to parse configuration files, process RSS data feeds, or interact with XML-based APIs, Python makes it straightforward and efficient. Learn XML basics, use XML validator, or try XPath queries.

In this comprehensive tutorial, you'll learn how to:

✓Parse XML files and extract data
✓Navigate XML tree structures
✓Create new XML documents from scratch
✓Modify existing XML elements and attributes
✓Handle namespaces and validation
✓Process large XML files efficiently

Sample XML File

We'll use this XML throughout the tutorial:

XML

<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book id="001" category="fiction">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
        <year>1925</year>
        <price currency="USD">10.99</price>
        <available>true</available>
    </book>
    <book id="002" category="fiction">
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price currency="USD">8.99</price>
        <available>true</available>
    </book>
    <book id="003" category="programming">
        <title>Clean Code</title>
        <author>Robert C. Martin</author>
        <year>2008</year>
        <price currency="USD">45.99</price>
        <available>false</available>
    </book>
</library>

Save this as library.xml to follow along.

Python XML Libraries

Python offers several libraries for XML processing. Here are the main options:

xml.etree.ElementTree

Built-in library

✓Part of Python standard library
✓No installation needed
✓Simple and lightweight
✓Good for most use cases

Recommended for: Beginners, simple XML tasks

lxml

Third-party library

✓More features and faster
✓XPath and XSLT support
✓Better error handling
✓XML validation support

Install:

pip install lxml

Recommended for: Advanced features, performance

xml.dom.minidom

Built-in DOM parser

•DOM-style API
•Loads entire document
•More verbose syntax

Recommended for: DOM-style parsing

xml.sax

Event-driven parser

•Streaming parser
•Memory efficient
•More complex to use

Recommended for: Very large files

Tutorial Focus:

This tutorial focuses on ElementTree (built-in) with examples in lxml where beneficial. ElementTree is perfect for learning and handles 90% of XML tasks.

Parsing XML Files

Method 1: Parse from File

The most common way to parse XML is from a file:

Python

import xml.etree.ElementTree as ET

# Parse XML file
tree = ET.parse('library.xml')
root = tree.getroot()

# Get root element info
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")
print(f"Number of children: {len(root)}")

# Output:
# Root tag: library
# Root attributes: {}
# Number of children: 3

Method 2: Parse from String

Parse XML directly from a string:

Python

import xml.etree.ElementTree as ET

xml_string = """
<book id="001">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
</book>
"""

# Parse from string
root = ET.fromstring(xml_string)

print(f"Tag: {root.tag}")
print(f"ID: {root.attrib['id']}")
print(f"Title: {root.find('title').text}")

# Output:
# Tag: book
# ID: 001
# Title: The Great Gatsby

Method 3: Parse from URL

Parse XML from a remote URL:

Python

import xml.etree.ElementTree as ET
import urllib.request

# Download and parse XML from URL
url = "https://example.com/data.xml"
with urllib.request.urlopen(url) as response:
    xml_data = response.read()

root = ET.fromstring(xml_data)
print(f"Root element: {root.tag}")

Error Handling:

Python

import xml.etree.ElementTree as ET

try:
    tree = ET.parse('library.xml')
    root = tree.getroot()
except ET.ParseError as e:
    print(f"Parse error: {e}")
except FileNotFoundError:
    print("File not found")
except Exception as e:
    print(f"Error: {e}")

Navigating XML Trees

Finding Elements

1. Iterate Through Children

Python

import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

# Iterate through all book elements
for book in root:
    print(f"Book ID: {book.attrib['id']}")
    print(f"Title: {book.find('title').text}")
    print(f"Author: {book.find('author').text}")
    print("---")

# Output:
# Book ID: 001
# Title: The Great Gatsby
# Author: F. Scott Fitzgerald
# ---
# Book ID: 002
# Title: 1984
# Author: George Orwell
# ---

2. Find First Match - find()

Python

# Find first book element
first_book = root.find('book')
print(first_book.find('title').text)  # The Great Gatsby

# Find nested element
title = root.find('book/title')
print(title.text)  # The Great Gatsby

# Find with attribute
# Note: find() doesn't support attribute filters directly
# Use findall() with iteration instead

3. Find All Matches - findall()

Python

# Find all book elements
books = root.findall('book')
print(f"Found {len(books)} books")

# Find all titles
titles = root.findall('.//title')  # . finds at any level
for title in titles:
    print(title.text)

# Output:
# The Great Gatsby
# 1984
# Clean Code

4. XPath-style Queries

Python

# Find books by attribute value
fiction_books = root.findall("book[@category='fiction']")
for book in fiction_books:
    print(book.find('title').text)

# Find books with price > 10
# Note: ElementTree has limited XPath support
# For advanced XPath, use lxml library

# Find all elements at any level
all_titles = root.findall('.//title')

# Find direct children only
direct_children = root.findall('./book')

5. Access Attributes and Text

Python

book = root.find('book')

# Get attribute value
book_id = book.get('id')  # Recommended
# or
book_id = book.attrib['id']  # Alternative

# Get element text
title = book.find('title').text
author = book.find('author').text

# Get all attributes as dict
all_attrs = book.attrib
print(all_attrs)  # {'id': '001', 'category': 'fiction'}

# Check if attribute exists
if 'id' in book.attrib:
    print(f"Book ID: {book.attrib['id']}")

Complete Example: Extract All Book Info

Python

import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

books_data = []

for book in root.findall('book'):
    book_info = {
        'id': book.get('id'),
        'category': book.get('category'),
        'title': book.find('title').text,
        'author': book.find('author').text,
        'year': int(book.find('year').text),
        'price': float(book.find('price').text),
        'currency': book.find('price').get('currency'),
        'available': book.find('available').text == 'true'
    }
    books_data.append(book_info)

# Print results
for book in books_data:
    print(f"{book['title']} by {book['author']} - {'$'}{book['price']}")

# Output:
# The Great Gatsby by F. Scott Fitzgerald - $10.99
# 1984 by George Orwell - $8.99
# Clean Code by Robert C. Martin - $45.99

Creating XML Documents

Building XML from Scratch

Python

import xml.etree.ElementTree as ET

# Create root element
root = ET.Element('library')

# Create first book
book1 = ET.SubElement(root, 'book')
book1.set('id', '001')
book1.set('category', 'fiction')

# Add child elements
title1 = ET.SubElement(book1, 'title')
title1.text = 'The Great Gatsby'

author1 = ET.SubElement(book1, 'author')
author1.text = 'F. Scott Fitzgerald'

year1 = ET.SubElement(book1, 'year')
year1.text = '1925'

# Create second book
book2 = ET.SubElement(root, 'book')
book2.set('id', '002')
book2.set('category', 'fiction')

title2 = ET.SubElement(book2, 'title')
title2.text = '1984'

author2 = ET.SubElement(book2, 'author')
author2.text = 'George Orwell'

# Create tree and save
tree = ET.ElementTree(root)
ET.indent(tree, space="    ")  # Python 3.9+
tree.write('new_library.xml', encoding='utf-8', xml_declaration=True)

print("XML file created successfully!")

Pretty Printing XML

Python

import xml.etree.ElementTree as ET

# For Python 3.9+, use ET.indent()
tree = ET.ElementTree(root)
ET.indent(tree, space="    ")
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

# For older Python versions, use xml.dom.minidom
import xml.dom.minidom as minidom

xml_string = ET.tostring(root, encoding='utf-8')
dom = minidom.parseString(xml_string)
pretty_xml = dom.toprettyxml(indent="    ")

with open('output.xml', 'w', encoding='utf-8') as f:
    f.write(pretty_xml)

Creating from Dictionary

Python

import xml.etree.ElementTree as ET

def dict_to_xml(tag, d):
    """Convert dictionary to XML Element"""
    elem = ET.Element(tag)
    for key, val in d.items():
        if isinstance(val, dict):
            child = dict_to_xml(key, val)
            elem.append(child)
        elif isinstance(val, list):
            for item in val:
                child = dict_to_xml(key[:-1], item)  # Remove 's'
                elem.append(child)
        else:
            child = ET.SubElement(elem, key)
            child.text = str(val)
    return elem

# Example usage
book_data = {
    'title': 'Python Programming',
    'author': 'John Doe',
    'year': 2025,
    'price': 39.99
}

book_elem = dict_to_xml('book', book_data)
tree = ET.ElementTree(book_elem)
tree.write('book.xml', encoding='utf-8', xml_declaration=True)

Modifying Existing XML

Update Element Values

Python

import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

# Update text content
first_book = root.find('book')
price = first_book.find('price')
price.text = '12.99'  # Update price

# Update attribute
first_book.set('category', 'classic')

# Save changes
tree.write('library_updated.xml', encoding='utf-8', xml_declaration=True)
print("XML updated successfully!")

Add New Elements

Python

import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

# Add new book
new_book = ET.SubElement(root, 'book')
new_book.set('id', '004')
new_book.set('category', 'programming')

title = ET.SubElement(new_book, 'title')
title.text = 'Python Crash Course'

author = ET.SubElement(new_book, 'author')
author.text = 'Eric Matthes'

year = ET.SubElement(new_book, 'year')
year.text = '2023'

# Add new child to existing element
first_book = root.find('book')
rating = ET.SubElement(first_book, 'rating')
rating.text = '4.5'

# Save
tree.write('library_expanded.xml', encoding='utf-8', xml_declaration=True)

Remove Elements

Python

import xml.etree.ElementTree as ET

tree = ET.parse('library.xml')
root = tree.getroot()

# Remove specific book by ID
for book in root.findall('book'):
    if book.get('id') == '003':
        root.remove(book)
        print("Book removed")

# Remove all unavailable books
for book in root.findall('book'):
    available = book.find('available')
    if available is not None and available.text == 'false':
        root.remove(book)

# Remove a child element
first_book = root.find('book')
year = first_book.find('year')
if year is not None:
    first_book.remove(year)

# Save
tree.write('library_cleaned.xml', encoding='utf-8', xml_declaration=True)

Complete CRUD Example

Python

import xml.etree.ElementTree as ET

class XMLLibrary:
    def __init__(self, filename):
        self.filename = filename
        self.tree = ET.parse(filename)
        self.root = self.tree.getroot()
    
    def add_book(self, book_id, title, author, year, price, category):
        """Create - Add new book"""
        book = ET.SubElement(self.root, 'book')
        book.set('id', book_id)
        book.set('category', category)
        
        ET.SubElement(book, 'title').text = title
        ET.SubElement(book, 'author').text = author
        ET.SubElement(book, 'year').text = str(year)
        ET.SubElement(book, 'price').text = str(price)
    
    def get_book(self, book_id):
        """Read - Get book by ID"""
        for book in self.root.findall('book'):
            if book.get('id') == book_id:
                return {
                    'id': book.get('id'),
                    'title': book.find('title').text,
                    'author': book.find('author').text,
                    'year': book.find('year').text
                }
        return None
    
    def update_book(self, book_id, **kwargs):
        """Update - Modify book details"""
        for book in self.root.findall('book'):
            if book.get('id') == book_id:
                for key, value in kwargs.items():
                    elem = book.find(key)
                    if elem is not None:
                        elem.text = str(value)
                return True
        return False
    
    def delete_book(self, book_id):
        """Delete - Remove book"""
        for book in self.root.findall('book'):
            if book.get('id') == book_id:
                self.root.remove(book)
                return True
        return False
    
    def save(self):
        """Save changes to file"""
        ET.indent(self.tree, space="    ")
        self.tree.write(self.filename, encoding='utf-8', xml_declaration=True)

# Usage
library = XMLLibrary('library.xml')

# Add new book
library.add_book('004', 'Design Patterns', 'Gang of Four', 1994, 54.99, 'programming')

# Update book
library.update_book('001', price='11.99', year='1926')

# Delete book
library.delete_book('002')

# Save all changes
library.save()
print("Library updated!")

Advanced Techniques

1. Working with Namespaces

XML

<?xml version="1.0"?>
<library xmlns="http://example.com/library"
         xmlns:pub="http://example.com/publisher">
    <book>
        <title>Example</title>
        <pub:publisher>Example Press</pub:publisher>
    </book>
</library>

Python

import xml.etree.ElementTree as ET

# Register namespace
ns = {
    'lib': 'http://example.com/library',
    'pub': 'http://example.com/publisher'
}

tree = ET.parse('library_with_ns.xml')
root = tree.getroot()

# Find with namespace
books = root.findall('lib:book', ns)
for book in books:
    title = book.find('lib:title', ns).text
    publisher = book.find('pub:publisher', ns).text
    print(f"{title} - {publisher}")

2. Processing Large Files (Streaming)

Python

import xml.etree.ElementTree as ET

def process_large_xml(filename):
    """Memory-efficient parsing of large XML files"""
    context = ET.iterparse(filename, events=('start', 'end'))
    context = iter(context)
    
    event, root = next(context)
    
    for event, elem in context:
        if event == 'end' and elem.tag == 'book':
            # Process book element
            title = elem.find('title').text
            author = elem.find('author').text
            print(f"{title} by {author}")
            
            # Clear element to free memory
            elem.clear()
            root.clear()

# Process without loading entire file into memory
process_large_xml('huge_library.xml')

3. XML to Dict Conversion

Python

import xml.etree.ElementTree as ET

def xml_to_dict(element):
    """Convert XML element to dictionary"""
    result = {}
    
    # Add attributes
    if element.attrib:
        result['@attributes'] = element.attrib
    
    # Add text content
    if element.text and element.text.strip():
        result['text'] = element.text.strip()
    
    # Add children
    for child in element:
        child_data = xml_to_dict(child)
        
        if child.tag in result:
            # Multiple children with same tag -> list
            if not isinstance(result[child.tag], list):
                result[child.tag] = [result[child.tag]]
            result[child.tag].append(child_data)
        else:
            result[child.tag] = child_data
    
    return result

# Usage
tree = ET.parse('library.xml')
root = tree.getroot()
data = xml_to_dict(root)

import json
print(json.dumps(data, indent=2))

4. Using lxml for XPath

lxml provides full XPath 1.0 support:

Python

# First install: pip install lxml
from lxml import etree

tree = etree.parse('library.xml')
root = tree.getroot()

# Advanced XPath queries
# Find books with price > 10
expensive_books = root.xpath('//book[price > 10]/title/text()')
print(expensive_books)

# Find fiction books only
fiction_titles = root.xpath('//book[@category="fiction"]/title/text()')
print(fiction_titles)

# Get book count by category
programming_count = root.xpath('count(//book[@category="programming"])')
print(f"Programming books: {programming_count}")

# Find books published after 2000
modern_books = root.xpath('//book[year > 2000]')
for book in modern_books:
    print(etree.tostring(book, pretty_print=True).decode())

5. XML Validation

Python

from lxml import etree

# Load XML Schema (XSD)
with open('library.xsd', 'r') as schema_file:
    schema_root = etree.XML(schema_file.read())
    schema = etree.XMLSchema(schema_root)

# Parse and validate XML
parser = etree.XMLParser(schema=schema)
try:
    tree = etree.parse('library.xml', parser)
    print("XML is valid!")
except etree.XMLSyntaxError as e:
    print(f"Validation error: {e}")

Best Practices

✓

Use ElementTree for Most Tasks

It's built-in, lightweight, and sufficient for 90% of XML processing needs.

✓

Always Handle Exceptions

XML parsing can fail. Wrap parse operations in try-except blocks.

✓

Check for None Before Accessing

Use if elem is not None: before accessing .text or .attrib

✓

Use Streaming for Large Files

Use iterparse() instead of parse() for large XML files.

✓

Validate External XML

Always validate XML from untrusted sources to prevent security issues.

✓

Use UTF-8 Encoding

Always specify encoding='utf-8' when writing XML files.

✓

Pretty Print for Readability

Use ET.indent() (Python 3.9+) for human-readable output.

✓

Consider Using lxml for Complex Tasks

If you need XPath, XSLT, or better performance, use lxml library.

❌ Common Mistakes to Avoid:

✗Not checking if element exists before accessing .text
✗Loading huge XML files entirely into memory
✗Forgetting to save changes after modifying XML
✗Not handling encoding issues with special characters
✗Using string concatenation to build XML (use ElementTree instead)

Additional Resources

Learn More:

Official Documentation:

XML Tools:

• XML Parser - Parse and analyze XML
• XML Validator - Validate XML syntax
• XML Formatter - Beautify XML code
• XML to JSON - Convert XML to JSON
• JSON to XML - Convert JSON to XML
• XML Viewer - Visualize XML trees
• XML Editor - Edit XML online
• XML Minifier - Compress XML files

Python Resources:

• Real Python XML Tutorial
• DataCamp Python XML
• lxml PyPI Package
• lxml on GitHub - Source code and issues
• Stack Overflow Python XML - Community Q&A
• TutorialsPoint Python XML

XML Parser Guide

Understanding XML parsing concepts.

XPath Tutorial

Master XPath for XML queries.

XML to CSV Guide

Convert XML to CSV format.

JSON to XML Guide

Convert between JSON and XML.

Python XML Tutorial

Table of Contents

Introduction to XML in Python

Sample XML File

Python XML Libraries

xml.etree.ElementTree

lxml

xml.dom.minidom

xml.sax

Parsing XML Files

Method 1: Parse from File

Method 2: Parse from String

Method 3: Parse from URL

Navigating XML Trees

Finding Elements

1. Iterate Through Children

2. Find First Match - find()

3. Find All Matches - findall()

4. XPath-style Queries

5. Access Attributes and Text

Complete Example: Extract All Book Info

Creating XML Documents

Building XML from Scratch

Pretty Printing XML

Creating from Dictionary

Modifying Existing XML

Update Element Values

Add New Elements

Remove Elements

Complete CRUD Example

Advanced Techniques

1. Working with Namespaces

2. Processing Large Files (Streaming)

3. XML to Dict Conversion

4. Using lxml for XPath

5. XML Validation

Best Practices

❌ Common Mistakes to Avoid:

Additional Resources

Learn More:

Related Articles

XML Parser Guide

XPath Tutorial

XML to CSV Guide

JSON to XML Guide