Table of Contents
Introduction to XML in Python
Python provides excellent built-in and third-party libraries for working with XML files. Whether you need to parse configuration files, process data feeds, or interact with XML-based APIs, Python makes it straightforward and efficient.
In this comprehensive tutorial, you'll learn how to:
- ✓Parse XML files and extract data
- ✓Navigate XML tree structures
- ✓Create new XML documents from scratch
- ✓Modify existing XML elements and attributes
- ✓Handle namespaces and validation
- ✓Process large XML files efficiently
Sample XML File
We'll use this XML throughout the tutorial:
<?xml version="1.0" encoding="UTF-8"?>
<library>
<book id="001" category="fiction">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price currency="USD">10.99</price>
<available>true</available>
</book>
<book id="002" category="fiction">
<title>1984</title>
<author>George Orwell</author>
<year>1949</year>
<price currency="USD">8.99</price>
<available>true</available>
</book>
<book id="003" category="programming">
<title>Clean Code</title>
<author>Robert C. Martin</author>
<year>2008</year>
<price currency="USD">45.99</price>
<available>false</available>
</book>
</library>Save this as library.xml to follow along.
Python XML Libraries
Python offers several libraries for XML processing. Here are the main options:
xml.etree.ElementTree
Built-in library
- ✓Part of Python standard library
- ✓No installation needed
- ✓Simple and lightweight
- ✓Good for most use cases
Recommended for: Beginners, simple XML tasks
lxml
Third-party library
- ✓More features and faster
- ✓XPath and XSLT support
- ✓Better error handling
- ✓XML validation support
Install:
pip install lxmlRecommended for: Advanced features, performance
xml.dom.minidom
Built-in DOM parser
- •DOM-style API
- •Loads entire document
- •More verbose syntax
Recommended for: DOM-style parsing
xml.sax
Event-driven parser
- •Streaming parser
- •Memory efficient
- •More complex to use
Recommended for: Very large files
Tutorial Focus:
This tutorial focuses on ElementTree (built-in) with examples in lxml where beneficial. ElementTree is perfect for learning and handles 90% of XML tasks.
Parsing XML Files
Method 1: Parse from File
The most common way to parse XML is from a file:
import xml.etree.ElementTree as ET
# Parse XML file
tree = ET.parse('library.xml')
root = tree.getroot()
# Get root element info
print(f"Root tag: {root.tag}")
print(f"Root attributes: {root.attrib}")
print(f"Number of children: {len(root)}")
# Output:
# Root tag: library
# Root attributes: {}
# Number of children: 3Method 2: Parse from String
Parse XML directly from a string:
import xml.etree.ElementTree as ET
xml_string = """
<book id="001">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
</book>
"""
# Parse from string
root = ET.fromstring(xml_string)
print(f"Tag: {root.tag}")
print(f"ID: {root.attrib['id']}")
print(f"Title: {root.find('title').text}")
# Output:
# Tag: book
# ID: 001
# Title: The Great GatsbyMethod 3: Parse from URL
Parse XML from a remote URL:
import xml.etree.ElementTree as ET
import urllib.request
# Download and parse XML from URL
url = "https://example.com/data.xml"
with urllib.request.urlopen(url) as response:
xml_data = response.read()
root = ET.fromstring(xml_data)
print(f"Root element: {root.tag}")Error Handling:
import xml.etree.ElementTree as ET
try:
tree = ET.parse('library.xml')
root = tree.getroot()
except ET.ParseError as e:
print(f"Parse error: {e}")
except FileNotFoundError:
print("File not found")
except Exception as e:
print(f"Error: {e}")Creating XML Documents
Building XML from Scratch
import xml.etree.ElementTree as ET
# Create root element
root = ET.Element('library')
# Create first book
book1 = ET.SubElement(root, 'book')
book1.set('id', '001')
book1.set('category', 'fiction')
# Add child elements
title1 = ET.SubElement(book1, 'title')
title1.text = 'The Great Gatsby'
author1 = ET.SubElement(book1, 'author')
author1.text = 'F. Scott Fitzgerald'
year1 = ET.SubElement(book1, 'year')
year1.text = '1925'
# Create second book
book2 = ET.SubElement(root, 'book')
book2.set('id', '002')
book2.set('category', 'fiction')
title2 = ET.SubElement(book2, 'title')
title2.text = '1984'
author2 = ET.SubElement(book2, 'author')
author2.text = 'George Orwell'
# Create tree and save
tree = ET.ElementTree(root)
ET.indent(tree, space=" ") # Python 3.9+
tree.write('new_library.xml', encoding='utf-8', xml_declaration=True)
print("XML file created successfully!")Pretty Printing XML
import xml.etree.ElementTree as ET
# For Python 3.9+, use ET.indent()
tree = ET.ElementTree(root)
ET.indent(tree, space=" ")
tree.write('output.xml', encoding='utf-8', xml_declaration=True)
# For older Python versions, use xml.dom.minidom
import xml.dom.minidom as minidom
xml_string = ET.tostring(root, encoding='utf-8')
dom = minidom.parseString(xml_string)
pretty_xml = dom.toprettyxml(indent=" ")
with open('output.xml', 'w', encoding='utf-8') as f:
f.write(pretty_xml)Creating from Dictionary
import xml.etree.ElementTree as ET
def dict_to_xml(tag, d):
"""Convert dictionary to XML Element"""
elem = ET.Element(tag)
for key, val in d.items():
if isinstance(val, dict):
child = dict_to_xml(key, val)
elem.append(child)
elif isinstance(val, list):
for item in val:
child = dict_to_xml(key[:-1], item) # Remove 's'
elem.append(child)
else:
child = ET.SubElement(elem, key)
child.text = str(val)
return elem
# Example usage
book_data = {
'title': 'Python Programming',
'author': 'John Doe',
'year': 2025,
'price': 39.99
}
book_elem = dict_to_xml('book', book_data)
tree = ET.ElementTree(book_elem)
tree.write('book.xml', encoding='utf-8', xml_declaration=True)Modifying Existing XML
Update Element Values
import xml.etree.ElementTree as ET
tree = ET.parse('library.xml')
root = tree.getroot()
# Update text content
first_book = root.find('book')
price = first_book.find('price')
price.text = '12.99' # Update price
# Update attribute
first_book.set('category', 'classic')
# Save changes
tree.write('library_updated.xml', encoding='utf-8', xml_declaration=True)
print("XML updated successfully!")Add New Elements
import xml.etree.ElementTree as ET
tree = ET.parse('library.xml')
root = tree.getroot()
# Add new book
new_book = ET.SubElement(root, 'book')
new_book.set('id', '004')
new_book.set('category', 'programming')
title = ET.SubElement(new_book, 'title')
title.text = 'Python Crash Course'
author = ET.SubElement(new_book, 'author')
author.text = 'Eric Matthes'
year = ET.SubElement(new_book, 'year')
year.text = '2023'
# Add new child to existing element
first_book = root.find('book')
rating = ET.SubElement(first_book, 'rating')
rating.text = '4.5'
# Save
tree.write('library_expanded.xml', encoding='utf-8', xml_declaration=True)Remove Elements
import xml.etree.ElementTree as ET
tree = ET.parse('library.xml')
root = tree.getroot()
# Remove specific book by ID
for book in root.findall('book'):
if book.get('id') == '003':
root.remove(book)
print("Book removed")
# Remove all unavailable books
for book in root.findall('book'):
available = book.find('available')
if available is not None and available.text == 'false':
root.remove(book)
# Remove a child element
first_book = root.find('book')
year = first_book.find('year')
if year is not None:
first_book.remove(year)
# Save
tree.write('library_cleaned.xml', encoding='utf-8', xml_declaration=True)Complete CRUD Example
import xml.etree.ElementTree as ET
class XMLLibrary:
def __init__(self, filename):
self.filename = filename
self.tree = ET.parse(filename)
self.root = self.tree.getroot()
def add_book(self, book_id, title, author, year, price, category):
"""Create - Add new book"""
book = ET.SubElement(self.root, 'book')
book.set('id', book_id)
book.set('category', category)
ET.SubElement(book, 'title').text = title
ET.SubElement(book, 'author').text = author
ET.SubElement(book, 'year').text = str(year)
ET.SubElement(book, 'price').text = str(price)
def get_book(self, book_id):
"""Read - Get book by ID"""
for book in self.root.findall('book'):
if book.get('id') == book_id:
return {
'id': book.get('id'),
'title': book.find('title').text,
'author': book.find('author').text,
'year': book.find('year').text
}
return None
def update_book(self, book_id, **kwargs):
"""Update - Modify book details"""
for book in self.root.findall('book'):
if book.get('id') == book_id:
for key, value in kwargs.items():
elem = book.find(key)
if elem is not None:
elem.text = str(value)
return True
return False
def delete_book(self, book_id):
"""Delete - Remove book"""
for book in self.root.findall('book'):
if book.get('id') == book_id:
self.root.remove(book)
return True
return False
def save(self):
"""Save changes to file"""
ET.indent(self.tree, space=" ")
self.tree.write(self.filename, encoding='utf-8', xml_declaration=True)
# Usage
library = XMLLibrary('library.xml')
# Add new book
library.add_book('004', 'Design Patterns', 'Gang of Four', 1994, 54.99, 'programming')
# Update book
library.update_book('001', price='11.99', year='1926')
# Delete book
library.delete_book('002')
# Save all changes
library.save()
print("Library updated!")Advanced Techniques
1. Working with Namespaces
<?xml version="1.0"?>
<library xmlns="http://example.com/library"
xmlns:pub="http://example.com/publisher">
<book>
<title>Example</title>
<pub:publisher>Example Press</pub:publisher>
</book>
</library>import xml.etree.ElementTree as ET
# Register namespace
ns = {
'lib': 'http://example.com/library',
'pub': 'http://example.com/publisher'
}
tree = ET.parse('library_with_ns.xml')
root = tree.getroot()
# Find with namespace
books = root.findall('lib:book', ns)
for book in books:
title = book.find('lib:title', ns).text
publisher = book.find('pub:publisher', ns).text
print(f"{title} - {publisher}")2. Processing Large Files (Streaming)
import xml.etree.ElementTree as ET
def process_large_xml(filename):
"""Memory-efficient parsing of large XML files"""
context = ET.iterparse(filename, events=('start', 'end'))
context = iter(context)
event, root = next(context)
for event, elem in context:
if event == 'end' and elem.tag == 'book':
# Process book element
title = elem.find('title').text
author = elem.find('author').text
print(f"{title} by {author}")
# Clear element to free memory
elem.clear()
root.clear()
# Process without loading entire file into memory
process_large_xml('huge_library.xml')3. XML to Dict Conversion
import xml.etree.ElementTree as ET
def xml_to_dict(element):
"""Convert XML element to dictionary"""
result = {}
# Add attributes
if element.attrib:
result['@attributes'] = element.attrib
# Add text content
if element.text and element.text.strip():
result['text'] = element.text.strip()
# Add children
for child in element:
child_data = xml_to_dict(child)
if child.tag in result:
# Multiple children with same tag -> list
if not isinstance(result[child.tag], list):
result[child.tag] = [result[child.tag]]
result[child.tag].append(child_data)
else:
result[child.tag] = child_data
return result
# Usage
tree = ET.parse('library.xml')
root = tree.getroot()
data = xml_to_dict(root)
import json
print(json.dumps(data, indent=2))4. Using lxml for XPath
lxml provides full XPath 1.0 support:
# First install: pip install lxml
from lxml import etree
tree = etree.parse('library.xml')
root = tree.getroot()
# Advanced XPath queries
# Find books with price > 10
expensive_books = root.xpath('//book[price > 10]/title/text()')
print(expensive_books)
# Find fiction books only
fiction_titles = root.xpath('//book[@category="fiction"]/title/text()')
print(fiction_titles)
# Get book count by category
programming_count = root.xpath('count(//book[@category="programming"])')
print(f"Programming books: {programming_count}")
# Find books published after 2000
modern_books = root.xpath('//book[year > 2000]')
for book in modern_books:
print(etree.tostring(book, pretty_print=True).decode())5. XML Validation
from lxml import etree
# Load XML Schema (XSD)
with open('library.xsd', 'r') as schema_file:
schema_root = etree.XML(schema_file.read())
schema = etree.XMLSchema(schema_root)
# Parse and validate XML
parser = etree.XMLParser(schema=schema)
try:
tree = etree.parse('library.xml', parser)
print("XML is valid!")
except etree.XMLSyntaxError as e:
print(f"Validation error: {e}")Best Practices
Use ElementTree for Most Tasks
It's built-in, lightweight, and sufficient for 90% of XML processing needs.
Always Handle Exceptions
XML parsing can fail. Wrap parse operations in try-except blocks.
Check for None Before Accessing
Use if elem is not None: before accessing .text or .attrib
Use Streaming for Large Files
Use iterparse() instead of parse() for large XML files.
Validate External XML
Always validate XML from untrusted sources to prevent security issues.
Use UTF-8 Encoding
Always specify encoding='utf-8' when writing XML files.
Pretty Print for Readability
Use ET.indent() (Python 3.9+) for human-readable output.
Consider Using lxml for Complex Tasks
If you need XPath, XSLT, or better performance, use lxml library.
❌ Common Mistakes to Avoid:
- ✗Not checking if element exists before accessing .text
- ✗Loading huge XML files entirely into memory
- ✗Forgetting to save changes after modifying XML
- ✗Not handling encoding issues with special characters
- ✗Using string concatenation to build XML (use ElementTree instead)
Additional Resources
Learn More:
Official Documentation:
Related Articles:
XML Tools:
- • XML Parser - Parse and analyze XML
- • XML Validator - Validate XML syntax
- • XML Formatter - Beautify XML code
- • XML to JSON - Convert XML to JSON
- • JSON to XML - Convert JSON to XML
- • XML Viewer - Visualize XML trees
- • XML Editor - Edit XML online
- • XML Minifier - Compress XML files
Python Resources:
- • Real Python XML Tutorial
- • DataCamp Python XML
- • lxml PyPI Package
- • lxml on GitHub - Source code and issues
- • Stack Overflow Python XML - Community Q&A
- • TutorialsPoint Python XML