XPath Tutorial: Complete Guide

Master XML navigation and querying with XPath expressions

Published: January 2025 • 15 min read

XPath (XML Path Language) is a query language for selecting nodes from XML documents. It provides a powerful, concise syntax to navigate XML structure, extract data, and test conditions. XPath is used in XSLT transformations, XML Schema assertions, web scraping, and many XML processing tasks.

This comprehensive tutorial covers XPath from basics to advanced techniques. You'll learn path expressions, axes, predicates, functions, and real-world patterns. By the end, you'll be able to write efficient XPath queries for any XML document.

What is XPath?

XPath treats an XML document as a tree of nodes. Each element, attribute, and text value is a node. XPath expressions navigate this tree to select specific nodes.

Sample XML Document

We'll use this XML throughout the tutorial:

XML
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book id="1" category="fiction">
    <title lang="en">The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <year>1925</year>
    <price>10.99</price>
  </book>
  
  <book id="2" category="fiction">
    <title lang="en">1984</title>
    <author>George Orwell</author>
    <year>1949</year>
    <price>8.99</price>
  </book>
  
  <book id="3" category="programming">
    <title lang="en">Clean Code</title>
    <author>Robert C. Martin</author>
    <year>2008</year>
    <price>45.99</price>
  </book>
  
  <book id="4" category="programming">
    <title lang="en">Design Patterns</title>
    <author>Gang of Four</author>
    <year>1994</year>
    <price>54.99</price>
  </book>
</bookstore>

XPath Node Types:

  • Element nodes: <book>, <title>, <author>
  • Attribute nodes: id="1", category="fiction"
  • Text nodes: "The Great Gatsby", "10.99"
  • Comment nodes: <!-- comments -->
  • Document node: The root of the entire document

Basic Path Expressions

XPath uses path expressions similar to file system paths. / separates path steps, and you navigate from the document root or current node.

Absolute Paths (from root)

XPath ExpressionSelects
/bookstoreRoot <bookstore> element
/bookstore/bookAll <book> elements (4 books)
/bookstore/book/titleAll <title> elements (4 titles)
/bookstore/book[1]First <book> element only

Relative Paths

Relative paths start from the current node (no leading /).

XPath ExpressionSelects (from current node)
bookAll <book> children
book/titleAll <title> grandchildren
.Current node
..Parent node

Descendant Selector //

// selects nodes anywhere in the document, regardless of depth.

XPath ExpressionSelects
// bookAll <book> elements anywhere
// titleAll <title> elements at any depth
// book/titleAll <title> that are children of <book>

Wildcard *

* matches any element node.

XPath ExpressionSelects
/bookstore/*All children of <bookstore> (all books)
// book/*All children of any <book> (titles, authors, etc.)
//*All elements in the document

Predicates: Filtering Results

Predicates filter node selections using conditions inside square brackets [].

Position Predicates

XPath ExpressionSelects
// book[1]First book
// book[last()]Last book (4th book)
// book[position()<3]First 2 books
// book[position()>2]Books 3 and 4

⚠Important: XPath uses 1-based indexing

Unlike most programming languages (0-indexed), XPath counts from 1.[1] is the first element, not the second.

Attribute Predicates

XPath ExpressionSelects
// book[@id]All books with an id attribute
// book[@id='1']Book with id="1"
// book[@category='fiction']All fiction books
// title[@lang='en']All English titles

Value Comparisons

XPath ExpressionSelects
// book[price>20]Books with price > 20
// book[price<=10]Books with price ≤ 10
// book[year>2000]Books published after 2000
// book[author='George Orwell']Books by George Orwell

Multiple Conditions (AND / OR)

<!-- AND: Both conditions must be true -->
//book[price>20 and @category='programming']
//  Result: Books that are programming AND price > 20
//  Matches: Clean Code ($45.99), Design Patterns ($54.99)

<!-- OR: Either condition can be true -->
//book[year<1950 or year>2000]
//  Result: Books published before 1950 OR after 2000
//  Matches: The Great Gatsby (1925), 1984 (1949), Clean Code (2008)

<!-- Complex: Combine multiple conditions -->
//book[@category='fiction' and price<10]
//  Result: Fiction books under $10
//  Matches: 1984 ($8.99)

XPath Axes: Navigating Relationships

Axes define the relationship between the current node and the nodes you want to select. They provide precise control over navigation.

Common Axes

AxisDescriptionExample
child::Direct children (default)child::book
descendant::All descendants (children, grandchildren, etc.)descendant::title
parent::Parent nodeparent::bookstore
ancestor::All ancestors (parent, grandparent, etc.)ancestor::*
following-sibling::Siblings after current nodefollowing-sibling::book
preceding-sibling::Siblings before current nodepreceding-sibling::book
attribute::Attributes of current nodeattribute::id

Axis Shortcuts

XPath provides shortcuts for commonly used axes:

ShortcutFull FormDescription
bookchild::bookchild:: is default
@idattribute::id@ selects attributes
// title/descendant-or-self::node()/child::title// descendant shortcut
.self::node()Current node
..parent::node()Parent node
<!-- Example: From a <title> node, find the parent book's price -->

<!-- Starting from: <title>The Great Gatsby</title> -->

<!-- Method 1: Using parent axis -->
parent::book/price
//  Result: <price>10.99</price>

<!-- Method 2: Using .. shortcut -->
../price
//  Result: <price>10.99</price>

<!-- Example: Find all books after the first one -->
//book[1]/following-sibling::book
//  Result: books with id 2, 3, 4

XPath Functions

XPath includes built-in functions for string manipulation, numeric operations, and boolean logic.

String Functions

<!-- contains(): Check if string contains substring -->
//book[contains(author, 'Orwell')]
//  Result: Books by authors containing "Orwell"

<!-- starts-with(): Check string prefix -->
//book[starts-with(title, 'The')]
//  Result: "The Great Gatsby"

<!-- string-length(): Get string length -->
//book[string-length(title) > 15]
//  Result: Books with titles longer than 15 characters

<!-- substring(): Extract substring -->
substring(//book[1]/title, 1, 3)
//  Result: "The" (first 3 characters)

<!-- concat(): Concatenate strings -->
concat(//book[1]/author, ' - ', //book[1]/title)
//  Result: "F. Scott Fitzgerald - The Great Gatsby"

<!-- normalize-space(): Remove extra whitespace -->
normalize-space('  Clean  Code  ')
//  Result: "Clean Code"

<!-- translate(): Character replacement -->
translate(//book[1]/title, 'aeiou', 'AEIOU')
//  Result: "ThE grEAt gAtsby" (vowels to uppercase)

Numeric Functions

<!-- sum(): Add values -->
sum(//book/price)
//  Result: 120.96 (10.99 + 8.99 + 45.99 + 54.99)

<!-- count(): Count nodes -->
count(//book)
//  Result: 4

<!-- number(): Convert to number -->
//book[number(year) > 2000]
//  Result: Books after year 2000

<!-- floor(), ceiling(), round() -->
floor(45.99)   //  Result: 45
ceiling(45.99) //  Result: 46
round(45.99)   //  Result: 46

Boolean Functions

<!-- not(): Logical NOT -->
//book[not(@category='fiction')]
//  Result: Programming books (non-fiction)

<!-- true() / false(): Boolean literals -->
//book[price > 20 and true()]
//  Result: Books over $20

<!-- boolean(): Convert to boolean -->
//book[boolean(@id)]
//  Result: All books with id attribute

Node Functions

<!-- name(): Get element name -->
//book[1]/*[name()='title']
//  Result: <title> element

<!-- position(): Current position -->
//book[position() mod 2 = 0]
//  Result: Even-positioned books (2nd, 4th)

<!-- last(): Last position -->
//book[position() = last()]
//  Result: Last book

<!-- text(): Get text content -->
//book[1]/title/text()
//  Result: "The Great Gatsby"

Real-World XPath Patterns

Find Most Expensive Books

<!-- Books more expensive than $50 -->
//book[price > 50]
//  Result: Design Patterns ($54.99)

<!-- Top 2 most expensive books -->
//book[price >= //book[position()=1]/price or position() <= 2]
//  (More complex: requires XPath 2.0 for sorting)

Group by Category

<!-- All fiction books -->
//book[@category='fiction']

<!-- All programming books -->
//book[@category='programming']

<!-- Count books per category -->
count(//book[@category='fiction'])
//  Result: 2

Complex Filtering

<!-- Fiction books under $10 -->
//book[@category='fiction' and price < 10]
//  Result: 1984 ($8.99)

<!-- Books from 1900s (1900-1999) -->
//book[year >= 1900 and year < 2000]
//  Result: The Great Gatsby (1925), 1984 (1949), Design Patterns (1994)

<!-- Books with specific title pattern -->
//book[contains(title, 'Code') or contains(title, 'Pattern')]
//  Result: Clean Code, Design Patterns

Web Scraping Pattern

XPath is commonly used with Selenium, BeautifulSoup, or Scrapy for web scraping.

<!-- Extract all product prices -->
//div[@class='product']//span[@class='price']

<!-- Find "Add to Cart" button for specific product -->
//div[contains(text(),'iPhone 15')]//button[text()='Add to Cart']

<!-- Get all links in navigation menu -->
//nav[@id='main-menu']//a/@href

<!-- Extract table data -->
//table[@id='results']//tr[position()>1]/td[2]

<!-- Find element by partial text -->
//button[contains(text(), 'Submit')]

XPath Best Practices

Use Specific Paths When Possible

Prefer /bookstore/book/title over // title for better performance

Avoid Overly Complex Expressions

Break complex queries into multiple steps or use XSLT variables

Use Predicates to Filter Early

// book[@category='fiction']/title is more efficient than filtering later

Test XPath in Browser DevTools

Chrome/Firefox console: $x("// book[@category='fiction']")

Handle Namespaces Properly

XML with namespaces requires namespace-aware XPath queries

Don't Rely on Position Alone

// book[3] breaks if document structure changes. Prefer // book[@id='3']

Avoid //* in Production

Selecting all elements is slow on large documents

Related Tools & Resources

External References

Official Documentation & Standards

Conclusion

XPath is an essential tool for working with XML documents. Whether you're parsing data, transforming documents with XSLT, scraping websites, or validating XML schemas, XPath provides a powerful and concise way to navigate and query XML structure.

Start with basic path expressions and gradually master predicates, axes, and functions. Practice on real XML documents to build intuition. Use browser DevTools to test queries interactively. With XPath in your toolkit, you can efficiently extract, transform, and validate XML data in any project.