XPath (XML Path Language) is a query language for selecting nodes from XML documents. It provides a powerful, concise syntax to navigate XML structure, extract data, and test conditions. XPath is used in XSLT transformations, XML Schema assertions, web scraping, and many XML processing tasks. Learn what XML is, try XML parsing, or use our XML formatter.

This comprehensive tutorial covers XPath from basics to advanced techniques. You'll learn path expressions, axes, predicates, functions, and real-world patterns. By the end, you'll be able to write efficient XPath queries for any XML document.

What is XPath?

XPath treats an XML document as a tree of nodes. Each element, attribute, and text value is a node. XPath expressions navigate this tree to select specific nodes.

Sample XML Document

We'll use this XML throughout the tutorial:

XML

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book id="1" category="fiction">
    <title lang="en">The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <year>1925</year>
    <price>10.99</price>
  </book>
  
  <book id="2" category="fiction">
    <title lang="en">1984</title>
    <author>George Orwell</author>
    <year>1949</year>
    <price>8.99</price>
  </book>
  
  <book id="3" category="programming">
    <title lang="en">Clean Code</title>
    <author>Robert C. Martin</author>
    <year>2008</year>
    <price>45.99</price>
  </book>
  
  <book id="4" category="programming">
    <title lang="en">Design Patterns</title>
    <author>Gang of Four</author>
    <year>1994</year>
    <price>54.99</price>
  </book>
</bookstore>

XPath Node Types:

•Element nodes: <book>, <title>, <author>
•Attribute nodes: id="1", category="fiction"
•Text nodes: "The Great Gatsby", "10.99"
•Comment nodes: 
•Document node: The root of the entire document

Basic Path Expressions

XPath uses path expressions similar to file system paths. / separates path steps, and you navigate from the document root or current node.

Absolute Paths (from root)

XPath Expression	Selects
/bookstore	Root `<bookstore>` element
/bookstore/book	All `<book>` elements (4 books)
/bookstore/book/title	All `<title>` elements (4 titles)
/bookstore/book[1]	First `<book>` element only

Relative Paths

Relative paths start from the current node (no leading /).

XPath Expression	Selects (from current node)
book	All `<book>` children
book/title	All `<title>` grandchildren
.	Current node
..	Parent node

Descendant Selector //

// selects nodes anywhere in the document, regardless of depth.

XPath Expression	Selects
// book	All `<book>` elements anywhere
// title	All `<title>` elements at any depth
// book/title	All `<title>` that are children of `<book>`

Wildcard *

* matches any element node.

XPath Expression	Selects
/bookstore/*	All children of `<bookstore>` (all books)
// book/*	All children of any `<book>` (titles, authors, etc.)
//*	All elements in the document

Predicates: Filtering Results

Predicates filter node selections using conditions inside square brackets [].

Position Predicates

XPath Expression	Selects
// book[1]	First book
// book[last()]	Last book (4th book)
// book[position()<3]	First 2 books
// book[position()>2]	Books 3 and 4

⚠Important: XPath uses 1-based indexing

Unlike most programming languages (0-indexed), XPath counts from 1.[1] is the first element, not the second.

Attribute Predicates

XPath Expression	Selects
// book[@id]	All books with an `id` attribute
// book[@id='1']	Book with `id="1"`
// book[@category='fiction']	All fiction books
// title[@lang='en']	All English titles

Value Comparisons

XPath Expression	Selects
// book[price>20]	Books with price > 20
// book[price<=10]	Books with price ≤ 10
// book[year>2000]	Books published after 2000
// book[author='George Orwell']	Books by George Orwell

Multiple Conditions (AND / OR)

<!-- AND: Both conditions must be true -->
//book[price>20 and @category='programming']
//  Result: Books that are programming AND price > 20
//  Matches: Clean Code ($45.99), Design Patterns ($54.99)

<!-- OR: Either condition can be true -->
//book[year<1950 or year>2000]
//  Result: Books published before 1950 OR after 2000
//  Matches: The Great Gatsby (1925), 1984 (1949), Clean Code (2008)

<!-- Complex: Combine multiple conditions -->
//book[@category='fiction' and price<10]
//  Result: Fiction books under $10
//  Matches: 1984 ($8.99)

XPath Axes: Navigating Relationships

Axes define the relationship between the current node and the nodes you want to select. They provide precise control over navigation.

Common Axes

Axis	Description	Example
child::	Direct children (default)	child::book
descendant::	All descendants (children, grandchildren, etc.)	descendant::title
parent::	Parent node	parent::bookstore
ancestor::	All ancestors (parent, grandparent, etc.)	ancestor::*
following-sibling::	Siblings after current node	following-sibling::book
preceding-sibling::	Siblings before current node	preceding-sibling::book
attribute::	Attributes of current node	attribute::id

Axis Shortcuts

XPath provides shortcuts for commonly used axes:

Shortcut	Full Form	Description
book	child::book	child:: is default
@id	attribute::id	@ selects attributes
// title	/descendant-or-self::node()/child::title	// descendant shortcut
.	self::node()	Current node
..	parent::node()	Parent node

<!-- Example: From a <title> node, find the parent book's price -->

<!-- Starting from: <title>The Great Gatsby</title> -->

<!-- Method 1: Using parent axis -->
parent::book/price
//  Result: <price>10.99</price>

<!-- Method 2: Using .. shortcut -->
../price
//  Result: <price>10.99</price>

<!-- Example: Find all books after the first one -->
//book[1]/following-sibling::book
//  Result: books with id 2, 3, 4

XPath Functions

XPath includes built-in functions for string manipulation, numeric operations, and boolean logic.

String Functions

<!-- contains(): Check if string contains substring -->
//book[contains(author, 'Orwell')]
//  Result: Books by authors containing "Orwell"

<!-- starts-with(): Check string prefix -->
//book[starts-with(title, 'The')]
//  Result: "The Great Gatsby"

<!-- string-length(): Get string length -->
//book[string-length(title) > 15]
//  Result: Books with titles longer than 15 characters

<!-- substring(): Extract substring -->
substring(//book[1]/title, 1, 3)
//  Result: "The" (first 3 characters)

<!-- concat(): Concatenate strings -->
concat(//book[1]/author, ' - ', //book[1]/title)
//  Result: "F. Scott Fitzgerald - The Great Gatsby"

<!-- normalize-space(): Remove extra whitespace -->
normalize-space('  Clean  Code  ')
//  Result: "Clean Code"

<!-- translate(): Character replacement -->
translate(//book[1]/title, 'aeiou', 'AEIOU')
//  Result: "ThE grEAt gAtsby" (vowels to uppercase)

Numeric Functions

<!-- sum(): Add values -->
sum(//book/price)
//  Result: 120.96 (10.99 + 8.99 + 45.99 + 54.99)

<!-- count(): Count nodes -->
count(//book)
//  Result: 4

<!-- number(): Convert to number -->
//book[number(year) > 2000]
//  Result: Books after year 2000

<!-- floor(), ceiling(), round() -->
floor(45.99)   //  Result: 45
ceiling(45.99) //  Result: 46
round(45.99)   //  Result: 46

Boolean Functions

<!-- not(): Logical NOT -->
//book[not(@category='fiction')]
//  Result: Programming books (non-fiction)

<!-- true() / false(): Boolean literals -->
//book[price > 20 and true()]
//  Result: Books over $20

<!-- boolean(): Convert to boolean -->
//book[boolean(@id)]
//  Result: All books with id attribute

Node Functions

<!-- name(): Get element name -->
//book[1]/*[name()='title']
//  Result: <title> element

<!-- position(): Current position -->
//book[position() mod 2 = 0]
//  Result: Even-positioned books (2nd, 4th)

<!-- last(): Last position -->
//book[position() = last()]
//  Result: Last book

<!-- text(): Get text content -->
//book[1]/title/text()
//  Result: "The Great Gatsby"

Real-World XPath Patterns

Find Most Expensive Books

<!-- Books more expensive than $50 -->
//book[price > 50]
//  Result: Design Patterns ($54.99)

<!-- Top 2 most expensive books -->
//book[price >= //book[position()=1]/price or position() <= 2]
//  (More complex: requires XPath 2.0 for sorting)

Group by Category

<!-- All fiction books -->
//book[@category='fiction']

<!-- All programming books -->
//book[@category='programming']

<!-- Count books per category -->
count(//book[@category='fiction'])
//  Result: 2

Complex Filtering

<!-- Fiction books under $10 -->
//book[@category='fiction' and price < 10]
//  Result: 1984 ($8.99)

<!-- Books from 1900s (1900-1999) -->
//book[year >= 1900 and year < 2000]
//  Result: The Great Gatsby (1925), 1984 (1949), Design Patterns (1994)

<!-- Books with specific title pattern -->
//book[contains(title, 'Code') or contains(title, 'Pattern')]
//  Result: Clean Code, Design Patterns

Web Scraping Pattern

XPath is commonly used with Selenium, BeautifulSoup, or Scrapy for web scraping.

<!-- Extract all product prices -->
//div[@class='product']//span[@class='price']

<!-- Find "Add to Cart" button for specific product -->
//div[contains(text(),'iPhone 15')]//button[text()='Add to Cart']

<!-- Get all links in navigation menu -->
//nav[@id='main-menu']//a/@href

<!-- Extract table data -->
//table[@id='results']//tr[position()>1]/td[2]

<!-- Find element by partial text -->
//button[contains(text(), 'Submit')]

XPath Best Practices

✓

Use Specific Paths When Possible

Prefer /bookstore/book/title over // title for better performance

✓

Avoid Overly Complex Expressions

Break complex queries into multiple steps or use XSLT variables

✓

Use Predicates to Filter Early

// book[@category='fiction']/title is more efficient than filtering later

✓

Test XPath in Browser DevTools

Chrome/Firefox console: $x("// book[@category='fiction']")

✓

Handle Namespaces Properly

XML with namespaces requires namespace-aware XPath queries

⚠

Don't Rely on Position Alone

// book[3] breaks if document structure changes. Prefer // book[@id='3']

⚠

Avoid //* in Production

Selecting all elements is slow on large documents

Related Tools & Resources

XML Parser

Parse and analyze XML structure online

XML Validator

Validate XML against schemas

XML Formatter

Format and beautify XML files

XML to JSON

Convert XML to JSON format

External References

Official Documentation & Standards

W3C XPath 3.1 Specification - Official XPath standard
MDN XPath Documentation - Comprehensive XPath reference
W3Schools XPath Tutorial - Beginner-friendly XPath guide
XPath Cheatsheet - Quick reference guide

Conclusion

XPath is an essential tool for working with XML documents. Whether you're parsing data, transforming documents with XSLT, scraping websites, or validating XML schemas, XPath provides a powerful and concise way to navigate and query XML structure.

Start with basic path expressions and gradually master predicates, axes, and functions. Practice on real XML documents to build intuition. Use browser DevTools to test queries interactively. With XPath in your toolkit, you can efficiently extract, transform, and validate XML data in any project.

Read: XML Schema Validation Back to XML Articles

All Categories

XPath Tutorial: Complete Guide