Correct Syntax Of Xpath Function 'substring-after' For Html That Selects Only Substring Of All Nodes?
Solution 1:
In XPath 2.0 and later respectively XQuery 1.0 and later or XSLT 2.0 and later you can use //span[@class = 'feed-date']/substring-after(., 'on ')
to get a sequence of string values. With XPath 1.0 that functionality does not exist, you would need to iterate all your span
elements in a host language and extract the string for each span.
As for using XPath 2.0 with HTMLAgilityPack, it looks as if that is possible making use of https://github.com/StefH/XPath2.Net which is also available on NuGet, that way the Microsoft XPathNavigator
gets various extension methods like XPath2Evaluate
which then allow you to use XPath 2.0 functions both on an XPathNavigator
created from Microsoft's XPathDocument
as well as the HTMLAgilityPack's HtmlDocument
.
Here is an example:
using System;
using System.Xml.XPath;
using Wmhelp.XPath2;
using HtmlAgilityPack;
namespaceXPath20Net1
{
classProgram
{
staticvoidMain(string[] args)
{
XPathNavigator nav = new XPathDocument("XMLFile1.xml").CreateNavigator();
Console.WriteLine(nav.XPath2Evaluate("string-join(//span[@class = 'feed-date']/substring-after(., 'on '), ' ')"));
HtmlDocument doc = new HtmlDocument();
doc.Load("HTMLPage1.html");
Console.WriteLine(doc.CreateNavigator().XPath2Evaluate("string-join(//span[@class = 'feed-date']/substring-after(., 'on '), ' ')"));
}
}
}
With the the XML document being
<?xml version="1.0" encoding="utf-8" ?><html><body><spanclass="feed-date">Published on 2016-07-07</span><spanclass="feed-date">Published on 2015-02-23</span><spanclass="feed-date">Published on 2014-11-13</span><spanclass="feed-date">Published on 2014-04-28</span></body></html>
and the HTML document being
<!DOCTYPE html><htmllang="en"><head><metacharset="utf-8"><title>Test</title></head><body><pid=test><spanclass="feed-date">Published on 2016-07-07</span><spanclass="feed-date">Published on 2015-02-23</span><spanclass="feed-date">Published on 2014-11-13</span><spanclass="feed-date">Published on 2014-04-28</span></body></html>
then output is
2016-07-07 2015-02-23 2014-11-13 2014-04-282016-07-07 2015-02-23 2014-11-13 2014-04-28
Post a Comment for "Correct Syntax Of Xpath Function 'substring-after' For Html That Selects Only Substring Of All Nodes?"