Copy Only HTML From Mixed Xml And HTML
We have a bunch of files that are html pages but which contain additional xml elements (all prefixed with our company name 'TLA') to provide data and structure for an older program
Solution 1:
Specifically targeting HTML elements would be hard, but if you just want to exclude content from the TLA namespace (but still include any non-TLA elements that the TLA elements contain), then this should work:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:mbl="http://www.tla.com" exclude-result-prefixes="mbl">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*" />
<xsl:template match="@*|node()" priority="-2">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- This element-only identity template prevents the
TLA namespace declaration from being copied to the output -->
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="@* | node()" />
</xsl:element>
</xsl:template>
<!-- Pass processing on to child elements of TLA elements -->
<xsl:template match="mbl:*">
<xsl:apply-templates select="*" />
</xsl:template>
</xsl:stylesheet>
You can also use this instead if you want to exclude anything that has any non-null namespace:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:mbl="http://www.tla.com" exclude-result-prefixes="mbl">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*" />
<xsl:template match="@*|node()" priority="-2">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="@* | node()" />
</xsl:element>
</xsl:template>
<xsl:template match="*[namespace-uri()]">
<xsl:apply-templates select="*" />
</xsl:template>
</xsl:stylesheet>
When either is run on your sample input, the result is:
<html>
<head>
<title>Highly Simplified Example Form</title>
</head>
<body>
<table>
<tr>
<td>
<input id="input_id_1" type="text" />
</td>
</tr>
</table>
</body>
</html>
Post a Comment for "Copy Only HTML From Mixed Xml And HTML"