Reading the links from a webpage

I needed to see the set of RPMs in a YUM repository. I wanted to do this as part of a larger script. To do this, I fetched the page via wget, and then applied an xsl transform on it using the command line tool xsltproc.

Here is how I called it:

wget -q -O – http://spacewalk.redhat.com/yum/0.5/Fedora/10/x86_64/os/Packages/ | xsltproc –html showhrefs.xslt –

And here is the xslt file showrefs.xslt

<?xml version=”1.0″ encoding=”UTF-8″?>
<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0″>
<xsl:output method=”xml” indent=”yes”/>

<!–  shut off the default matchin rule –>
<xsl:template match=”text()” />

<!– print the href value of the hyperlinks –>
<xsl:template match=”a”>
<xsl:value-of select=”@href” />
<xsl:text >
</xsl:text>

</xsl:template>

</xsl:stylesheet>

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.