Skip to content Skip to sidebar Skip to footer

How Can I Select From Only One Table With Web::scraper?

I want to extract the text only for heading Node Object Methods from a webpage. The specific HMTL part is as follows:

Node Object Properties

The &

Solution 1:

Web::Scraper can use nth_of_type to choose the right table. There are two tables with the same class, so you can say table.reference:nth-of-type(2):

use v5.22;

use feature qw(postderef);
no warnings qw(experimental::postderef);


use Web::Scraper;

my $html = do { local $/; <DATA> };

my $methods = scraper {
    process "table.reference:nth-of-type(2) > tr > td > a", 'renners[]' => 'TEXT';
    };
my $res = $methods->scrape( $html );

sayjoin"\n", $res->{renners}->@*;

And here's a Mojo::DOM:

use Mojo::DOM;

my $html = do { local $/; <DATA> };

my $dom = Mojo::DOM->new( $html );

say $dom
    ->find( 'table.reference:nth-of-type(2) > tr > td > a' )
    ->map( 'text' )
    ->join( "\n" );

I tried looking for a selector solution that could recognize the text in the h2, but my kung fu is weak here.

Solution 2:

Web::Query provides an almost identical solution to the Mojo::DOM solution proposed by brian d foy.

use Web::Query;

my $html = do { local $/; <DATA> };

wq($html)
    ->find('table.reference:nth-of-type(2) > tr > td > a')
    ->each(sub{
        my ($i, $e) = @_;
        say $e->text();
    });

However it looks like Mojo::DOM is the more robust library. For Web::Query to correctly match with its selector I had to edit the input provided in the question to add a root node surrounding all the other content.

__DATA__
<html>
...
</html>

Solution 3:

You can use XPath to extract data from the very next table after the heading Node Object Methods, like so

use Web::Scraper;

my $html = do { local $/; <DATA> };

my $methods = scraper {
    process '//h2[.="Node Object Methods"]/following-sibling::table[1]//tr/td[1]', 
        'renners[]' => 'TEXT';
};  
my $res = $methods->scrape( $html );

sayjoin"\n", @{ $res->{renners} };

The output will be

appendChild()
cloneNode()
compareDocumentPosition()
getFeature(feature,version)
getUserData(key)
hasAttributes()
hasChildNodes()
insertBefore()

Post a Comment for "How Can I Select From Only One Table With Web::scraper?"