How Can I Select From Only One Table With Web::scraper?
I want to extract the text only for heading Node Object Methods from a webpage. The specific HMTL part is as follows:
Node Object Properties
The &
Solution 1:
Web::Scraper can use nth_of_type
to choose the right table. There are two tables with the same class, so you can say table.reference:nth-of-type(2)
use v5.22;
use feature qw(postderef);
no warnings qw(experimental::postderef);
use Web::Scraper;
my $html = do { local $/; <DATA> };
my $methods = scraper {
process "table.reference:nth-of-type(2) > tr > td > a", 'renners[]' => 'TEXT';
my $res = $methods->scrape( $html );
sayjoin"\n", $res->{renners}->@*;
And here's a Mojo::DOM:
use Mojo::DOM;
my $html = do { local $/; <DATA> };
my $dom = Mojo::DOM->new( $html );
say $dom
->find( 'table.reference:nth-of-type(2) > tr > td > a' )
->map( 'text' )
->join( "\n" );
I tried looking for a selector solution that could recognize the text in the h2
, but my kung fu is weak here.
Solution 2:
Web::Query provides an almost identical solution to the Mojo::DOM solution proposed by brian d foy.
use Web::Query;
my $html = do { local $/; <DATA> };
->find('table.reference:nth-of-type(2) > tr > td > a')
my ($i, $e) = @_;
say $e->text();
However it looks like Mojo::DOM is the more robust library. For Web::Query to correctly match with its selector I had to edit the input provided in the question to add a root node surrounding all the other content.
Solution 3:
You can use XPath to extract data from the very next table after the heading Node Object Methods
, like so
use Web::Scraper;
my $html = do { local $/; <DATA> };
my $methods = scraper {
process '//h2[.="Node Object Methods"]/following-sibling::table[1]//tr/td[1]',
'renners[]' => 'TEXT';
my $res = $methods->scrape( $html );
sayjoin"\n", @{ $res->{renners} };
The output will be
Post a Comment for "How Can I Select From Only One Table With Web::scraper?"