Skip to content Skip to sidebar Skip to footer

Scraping Hidden Html (when Visible = False) Using Hpricot (ruby On Rails)

I've come across an issue which unfortunately I can't seem to surpass, I'm also just a newborn to Ruby on rails unfortunately hence the number of questions I am attempting to scrap

Solution 1:

In the particular example you have given, the elements are not hidden, but loaded via ajax after the page load. So basically what you need is a http client which can run javascript (web browser?) to see those address and other contents.

If you want to really automate the process and scrape the data which is got through ajax or javascript, you can try selenium. Even though it is not developed for that purpose, it serves your needs.

Solution 2:

I don't have an answer to your specific question, but I thought I'd point to Ryan Bates' Railscast episode on screen scraping with ruby: http://railscasts.com/episodes/173-screen-scraping-with-scrapi

He uses a library called scrAPI instead of ScRUBYt, since he couldn't get ScRUBYt working. scrAPI seems to be a bit easier maybe?

I hope this helps somewhat, good luck with your assignment! :)

-John

Solution 3:

There is a good script posted at the google group. It seems to extract address, etc. You may want to look at the code for the script page.txt.

Post a Comment for "Scraping Hidden Html (when Visible = False) Using Hpricot (ruby On Rails)"