Skip to content Skip to sidebar Skip to footer

How To Parse An Html String In Google Apps Script Without Using Xmlservice?

I want to create a scraper using Google Spreadsheets with Google Apps Script. I know it is possible and I have seen some tutorials and threads about it. The main idea is to use:

Solution 1:

I made cheeriogs for your problem. it's works on GAS as cheerio which is jQuery-like api. You can do that like this.

const content = UrlFetchApp.fetch('https://example.co/').getContentText();
const $ = Cheerio.load(content);
Logger.log($('p .blah').first().text()); // blah blah blah ...

See also https://github.com/asciian/cheeriogs

Solution 2:

This has been discussed before - see this Q&A.

Unlike XML service, the XMLService is not very forgiving of malformed HTML. The trick in the answer by Justin Bicknell does the job. Even though XML service has been deprecated, it still continues to work.

Solution 3:

I have done this in vanilla js. Not real html parsing. Just try to get some content out of a string (url):

functiongetLKKBTC() {
  var url = 'https://www.lykke.com/exchange';
  var html = UrlFetchApp.fetch(url).getContentText();
  var searchstring = '<td class="ask_BTCLKK">';
  var index = html.search(searchstring);
  if (index >= 0) {
    var pos = index + searchstring.lengthvar rate = html.substring(pos, pos + 6);
    rate = parseFloat(rate)
    rate = 1/rate
    returnparseFloat(rate);
  }
  throw"Failed to fetch/parse data from " + url;
}

Solution 4:

Please be aware that certain web sites may not permit automated scraping of their content, so please consult their terms or service before using Apps Script to extract the content.

The XmlService only works against valid XML documents, and most HTML (especially HTML5), is not valid XML. A previous version of the XmlService, simply called Xml, allowed for "lenient" parsing, which would allow it to parse HTML as well. This service was sunset in 2013, but for the time being still functions. The reference docs are no longer available, but this old tutorial shows it's usage.

Another alternative is to use a service like Kimono, which handles the scraping and parsing parts and provides a simple API you can call via UrlFetchApp to retrieve the structured data.

Solution 5:

I´ve found a very neat alternative to scrape using Google App Script. It is called PhantomJS Cloud. One can use the urlFetchApp to access the API. This allows to execute Jquery code on the pages, which makes life so much simpler.

Post a Comment for "How To Parse An Html String In Google Apps Script Without Using Xmlservice?"