1 year ago

#332361

test-img

hykth

Workaround for 403 error when using URLFETCH with Google Apps Script (external website)

I've used sof for many years (I almost always found all my answers!) but I'm quite stuck for the current project so this is the first time I post here. :)

I want to get the product price from www.hermes.com using either the URL or the product ref. ex: https://www.hermes.com/fr/fr/product/portefeuille-dogon-duo-H050896CK5E/ ref = H050896CK5E The URLs and Refs are stored in a Spreadsheet. As I called UrlFetchApp.fetch function in my script, I got 403 error. If my understanding is correct, that means the hermes.com server is blocking me out.

I also tried =IMPORTXML and it says that the spreadsheet cannot access the URL.

Here are the workaround I found: use Google Custom Search API to search the URL and iterate until the result URL matches the query.

[Current issues]

So my question was: How would you do to bypass the 403 error ? (not bypass security of course but if you have any ideas how to retrieve the hermes.com prices, please let me know!)

I will paste the scripts below. Thank you in advance.

→ What I used for hermes.com. With the muteHttpExceptions = true, I get the captcha html

var response = UrlFetchApp.fetch("http://www.hermes.com/",
            {
              method: "get",
              contentType: "application/json",
              muteHttpExceptions: true,
            });

→ Result of above (a captcha html, I think hermes.com knows I'm a bot)

<html><head><title>hermes.com</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style="margin:0"><p id="cmsg">Please enable JS and disable any ad blocker</p><script>var dd={'cid':'AHrlqAAAAAMAs2XwactPh88AInQWTw==','hsh':'2211F522B61E269B869FA6EAFFB5E1','t':'fe','s':13461,'host':'geo.captcha-delivery.com'}</script><script src="https://ct.captcha-delivery.com/c.js"></script></body></html>

→ What I'm using now (Google Custom Search)

  for (var i = 0; i < 5; i++) {
    var start = (i * 10) + 1;
    var apiUrl = "https://www.googleapis.com/customsearch/v1?key=" + apiKey + "&cx=" + searchId + "&q=search " + query + "&start=" + start;
    var apiOptions = {
      method: 'get'
    };
    var responseApi = UrlFetchApp.fetch(apiUrl, apiOptions);
    var responseJson = JSON.parse(responseApi.getContentText());
    var checkDomain = "";
    for (var v = 0; v < 10; v++) {
      if (responseJson["items"] != null && responseJson["items"][v] != null) {
        checkDomain = responseJson["items"][v]["link"];
        if (checkDomain != null && checkDomain == query) {
          productPrice = responseJson["items"][v]["pagemap"]["metatags"][0]["product:price:amount"];
          currency = responseJson["items"][v]["pagemap"]["metatags"][0]["product:price:currency"];
          break;
        }
      }
    }
    if (productPrice > 0) { break; }
  }

google-apps-script

web-scraping

http-status-code-403

google-custom-search

urlfetch

0 Answers

Your Answer

Accepted video resources