Today a friend was trying to fix a scrapping feature, apparently, the feature it was working fine in the past but suddenly stop working, the feature was using Mechanize.
So the first guess on this one was maybe we should a secret key to try the scrap the site, or something like that.
The second one was maybe the user agent was being denied by the server.
But not luck on that, so she started to use WATIR to do the job, but we still got to the same point.
So we asked a guy we know he is smart enough to help us out. And easily he started using
curl to check what's going on.
So the experiment looks like this:
curl -I http://www.zazvick.com HTTP/1.1 301 Moved Permanently Date: Thu, 01 Jun 2017 21:59:49 GMT Content-Type: text/plain; charset=utf-8 Content-Length: 54 Connection: keep-alive Set-Cookie: __cfduid=da072aa13d0c9a2cb4047cc1ef2472fa91496354389; expires=Fri, 01-Jun-18 21:59:49 GMT; path=/; domain=.zazvick.com; HttpOnly X-Powered-By: Express Location: https://zazvick.com/ Vary: Accept, Accept-Encoding Via: 1.1 vegur Server: cloudflare-nginx CF-RAY: 36857137b2c10962-DFW
if you look carefully on this request there is a
301 Moved Permanently status that's mean is doing a redirect.
so Imagine what was the conclusion: the issue was caused because the site was changed from
https so they are using an SSL certificate.
How did we get to that conclusion? well 😅:
curl -I https://www.zazvick.com HTTP/1.1 200 OK Date: Thu, 01 Jun 2017 21:59:45 GMT Content-Type: text/html; charset=utf-8 Connection: keep-alive Set-Cookie: __cfduid=d93c17e1eb58f7c242c351c309a10aa1e1496354385; expires=Fri, 01-Jun-18 21:59:45 GMT; path=/; domain=.zazvick.com; HttpOnly X-Powered-By: Express Cache-Control: public, max-age=0 Vary: Accept-Encoding Via: 1.1 vegur Server: cloudflare-nginx CF-RAY: 3685711cdc97201e-DFW
If you see there is an status
200 that stands for successfull request.
That's all folks, hopefully you find this of any help, see you in the next post.