Scrapping websites leading with bugs: Mechanize?, Watir?, Curl!

Today a friend was trying to fix a scrapping feature, apparently, the feature it was working fine in the past but suddenly stop working, the feature was using Mechanize.

So the first guess on this one was maybe we should a secret key to try the scrap the site, or something like that.

The second one was maybe the user agent was being denied by the server.

But not luck on that, so she started to use WATIR to do the job, but we still got to the same point.

So we asked a guy we know he is smart enough to help us out. And easily he started using curl to check what's going on.

So the experiment looks like this:

curl -I http://www.zazvick.com

HTTP/1.1 301 Moved Permanently  
Date: Thu, 01 Jun 2017 21:59:49 GMT  
Content-Type: text/plain; charset=utf-8  
Content-Length: 54  
Connection: keep-alive  
Set-Cookie: __cfduid=da072aa13d0c9a2cb4047cc1ef2472fa91496354389; expires=Fri, 01-Jun-18 21:59:49 GMT; path=/; domain=.zazvick.com; HttpOnly  
X-Powered-By: Express  
Location: https://zazvick.com/  
Vary: Accept, Accept-Encoding  
Via: 1.1 vegur  
Server: cloudflare-nginx  
CF-RAY: 36857137b2c10962-DFW  

if you look carefully on this request there is a 301 Moved Permanently status that's mean is doing a redirect.

so Imagine what was the conclusion: the issue was caused because the site was changed from http to https so they are using an SSL certificate.

How did we get to that conclusion? well 😅:

curl -I https://www.zazvick.com

HTTP/1.1 200 OK  
Date: Thu, 01 Jun 2017 21:59:45 GMT  
Content-Type: text/html; charset=utf-8  
Connection: keep-alive  
Set-Cookie: __cfduid=d93c17e1eb58f7c242c351c309a10aa1e1496354385; expires=Fri, 01-Jun-18 21:59:45 GMT; path=/; domain=.zazvick.com; HttpOnly  
X-Powered-By: Express  
Cache-Control: public, max-age=0  
Vary: Accept-Encoding  
Via: 1.1 vegur  
Server: cloudflare-nginx  
CF-RAY: 3685711cdc97201e-DFW  

If you see there is an status 200 that stands for successfull request.

That's all folks, hopefully you find this of any help, see you in the next post.

Victor Velazquez

Coder, Musician, Startups, Passionate Dancer & Life Lover. Software Engineer at MagmaLabs, Co-founder of Web Dev Talks, Co-founder of Voltaire, Co-founder of Paqkit, Ex-co-founder of Zaznova.

Subscribe to The zazvick's blog

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!