2007/08/26

building a spider in python

This was the first set of keywords that someone in the US typed to reach my blog. He must have been disappointed, there isn't much about writing a spider in Python, except that there is a useful class which wasn't that useful to me. Maybe he didn't care to know more than that, maybe he did.

Prototype 1 had its spider written in Python.

It is quite easy to do so, the httplib class is easy to use, it has all the useful options one might need (or at least that I needed to have), I couldn't find any bug in it, and its use is straightforward.

It gets the job done, leaving to the developer the opportunity to focus on what to do with the data.

If you are considering python to write a spider, go for it. And if you don't trust my word (yet), you will see a good number of major search engine using python to write that component.

No comments: