About This Site
Over a year ago, looking for a python-based web crawler I discovered Harvestman. Harvestman did everything I wanted and more but it was just too complicated for me and didn't quite work they way I wanted. Re-visiting the code more recently I noticed that version 2 of Harvestman did exactly what I wanted and more (still) but it also did it the way I wanted too.
This site is an experiment that will hopefully grow out discussion between the creator of Harvestman, Anand B Pillai and me, Tom Smith. Anand doesn't have a lot of time to support this product and so hopefully I will be doing what I can, namely, helping to flesh out the documentation and creating a site to make writing the documentation as easy as it can be. I also had a go at a logo :-)
I wondered about creating a Wordpress or Wiki site but I have decided to use a Trac site so that my "live" experiments with Harvestman might be distributed as examples, helping them get started with what I call "Personal Data Mining". There's still a few glitches to iron out, so bear with me. I have managed to get a WYSIWYG editor installed but I need a little help working with .egg files on this hosted server. Still, I like the fact that Trac supports python syntax colouring, like this...
class DataCrawler(HarvestMan): """ A crawler which fetches pages by looking for matching data """ # This is an extreme case of using events. This combines # three events to create a fine grained filter that downloads # only page which has the string 'database' in it. # This is a rather simple filtered crawler, but by overriding # the handlers below with more powerful processing which can # scan a page and look for regular expressions by using # complex grammars, it is possible to build a topic focussed # crawler. def __init__(self, keyword): self.keyword = keyword super(DataCrawler, self).__init__()
... Given that both Harvest Man? and Trac are python-powered, I can see that perhaps this may be the most important feature of Trac in helping any documentation effort.
I am no python expert, so the content will be aimed at someone like me, who knows a little python (or is willing to learn) but likes things to be very simple. Let's see how we go...
Tom
