The following steps document the installation of Harvest Man? directly from the subversion trunk.


1. Downloading

Harvest Man? subversion trunk is at http://harvestman-crawler.googlecode.com/svn/trunk/. To check out the source code, enter the following command in a folder where you want to keep the files.


$ svn co http://harvestman-crawler.googlecode.com/svn/trunk/HarvestMan HarvestMan-2.0

NOTE: You can also download an alpha package tarball from http://harvestmanontheweb.com/packages/2.0. Files are named in the form Harvest Man?-2.0alpha<day><month><year>.tar.gz. Grab the latest file. However, tarballs are not updated frequently to the website, so these will be mostly behind the subversion trunk by several days.


2. Requirements and Dependencies

Harvest Man? requires at least Python version 2.4 to run. The other dependencies of Harvest Man? (for version 2.0) are,

  1. sgmlop - Fast SGML/XML/(X)HTML parser written in C (used for HTML parsing)
  2. pyparsing - A general parsing module for Python (used in the rudimentary Javascript parser)
  3. web.py - Simple but powerful web framework written in pure Python. (for browser-based UI)

Of these, the Harvest Man? distribution includes sgmlop pacakge. The setup.py script of Harvest Man? pulls in the other two dependencies if they are not already installed.

2. Installing to Python

Change directory to the folder "Harvest Man?-2.0" and issue the following command. Make sure you have root privileges, before doing so.


$ sudo python setup.py install

This will check dependencies, pull in those which are not satisfied, install the code to your Python site-packages folder and create short-cuts for the two Harvest Man? applications, namely harvestman and hget . It will also run unit-tests to check the installation.

Here is the command-line log of a fresh installation of Harvest Man?, which installs everything, including the dependencies from scratch.

Full Command Line Install Log

3. Installation Check

The installation script executes unit test cases immediately after installation and informs the user of the results.

Running unit tests...
Running test_connector...
test_connect...
test_connect_etag...
test_connect_lmt...
Running test_urlparser...
Ran 10 tests, 10 passes, 0 failures

NOTE: Unit tests are executed only for the first time when running setup.py, not for subsequent invocations.

Harvest Man? also has a "self-test" option which can be invoked by passing the argument selftest to the program on the command-line. It invokes the same unit-tests as above.

[anand@localhost ~]$ harvestman --selftest
Loading system configuration... 
Loading user configuration... 
Running self-test...
Running test_connector...
test_connect...
test_connect_etag...
test_connect_lmt...
Running test_urlparser...
<unittest.TestResult run=10 errors=0 failures=0>
self-test complete. All tests passed.

If something goes wrong in the tests, the program will print...

self-test failed. Please check your installation!