new recording is scheduled through the web site, the KiSS recorder finds out about this new
recording by polling a server on the internet. This is a really cool feature since it
basically allows programming the recorder when away from home. </p>
- <p> After using this feature for some time now, I started noticing regular patterns. Often you
+ <p> After using this feature for some time, I started noticing regular patterns. Often you
are looking for the same programs and for certain types of programs. So, wouldn't it be nice
to have a program do this work for you and automatically record programs and notify you of
possibly interesting ones? </p>
interesting ones. </p>
<p> In its current version, the crawler can be used in two ways: </p>
<ul>
- <li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li>
+ <li><strong>standalone program</strong>:
+ A standalone program run from the command-line or as a scheduled task.</li>
<li><strong>web application</strong>: A web application running on a java application
server. With this type of use, the crawler also features an automatic retry mechanism in
case of failures, as well as a simple web interface. </li>
<p> The latest source can be obtained from subversion with the URL
<code>https://wamblee.org/svn/public/utils</code>. The subversion repository allows
read-only access to anyone. </p>
- <p> The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
- server (only required for the web application). It requires at least a Java Virtual Machine
+ <p> The application was developed and tested on SuSE linux 10.1 with
+ JBoss 4.0.4 application
+ server. An application server or servlet container is only required for the
+ web application. The crawler requires at least a Java Virtual Machine
1.5 or greater to run. </p>
</section>
<table>
<tr>
<th>Pattern</th>
- <th>Example of matching field values</th>
+ <th>Examples of matching field values</th>
</tr>
<tr>
<td>the.*x.*files</td>
<code>http://localhost:8080/wamblee-crawler-kissweb</code>). The screen should show an
overview of the last time it ran (if it ran before) as well as a button to run the crawler
immediately. Also, the result of the last run can be viewed. The crawler will run
- automatically starting at around 19:00 (this is not exactly 19:00),
+ automatically starting after 19:00,
and will retry at 1 hour intervals in case
- of failure to retrieve programme information. </p>
+ of failure to retrieve programme information.
+ </p>
+
+ <p>
+ Since the crawler checks the status at
+ 1 hour intervals it can run for the first time anytime between 19:00 and 20:00. This is done
+ on purpose since it means that crawlers run by different people will not all start running
+ simultaneously and is thus more friendly to the KiSS servers. </p>
</section>
<section>
<title>Source distribution</title>
- <p> With the source code, build everything with <code>ant dist-lite</code>, then locate the
- binary distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>. Then
- proceed as for the binary distribution. </p>
+ <p> With the source code, build everything with maven2 as follows:</p>
+ <source>
+ mvn -Dmaven.test.skip=true install
+ cd crawler
+ mvn package assembly:assembly
+ </source>
+ <p>
+ After this, locate the
+ binary distribution in the <code>target</code> subdirectory of the <code>crawler</code>
+ directory. Then
+ proceed as for the binary distribution.</p>
+
</section>
<section>