records programs for you or sends notifications about interesting ones.
</p>
<p>
- In its current version, the crawler can be used a standalone program
- only and the preferred way to run it is as a scheduled task.
+ In its current version, the crawler can be used in two ways:
</p>
+ <ul>
+ <li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li>
+ <li><strong>web application</strong>: A web application running on a java
+ application server. With this type of use, the crawler also features an automatic retry
+ mechanism in case of failures, as well as a simple web interface. </li>
+ </ul>
</section>
<section>
</p>
<p>
The easy way to start is the
- <a href="installs/crawler/kiss/kiss-crawler-bin.zip">binary version</a>.
+ <a href="installs/crawler/kiss/kiss-crawler-bin.zip">standalone program binary version</a>
+ or using the <a href="installs/crawler/kissweb/wamblee-crawler-kissweb.war">web
+ application</a>.
</p>
<p>
The latest source can be obtained from subversion with the
URL <code>https://wamblee.org/svn/public/utils</code>. The subversion
repository allows read-only access to anyone.
</p>
+ <p>
+ The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
+ server (only required for the web application). It requires at least a Java Virtual Machine
+ 1.5 or greater to run.
+ </p>
</section>
<section>
tailored to the KiSS electronic programme guide.</li>
<li><code>programs.xml</code>: containing a description of which
programs must be recorded and which programs are interesting.</li>
- <li><code>org.wamblee.crawler.properties</code>: Containing a configuration of
- how to notify users of results. </li>
+ <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li>
</ul>
+ <p>
+ For the standalone program, all configuration files are in the <code>conf</code> directory.
+ For the web application, the properties files is located in the <code>WEB-INF/classes</code>
+ directory of the web application, and <code>crawler.xml</code> and <code>programs.xml</code>
+ are located outside of the web application at a location configured in the properties file.
+ </p>
<section>
<title>Installing and running the crawler</title>
<section>
- <title>Binary distribution</title>
+ <title>Standalone application</title>
<p>
In the binary distribution, execute the
<code>run</code> script for your operating system
</p>
</section>
+ <section>
+ <title>Web application</title>
+ <p>
+ After deploying the web application, navigate to the
+ application in your browser (e.g.
+ <code>http://localhost:8080/wamblee-crawler-kissweb</code>).
+ The screen should show an overview of the last time it ran (if
+ it ran before) as well as a button to run the crawler immediately.
+ Also, the result of the last run can be viewed.
+ The crawler will run automatically every morning at 5 AM local time.
+ </p>
+ </section>
+
<section>
<title>Source distribution</title>
<p>
<section>
<title>General usage</title>
<p>
- The crawler, as it is now, is s standalone program which is
- intended to be run from a command-line. When it runs, it
+ When the crawler runs, it
retrieves the programs for today. As a result, it is advisable
to run the program at an early point of the day as a scheduled
- task (e.g. cron on unix).
+ task (e.g. cron on unix). For the web application this is
+ preconfigured at 5AM.
</p>
<p>
Modifying the program to allow it to investigate tomorrow's
<p>
The best example is in the distribution itself. It is my personal
- <code>programs.xml</code> file.
+ <code>programs.xml</code> file.
</p>
</section>