<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
<document>
<header>
- <title>Welcome to the KiSS crawler</title>
+ <title>Automatic Recording for KiSS Hard Disk Recorders</title>
</header>
<body>
<section id="overview">
patterns. Often you are looking for the same programs and for certain
types of programs. So, wouldn't it be nice to have a program
do this work for you and automatically record programs and notify you
- of possibly interesting ones.
+ of possibly interesting ones?
</p>
<p>
This is where the KiSS crawler comes in. This is a simple crawler which
programme information from there. Then based on that it automatically
records programs for you or sends notifications about interesting ones.
</p>
+ <p>
+ In its current version, the crawler can be used in two ways:
+ </p>
+ <ul>
+ <li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li>
+ <li><strong>web application</strong>: A web application running on a java
+ application server. With this type of use, the crawler also features an automatic retry
+ mechanism in case of failures, as well as a simple web interface. </li>
+ </ul>
</section>
<section>
<title>Downloading</title>
+
+ <p>
+ At this moment, no formal releases have been made and only the latest
+ version can be downloaded.
+ </p>
+ <p>
+ The easy way to start is the
+ <a href="installs/crawler/kiss/kiss-crawler-bin.zip">standalone program binary version</a>
+ or using the <a href="installs/crawler/kissweb/wamblee-crawler-kissweb.war">web
+ application</a>.
+ </p>
+ <p>
+ The latest source can be obtained from subversion with the
+ URL <code>https://wamblee.org/svn/public/utils</code>. The subversion
+ repository allows read-only access to anyone.
+ </p>
+ <p>
+ The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
+ server (only required for the web application). It requires at least a Java Virtual Machine
+ 1.5 or greater to run.
+ </p>
</section>
<section>
<title>Configuring the crawler</title>
+
+ <p>
+ The crawler comes with three configuration files:
+ </p>
+ <ul>
+ <li><code>crawler.xml</code>: basic crawler configuration
+ tailored to the KiSS electronic programme guide.</li>
+ <li><code>programs.xml</code>: containing a description of which
+ programs must be recorded and which programs are interesting.</li>
+ <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li>
+ </ul>
+ <p>
+ For the standalone program, all configuration files are in the <code>conf</code> directory.
+ For the web application, the properties files is located in the <code>WEB-INF/classes</code>
+ directory of the web application, and <code>crawler.xml</code> and <code>programs.xml</code>
+ are located outside of the web application at a location configured in the properties file.
+ </p>
+
+
+ <section>
+ <title>Crawler configuration <code>crawler.xml</code></title>
+
+ <p>
+ First of all, copy the <code>config.xml.example</code> file
+ to <code>config.xml</code>. After that, edit the first entry of
+ that file and replace <code>user</code> and <code>passwd</code>
+ with your personal user id and password for the KiSS Electronic
+ Programme Guide.
+ </p>
+ </section>
+
+ <section>
+ <title>Program configuration</title>
+ <p>
+ Interesting TV shows are described using <code>program</code>
+ elements. Each <code>program</code> element contains
+ one or more <code>match</code> elements that describe
+ a condition that the interesting program must match.
+ </p>
+ <p>
+ Matching can be done on the following properties of a program:
+ </p>
+ <table>
+ <tr><th>Field name</th>
+ <th>Description</th></tr>
+ <tr>
+ <td>name</td>
+ <td>Program name</td>
+ </tr>
+ <tr>
+ <td>description</td>
+ <td>Program description</td>
+ </tr>
+ <tr>
+ <td>channel</td>
+ <td>Channel name</td>
+ </tr>
+ <tr>
+ <td>keywords</td>
+ <td>Keywords/classification of the program.</td>
+ </tr>
+ </table>
+ <p>
+ The field to match is specified using the <code>field</code>
+ attribute of the <code>match</code> element. If no field name
+ is specified then the program name is matched. Matching is done
+ by converting the field value to lowercase and then doing a
+ perl-like regular expression match of the provided value. As a
+ result, the content of the match element should be specified in
+ lower case otherwise the pattern will never match.
+ If multiple <code>match</code> elements are specified for a
+ given <code>program</code> element, then all matches must
+ apply for a program to be interesting.
+ </p>
+ <p>
+ Example patterns:
+ </p>
+ <table>
+ <tr>
+ <th>Pattern</th>
+ <th>Example of matching field values</th>
+ </tr>
+ <tr>
+ <td>the.*x.*files</td>
+ <td>"The X files", "The X-Files: the making of"</td>
+ </tr>
+ <tr>
+ <td>star trek</td>
+ <td>"Star Trek Voyager", "Star Trek: The next generation"</td>
+ </tr>
+ </table>
+
+ <p>
+ It is possible that different programs cannot be recorded
+ since they overlap. To deal with such conflicts, it is possible
+ to specify a priority using the <code>priority</code> element.
+ Higher values of the priority value mean a higher priority.
+ If two programs have the same priority, then it is (more or less)
+ unspecified which of the two will be recorded, but it will at least
+ record one program. If no priority is specified, then the
+ priority is 1 (one).
+ </p>
+
+ <p>
+ Since it is not always desirable to try to record every
+ program that matches the criteria, it is also possible to
+ generate notifications for interesting programs only without
+ recording them. This is done by specifying the
+ <code>action</code> alement with the content <code>notify</code>.
+ By default, the <code>action</code> is <code>record</code>.
+ To make the mail reports more readable it is possible to
+ also assign a category to a program for grouping interesting
+ programs. This can be done using the <code>category</code>
+ element. Note that if the <code>action</code> is
+ <code>notify</code>. then the <code>priority</code> element
+ is not used.
+ </p>
+
+ </section>
+
+ <section>
+ <title>Notification configuration</title>
+ <p>
+ Edit the configuration file <code>org.wamblee.crawler.properties</code>.
+ The properties file is self-explanatory.
+ </p>
+ </section>
</section>
+
+
+
<section>
<title>Installing and running the crawler</title>
+
+ <section>
+ <title>Standalone application</title>
+ <p>
+ In the binary distribution, execute the
+ <code>run</code> script for your operating system
+ (<code>run.bat</code> for windows, and
+ <code>run.sh</code> for unix).
+ </p>
+ </section>
+
+ <section>
+ <title>Web application</title>
+ <p>
+ After deploying the web application, navigate to the
+ application in your browser (e.g.
+ <code>http://localhost:8080/wamblee-crawler-kissweb</code>).
+ The screen should show an overview of the last time it ran (if
+ it ran before) as well as a button to run the crawler immediately.
+ Also, the result of the last run can be viewed.
+ The crawler will run automatically every morning at 5 AM local time,
+ and will retry at 1 hour intervals in case of failure to retrieve
+ programme information.
+ </p>
+ </section>
+
+ <section>
+ <title>Source distribution</title>
+ <p>
+ With the source code, build everything with
+ <code>ant dist-lite</code>, then locate the binary
+ distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>.
+ Then proceed as for the binary distribution.
+ </p>
+ </section>
+
+ <section>
+ <title>General usage</title>
+ <p>
+ When the crawler runs, it
+ retrieves the programs for today. As a result, it is advisable
+ to run the program at an early point of the day as a scheduled
+ task (e.g. cron on unix). For the web application this is
+ preconfigured at 5AM.
+ </p>
+ <p>
+ Modifying the program to allow it to investigate tomorrow's
+ programs instead is easy as well but not yet implemented.
+ </p>
+ </section>
+
+
</section>
<section id="examples">
<title>Examples</title>
+ <p>
+ The best example is in the distribution itself. It is my personal
+ <code>programs.xml</code> file.
+ </p>
</section>
<section>
<title>Contributing</title>
+
+ <p>
+ You are always welcome to contribute. If you find a problem just
+ tell me about it and if you have ideas am I always interested to
+ hear about them.
+ </p>
+ <p>
+ If you are a programmer and have a fix for a bug, just send me a
+ patch and if you are fanatic enough and have ideas, I can also
+ give you write access to the repository.
+ </p>
</section>