limitations under the License.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
-<document>
- <header>
- <title>Automatic Recording for KiSS Hard Disk Recorders</title>
- </header>
+<document>
+ <header>
+ <title>Automatic Recording for KiSS Hard Disk Recorders</title>
+ </header>
<body>
- <warning>
- KiSS makes regular updates to their site that sometimes require adaptations
- to the crawler. If it stops working, check out the most recent version here.
- </warning>
+ <warning> KiSS makes regular updates to their site that sometimes require adaptations to the
+ crawler. If it stops working, check out the most recent version here. </warning>
<section id="changelog">
<title>Changelog</title>
<section>
- <title>17 November 2006</title>
+ <title>21 November 2006</title>
<ul>
- <li>Corrected the packed distributions. The standalone distribution
- had an error in the scripts and was missing libraries </li>
+ <li>Corrected the <code>config.xml</code> again.</li>
+ <li>Corrected errors in the documentation for the web application. It starts running at 19:00
+ and not at 5:00.</li>
+ </ul>
+ </section>
+ <section>
+ <title>19 November 2006</title>
+ <ul>
+ <li>Corrected the <code>config.xml</code> file to deal with changes in the login procedure.</li>
+ </ul>
+ </section>
+ <section>
+ <title>17 November 2006</title>
+ <ul>
+ <li>Corrected the packed distributions. The standalone distribution had an error in the
+ scripts and was missing libraries </li>
</ul>
- </section>
- <section>
+ </section>
+ <section>
<title>7 September 2006</title>
<ul>
<li>KiSS modified the login procedure. It is now working again.</li>
- <li>Generalized the startup scripts. They should now be insensitive to the specific libraries used. </li>
+ <li>Generalized the startup scripts. They should now be insensitive to the specific
+ libraries used. </li>
</ul>
</section>
<section>
<title>31 August 2006</title>
<ul>
- <li>Added windows bat file for running the crawler under windows.
- Very add-hoc, will be generalized. </li>
+ <li>Added windows bat file for running the crawler under windows. Very add-hoc, will be
+ generalized. </li>
</ul>
</section>
<section>
<title>24 August 2006</title>
<ul>
<li>The crawler now uses desktop login for crawling. Also, it is much more efficient since
- it no longer needs to crawl the individual programs. This is because the channel page
+ it no longer needs to crawl the individual programs. This is because the channel page
includes descriptions of programs in javascript popups which can be used by the crawler.
- The result is a significant reduction of the load on the KiSS EPG site. Also, the delay
+ The result is a significant reduction of the load on the KiSS EPG site. Also, the delay
between requests has been increased to further reduce load on the KiSS EPG site. </li>
- <li>
- The crawler now crawls programs for tomorrow instead of for today.
- </li>
- <li>
- The web based crawler is configured to run only between 7pm and 12pm. It used to run at
- 5am.
- </li>
+ <li> The crawler now crawls programs for tomorrow instead of for today. </li>
+ <li> The web based crawler is configured to run only between 7pm and 12pm. It used to run
+ at 5am. </li>
</ul>
</section>
-
+
<section>
<title>13-20 August 2006</title>
- <p>
- There were several changes to the login procedure, requiring modifications to the crawler.
- </p>
+ <p> There were several changes to the login procedure, requiring modifications to the
+ crawler. </p>
<ul>
<li>The crawler now uses the 'Referer' header field correctly at login.</li>
- <li>KiSS now uses hidden form fields in their login process which are now also handled correctly by the
- crawler.</li>
+ <li>KiSS now uses hidden form fields in their login process which are now also handled
+ correctly by the crawler.</li>
</ul>
</section>
</section>
<section id="overview">
<title>Overview</title>
-
- <p>
- In 2005, <a href="site:links/kiss">KiSS</a> introduced the ability
- to schedule recordings on KiSS hard disk recorder (such as the
- DP-558) through a web site on the internet. When a new recording is
- scheduled through the web site, the KiSS recorder finds out about
- this new recording by polling a server on the internet.
- This is a really cool feature since it basically allows programming
- the recorder when away from home.
- </p>
- <p>
- After using this feature for some time now, I started noticing regular
- patterns. Often you are looking for the same programs and for certain
- types of programs. So, wouldn't it be nice to have a program
- do this work for you and automatically record programs and notify you
- of possibly interesting ones?
- </p>
- <p>
- This is where the KiSS crawler comes in. This is a simple crawler which
- logs on to the KiSS electronic programme guide web site and gets
- programme information from there. Then based on that it automatically
- records programs for you or sends notifications about interesting ones.
- </p>
- <p>
- In its current version, the crawler can be used in two ways:
- </p>
+
+ <p> In 2005, <a href="site:links/kiss">KiSS</a> introduced the ability to schedule recordings
+ on KiSS hard disk recorder (such as the DP-558) through a web site on the internet. When a
+ new recording is scheduled through the web site, the KiSS recorder finds out about this new
+ recording by polling a server on the internet. This is a really cool feature since it
+ basically allows programming the recorder when away from home. </p>
+ <p> After using this feature for some time now, I started noticing regular patterns. Often you
+ are looking for the same programs and for certain types of programs. So, wouldn't it be nice
+ to have a program do this work for you and automatically record programs and notify you of
+ possibly interesting ones? </p>
+ <p> This is where the KiSS crawler comes in. This is a simple crawler which logs on to the
+ KiSS electronic programme guide web site and gets programme information from there. Then
+ based on that it automatically records programs for you or sends notifications about
+ interesting ones. </p>
+ <p> In its current version, the crawler can be used in two ways: </p>
<ul>
<li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li>
- <li><strong>web application</strong>: A web application running on a java
- application server. With this type of use, the crawler also features an automatic retry
- mechanism in case of failures, as well as a simple web interface. </li>
+ <li><strong>web application</strong>: A web application running on a java application
+ server. With this type of use, the crawler also features an automatic retry mechanism in
+ case of failures, as well as a simple web interface. </li>
</ul>
</section>
-
+
<section>
<title>Downloading</title>
-
- <p>
- At this moment, no formal releases have been made and only the latest
- version can be downloaded.
- </p>
- <p>
- The easy way to start is the
- <a href="installs/crawler/target/wamblee-crawler-0.2-SNAPSHOT-kissbin.zip">standalone program binary version</a>
- or using the <a href="installs/crawler/kissweb/target/wamblee-crawler-kissweb.war">web
- application</a>.
- </p>
- <p>
- The latest source can be obtained from subversion with the
- URL <code>https://wamblee.org/svn/public/utils</code>. The subversion
- repository allows read-only access to anyone.
- </p>
- <p>
- The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
+
+ <p> At this moment, no formal releases have been made and only the latest version can be
+ downloaded. </p>
+ <p> The easy way to start is the <a
+ href="installs/crawler/target/wamblee-crawler-0.2-SNAPSHOT-kissbin.zip">standalone program
+ binary version</a> or using the <a
+ href="installs/crawler/kissweb/target/wamblee-crawler-kissweb.war">web application</a>. </p>
+ <p> The latest source can be obtained from subversion with the URL
+ <code>https://wamblee.org/svn/public/utils</code>. The subversion repository allows
+ read-only access to anyone. </p>
+ <p> The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
server (only required for the web application). It requires at least a Java Virtual Machine
- 1.5 or greater to run.
- </p>
+ 1.5 or greater to run. </p>
</section>
-
+
<section>
<title>Configuring the crawler</title>
-
- <p>
- The crawler comes with three configuration files:
- </p>
+
+ <p> The crawler comes with three configuration files: </p>
<ul>
- <li><code>crawler.xml</code>: basic crawler configuration
- tailored to the KiSS electronic programme guide.</li>
- <li><code>programs.xml</code>: containing a description of which
- programs must be recorded and which programs are interesting.</li>
- <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li>
+ <li><code>crawler.xml</code>: basic crawler configuration tailored to the KiSS electronic
+ programme guide.</li>
+ <li><code>programs.xml</code>: containing a description of which programs must be recorded
+ and which programs are interesting.</li>
+ <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li>
</ul>
- <p>
- For the standalone program, all configuration files are in the <code>conf</code> directory.
- For the web application, the properties files is located in the <code>WEB-INF/classes</code>
- directory of the web application, and <code>crawler.xml</code> and <code>programs.xml</code>
- are located outside of the web application at a location configured in the properties file.
- </p>
-
-
+ <p> For the standalone program, all configuration files are in the <code>conf</code>
+ directory. For the web application, the properties files is located in the
+ <code>WEB-INF/classes</code> directory of the web application, and
+ <code>crawler.xml</code> and <code>programs.xml</code> are located outside of the web
+ application at a location configured in the properties file. </p>
+
+
<section>
<title>Crawler configuration <code>crawler.xml</code></title>
-
- <p>
- First of all, copy the <code>config.xml.example</code> file
- to <code>config.xml</code>. After that, edit the first entry of
- that file and replace <code>user</code> and <code>passwd</code>
- with your personal user id and password for the KiSS Electronic
- Programme Guide.
- </p>
+
+ <p> First of all, copy the <code>config.xml.example</code> file to <code>config.xml</code>.
+ After that, edit the first entry of that file and replace <code>user</code> and
+ <code>passwd</code> with your personal user id and password for the KiSS Electronic
+ Programme Guide. </p>
+ </section>
+
+ <section>
+ <title>Program configuration</title>
+ <p> Interesting TV shows are described using <code>program</code> elements. Each
+ <code>program</code> element contains one or more <code>match</code> elements that
+ describe a condition that the interesting program must match. </p>
+ <p> Matching can be done on the following properties of a program: </p>
+ <table>
+ <tr>
+ <th>Field name</th>
+ <th>Description</th>
+ </tr>
+ <tr>
+ <td>name</td>
+ <td>Program name</td>
+ </tr>
+ <tr>
+ <td>description</td>
+ <td>Program description</td>
+ </tr>
+ <tr>
+ <td>channel</td>
+ <td>Channel name</td>
+ </tr>
+ <tr>
+ <td>keywords</td>
+ <td>Keywords/classification of the program.</td>
+ </tr>
+ </table>
+ <p> The field to match is specified using the <code>field</code> attribute of the
+ <code>match</code> element. If no field name is specified then the program name is
+ matched. Matching is done by converting the field value to lowercase and then doing a
+ perl-like regular expression match of the provided value. As a result, the content of the
+ match element should be specified in lower case otherwise the pattern will never match. If
+ multiple <code>match</code> elements are specified for a given <code>program</code>
+ element, then all matches must apply for a program to be interesting. </p>
+ <p> Example patterns: </p>
+ <table>
+ <tr>
+ <th>Pattern</th>
+ <th>Example of matching field values</th>
+ </tr>
+ <tr>
+ <td>the.*x.*files</td>
+ <td>"The X files", "The X-Files: the making of"</td>
+ </tr>
+ <tr>
+ <td>star trek</td>
+ <td>"Star Trek Voyager", "Star Trek: The next generation"</td>
+ </tr>
+ </table>
+
+ <p> It is possible that different programs cannot be recorded since they overlap. To deal
+ with such conflicts, it is possible to specify a priority using the <code>priority</code>
+ element. Higher values of the priority value mean a higher priority. If two programs have
+ the same priority, then it is (more or less) unspecified which of the two will be
+ recorded, but it will at least record one program. If no priority is specified, then the
+ priority is 1 (one). </p>
+
+ <p> Since it is not always desirable to try to record every program that matches the
+ criteria, it is also possible to generate notifications for interesting programs only
+ without recording them. This is done by specifying the <code>action</code> alement with
+ the content <code>notify</code>. By default, the <code>action</code> is
+ <code>record</code>. To make the mail reports more readable it is possible to also assign
+ a category to a program for grouping interesting programs. This can be done using the
+ <code>category</code> element. Note that if the <code>action</code> is
+ <code>notify</code>. then the <code>priority</code> element is not used. </p>
+
</section>
- <section>
- <title>Program configuration</title>
- <p>
- Interesting TV shows are described using <code>program</code>
- elements. Each <code>program</code> element contains
- one or more <code>match</code> elements that describe
- a condition that the interesting program must match.
- </p>
- <p>
- Matching can be done on the following properties of a program:
- </p>
- <table>
- <tr><th>Field name</th>
- <th>Description</th></tr>
- <tr>
- <td>name</td>
- <td>Program name</td>
- </tr>
- <tr>
- <td>description</td>
- <td>Program description</td>
- </tr>
- <tr>
- <td>channel</td>
- <td>Channel name</td>
- </tr>
- <tr>
- <td>keywords</td>
- <td>Keywords/classification of the program.</td>
- </tr>
- </table>
- <p>
- The field to match is specified using the <code>field</code>
- attribute of the <code>match</code> element. If no field name
- is specified then the program name is matched. Matching is done
- by converting the field value to lowercase and then doing a
- perl-like regular expression match of the provided value. As a
- result, the content of the match element should be specified in
- lower case otherwise the pattern will never match.
- If multiple <code>match</code> elements are specified for a
- given <code>program</code> element, then all matches must
- apply for a program to be interesting.
- </p>
- <p>
- Example patterns:
- </p>
- <table>
- <tr>
- <th>Pattern</th>
- <th>Example of matching field values</th>
- </tr>
- <tr>
- <td>the.*x.*files</td>
- <td>"The X files", "The X-Files: the making of"</td>
- </tr>
- <tr>
- <td>star trek</td>
- <td>"Star Trek Voyager", "Star Trek: The next generation"</td>
- </tr>
- </table>
-
- <p>
- It is possible that different programs cannot be recorded
- since they overlap. To deal with such conflicts, it is possible
- to specify a priority using the <code>priority</code> element.
- Higher values of the priority value mean a higher priority.
- If two programs have the same priority, then it is (more or less)
- unspecified which of the two will be recorded, but it will at least
- record one program. If no priority is specified, then the
- priority is 1 (one).
- </p>
-
- <p>
- Since it is not always desirable to try to record every
- program that matches the criteria, it is also possible to
- generate notifications for interesting programs only without
- recording them. This is done by specifying the
- <code>action</code> alement with the content <code>notify</code>.
- By default, the <code>action</code> is <code>record</code>.
- To make the mail reports more readable it is possible to
- also assign a category to a program for grouping interesting
- programs. This can be done using the <code>category</code>
- element. Note that if the <code>action</code> is
- <code>notify</code>. then the <code>priority</code> element
- is not used.
- </p>
-
- </section>
-
<section>
<title>Notification configuration</title>
- <p>
- Edit the configuration file <code>org.wamblee.crawler.properties</code>.
- The properties file is self-explanatory.
- </p>
+ <p> Edit the configuration file <code>org.wamblee.crawler.properties</code>. The properties
+ file is self-explanatory. </p>
</section>
</section>
-
-
-
-
+
+
+
+
<section>
<title>Installing and running the crawler</title>
-
+
<section>
<title>Standalone application</title>
- <p>
- In the binary distribution, execute the
- <code>run</code> script for your operating system
- (<code>run.bat</code> for windows, and
- <code>run.sh</code> for unix).
- </p>
+ <p> In the binary distribution, execute the <code>run</code> script for your operating
+ system (<code>run.bat</code> for windows, and <code>run.sh</code> for unix). </p>
</section>
-
+
<section>
<title>Web application</title>
- <p>
- After deploying the web application, navigate to the
- application in your browser (e.g.
- <code>http://localhost:8080/wamblee-crawler-kissweb</code>).
- The screen should show an overview of the last time it ran (if
- it ran before) as well as a button to run the crawler immediately.
- Also, the result of the last run can be viewed.
- The crawler will run automatically every morning at 5 AM local time,
- and will retry at 1 hour intervals in case of failure to retrieve
- programme information.
- </p>
+ <p> After deploying the web application, navigate to the application in your browser (e.g.
+ <code>http://localhost:8080/wamblee-crawler-kissweb</code>). The screen should show an
+ overview of the last time it ran (if it ran before) as well as a button to run the crawler
+ immediately. Also, the result of the last run can be viewed. The crawler will run
+ automatically starting at around 19:00 (this is not exactly 19:00),
+ and will retry at 1 hour intervals in case
+ of failure to retrieve programme information. </p>
</section>
-
+
<section>
<title>Source distribution</title>
- <p>
- With the source code, build everything with
- <code>ant dist-lite</code>, then locate the binary
- distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>.
- Then proceed as for the binary distribution.
- </p>
+ <p> With the source code, build everything with <code>ant dist-lite</code>, then locate the
+ binary distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>. Then
+ proceed as for the binary distribution. </p>
</section>
-
+
<section>
<title>General usage</title>
- <p>
- When the crawler runs, it
- retrieves the programs for tomorrow. As a result, it is advisable
- to run the program at an early point of the day as a scheduled
- task (e.g. cron on unix). For the web application this is
- preconfigured at 5AM.
- </p>
- <note>
- If you deploy the web application today, it will run automatically
- on the next (!) day. This even holds if you deploy the application
- before the normal scheduled time.
- </note>
-
- <p>
- Modifying the program to allow it to investigate tomorrow's
- programs instead is easy as well but not yet implemented.
+ <p> When the crawler runs, it retrieves the programs for tomorrow.
</p>
+ <note> If you deploy the web application today, it will run automatically on the next (!)
+ day. This even holds if you deploy the application before the normal scheduled time. </note>
</section>
-
-
+
+
</section>
<section id="examples">
<title>Examples</title>
-
- <p>
- The best example is in the distribution itself. It is my personal
- <code>programs.xml</code> file.
- </p>
+
+ <p> The best example is in the distribution itself. It is my personal
+ <code>programs.xml</code> file. </p>
</section>
-
+
<section>
<title>Contributing</title>
-
- <p>
- You are always welcome to contribute. If you find a problem just
- tell me about it and if you have ideas am I always interested to
- hear about them.
- </p>
- <p>
- If you are a programmer and have a fix for a bug, just send me a
- patch and if you are fanatic enough and have ideas, I can also
- give you write access to the repository.
- </p>
+
+ <p> You are always welcome to contribute. If you find a problem just tell me about it and if
+ you have ideas am I always interested to hear about them. </p>
+ <p> If you are a programmer and have a fix for a bug, just send me a patch and if you are
+ fanatic enough and have ideas, I can also give you write access to the repository. </p>
</section>
-
-
+
+
</body>
</document>