From: erik
This is where the KiSS crawler comes in. This is a simple crawler which @@ -46,27 +46,232 @@ programme information from there. Then based on that it automatically records programs for you or sends notifications about interesting ones.
++ In its current version, the crawler can be used a standalone program + only and the preferred way to run it is as a scheduled task. +
+ At this moment, no formal releases have been made and only the latest + version can be downloaded. +
++ The easy way to start is the + binary version. +
+
+ The latest source can be obtained from subversion with the
+ URL https://wamblee.org/svn/public/utils
. The subversion
+ repository allows read-only access to anyone.
+
+ The crawler comes with two configuration files, namely
+ crawler.xml
and programs.xml
.
+
crawler.xml
+ First of all, copy the config.xml.example
file
+ to config.xml
. After that, edit the first entry of
+ that file and replace user
and passwd
+ with your personal user id and password for the KiSS Electronic
+ Programme Guide.
+
programs.xml
+ The programs.xml
file contains the following
+ configuration items:
+
+ Notification is configured in the (surprise, surprise!)
+ notification
element. This notification element
+ is used to configure respectively sender mail address (= reply
+ address), recipient address, subject of the email, smtp server
+ host and port and optional username and password.
+ In addition it contains the names of the stylesheets to
+ generate the HTML and Text reports. These stylesheets
+ should not be changed.
+
+ Interesting TV shows are described using program
+ elements. Each program
element contains
+ one or more match
elements that describe
+ a condition that the interesting program must match.
+
+ Matching can be done on the following properties of a program: +
+Field name | +Description |
---|---|
name | +Program name | +
description | +Program description | +
channel | +Channel name | +
keywords | +Keywords/classification of the program. | +
+ The field to match is specified using the field
+ attribute of the match
element. If no field name
+ is specified then the program name is matched. Matching is done
+ by converting the field value to lowercase and then doing a
+ perl-like regular expression match of the provided value. As a
+ result, the content of the match element should be specified in
+ lower case otherwise the pattern will never match.
+ If multiple match
elements are specified for a
+ given program
element, then all matches must
+ apply for a program to be interesting.
+
+ Example patterns: +
+Pattern | +Example of matching field values | +
---|---|
the.*x.*files | +"The X files", "The X-Files: the making of" | +
star trek | +"Star Trek Voyager", "Star Trek: The next generation" | +
+ It is possible that different programs cannot be recorded at
+ since they overlap. To deal with such conflicts, it is possible
+ to specify a priority using the priority
element.
+ Higher values of the priority value mean a higher priority.
+ If two programs have the same priority, then it is (more or less)
+ unspecified which of the two will be recorded, but it will at least
+ record one program. If no priority is specified, then the
+ priority is 1 (one).
+
+ Since it is not always desirable to try to record every
+ program that matches the criteria, it is also possible to
+ generate notifications for interesting programs only without
+ recording them. This is done by specifying the
+ action
alement with the content notify
.
+ By default, the action
is record
.
+ To make the mail reports more readable it is possible to
+ also assign a category to a program for grouping interesting
+ programs. This can be done using the category
+ element. Note that if the action
is
+ notify
. then the priority
element
+ is not used.
+
+ In the binary distribution, execute the
+ run
script for your operating system
+ (run.bat
for windows, and
+ run.sh
for unix).
+
+ With the source code, build everything with
+ ant dist-lite
, then locate the binary
+ distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip
.
+ Then proceed as for the binary distribution.
+
+ The crawler, as it is now, is s standalone program which is + intended to be run from a command-line. When it runs, it + retrieves the programs for today. As a result, it is advisable + to run the program at an early point of the day as a scheduled + task (e.g. cron on unix). +
++ Modifying the program to allow it to investigate tomorrow's + programs instead is easy as well but not yet implemented. +
+
+ The best example is in the distribution itself. It is my personal
+ programs.xml
file.
+
+ You are always welcome to contribute. If you find a problem just + tell me about it and if you have ideas am I always interested to + hear about them. +
++ If you are a programmer and have a fix for a bug, just send me a + patch and if you are fanatic enough and have ideas, I can also + give you write access to the repository. +