X-Git-Url: http://wamblee.org/gitweb/?a=blobdiff_plain;f=crawler%2Fkiss%2Fdocs%2Fcontent%2Fxdocs%2Findex.xml;h=d2cc2149b0cb73ada6221d38065bc275f67d21cf;hb=616d9c8927b015cd8b652460c6227b40ee1ecd2e;hp=8886d717b14d8903228e96465ebd0edfcebf99a8;hpb=5313be7b0501c6189380555ae8d9c355d7fee145;p=utils
diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml
index 8886d717..d2cc2149 100644
--- a/crawler/kiss/docs/content/xdocs/index.xml
+++ b/crawler/kiss/docs/content/xdocs/index.xml
@@ -18,7 +18,7 @@
This is where the KiSS crawler comes in. This is a simple crawler which @@ -46,27 +46,232 @@ programme information from there. Then based on that it automatically records programs for you or sends notifications about interesting ones.
++ In its current version, the crawler can be used a standalone program + only and the preferred way to run it is as a scheduled task. +
+ At this moment, no formal releases have been made and only the latest + version can be downloaded. +
++ The easy way to start is the + binary version. +
+
+ The latest source can be obtained from subversion with the
+ URL https://wamblee.org/svn/public/utils
. The subversion
+ repository allows read-only access to anyone.
+
+ The crawler comes with two configuration files, namely
+ crawler.xml
and programs.xml
.
+
crawler.xml
+ First of all, copy the config.xml.example
file
+ to config.xml
. After that, edit the first entry of
+ that file and replace user
and passwd
+ with your personal user id and password for the KiSS Electronic
+ Programme Guide.
+
programs.xml
+ The programs.xml
file contains the following
+ configuration items:
+
+ Notification is configured in the (surprise, surprise!)
+ notification
element. This notification element
+ is used to configure respectively sender mail address (= reply
+ address), recipient address, subject of the email, smtp server
+ host and port and optional username and password.
+ In addition it contains the names of the stylesheets to
+ generate the HTML and Text reports. These stylesheets
+ should not be changed.
+
+ Interesting TV shows are described using program
+ elements. Each program
element contains
+ one or more match
elements that describe
+ a condition that the interesting program must match.
+
+ Matching can be done on the following properties of a program: +
+Field name | +Description |
---|---|
name | +Program name | +
description | +Program description | +
channel | +Channel name | +
keywords | +Keywords/classification of the program. | +
+ The field to match is specified using the field
+ attribute of the match
element. If no field name
+ is specified then the program name is matched. Matching is done
+ by converting the field value to lowercase and then doing a
+ perl-like regular expression match of the provided value. As a
+ result, the content of the match element should be specified in
+ lower case otherwise the pattern will never match.
+ If multiple match
elements are specified for a
+ given program
element, then all matches must
+ apply for a program to be interesting.
+
+ Example patterns: +
+Pattern | +Example of matching field values | +
---|---|
the.*x.*files | +"The X files", "The X-Files: the making of" | +
star trek | +"Star Trek Voyager", "Star Trek: The next generation" | +
+ It is possible that different programs cannot be recorded
+ since they overlap. To deal with such conflicts, it is possible
+ to specify a priority using the priority
element.
+ Higher values of the priority value mean a higher priority.
+ If two programs have the same priority, then it is (more or less)
+ unspecified which of the two will be recorded, but it will at least
+ record one program. If no priority is specified, then the
+ priority is 1 (one).
+
+ Since it is not always desirable to try to record every
+ program that matches the criteria, it is also possible to
+ generate notifications for interesting programs only without
+ recording them. This is done by specifying the
+ action
alement with the content notify
.
+ By default, the action
is record
.
+ To make the mail reports more readable it is possible to
+ also assign a category to a program for grouping interesting
+ programs. This can be done using the category
+ element. Note that if the action
is
+ notify
. then the priority
element
+ is not used.
+
+ In the binary distribution, execute the
+ run
script for your operating system
+ (run.bat
for windows, and
+ run.sh
for unix).
+
+ With the source code, build everything with
+ ant dist-lite
, then locate the binary
+ distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip
.
+ Then proceed as for the binary distribution.
+
+ The crawler, as it is now, is s standalone program which is + intended to be run from a command-line. When it runs, it + retrieves the programs for today. As a result, it is advisable + to run the program at an early point of the day as a scheduled + task (e.g. cron on unix). +
++ Modifying the program to allow it to investigate tomorrow's + programs instead is easy as well but not yet implemented. +
+
+ The best example is in the distribution itself. It is my personal
+ programs.xml
file.
+
+ You are always welcome to contribute. If you find a problem just + tell me about it and if you have ideas am I always interested to + hear about them. +
++ If you are a programmer and have a fix for a bug, just send me a + patch and if you are fanatic enough and have ideas, I can also + give you write access to the repository. +