From a7bac435672c2a49220443684aa7860fb8699de1 Mon Sep 17 00:00:00 2001 From: erik Date: Tue, 18 Apr 2006 21:47:33 +0000 Subject: [PATCH] documentation is now in its first acceptable form. Ready for publication. --- crawler/kiss/build.xml | 2 +- crawler/kiss/conf/kiss/config.xml.example | 22 +-- crawler/kiss/conf/kiss/programs.xml | 2 + crawler/kiss/docs/content/xdocs/index.xml | 209 +++++++++++++++++++++- crawler/kiss/docs/sitemap.xmap | 5 + 5 files changed, 226 insertions(+), 14 deletions(-) diff --git a/crawler/kiss/build.xml b/crawler/kiss/build.xml index 234ed570..59ba0ab0 100644 --- a/crawler/kiss/build.xml +++ b/crawler/kiss/build.xml @@ -53,7 +53,7 @@ - diff --git a/crawler/kiss/conf/kiss/config.xml.example b/crawler/kiss/conf/kiss/config.xml.example index e3d2d923..4aeb5af8 100644 --- a/crawler/kiss/conf/kiss/config.xml.example +++ b/crawler/kiss/conf/kiss/config.xml.example @@ -1,9 +1,19 @@ + + http://epg.kml.kiss-technology.com/login_core.php + post + login.xsl + + + + + + channels-favorites - channels-favorites.xsl + channels-fav orites.xsl @@ -27,16 +37,6 @@ - - http://epg.kml.kiss-technology.com/login_core.php - post - login.xsl - - - - - - .* get diff --git a/crawler/kiss/conf/kiss/programs.xml b/crawler/kiss/conf/kiss/programs.xml index 8c8850d7..267a6751 100644 --- a/crawler/kiss/conf/kiss/programs.xml +++ b/crawler/kiss/conf/kiss/programs.xml @@ -7,6 +7,8 @@ falcon 25 + + reportToHtml.xsl diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml index 4afa8fd5..3dc5b257 100644 --- a/crawler/kiss/docs/content/xdocs/index.xml +++ b/crawler/kiss/docs/content/xdocs/index.xml @@ -18,7 +18,7 @@
- Automatic recording for KiSS hard disk recorders + Automatic Recording for KiSS Hard Disk Recorders
@@ -38,7 +38,7 @@ patterns. Often you are looking for the same programs and for certain types of programs. So, wouldn't it be nice to have a program do this work for you and automatically record programs and notify you - of possibly interesting ones. + of possibly interesting ones?

This is where the KiSS crawler comes in. This is a simple crawler which @@ -46,27 +46,232 @@ programme information from there. Then based on that it automatically records programs for you or sends notifications about interesting ones.

+

+ In its current version, the crawler can be used a standalone program + only and the preferred way to run it is as a scheduled task. +

Downloading + +

+ At this moment, no formal releases have been made and only the latest + version can be downloaded. +

+

+ The easy way to start is the + binary version. +

+

+ The latest source can be obtained from subversion with the + URL https://wamblee.org/svn/public/utils. The subversion + repository allows read-only access to anyone. +

Configuring the crawler + +

+ The crawler comes with two configuration files, namely + crawler.xml and programs.xml. +

+ +
+ Crawler configuration <code>crawler.xml</code> + +

+ First of all, copy the config.xml.example file + to config.xml. After that, edit the first entry of + that file and replace user and passwd + with your personal user id and password for the KiSS Electronic + Programme Guide. +

+
+ +
+ Program configuration: <code>programs.xml</code> + +

+ The programs.xml file contains the following + configuration items: +

+
    +
  • Notification configuration: Describing how to + do notification of the results of crawling the site.
  • +
  • Zero or more configurations of interesting programs.
  • +
+
+ Notification configuration +

+ Notification is configured in the (surprise, surprise!) + notification element. This notification element + is used to configure respectively sender mail address (= reply + address), recipient address, subject of the email, smtp server + host and port and optional username and password. + In addition it contains the names of the stylesheets to + generate the HTML and Text reports. These stylesheets + should not be changed. +

+
+ +
+ Program configuration +

+ Interesting TV shows are described using program + elements. Each program element contains + one or more match elements that describe + a condition that the interesting program must match. +

+

+ Matching can be done on the following properties of a program: +

+ + + + + + + + + + + + + + + + + + + +
Field nameDescription
nameProgram name
descriptionProgram description
channelChannel name
keywordsKeywords/classification of the program.
+

+ The field to match is specified using the field + attribute of the match element. If no field name + is specified then the program name is matched. Matching is done + by converting the field value to lowercase and then doing a + perl-like regular expression match of the provided value. As a + result, the content of the match element should be specified in + lower case otherwise the pattern will never match. + If multiple match elements are specified for a + given program element, then all matches must + apply for a program to be interesting. +

+

+ Example patterns: +

+ + + + + + + + + + + + + +
PatternExample of matching field values
the.*x.*files"The X files", "The X-Files: the making of"
star trek"Star Trek Voyager", "Star Trek: The next generation"
+ +

+ It is possible that different programs cannot be recorded at + since they overlap. To deal with such conflicts, it is possible + to specify a priority using the priority element. + Higher values of the priority value mean a higher priority. + If two programs have the same priority, then it is (more or less) + unspecified which of the two will be recorded, but it will at least + record one program. If no priority is specified, then the + priority is 1 (one). +

+ +

+ Since it is not always desirable to try to record every + program that matches the criteria, it is also possible to + generate notifications for interesting programs only without + recording them. This is done by specifying the + action alement with the content notify. + By default, the action is record. + To make the mail reports more readable it is possible to + also assign a category to a program for grouping interesting + programs. This can be done using the category + element. Note that if the action is + notify. then the priority element + is not used. +

+ +
+ + +
Installing and running the crawler + +
+ Binary distribution +

+ In the binary distribution, execute the + run script for your operating system + (run.bat for windows, and + run.sh for unix). +

+
+ +
+ Source distribution +

+ With the source code, build everything with + ant dist-lite, then locate the binary + distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip. + Then proceed as for the binary distribution. +

+
+ +
+ General usage +

+ The crawler, as it is now, is s standalone program which is + intended to be run from a command-line. When it runs, it + retrieves the programs for today. As a result, it is advisable + to run the program at an early point of the day as a scheduled + task (e.g. cron on unix). +

+

+ Modifying the program to allow it to investigate tomorrow's + programs instead is easy as well but not yet implemented. +

+
+ +
Examples +

+ The best example is in the distribution itself. It is my personal + programs.xml file. +

Contributing + +

+ You are always welcome to contribute. If you find a problem just + tell me about it and if you have ideas am I always interested to + hear about them. +

+

+ If you are a programmer and have a fix for a bug, just send me a + patch and if you are fanatic enough and have ideas, I can also + give you write access to the repository. +

diff --git a/crawler/kiss/docs/sitemap.xmap b/crawler/kiss/docs/sitemap.xmap index f32221a4..262f781b 100644 --- a/crawler/kiss/docs/sitemap.xmap +++ b/crawler/kiss/docs/sitemap.xmap @@ -61,6 +61,11 @@ + + + + + -- 2.31.1