X-Git-Url: http://wamblee.org/gitweb/?a=blobdiff_plain;ds=sidebyside;f=crawler%2Fkiss%2Fdocs%2Fcontent%2Fxdocs%2Findex.xml;h=1592cea52fdee21007672e143b916763822bb291;hb=0c8e2f754881b2131725b5f2da229fe8606ba913;hp=549f58fc44fd2905db4bc18573c6d6f8c55ef2b4;hpb=7aa0c5f8bbbca8ba7d7d7b006a3b298dab6bb754;p=utils
diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml
index 549f58fc..1592cea5 100644
--- a/crawler/kiss/docs/content/xdocs/index.xml
+++ b/crawler/kiss/docs/content/xdocs/index.xml
@@ -18,7 +18,7 @@
This is where the KiSS crawler comes in. This is a simple crawler which @@ -46,20 +46,256 @@ programme information from there. Then based on that it automatically records programs for you or sends notifications about interesting ones.
++ In its current version, the crawler can be used in two ways: +
++ At this moment, no formal releases have been made and only the latest + version can be downloaded. +
++ The easy way to start is the + standalone program binary version + or using the web + application. +
+
+ The latest source can be obtained from subversion with the
+ URL https://wamblee.org/svn/public/utils
. The subversion
+ repository allows read-only access to anyone.
+
+ The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application + server (only required for the web application). It requires at least a Java Virtual Machine + 1.5 or greater to run. +
+ The crawler comes with three configuration files: +
+crawler.xml
: basic crawler configuration
+ tailored to the KiSS electronic programme guide.programs.xml
: containing a description of which
+ programs must be recorded and which programs are interesting.org.wamblee.crawler.properties
: Containing a configuration
+ For the standalone program, all configuration files are in the conf
directory.
+ For the web application, the properties files is located in the WEB-INF/classes
+ directory of the web application, and crawler.xml
and programs.xml
+ are located outside of the web application at a location configured in the properties file.
+
crawler.xml
+ First of all, copy the config.xml.example
file
+ to config.xml
. After that, edit the first entry of
+ that file and replace user
and passwd
+ with your personal user id and password for the KiSS Electronic
+ Programme Guide.
+
+ Interesting TV shows are described using program
+ elements. Each program
element contains
+ one or more match
elements that describe
+ a condition that the interesting program must match.
+
+ Matching can be done on the following properties of a program: +
+Field name | +Description |
---|---|
name | +Program name | +
description | +Program description | +
channel | +Channel name | +
keywords | +Keywords/classification of the program. | +
+ The field to match is specified using the field
+ attribute of the match
element. If no field name
+ is specified then the program name is matched. Matching is done
+ by converting the field value to lowercase and then doing a
+ perl-like regular expression match of the provided value. As a
+ result, the content of the match element should be specified in
+ lower case otherwise the pattern will never match.
+ If multiple match
elements are specified for a
+ given program
element, then all matches must
+ apply for a program to be interesting.
+
+ Example patterns: +
+Pattern | +Example of matching field values | +
---|---|
the.*x.*files | +"The X files", "The X-Files: the making of" | +
star trek | +"Star Trek Voyager", "Star Trek: The next generation" | +
+ It is possible that different programs cannot be recorded
+ since they overlap. To deal with such conflicts, it is possible
+ to specify a priority using the priority
element.
+ Higher values of the priority value mean a higher priority.
+ If two programs have the same priority, then it is (more or less)
+ unspecified which of the two will be recorded, but it will at least
+ record one program. If no priority is specified, then the
+ priority is 1 (one).
+
+ Since it is not always desirable to try to record every
+ program that matches the criteria, it is also possible to
+ generate notifications for interesting programs only without
+ recording them. This is done by specifying the
+ action
alement with the content notify
.
+ By default, the action
is record
.
+ To make the mail reports more readable it is possible to
+ also assign a category to a program for grouping interesting
+ programs. This can be done using the category
+ element. Note that if the action
is
+ notify
. then the priority
element
+ is not used.
+
+ Edit the configuration file org.wamblee.crawler.properties
.
+ The properties file is self-explanatory.
+
+ In the binary distribution, execute the
+ run
script for your operating system
+ (run.bat
for windows, and
+ run.sh
for unix).
+
+ After deploying the web application, navigate to the
+ application in your browser (e.g.
+ http://localhost:8080/wamblee-crawler-kissweb
).
+ The screen should show an overview of the last time it ran (if
+ it ran before) as well as a button to run the crawler immediately.
+ Also, the result of the last run can be viewed.
+ The crawler will run automatically every morning at 5 AM local time,
+ and will retry at 1 hour intervals in case of failure to retrieve
+ programme information.
+
+ With the source code, build everything with
+ ant dist-lite
, then locate the binary
+ distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip
.
+ Then proceed as for the binary distribution.
+
+ When the crawler runs, it + retrieves the programs for today. As a result, it is advisable + to run the program at an early point of the day as a scheduled + task (e.g. cron on unix). For the web application this is + preconfigured at 5AM. +
++ Modifying the program to allow it to investigate tomorrow's + programs instead is easy as well but not yet implemented. +
+
+ The best example is in the distribution itself. It is my personal
+ programs.xml
file.
+
+ You are always welcome to contribute. If you find a problem just + tell me about it and if you have ideas am I always interested to + hear about them. +
++ If you are a programmer and have a fix for a bug, just send me a + patch and if you are fanatic enough and have ideas, I can also + give you write access to the repository. +