From: erik
- There were several changes to the login procedure, requiring modifications to the crawler.
- There were several changes to the login procedure, requiring modifications to the
+ crawler.
- In 2005, KiSS introduced the ability
- to schedule recordings on KiSS hard disk recorder (such as the
- DP-558) through a web site on the internet. When a new recording is
- scheduled through the web site, the KiSS recorder finds out about
- this new recording by polling a server on the internet.
- This is a really cool feature since it basically allows programming
- the recorder when away from home.
-
- After using this feature for some time now, I started noticing regular
- patterns. Often you are looking for the same programs and for certain
- types of programs. So, wouldn't it be nice to have a program
- do this work for you and automatically record programs and notify you
- of possibly interesting ones?
-
- This is where the KiSS crawler comes in. This is a simple crawler which
- logs on to the KiSS electronic programme guide web site and gets
- programme information from there. Then based on that it automatically
- records programs for you or sends notifications about interesting ones.
-
- In its current version, the crawler can be used in two ways:
- In 2005, KiSS introduced the ability to schedule recordings
+ on KiSS hard disk recorder (such as the DP-558) through a web site on the internet. When a
+ new recording is scheduled through the web site, the KiSS recorder finds out about this new
+ recording by polling a server on the internet. This is a really cool feature since it
+ basically allows programming the recorder when away from home. After using this feature for some time now, I started noticing regular patterns. Often you
+ are looking for the same programs and for certain types of programs. So, wouldn't it be nice
+ to have a program do this work for you and automatically record programs and notify you of
+ possibly interesting ones? This is where the KiSS crawler comes in. This is a simple crawler which logs on to the
+ KiSS electronic programme guide web site and gets programme information from there. Then
+ based on that it automatically records programs for you or sends notifications about
+ interesting ones. In its current version, the crawler can be used in two ways:
- At this moment, no formal releases have been made and only the latest
- version can be downloaded.
-
- The easy way to start is the
- standalone program binary version
- or using the web
- application.
-
- The latest source can be obtained from subversion with the
- URL
- The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
+
+ At this moment, no formal releases have been made and only the latest version can be
+ downloaded. The easy way to start is the standalone program
+ binary version or using the web application. The latest source can be obtained from subversion with the URL
+ The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
server (only required for the web application). It requires at least a Java Virtual Machine
- 1.5 or greater to run.
-
-
+ config.xml
again.
+
+ config.xml
file to deal with changes in the login procedure.
+
-
-
https://wamblee.org/svn/public/utils
. The subversion
- repository allows read-only access to anyone.
- https://wamblee.org/svn/public/utils
. The subversion repository allows
+ read-only access to anyone.
- The crawler comes with three configuration files: -
+ +The crawler comes with three configuration files:
crawler.xml
: basic crawler configuration
- tailored to the KiSS electronic programme guide.programs.xml
: containing a description of which
- programs must be recorded and which programs are interesting.org.wamblee.crawler.properties
: Containing a configuration crawler.xml
: basic crawler configuration tailored to the KiSS electronic
+ programme guide.programs.xml
: containing a description of which programs must be recorded
+ and which programs are interesting.org.wamblee.crawler.properties
: Containing a configuration
- For the standalone program, all configuration files are in the conf
directory.
- For the web application, the properties files is located in the WEB-INF/classes
- directory of the web application, and crawler.xml
and programs.xml
- are located outside of the web application at a location configured in the properties file.
-
For the standalone program, all configuration files are in the conf
+ directory. For the web application, the properties files is located in the
+ WEB-INF/classes
directory of the web application, and
+ crawler.xml
and programs.xml
are located outside of the web
+ application at a location configured in the properties file.
crawler.xml
- First of all, copy the config.xml.example
file
- to config.xml
. After that, edit the first entry of
- that file and replace user
and passwd
- with your personal user id and password for the KiSS Electronic
- Programme Guide.
-
First of all, copy the config.xml.example
file to config.xml
.
+ After that, edit the first entry of that file and replace user
and
+ passwd
with your personal user id and password for the KiSS Electronic
+ Programme Guide.
Interesting TV shows are described using program
elements. Each
+ program
element contains one or more match
elements that
+ describe a condition that the interesting program must match.
Matching can be done on the following properties of a program:
+Field name | +Description | +
---|---|
name | +Program name | +
description | +Program description | +
channel | +Channel name | +
keywords | +Keywords/classification of the program. | +
The field to match is specified using the field
attribute of the
+ match
element. If no field name is specified then the program name is
+ matched. Matching is done by converting the field value to lowercase and then doing a
+ perl-like regular expression match of the provided value. As a result, the content of the
+ match element should be specified in lower case otherwise the pattern will never match. If
+ multiple match
elements are specified for a given program
+ element, then all matches must apply for a program to be interesting.
Example patterns:
+Pattern | +Example of matching field values | +
---|---|
the.*x.*files | +"The X files", "The X-Files: the making of" | +
star trek | +"Star Trek Voyager", "Star Trek: The next generation" | +
It is possible that different programs cannot be recorded since they overlap. To deal
+ with such conflicts, it is possible to specify a priority using the priority
+ element. Higher values of the priority value mean a higher priority. If two programs have
+ the same priority, then it is (more or less) unspecified which of the two will be
+ recorded, but it will at least record one program. If no priority is specified, then the
+ priority is 1 (one).
Since it is not always desirable to try to record every program that matches the
+ criteria, it is also possible to generate notifications for interesting programs only
+ without recording them. This is done by specifying the action
alement with
+ the content notify
. By default, the action
is
+ record
. To make the mail reports more readable it is possible to also assign
+ a category to a program for grouping interesting programs. This can be done using the
+ category
element. Note that if the action
is
+ notify
. then the priority
element is not used.
- Interesting TV shows are described using program
- elements. Each program
element contains
- one or more match
elements that describe
- a condition that the interesting program must match.
-
- Matching can be done on the following properties of a program: -
-Field name | -Description |
---|---|
name | -Program name | -
description | -Program description | -
channel | -Channel name | -
keywords | -Keywords/classification of the program. | -
- The field to match is specified using the field
- attribute of the match
element. If no field name
- is specified then the program name is matched. Matching is done
- by converting the field value to lowercase and then doing a
- perl-like regular expression match of the provided value. As a
- result, the content of the match element should be specified in
- lower case otherwise the pattern will never match.
- If multiple match
elements are specified for a
- given program
element, then all matches must
- apply for a program to be interesting.
-
- Example patterns: -
-Pattern | -Example of matching field values | -
---|---|
the.*x.*files | -"The X files", "The X-Files: the making of" | -
star trek | -"Star Trek Voyager", "Star Trek: The next generation" | -
- It is possible that different programs cannot be recorded
- since they overlap. To deal with such conflicts, it is possible
- to specify a priority using the priority
element.
- Higher values of the priority value mean a higher priority.
- If two programs have the same priority, then it is (more or less)
- unspecified which of the two will be recorded, but it will at least
- record one program. If no priority is specified, then the
- priority is 1 (one).
-
- Since it is not always desirable to try to record every
- program that matches the criteria, it is also possible to
- generate notifications for interesting programs only without
- recording them. This is done by specifying the
- action
alement with the content notify
.
- By default, the action
is record
.
- To make the mail reports more readable it is possible to
- also assign a category to a program for grouping interesting
- programs. This can be done using the category
- element. Note that if the action
is
- notify
. then the priority
element
- is not used.
-
- Edit the configuration file org.wamblee.crawler.properties
.
- The properties file is self-explanatory.
-
Edit the configuration file org.wamblee.crawler.properties
. The properties
+ file is self-explanatory.
- In the binary distribution, execute the
- run
script for your operating system
- (run.bat
for windows, and
- run.sh
for unix).
-
In the binary distribution, execute the run
script for your operating
+ system (run.bat
for windows, and run.sh
for unix).
- After deploying the web application, navigate to the
- application in your browser (e.g.
- http://localhost:8080/wamblee-crawler-kissweb
).
- The screen should show an overview of the last time it ran (if
- it ran before) as well as a button to run the crawler immediately.
- Also, the result of the last run can be viewed.
- The crawler will run automatically every morning at 5 AM local time,
- and will retry at 1 hour intervals in case of failure to retrieve
- programme information.
-
After deploying the web application, navigate to the application in your browser (e.g.
+ http://localhost:8080/wamblee-crawler-kissweb
). The screen should show an
+ overview of the last time it ran (if it ran before) as well as a button to run the crawler
+ immediately. Also, the result of the last run can be viewed. The crawler will run
+ automatically starting at around 19:00 (this is not exactly 19:00),
+ and will retry at 1 hour intervals in case
+ of failure to retrieve programme information.
- With the source code, build everything with
- ant dist-lite
, then locate the binary
- distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip
.
- Then proceed as for the binary distribution.
-
With the source code, build everything with ant dist-lite
, then locate the
+ binary distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip
. Then
+ proceed as for the binary distribution.
- When the crawler runs, it - retrieves the programs for tomorrow. As a result, it is advisable - to run the program at an early point of the day as a scheduled - task (e.g. cron on unix). For the web application this is - preconfigured at 5AM. -
-- Modifying the program to allow it to investigate tomorrow's - programs instead is easy as well but not yet implemented. +
When the crawler runs, it retrieves the programs for tomorrow.
+
- The best example is in the distribution itself. It is my personal
- programs.xml
file.
-
The best example is in the distribution itself. It is my personal
+ programs.xml
file.
- You are always welcome to contribute. If you find a problem just - tell me about it and if you have ideas am I always interested to - hear about them. -
-- If you are a programmer and have a fix for a bug, just send me a - patch and if you are fanatic enough and have ideas, I can also - give you write access to the repository. -
+ +You are always welcome to contribute. If you find a problem just tell me about it and if + you have ideas am I always interested to hear about them.
+If you are a programmer and have a fix for a bug, just send me a patch and if you are + fanatic enough and have ideas, I can also give you write access to the repository.