Welcome to MyProj

- Welcome to MyProj + Automatic Recording for KiSS Hard Disk Recorders

- + + + KiSS makes regular updates to their site that sometimes require adaptations + to the crawler. If it stops working, check out the most recent version here. + +

+ Changelog + +

+ 31 August 2006 +

Added windows bat file for running the crawler under windows. + Very add-hoc, will be generalized.

+ 24 August 2006 +

The crawler now uses desktop login for crawling. Also, it is much more efficient since + it no longer needs to crawl the individual programs. This is because the channel page + includes descriptions of programs in javascript popups which can be used by the crawler. + The result is a significant reduction of the load on the KiSS EPG site. Also, the delay + between requests has been increased to further reduce load on the KiSS EPG site.
+ The crawler now crawls programs for tomorrow instead of for today. +
+ The web based crawler is configured to run only between 7pm and 12pm. It used to run at + 5am. +

+ +

+ 13-20 August 2006 +

+ There were several changes to the login procedure, requiring modifications to the crawler. +

The crawler now uses the 'Referer' header field correctly at login.
KiSS now uses hidden form fields in their login process which are now also handled correctly by the + crawler.

- Congratulations -

You have successfully generated and rendered an Apache Forrest site. - This page is from the site template. It is found in - src/documentation/content/xdocs/index.xml - Please edit it and replace this text with content of your own.

+ Overview + +

+ In 2005, KiSS introduced the ability + to schedule recordings on KiSS hard disk recorder (such as the + DP-558) through a web site on the internet. When a new recording is + scheduled through the web site, the KiSS recorder finds out about + this new recording by polling a server on the internet. + This is a really cool feature since it basically allows programming + the recorder when away from home. +

+ After using this feature for some time now, I started noticing regular + patterns. Often you are looking for the same programs and for certain + types of programs. So, wouldn't it be nice to have a program + do this work for you and automatically record programs and notify you + of possibly interesting ones? +

+ This is where the KiSS crawler comes in. This is a simple crawler which + logs on to the KiSS electronic programme guide web site and gets + programme information from there. Then based on that it automatically + records programs for you or sends notifications about interesting ones. +

+ In its current version, the crawler can be used in two ways: +

standalone program: A standalone program run as a scheduled task.
web application: A web application running on a java + application server. With this type of use, the crawler also features an automatic retry + mechanism in case of failures, as well as a simple web interface.

+ +

+ Downloading + +

+ At this moment, no formal releases have been made and only the latest + version can be downloaded. +

+ The easy way to start is the + standalone program binary version + or using the web + application. +

+ The latest source can be obtained from subversion with the + URL https://wamblee.org/svn/public/utils. The subversion + repository allows read-only access to anyone. +

+ The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application + server (only required for the web application). It requires at least a Java Virtual Machine + 1.5 or greater to run. +

+ +

+ Configuring the crawler + +

+ The crawler comes with three configuration files: +

crawler.xml: basic crawler configuration + tailored to the KiSS electronic programme guide.
programs.xml: containing a description of which + programs must be recorded and which programs are interesting.
org.wamblee.crawler.properties: Containing a configuration

+ For the standalone program, all configuration files are in the conf directory. + For the web application, the properties files is located in the WEB-INF/classes + directory of the web application, and crawler.xml and programs.xml + are located outside of the web application at a location configured in the properties file. +

+ + +

+ Crawler configuration <code>crawler.xml</code> + +

+ First of all, copy the config.xml.example file + to config.xml. After that, edit the first entry of + that file and replace user and passwd + with your personal user id and password for the KiSS Electronic + Programme Guide. +

+ +

+ Program configuration +

+ Interesting TV shows are described using program + elements. Each program element contains + one or more match elements that describe + a condition that the interesting program must match. +

+ Matching can be done on the following properties of a program: +

+ + + + + + + + + + + + + + + + + + + +

Field name	Description
name	Program name
description	Program description
channel	Channel name
keywords	Keywords/classification of the program.

+ The field to match is specified using the field + attribute of the match element. If no field name + is specified then the program name is matched. Matching is done + by converting the field value to lowercase and then doing a + perl-like regular expression match of the provided value. As a + result, the content of the match element should be specified in + lower case otherwise the pattern will never match. + If multiple match elements are specified for a + given program element, then all matches must + apply for a program to be interesting. +

+ Example patterns: +

+ + + + + + + + + + + + + +

Pattern	Example of matching field values
the.x.files	"The X files", "The X-Files: the making of"
star trek	"Star Trek Voyager", "Star Trek: The next generation"

+ +

+ It is possible that different programs cannot be recorded + since they overlap. To deal with such conflicts, it is possible + to specify a priority using the priority element. + Higher values of the priority value mean a higher priority. + If two programs have the same priority, then it is (more or less) + unspecified which of the two will be recorded, but it will at least + record one program. If no priority is specified, then the + priority is 1 (one). +

+ +

+ Since it is not always desirable to try to record every + program that matches the criteria, it is also possible to + generate notifications for interesting programs only without + recording them. This is done by specifying the + action alement with the content notify. + By default, the action is record. + To make the mail reports more readable it is possible to + also assign a category to a program for grouping interesting + programs. This can be done using the category + element. Note that if the action is + notify. then the priority element + is not used. +

+ +

+ Notification configuration +

+ Edit the configuration file org.wamblee.crawler.properties. + The properties file is self-explanatory. +

+ + + + +

+ Installing and running the crawler + +

+ Standalone application +

+ In the binary distribution, execute the + run script for your operating system + (run.bat for windows, and + run.sh for unix). +

+ +

+ Web application +

+ After deploying the web application, navigate to the + application in your browser (e.g. + http://localhost:8080/wamblee-crawler-kissweb). + The screen should show an overview of the last time it ran (if + it ran before) as well as a button to run the crawler immediately. + Also, the result of the last run can be viewed. + The crawler will run automatically every morning at 5 AM local time, + and will retry at 1 hour intervals in case of failure to retrieve + programme information. +

+ +

+ Source distribution +

+ With the source code, build everything with + ant dist-lite, then locate the binary + distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip. + Then proceed as for the binary distribution. +

+ +

+ General usage +

+ When the crawler runs, it + retrieves the programs for tomorrow. As a result, it is advisable + to run the program at an early point of the day as a scheduled + task (e.g. cron on unix). For the web application this is + preconfigured at 5AM. +

+ + If you deploy the web application today, it will run automatically + on the next (!) day. This even holds if you deploy the application + before the normal scheduled time. + + +

+ Modifying the program to allow it to investigate tomorrow's + programs instead is easy as well but not yet implemented. +

+ +

- Using examples as templates + Examples + +

+ The best example is in the distribution itself. It is my personal + programs.xml file. +

+ +

+ Contributing +

- This demo site has many examples. See the menu at the left. - The sources for these examples are in the directory - src/documentation/content/xdocs/ + You are always welcome to contribute. If you find a problem just + tell me about it and if you have ideas am I always interested to + hear about them.

- The sources for the Apache Forrest website are also included - in your distribution at $FORREST_HOME/site-author/ + If you are a programmer and have a fix for a bug, just send me a + patch and if you are fanatic enough and have ideas, I can also + give you write access to the repository.

You can also extend the functionality of Forrest via - plugins, - these will often come with more samples for you to out.

+ +