X-Git-Url: http://wamblee.org/gitweb/?a=blobdiff_plain;f=crawler%2Fkiss%2Fdocs%2Fcontent%2Fxdocs%2Findex.xml;h=50397ef2a3dbc51fb0242d5626c669aa8cef9919;hb=098fb64daea94942558b8bcbde992e8ef690b759;hp=26bdd02a887dd0fbe491f7f4c4550975bffb7557;hpb=d95e4291fe90d5d37051c94bad83ff80029f8e1d;p=utils diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml index 26bdd02a..50397ef2 100644 --- a/crawler/kiss/docs/content/xdocs/index.xml +++ b/crawler/kiss/docs/content/xdocs/index.xml @@ -18,32 +18,334 @@
- Welcome to MyProj + Automatic Recording for KiSS Hard Disk Recorders
- + + + KiSS makes regular updates to their site that sometimes require adaptations + to the crawler. If it stops working, check out the most recent version here. + +
+ Changelog + +
+ 31 August 2006 +
    +
  • Added windows bat file for running the crawler under windows. + Very add-hoc, will be generalized.
  • +
+
+
+ 24 August 2006 +
    +
  • The crawler now uses desktop login for crawling. Also, it is much more efficient since + it no longer needs to crawl the individual programs. This is because the channel page + includes descriptions of programs in javascript popups which can be used by the crawler. + The result is a significant reduction of the load on the KiSS EPG site. Also, the delay + between requests has been increased to further reduce load on the KiSS EPG site.
  • +
  • + The crawler now crawls programs for tomorrow instead of for today. +
  • +
  • + The web based crawler is configured to run only between 7pm and 12pm. It used to run at + 5am. +
  • +
+
+ +
+ 13-20 August 2006 +

+ There were several changes to the login procedure, requiring modifications to the crawler. +

+
    +
  • The crawler now uses the 'Referer' header field correctly at login.
  • +
  • KiSS now uses hidden form fields in their login process which are now also handled correctly by the + crawler.
  • +
+
+
- Congratulations -

You have successfully generated and rendered an Apache Forrest site. - This page is from the site template. It is found in - src/documentation/content/xdocs/index.xml - Please edit it and replace this text with content of your own.

+ Overview + +

+ In 2005, KiSS introduced the ability + to schedule recordings on KiSS hard disk recorder (such as the + DP-558) through a web site on the internet. When a new recording is + scheduled through the web site, the KiSS recorder finds out about + this new recording by polling a server on the internet. + This is a really cool feature since it basically allows programming + the recorder when away from home. +

+

+ After using this feature for some time now, I started noticing regular + patterns. Often you are looking for the same programs and for certain + types of programs. So, wouldn't it be nice to have a program + do this work for you and automatically record programs and notify you + of possibly interesting ones? +

+

+ This is where the KiSS crawler comes in. This is a simple crawler which + logs on to the KiSS electronic programme guide web site and gets + programme information from there. Then based on that it automatically + records programs for you or sends notifications about interesting ones. +

+

+ In its current version, the crawler can be used in two ways: +

+ +
+ +
+ Downloading + +

+ At this moment, no formal releases have been made and only the latest + version can be downloaded. +

+

+ The easy way to start is the + standalone program binary version + or using the web + application. +

+

+ The latest source can be obtained from subversion with the + URL https://wamblee.org/svn/public/utils. The subversion + repository allows read-only access to anyone. +

+

+ The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application + server (only required for the web application). It requires at least a Java Virtual Machine + 1.5 or greater to run. +

+
+ +
+ Configuring the crawler + +

+ The crawler comes with three configuration files: +

+ +

+ For the standalone program, all configuration files are in the conf directory. + For the web application, the properties files is located in the WEB-INF/classes + directory of the web application, and crawler.xml and programs.xml + are located outside of the web application at a location configured in the properties file. +

+ + +
+ Crawler configuration <code>crawler.xml</code> + +

+ First of all, copy the config.xml.example file + to config.xml. After that, edit the first entry of + that file and replace user and passwd + with your personal user id and password for the KiSS Electronic + Programme Guide. +

+
+ +
+ Program configuration +

+ Interesting TV shows are described using program + elements. Each program element contains + one or more match elements that describe + a condition that the interesting program must match. +

+

+ Matching can be done on the following properties of a program: +

+ + + + + + + + + + + + + + + + + + + +
Field nameDescription
nameProgram name
descriptionProgram description
channelChannel name
keywordsKeywords/classification of the program.
+

+ The field to match is specified using the field + attribute of the match element. If no field name + is specified then the program name is matched. Matching is done + by converting the field value to lowercase and then doing a + perl-like regular expression match of the provided value. As a + result, the content of the match element should be specified in + lower case otherwise the pattern will never match. + If multiple match elements are specified for a + given program element, then all matches must + apply for a program to be interesting. +

+

+ Example patterns: +

+ + + + + + + + + + + + + +
PatternExample of matching field values
the.*x.*files"The X files", "The X-Files: the making of"
star trek"Star Trek Voyager", "Star Trek: The next generation"
+ +

+ It is possible that different programs cannot be recorded + since they overlap. To deal with such conflicts, it is possible + to specify a priority using the priority element. + Higher values of the priority value mean a higher priority. + If two programs have the same priority, then it is (more or less) + unspecified which of the two will be recorded, but it will at least + record one program. If no priority is specified, then the + priority is 1 (one). +

+ +

+ Since it is not always desirable to try to record every + program that matches the criteria, it is also possible to + generate notifications for interesting programs only without + recording them. This is done by specifying the + action alement with the content notify. + By default, the action is record. + To make the mail reports more readable it is possible to + also assign a category to a program for grouping interesting + programs. This can be done using the category + element. Note that if the action is + notify. then the priority element + is not used. +

+ +
+ +
+ Notification configuration +

+ Edit the configuration file org.wamblee.crawler.properties. + The properties file is self-explanatory. +

+
+
+ + + + +
+ Installing and running the crawler + +
+ Standalone application +

+ In the binary distribution, execute the + run script for your operating system + (run.bat for windows, and + run.sh for unix). +

+
+ +
+ Web application +

+ After deploying the web application, navigate to the + application in your browser (e.g. + http://localhost:8080/wamblee-crawler-kissweb). + The screen should show an overview of the last time it ran (if + it ran before) as well as a button to run the crawler immediately. + Also, the result of the last run can be viewed. + The crawler will run automatically every morning at 5 AM local time, + and will retry at 1 hour intervals in case of failure to retrieve + programme information. +

+
+ +
+ Source distribution +

+ With the source code, build everything with + ant dist-lite, then locate the binary + distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip. + Then proceed as for the binary distribution. +

+
+ +
+ General usage +

+ When the crawler runs, it + retrieves the programs for tomorrow. As a result, it is advisable + to run the program at an early point of the day as a scheduled + task (e.g. cron on unix). For the web application this is + preconfigured at 5AM. +

+ + If you deploy the web application today, it will run automatically + on the next (!) day. This even holds if you deploy the application + before the normal scheduled time. + + +

+ Modifying the program to allow it to investigate tomorrow's + programs instead is easy as well but not yet implemented. +

+
+ +
- Using examples as templates + Examples + +

+ The best example is in the distribution itself. It is my personal + programs.xml file. +

+
+ +
+ Contributing +

- This demo site has many examples. See the menu at the left. - The sources for these examples are in the directory - src/documentation/content/xdocs/ + You are always welcome to contribute. If you find a problem just + tell me about it and if you have ideas am I always interested to + hear about them.

- The sources for the Apache Forrest website are also included - in your distribution at $FORREST_HOME/site-author/ + If you are a programmer and have a fix for a bug, just send me a + patch and if you are fanatic enough and have ideas, I can also + give you write access to the repository.

-

You can also extend the functionality of Forrest via - plugins, - these will often come with more samples for you to out.

+ +