1 <?xml version="1.0" encoding="UTF-8"?>
3 Copyright 2002-2004 The Apache Software Foundation or its licensors,
6 Licensed under the Apache License, Version 2.0 (the "License");
7 you may not use this file except in compliance with the License.
8 You may obtain a copy of the License at
10 http://www.apache.org/licenses/LICENSE-2.0
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an "AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License for the specific language governing permissions and
16 limitations under the License.
18 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
21 <title>Automatic Recording for KiSS Hard Disk Recorders</title>
25 KiSS makes regular updates to their site that sometimes require adaptations
26 to the crawler. If it stops working, check out the most recent version here.
28 <section id="changelog">
29 <title>Changelog</title>
31 <title>17 November 2006</title>
33 <li>Corrected the packed distributions. The standalone distribution
34 had an error in the scripts and was missing libraries </li>
39 <title>7 September 2006</title>
41 <li>KiSS modified the login procedure. It is now working again.</li>
42 <li>Generalized the startup scripts. They should now be insensitive to the specific libraries used. </li>
46 <title>31 August 2006</title>
48 <li>Added windows bat file for running the crawler under windows.
49 Very add-hoc, will be generalized. </li>
53 <title>24 August 2006</title>
55 <li>The crawler now uses desktop login for crawling. Also, it is much more efficient since
56 it no longer needs to crawl the individual programs. This is because the channel page
57 includes descriptions of programs in javascript popups which can be used by the crawler.
58 The result is a significant reduction of the load on the KiSS EPG site. Also, the delay
59 between requests has been increased to further reduce load on the KiSS EPG site. </li>
61 The crawler now crawls programs for tomorrow instead of for today.
64 The web based crawler is configured to run only between 7pm and 12pm. It used to run at
71 <title>13-20 August 2006</title>
73 There were several changes to the login procedure, requiring modifications to the crawler.
76 <li>The crawler now uses the 'Referer' header field correctly at login.</li>
77 <li>KiSS now uses hidden form fields in their login process which are now also handled correctly by the
82 <section id="overview">
83 <title>Overview</title>
86 In 2005, <a href="site:links/kiss">KiSS</a> introduced the ability
87 to schedule recordings on KiSS hard disk recorder (such as the
88 DP-558) through a web site on the internet. When a new recording is
89 scheduled through the web site, the KiSS recorder finds out about
90 this new recording by polling a server on the internet.
91 This is a really cool feature since it basically allows programming
92 the recorder when away from home.
95 After using this feature for some time now, I started noticing regular
96 patterns. Often you are looking for the same programs and for certain
97 types of programs. So, wouldn't it be nice to have a program
98 do this work for you and automatically record programs and notify you
99 of possibly interesting ones?
102 This is where the KiSS crawler comes in. This is a simple crawler which
103 logs on to the KiSS electronic programme guide web site and gets
104 programme information from there. Then based on that it automatically
105 records programs for you or sends notifications about interesting ones.
108 In its current version, the crawler can be used in two ways:
111 <li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li>
112 <li><strong>web application</strong>: A web application running on a java
113 application server. With this type of use, the crawler also features an automatic retry
114 mechanism in case of failures, as well as a simple web interface. </li>
119 <title>Downloading</title>
122 At this moment, no formal releases have been made and only the latest
123 version can be downloaded.
126 The easy way to start is the
127 <a href="installs/crawler/target/wamblee-crawler-0.2-SNAPSHOT-kissbin.zip">standalone program binary version</a>
128 or using the <a href="installs/crawler/kissweb/target/wamblee-crawler-kissweb.war">web
132 The latest source can be obtained from subversion with the
133 URL <code>https://wamblee.org/svn/public/utils</code>. The subversion
134 repository allows read-only access to anyone.
137 The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
138 server (only required for the web application). It requires at least a Java Virtual Machine
139 1.5 or greater to run.
144 <title>Configuring the crawler</title>
147 The crawler comes with three configuration files:
150 <li><code>crawler.xml</code>: basic crawler configuration
151 tailored to the KiSS electronic programme guide.</li>
152 <li><code>programs.xml</code>: containing a description of which
153 programs must be recorded and which programs are interesting.</li>
154 <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li>
157 For the standalone program, all configuration files are in the <code>conf</code> directory.
158 For the web application, the properties files is located in the <code>WEB-INF/classes</code>
159 directory of the web application, and <code>crawler.xml</code> and <code>programs.xml</code>
160 are located outside of the web application at a location configured in the properties file.
165 <title>Crawler configuration <code>crawler.xml</code></title>
168 First of all, copy the <code>config.xml.example</code> file
169 to <code>config.xml</code>. After that, edit the first entry of
170 that file and replace <code>user</code> and <code>passwd</code>
171 with your personal user id and password for the KiSS Electronic
177 <title>Program configuration</title>
179 Interesting TV shows are described using <code>program</code>
180 elements. Each <code>program</code> element contains
181 one or more <code>match</code> elements that describe
182 a condition that the interesting program must match.
185 Matching can be done on the following properties of a program:
188 <tr><th>Field name</th>
189 <th>Description</th></tr>
192 <td>Program name</td>
196 <td>Program description</td>
200 <td>Channel name</td>
204 <td>Keywords/classification of the program.</td>
208 The field to match is specified using the <code>field</code>
209 attribute of the <code>match</code> element. If no field name
210 is specified then the program name is matched. Matching is done
211 by converting the field value to lowercase and then doing a
212 perl-like regular expression match of the provided value. As a
213 result, the content of the match element should be specified in
214 lower case otherwise the pattern will never match.
215 If multiple <code>match</code> elements are specified for a
216 given <code>program</code> element, then all matches must
217 apply for a program to be interesting.
225 <th>Example of matching field values</th>
228 <td>the.*x.*files</td>
229 <td>"The X files", "The X-Files: the making of"</td>
233 <td>"Star Trek Voyager", "Star Trek: The next generation"</td>
238 It is possible that different programs cannot be recorded
239 since they overlap. To deal with such conflicts, it is possible
240 to specify a priority using the <code>priority</code> element.
241 Higher values of the priority value mean a higher priority.
242 If two programs have the same priority, then it is (more or less)
243 unspecified which of the two will be recorded, but it will at least
244 record one program. If no priority is specified, then the
249 Since it is not always desirable to try to record every
250 program that matches the criteria, it is also possible to
251 generate notifications for interesting programs only without
252 recording them. This is done by specifying the
253 <code>action</code> alement with the content <code>notify</code>.
254 By default, the <code>action</code> is <code>record</code>.
255 To make the mail reports more readable it is possible to
256 also assign a category to a program for grouping interesting
257 programs. This can be done using the <code>category</code>
258 element. Note that if the <code>action</code> is
259 <code>notify</code>. then the <code>priority</code> element
266 <title>Notification configuration</title>
268 Edit the configuration file <code>org.wamblee.crawler.properties</code>.
269 The properties file is self-explanatory.
278 <title>Installing and running the crawler</title>
281 <title>Standalone application</title>
283 In the binary distribution, execute the
284 <code>run</code> script for your operating system
285 (<code>run.bat</code> for windows, and
286 <code>run.sh</code> for unix).
291 <title>Web application</title>
293 After deploying the web application, navigate to the
294 application in your browser (e.g.
295 <code>http://localhost:8080/wamblee-crawler-kissweb</code>).
296 The screen should show an overview of the last time it ran (if
297 it ran before) as well as a button to run the crawler immediately.
298 Also, the result of the last run can be viewed.
299 The crawler will run automatically every morning at 5 AM local time,
300 and will retry at 1 hour intervals in case of failure to retrieve
301 programme information.
306 <title>Source distribution</title>
308 With the source code, build everything with
309 <code>ant dist-lite</code>, then locate the binary
310 distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>.
311 Then proceed as for the binary distribution.
316 <title>General usage</title>
318 When the crawler runs, it
319 retrieves the programs for tomorrow. As a result, it is advisable
320 to run the program at an early point of the day as a scheduled
321 task (e.g. cron on unix). For the web application this is
322 preconfigured at 5AM.
325 If you deploy the web application today, it will run automatically
326 on the next (!) day. This even holds if you deploy the application
327 before the normal scheduled time.
331 Modifying the program to allow it to investigate tomorrow's
332 programs instead is easy as well but not yet implemented.
339 <section id="examples">
340 <title>Examples</title>
343 The best example is in the distribution itself. It is my personal
344 <code>programs.xml</code> file.
349 <title>Contributing</title>
352 You are always welcome to contribute. If you find a problem just
353 tell me about it and if you have ideas am I always interested to
357 If you are a programmer and have a fix for a bug, just send me a
358 patch and if you are fanatic enough and have ideas, I can also
359 give you write access to the repository.