1 <?xml version="1.0" encoding="UTF-8"?>
3 Copyright 2002-2004 The Apache Software Foundation or its licensors,
6 Licensed under the Apache License, Version 2.0 (the "License");
7 you may not use this file except in compliance with the License.
8 You may obtain a copy of the License at
10 http://www.apache.org/licenses/LICENSE-2.0
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an "AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License for the specific language governing permissions and
16 limitations under the License.
18 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
21 <title>Automatic Recording for KiSS Hard Disk Recorders</title>
24 <section id="overview">
25 <title>Overview</title>
28 In 2005, <a href="site:links/kiss">KiSS</a> introduced the ability
29 to schedule recordings on KiSS hard disk recorder (such as the
30 DP-558) through a web site on the internet. When a new recording is
31 scheduled through the web site, the KiSS recorder finds out about
32 this new recording by polling a server on the internet.
33 This is a really cool feature since it basically allows programming
34 the recorder when away from home.
37 After using this feature for some time now, I started noticing regular
38 patterns. Often you are looking for the same programs and for certain
39 types of programs. So, wouldn't it be nice to have a program
40 do this work for you and automatically record programs and notify you
41 of possibly interesting ones?
44 This is where the KiSS crawler comes in. This is a simple crawler which
45 logs on to the KiSS electronic programme guide web site and gets
46 programme information from there. Then based on that it automatically
47 records programs for you or sends notifications about interesting ones.
50 In its current version, the crawler can be used in two ways:
53 <li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li>
54 <li><strong>web application</strong>: A web application running on a java
55 application server. With this type of use, the crawler also features an automatic retry
56 mechanism in case of failures, as well as a simple web interface. </li>
61 <title>Downloading</title>
64 At this moment, no formal releases have been made and only the latest
65 version can be downloaded.
68 The easy way to start is the
69 <a href="installs/crawler/kiss/kiss-crawler-bin.zip">standalone program binary version</a>
70 or using the <a href="installs/crawler/kissweb/wamblee-crawler-kissweb.war">web
74 The latest source can be obtained from subversion with the
75 URL <code>https://wamblee.org/svn/public/utils</code>. The subversion
76 repository allows read-only access to anyone.
79 The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
80 server (only required for the web application). It requires at least a Java Virtual Machine
81 1.5 or greater to run.
86 <title>Configuring the crawler</title>
89 The crawler comes with three configuration files:
92 <li><code>crawler.xml</code>: basic crawler configuration
93 tailored to the KiSS electronic programme guide.</li>
94 <li><code>programs.xml</code>: containing a description of which
95 programs must be recorded and which programs are interesting.</li>
96 <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li>
99 For the standalone program, all configuration files are in the <code>conf</code> directory.
100 For the web application, the properties files is located in the <code>WEB-INF/classes</code>
101 directory of the web application, and <code>crawler.xml</code> and <code>programs.xml</code>
102 are located outside of the web application at a location configured in the properties file.
107 <title>Crawler configuration <code>crawler.xml</code></title>
110 First of all, copy the <code>config.xml.example</code> file
111 to <code>config.xml</code>. After that, edit the first entry of
112 that file and replace <code>user</code> and <code>passwd</code>
113 with your personal user id and password for the KiSS Electronic
119 <title>Program configuration</title>
121 Interesting TV shows are described using <code>program</code>
122 elements. Each <code>program</code> element contains
123 one or more <code>match</code> elements that describe
124 a condition that the interesting program must match.
127 Matching can be done on the following properties of a program:
130 <tr><th>Field name</th>
131 <th>Description</th></tr>
134 <td>Program name</td>
138 <td>Program description</td>
142 <td>Channel name</td>
146 <td>Keywords/classification of the program.</td>
150 The field to match is specified using the <code>field</code>
151 attribute of the <code>match</code> element. If no field name
152 is specified then the program name is matched. Matching is done
153 by converting the field value to lowercase and then doing a
154 perl-like regular expression match of the provided value. As a
155 result, the content of the match element should be specified in
156 lower case otherwise the pattern will never match.
157 If multiple <code>match</code> elements are specified for a
158 given <code>program</code> element, then all matches must
159 apply for a program to be interesting.
167 <th>Example of matching field values</th>
170 <td>the.*x.*files</td>
171 <td>"The X files", "The X-Files: the making of"</td>
175 <td>"Star Trek Voyager", "Star Trek: The next generation"</td>
180 It is possible that different programs cannot be recorded
181 since they overlap. To deal with such conflicts, it is possible
182 to specify a priority using the <code>priority</code> element.
183 Higher values of the priority value mean a higher priority.
184 If two programs have the same priority, then it is (more or less)
185 unspecified which of the two will be recorded, but it will at least
186 record one program. If no priority is specified, then the
191 Since it is not always desirable to try to record every
192 program that matches the criteria, it is also possible to
193 generate notifications for interesting programs only without
194 recording them. This is done by specifying the
195 <code>action</code> alement with the content <code>notify</code>.
196 By default, the <code>action</code> is <code>record</code>.
197 To make the mail reports more readable it is possible to
198 also assign a category to a program for grouping interesting
199 programs. This can be done using the <code>category</code>
200 element. Note that if the <code>action</code> is
201 <code>notify</code>. then the <code>priority</code> element
208 <title>Notification configuration</title>
210 Edit the configuration file <code>org.wamblee.crawler.properties</code>.
211 The properties file is self-explanatory.
220 <title>Installing and running the crawler</title>
223 <title>Standalone application</title>
225 In the binary distribution, execute the
226 <code>run</code> script for your operating system
227 (<code>run.bat</code> for windows, and
228 <code>run.sh</code> for unix).
233 <title>Web application</title>
235 After deploying the web application, navigate to the
236 application in your browser (e.g.
237 <code>http://localhost:8080/wamblee-crawler-kissweb</code>).
238 The screen should show an overview of the last time it ran (if
239 it ran before) as well as a button to run the crawler immediately.
240 Also, the result of the last run can be viewed.
241 The crawler will run automatically every morning at 5 AM local time,
242 and will retry at 1 hour intervals in case of failure to retrieve
243 programme information.
248 <title>Source distribution</title>
250 With the source code, build everything with
251 <code>ant dist-lite</code>, then locate the binary
252 distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>.
253 Then proceed as for the binary distribution.
258 <title>General usage</title>
260 When the crawler runs, it
261 retrieves the programs for today. As a result, it is advisable
262 to run the program at an early point of the day as a scheduled
263 task (e.g. cron on unix). For the web application this is
264 preconfigured at 5AM.
267 Modifying the program to allow it to investigate tomorrow's
268 programs instead is easy as well but not yet implemented.
275 <section id="examples">
276 <title>Examples</title>
279 The best example is in the distribution itself. It is my personal
280 <code>programs.xml</code> file.
285 <title>Contributing</title>
288 You are always welcome to contribute. If you find a problem just
289 tell me about it and if you have ideas am I always interested to
293 If you are a programmer and have a fix for a bug, just send me a
294 patch and if you are fanatic enough and have ideas, I can also
295 give you write access to the repository.