X-Git-Url: http://wamblee.org/gitweb/?a=blobdiff_plain;ds=sidebyside;f=crawler%2Fkiss%2Fdocs%2Fcontent%2Fxdocs%2Findex.xml;h=3588899625acb87bc7a7387bf98256c9629f0fd0;hb=e2d260482ddc51d1e010ae916c63d6998ab0e8f0;hp=3dc5b2572b9e54718f429ee63765dc5dd68b6fb4;hpb=ddf88dbcebb2143aa6431d6e98a7a234c7103b0a;p=utils
diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml
index 3dc5b257..35888996 100644
--- a/crawler/kiss/docs/content/xdocs/index.xml
+++ b/crawler/kiss/docs/content/xdocs/index.xml
@@ -16,264 +16,297 @@
limitations under the License.
-->
- There were several changes to the login procedure, requiring modifications to the
+ crawler.
- In 2005, KiSS introduced the ability
- to schedule recordings on KiSS hard disk recorder (such as the
- DP-558) through a web site on the internet. When a new recording is
- scheduled through the web site, the KiSS recorder finds out about
- this new recording by polling a server on the internet.
- This is a really cool feature since it basically allows programming
- the recorder when away from home.
-
- After using this feature for some time now, I started noticing regular
- patterns. Often you are looking for the same programs and for certain
- types of programs. So, wouldn't it be nice to have a program
- do this work for you and automatically record programs and notify you
- of possibly interesting ones?
-
- This is where the KiSS crawler comes in. This is a simple crawler which
- logs on to the KiSS electronic programme guide web site and gets
- programme information from there. Then based on that it automatically
- records programs for you or sends notifications about interesting ones.
-
- In its current version, the crawler can be used a standalone program
- only and the preferred way to run it is as a scheduled task.
- In 2005, KiSS introduced the ability to schedule recordings
+ on KiSS hard disk recorder (such as the DP-558) through a web site on the internet. When a
+ new recording is scheduled through the web site, the KiSS recorder finds out about this new
+ recording by polling a server on the internet. This is a really cool feature since it
+ basically allows programming the recorder when away from home. After using this feature for some time, I started noticing regular patterns. Often you
+ are looking for the same programs and for certain types of programs. So, wouldn't it be nice
+ to have a program do this work for you and automatically record programs and notify you of
+ possibly interesting ones? This is where the KiSS crawler comes in. This is a simple crawler which logs on to the
+ KiSS electronic programme guide web site and gets programme information from there. Then
+ based on that it automatically records programs for you or sends notifications about
+ interesting ones. In its current version, the crawler can be used in two ways:
- At this moment, no formal releases have been made and only the latest
- version can be downloaded.
-
- The easy way to start is the
- binary version.
-
- The latest source can be obtained from subversion with the
- URL At this moment, no formal releases have been made and only the latest version can be
+ downloaded. The easy way to start is the standalone program
+ binary version or using the web application. The latest source can be obtained from subversion with the URL
+ The application was developed and tested on SuSE linux 10.1 with
+ JBoss 4.0.4 application
+ server (only required for the web application). It requires at least a Java Virtual Machine
+ 1.5 or greater to run.
- The crawler comes with two configuration files, namely
- The crawler comes with three configuration files: For the standalone program, all configuration files are in the
- First of all, copy the First of all, copy the
- The
- Notification is configured in the (surprise, surprise!)
-
- Interesting TV shows are described using
- Matching can be done on the following properties of a program:
-
- The field to match is specified using the
- Example patterns:
-
- It is possible that different programs cannot be recorded at
- since they overlap. To deal with such conflicts, it is possible
- to specify a priority using the
- Since it is not always desirable to try to record every
- program that matches the criteria, it is also possible to
- generate notifications for interesting programs only without
- recording them. This is done by specifying the
- Interesting TV shows are described using Matching can be done on the following properties of a program: The field to match is specified using the Example patterns: It is possible that different programs cannot be recorded since they overlap. To deal
+ with such conflicts, it is possible to specify a priority using the Since it is not always desirable to try to record every program that matches the
+ criteria, it is also possible to generate notifications for interesting programs only
+ without recording them. This is done by specifying the Edit the configuration file
- In the binary distribution, execute the
- In the binary distribution, execute the After deploying the web application, navigate to the application in your browser (e.g.
+
+ Since the crawler checks the status at
+ 1 hour intervals it can run for the first time anytime between 19:00 and 20:00. This is done
+ on purpose since it means that crawlers run by different people will not all start running
+ simultaneously and is thus more friendly to the KiSS servers. With the source code, build everything with maven2 as follows:
- With the source code, build everything with
-
+
+ config.xml
again.
+
+ config.xml
file to deal with changes in the login procedure.
+
+
+
+
+
+
+
+
+
+
+
https://wamblee.org/svn/public/utils
. The subversion
- repository allows read-only access to anyone.
- https://wamblee.org/svn/public/utils
. The subversion repository allows
+ read-only access to anyone. crawler.xml
and programs.xml
.
-
+
+ crawler.xml
: basic crawler configuration tailored to the KiSS electronic
+ programme guide.programs.xml
: containing a description of which programs must be recorded
+ and which programs are interesting.org.wamblee.crawler.properties
: Containing a configuration conf
+ directory. For the web application, the properties files is located in the
+ WEB-INF/classes
directory of the web application, and
+ crawler.xml
and programs.xml
are located outside of the web
+ application at a location configured in the properties file. crawler.xml
config.xml.example
file
- to config.xml
. After that, edit the first entry of
- that file and replace user
and passwd
- with your personal user id and password for the KiSS Electronic
- Programme Guide.
- config.xml.example
file to config.xml
.
+ After that, edit the first entry of that file and replace user
and
+ passwd
with your personal user id and password for the KiSS Electronic
+ Programme Guide. programs.xml
programs.xml
file contains the following
- configuration items:
-
-
- notification
element. This notification element
- is used to configure respectively sender mail address (= reply
- address), recipient address, subject of the email, smtp server
- host and port and optional username and password.
- In addition it contains the names of the stylesheets to
- generate the HTML and Text reports. These stylesheets
- should not be changed.
- program
- elements. Each program
element contains
- one or more match
elements that describe
- a condition that the interesting program must match.
-
-
-
- Field name
- Description
-
- name
- Program name
-
-
- description
- Program description
-
-
- channel
- Channel name
-
-
- keywords
- Keywords/classification of the program.
- field
- attribute of the match
element. If no field name
- is specified then the program name is matched. Matching is done
- by converting the field value to lowercase and then doing a
- perl-like regular expression match of the provided value. As a
- result, the content of the match element should be specified in
- lower case otherwise the pattern will never match.
- If multiple match
elements are specified for a
- given program
element, then all matches must
- apply for a program to be interesting.
-
-
-
-
-
- Pattern
- Example of matching field values
-
-
- the.*x.*files
- "The X files", "The X-Files: the making of"
-
-
- star trek
- "Star Trek Voyager", "Star Trek: The next generation"
- priority
element.
- Higher values of the priority value mean a higher priority.
- If two programs have the same priority, then it is (more or less)
- unspecified which of the two will be recorded, but it will at least
- record one program. If no priority is specified, then the
- priority is 1 (one).
- action
alement with the content notify
.
- By default, the action
is record
.
- To make the mail reports more readable it is possible to
- also assign a category to a program for grouping interesting
- programs. This can be done using the category
- element. Note that if the action
is
- notify
. then the priority
element
- is not used.
- program
elements. Each
+ program
element contains one or more match
elements that
+ describe a condition that the interesting program must match.
+
+
+
+ Field name
+ Description
+
+
+ name
+ Program name
+
+
+ description
+ Program description
+
+
+ channel
+ Channel name
+
+
+ keywords
+ Keywords/classification of the program.
+ field
attribute of the
+ match
element. If no field name is specified then the program name is
+ matched. Matching is done by converting the field value to lowercase and then doing a
+ perl-like regular expression match of the provided value. As a result, the content of the
+ match element should be specified in lower case otherwise the pattern will never match. If
+ multiple match
elements are specified for a given program
+ element, then all matches must apply for a program to be interesting.
+
+
+
+
+ Pattern
+ Examples of matching field values
+
+
+ the.*x.*files
+ "The X files", "The X-Files: the making of"
+
+
+ star trek
+ "Star Trek Voyager", "Star Trek: The next generation"
+ priority
+ element. Higher values of the priority value mean a higher priority. If two programs have
+ the same priority, then it is (more or less) unspecified which of the two will be
+ recorded, but it will at least record one program. If no priority is specified, then the
+ priority is 1 (one). action
alement with
+ the content notify
. By default, the action
is
+ record
. To make the mail reports more readable it is possible to also assign
+ a category to a program for grouping interesting programs. This can be done using the
+ category
element. Note that if the action
is
+ notify
. then the priority
element is not used. org.wamblee.crawler.properties
. The properties
+ file is self-explanatory. run
script for your operating system
- (run.bat
for windows, and
- run.sh
for unix).
- run
script for your operating
+ system (run.bat
for windows, and run.sh
for unix). http://localhost:8080/wamblee-crawler-kissweb
). The screen should show an
+ overview of the last time it ran (if it ran before) as well as a button to run the crawler
+ immediately. Also, the result of the last run can be viewed. The crawler will run
+ automatically starting after 19:00,
+ and will retry at 1 hour intervals in case
+ of failure to retrieve programme information.
+ ant dist-lite
, then locate the binary
- distribution in lib/wamblee/crawler/kiss/kiss-crawler-bin.zip
.
- Then proceed as for the binary distribution.
- target
subdirectory of the crawler
+ directory. Then
+ proceed as for the binary distribution.
- The crawler, as it is now, is s standalone program which is - intended to be run from a command-line. When it runs, it - retrieves the programs for today. As a result, it is advisable - to run the program at an early point of the day as a scheduled - task (e.g. cron on unix). -
-- Modifying the program to allow it to investigate tomorrow's - programs instead is easy as well but not yet implemented. +
When the crawler runs, it retrieves the programs for tomorrow.
+
- The best example is in the distribution itself. It is my personal
- programs.xml
file.
-
The best example is in the distribution itself. It is my personal
+ programs.xml
file.
- You are always welcome to contribute. If you find a problem just - tell me about it and if you have ideas am I always interested to - hear about them. -
-- If you are a programmer and have a fix for a bug, just send me a - patch and if you are fanatic enough and have ideas, I can also - give you write access to the repository. -
+ +You are always welcome to contribute. If you find a problem just tell me about it and if + you have ideas am I always interested to hear about them.
+If you are a programmer and have a fix for a bug, just send me a patch and if you are + fanatic enough and have ideas, I can also give you write access to the repository.