X-Git-Url: http://wamblee.org/gitweb/?a=blobdiff_plain;ds=inline;f=crawler%2Fkiss%2Fdocs%2Fcontent%2Fxdocs%2Findex.xml;h=e84efaa60f3642019945e0c2c3ca1c5fdcb506d1;hb=f0872e0b5600e45eee9b2ba6a9e9fa4291a2e530;hp=3dc5b2572b9e54718f429ee63765dc5dd68b6fb4;hpb=a7bac435672c2a49220443684aa7860fb8699de1;p=utils
diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml
index 3dc5b257..e84efaa6 100644
--- a/crawler/kiss/docs/content/xdocs/index.xml
+++ b/crawler/kiss/docs/content/xdocs/index.xml
@@ -20,7 +20,57 @@
+ There were several changes to the login procedure, requiring modifications to the crawler. +
+- In its current version, the crawler can be used a standalone program - only and the preferred way to run it is as a scheduled task. + In its current version, the crawler can be used in two ways:
+The easy way to start is the - binary version. + standalone program binary version + or using the web + application.
The latest source can be obtained from subversion with the
URL https://wamblee.org/svn/public/utils
. The subversion
repository allows read-only access to anyone.
+ The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application + server (only required for the web application). It requires at least a Java Virtual Machine + 1.5 or greater to run. +
- The crawler comes with two configuration files, namely
- crawler.xml
and programs.xml
.
+ The crawler comes with three configuration files:
+
crawler.xml
: basic crawler configuration
+ tailored to the KiSS electronic programme guide.programs.xml
: containing a description of which
+ programs must be recorded and which programs are interesting.org.wamblee.crawler.properties
: Containing a configuration
+ For the standalone program, all configuration files are in the conf
directory.
+ For the web application, the properties files is located in the WEB-INF/classes
+ directory of the web application, and crawler.xml
and programs.xml
+ are located outside of the web application at a location configured in the properties file.
crawler.xml
programs.xml
- The programs.xml
file contains the following
- configuration items:
-
- Notification is configured in the (surprise, surprise!)
- notification
element. This notification element
- is used to configure respectively sender mail address (= reply
- address), recipient address, subject of the email, smtp server
- host and port and optional username and password.
- In addition it contains the names of the stylesheets to
- generate the HTML and Text reports. These stylesheets
- should not be changed.
-
@@ -178,7 +227,7 @@
- It is possible that different programs cannot be recorded at
+ It is possible that different programs cannot be recorded
since they overlap. To deal with such conflicts, it is possible
to specify a priority using the priority
element.
Higher values of the priority value mean a higher priority.
@@ -204,16 +253,24 @@
+ Edit the configuration file org.wamblee.crawler.properties
.
+ The properties file is self-explanatory.
+
In the binary distribution, execute the
run
script for your operating system
@@ -222,6 +279,21 @@
+ After deploying the web application, navigate to the
+ application in your browser (e.g.
+ http://localhost:8080/wamblee-crawler-kissweb
).
+ The screen should show an overview of the last time it ran (if
+ it ran before) as well as a button to run the crawler immediately.
+ Also, the result of the last run can be viewed.
+ The crawler will run automatically every morning at 5 AM local time,
+ and will retry at 1 hour intervals in case of failure to retrieve
+ programme information.
+
@@ -235,12 +307,18 @@
- The crawler, as it is now, is s standalone program which is
- intended to be run from a command-line. When it runs, it
- retrieves the programs for today. As a result, it is advisable
+ When the crawler runs, it
+ retrieves the programs for tomorrow. As a result, it is advisable
to run the program at an early point of the day as a scheduled
- task (e.g. cron on unix).
+ task (e.g. cron on unix). For the web application this is
+ preconfigured at 5AM.
Modifying the program to allow it to investigate tomorrow's
programs instead is easy as well but not yet implemented.
@@ -255,7 +333,7 @@
The best example is in the distribution itself. It is my personal
- programs.xml
file.
+ programs.xml
file.