X-Git-Url: http://wamblee.org/gitweb/?a=blobdiff_plain;ds=sidebyside;f=crawler%2Fkiss%2Fdocs%2Fcontent%2Fxdocs%2Findex.xml;h=0838e87c2283864c727f45ef43665e86985c3cb1;hb=84f06d23e6645876665496f1dc973013db0c1368;hp=1592cea52fdee21007672e143b916763822bb291;hpb=5fd0e77da019c4636a1b5c2c9b043801f6cab175;p=utils
diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml
index 1592cea5..0838e87c 100644
--- a/crawler/kiss/docs/content/xdocs/index.xml
+++ b/crawler/kiss/docs/content/xdocs/index.xml
@@ -20,7 +20,58 @@
Automatic Recording for KiSS Hard Disk Recorders
-
+
+
+ KiSS makes regular updates to their site that sometimes require adaptations
+ to the crawler. If it stops working, check out the most recent version here.
+
+
+ Changelog
+
+
+ 7 September 2006
+
+ - KiSS modified the login procedure. It is now working again.
+ - Generalized the startup scripts. They should now be insensitive to the specific libraries used.
+
+
+
+ 31 August 2006
+
+ - Added windows bat file for running the crawler under windows.
+ Very add-hoc, will be generalized.
+
+
+
+ 24 August 2006
+
+ - The crawler now uses desktop login for crawling. Also, it is much more efficient since
+ it no longer needs to crawl the individual programs. This is because the channel page
+ includes descriptions of programs in javascript popups which can be used by the crawler.
+ The result is a significant reduction of the load on the KiSS EPG site. Also, the delay
+ between requests has been increased to further reduce load on the KiSS EPG site.
+ -
+ The crawler now crawls programs for tomorrow instead of for today.
+
+ -
+ The web based crawler is configured to run only between 7pm and 12pm. It used to run at
+ 5am.
+
+
+
+
+
+ 13-20 August 2006
+
+ There were several changes to the login procedure, requiring modifications to the crawler.
+
+
+ - The crawler now uses the 'Referer' header field correctly at login.
+ - KiSS now uses hidden form fields in their login process which are now also handled correctly by the
+ crawler.
+
+
+
Overview
@@ -66,8 +117,8 @@
The easy way to start is the
- standalone program binary version
- or using the web
+ standalone program binary version
+ or using the web
application.
@@ -258,11 +309,17 @@
General usage
When the crawler runs, it
- retrieves the programs for today. As a result, it is advisable
+ retrieves the programs for tomorrow. As a result, it is advisable
to run the program at an early point of the day as a scheduled
task (e.g. cron on unix). For the web application this is
preconfigured at 5AM.
+
+ If you deploy the web application today, it will run automatically
+ on the next (!) day. This even holds if you deploy the application
+ before the normal scheduled time.
+
+
Modifying the program to allow it to investigate tomorrow's
programs instead is easy as well but not yet implemented.