From: erik <erik@77661180-640e-0410-b3a8-9f9b13e6d0e0> Date: Tue, 21 Nov 2006 21:42:00 +0000 (+0000) Subject: (no commit message) X-Git-Tag: MYTHTV_EAR_NO_MSG_LINKING~50 X-Git-Url: http://wamblee.org/gitweb/?a=commitdiff_plain;h=527b43192fb0be2b11b3021bbdd1c02ab72e3ee3;p=utils --- diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml index d9089470..b33a2f05 100644 --- a/crawler/kiss/docs/content/xdocs/index.xml +++ b/crawler/kiss/docs/content/xdocs/index.xml @@ -16,350 +16,279 @@ limitations under the License. --> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd"> -<document> - <header> - <title>Automatic Recording for KiSS Hard Disk Recorders</title> - </header> +<document> + <header> + <title>Automatic Recording for KiSS Hard Disk Recorders</title> + </header> <body> - <warning> - KiSS makes regular updates to their site that sometimes require adaptations - to the crawler. If it stops working, check out the most recent version here. - </warning> + <warning> KiSS makes regular updates to their site that sometimes require adaptations to the + crawler. If it stops working, check out the most recent version here. </warning> <section id="changelog"> <title>Changelog</title> <section> - <title>17 November 2006</title> + <title>21 November 2006</title> <ul> - <li>Corrected the packed distributions. The standalone distribution - had an error in the scripts and was missing libraries </li> + <li>Corrected the <code>config.xml</code> again.</li> + <li>Corrected errors in the documentation for the web application. It starts running at 19:00 + and not at 5:00.</li> + </ul> + </section> + <section> + <title>19 November 2006</title> + <ul> + <li>Corrected the <code>config.xml</code> file to deal with changes in the login procedure.</li> + </ul> + </section> + <section> + <title>17 November 2006</title> + <ul> + <li>Corrected the packed distributions. The standalone distribution had an error in the + scripts and was missing libraries </li> </ul> - </section> - <section> + </section> + <section> <title>7 September 2006</title> <ul> <li>KiSS modified the login procedure. It is now working again.</li> - <li>Generalized the startup scripts. They should now be insensitive to the specific libraries used. </li> + <li>Generalized the startup scripts. They should now be insensitive to the specific + libraries used. </li> </ul> </section> <section> <title>31 August 2006</title> <ul> - <li>Added windows bat file for running the crawler under windows. - Very add-hoc, will be generalized. </li> + <li>Added windows bat file for running the crawler under windows. Very add-hoc, will be + generalized. </li> </ul> </section> <section> <title>24 August 2006</title> <ul> <li>The crawler now uses desktop login for crawling. Also, it is much more efficient since - it no longer needs to crawl the individual programs. This is because the channel page + it no longer needs to crawl the individual programs. This is because the channel page includes descriptions of programs in javascript popups which can be used by the crawler. - The result is a significant reduction of the load on the KiSS EPG site. Also, the delay + The result is a significant reduction of the load on the KiSS EPG site. Also, the delay between requests has been increased to further reduce load on the KiSS EPG site. </li> - <li> - The crawler now crawls programs for tomorrow instead of for today. - </li> - <li> - The web based crawler is configured to run only between 7pm and 12pm. It used to run at - 5am. - </li> + <li> The crawler now crawls programs for tomorrow instead of for today. </li> + <li> The web based crawler is configured to run only between 7pm and 12pm. It used to run + at 5am. </li> </ul> </section> - + <section> <title>13-20 August 2006</title> - <p> - There were several changes to the login procedure, requiring modifications to the crawler. - </p> + <p> There were several changes to the login procedure, requiring modifications to the + crawler. </p> <ul> <li>The crawler now uses the 'Referer' header field correctly at login.</li> - <li>KiSS now uses hidden form fields in their login process which are now also handled correctly by the - crawler.</li> + <li>KiSS now uses hidden form fields in their login process which are now also handled + correctly by the crawler.</li> </ul> </section> </section> <section id="overview"> <title>Overview</title> - - <p> - In 2005, <a href="site:links/kiss">KiSS</a> introduced the ability - to schedule recordings on KiSS hard disk recorder (such as the - DP-558) through a web site on the internet. When a new recording is - scheduled through the web site, the KiSS recorder finds out about - this new recording by polling a server on the internet. - This is a really cool feature since it basically allows programming - the recorder when away from home. - </p> - <p> - After using this feature for some time now, I started noticing regular - patterns. Often you are looking for the same programs and for certain - types of programs. So, wouldn't it be nice to have a program - do this work for you and automatically record programs and notify you - of possibly interesting ones? - </p> - <p> - This is where the KiSS crawler comes in. This is a simple crawler which - logs on to the KiSS electronic programme guide web site and gets - programme information from there. Then based on that it automatically - records programs for you or sends notifications about interesting ones. - </p> - <p> - In its current version, the crawler can be used in two ways: - </p> + + <p> In 2005, <a href="site:links/kiss">KiSS</a> introduced the ability to schedule recordings + on KiSS hard disk recorder (such as the DP-558) through a web site on the internet. When a + new recording is scheduled through the web site, the KiSS recorder finds out about this new + recording by polling a server on the internet. This is a really cool feature since it + basically allows programming the recorder when away from home. </p> + <p> After using this feature for some time now, I started noticing regular patterns. Often you + are looking for the same programs and for certain types of programs. So, wouldn't it be nice + to have a program do this work for you and automatically record programs and notify you of + possibly interesting ones? </p> + <p> This is where the KiSS crawler comes in. This is a simple crawler which logs on to the + KiSS electronic programme guide web site and gets programme information from there. Then + based on that it automatically records programs for you or sends notifications about + interesting ones. </p> + <p> In its current version, the crawler can be used in two ways: </p> <ul> <li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li> - <li><strong>web application</strong>: A web application running on a java - application server. With this type of use, the crawler also features an automatic retry - mechanism in case of failures, as well as a simple web interface. </li> + <li><strong>web application</strong>: A web application running on a java application + server. With this type of use, the crawler also features an automatic retry mechanism in + case of failures, as well as a simple web interface. </li> </ul> </section> - + <section> <title>Downloading</title> - - <p> - At this moment, no formal releases have been made and only the latest - version can be downloaded. - </p> - <p> - The easy way to start is the - <a href="installs/crawler/target/wamblee-crawler-0.2-SNAPSHOT-kissbin.zip">standalone program binary version</a> - or using the <a href="installs/crawler/kissweb/target/wamblee-crawler-kissweb.war">web - application</a>. - </p> - <p> - The latest source can be obtained from subversion with the - URL <code>https://wamblee.org/svn/public/utils</code>. The subversion - repository allows read-only access to anyone. - </p> - <p> - The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application + + <p> At this moment, no formal releases have been made and only the latest version can be + downloaded. </p> + <p> The easy way to start is the <a + href="installs/crawler/target/wamblee-crawler-0.2-SNAPSHOT-kissbin.zip">standalone program + binary version</a> or using the <a + href="installs/crawler/kissweb/target/wamblee-crawler-kissweb.war">web application</a>. </p> + <p> The latest source can be obtained from subversion with the URL + <code>https://wamblee.org/svn/public/utils</code>. The subversion repository allows + read-only access to anyone. </p> + <p> The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application server (only required for the web application). It requires at least a Java Virtual Machine - 1.5 or greater to run. - </p> + 1.5 or greater to run. </p> </section> - + <section> <title>Configuring the crawler</title> - - <p> - The crawler comes with three configuration files: - </p> + + <p> The crawler comes with three configuration files: </p> <ul> - <li><code>crawler.xml</code>: basic crawler configuration - tailored to the KiSS electronic programme guide.</li> - <li><code>programs.xml</code>: containing a description of which - programs must be recorded and which programs are interesting.</li> - <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li> + <li><code>crawler.xml</code>: basic crawler configuration tailored to the KiSS electronic + programme guide.</li> + <li><code>programs.xml</code>: containing a description of which programs must be recorded + and which programs are interesting.</li> + <li><code>org.wamblee.crawler.properties</code>: Containing a configuration </li> </ul> - <p> - For the standalone program, all configuration files are in the <code>conf</code> directory. - For the web application, the properties files is located in the <code>WEB-INF/classes</code> - directory of the web application, and <code>crawler.xml</code> and <code>programs.xml</code> - are located outside of the web application at a location configured in the properties file. - </p> - - + <p> For the standalone program, all configuration files are in the <code>conf</code> + directory. For the web application, the properties files is located in the + <code>WEB-INF/classes</code> directory of the web application, and + <code>crawler.xml</code> and <code>programs.xml</code> are located outside of the web + application at a location configured in the properties file. </p> + + <section> <title>Crawler configuration <code>crawler.xml</code></title> - - <p> - First of all, copy the <code>config.xml.example</code> file - to <code>config.xml</code>. After that, edit the first entry of - that file and replace <code>user</code> and <code>passwd</code> - with your personal user id and password for the KiSS Electronic - Programme Guide. - </p> + + <p> First of all, copy the <code>config.xml.example</code> file to <code>config.xml</code>. + After that, edit the first entry of that file and replace <code>user</code> and + <code>passwd</code> with your personal user id and password for the KiSS Electronic + Programme Guide. </p> + </section> + + <section> + <title>Program configuration</title> + <p> Interesting TV shows are described using <code>program</code> elements. Each + <code>program</code> element contains one or more <code>match</code> elements that + describe a condition that the interesting program must match. </p> + <p> Matching can be done on the following properties of a program: </p> + <table> + <tr> + <th>Field name</th> + <th>Description</th> + </tr> + <tr> + <td>name</td> + <td>Program name</td> + </tr> + <tr> + <td>description</td> + <td>Program description</td> + </tr> + <tr> + <td>channel</td> + <td>Channel name</td> + </tr> + <tr> + <td>keywords</td> + <td>Keywords/classification of the program.</td> + </tr> + </table> + <p> The field to match is specified using the <code>field</code> attribute of the + <code>match</code> element. If no field name is specified then the program name is + matched. Matching is done by converting the field value to lowercase and then doing a + perl-like regular expression match of the provided value. As a result, the content of the + match element should be specified in lower case otherwise the pattern will never match. If + multiple <code>match</code> elements are specified for a given <code>program</code> + element, then all matches must apply for a program to be interesting. </p> + <p> Example patterns: </p> + <table> + <tr> + <th>Pattern</th> + <th>Example of matching field values</th> + </tr> + <tr> + <td>the.*x.*files</td> + <td>"The X files", "The X-Files: the making of"</td> + </tr> + <tr> + <td>star trek</td> + <td>"Star Trek Voyager", "Star Trek: The next generation"</td> + </tr> + </table> + + <p> It is possible that different programs cannot be recorded since they overlap. To deal + with such conflicts, it is possible to specify a priority using the <code>priority</code> + element. Higher values of the priority value mean a higher priority. If two programs have + the same priority, then it is (more or less) unspecified which of the two will be + recorded, but it will at least record one program. If no priority is specified, then the + priority is 1 (one). </p> + + <p> Since it is not always desirable to try to record every program that matches the + criteria, it is also possible to generate notifications for interesting programs only + without recording them. This is done by specifying the <code>action</code> alement with + the content <code>notify</code>. By default, the <code>action</code> is + <code>record</code>. To make the mail reports more readable it is possible to also assign + a category to a program for grouping interesting programs. This can be done using the + <code>category</code> element. Note that if the <code>action</code> is + <code>notify</code>. then the <code>priority</code> element is not used. </p> + </section> - <section> - <title>Program configuration</title> - <p> - Interesting TV shows are described using <code>program</code> - elements. Each <code>program</code> element contains - one or more <code>match</code> elements that describe - a condition that the interesting program must match. - </p> - <p> - Matching can be done on the following properties of a program: - </p> - <table> - <tr><th>Field name</th> - <th>Description</th></tr> - <tr> - <td>name</td> - <td>Program name</td> - </tr> - <tr> - <td>description</td> - <td>Program description</td> - </tr> - <tr> - <td>channel</td> - <td>Channel name</td> - </tr> - <tr> - <td>keywords</td> - <td>Keywords/classification of the program.</td> - </tr> - </table> - <p> - The field to match is specified using the <code>field</code> - attribute of the <code>match</code> element. If no field name - is specified then the program name is matched. Matching is done - by converting the field value to lowercase and then doing a - perl-like regular expression match of the provided value. As a - result, the content of the match element should be specified in - lower case otherwise the pattern will never match. - If multiple <code>match</code> elements are specified for a - given <code>program</code> element, then all matches must - apply for a program to be interesting. - </p> - <p> - Example patterns: - </p> - <table> - <tr> - <th>Pattern</th> - <th>Example of matching field values</th> - </tr> - <tr> - <td>the.*x.*files</td> - <td>"The X files", "The X-Files: the making of"</td> - </tr> - <tr> - <td>star trek</td> - <td>"Star Trek Voyager", "Star Trek: The next generation"</td> - </tr> - </table> - - <p> - It is possible that different programs cannot be recorded - since they overlap. To deal with such conflicts, it is possible - to specify a priority using the <code>priority</code> element. - Higher values of the priority value mean a higher priority. - If two programs have the same priority, then it is (more or less) - unspecified which of the two will be recorded, but it will at least - record one program. If no priority is specified, then the - priority is 1 (one). - </p> - - <p> - Since it is not always desirable to try to record every - program that matches the criteria, it is also possible to - generate notifications for interesting programs only without - recording them. This is done by specifying the - <code>action</code> alement with the content <code>notify</code>. - By default, the <code>action</code> is <code>record</code>. - To make the mail reports more readable it is possible to - also assign a category to a program for grouping interesting - programs. This can be done using the <code>category</code> - element. Note that if the <code>action</code> is - <code>notify</code>. then the <code>priority</code> element - is not used. - </p> - - </section> - <section> <title>Notification configuration</title> - <p> - Edit the configuration file <code>org.wamblee.crawler.properties</code>. - The properties file is self-explanatory. - </p> + <p> Edit the configuration file <code>org.wamblee.crawler.properties</code>. The properties + file is self-explanatory. </p> </section> </section> - - - - + + + + <section> <title>Installing and running the crawler</title> - + <section> <title>Standalone application</title> - <p> - In the binary distribution, execute the - <code>run</code> script for your operating system - (<code>run.bat</code> for windows, and - <code>run.sh</code> for unix). - </p> + <p> In the binary distribution, execute the <code>run</code> script for your operating + system (<code>run.bat</code> for windows, and <code>run.sh</code> for unix). </p> </section> - + <section> <title>Web application</title> - <p> - After deploying the web application, navigate to the - application in your browser (e.g. - <code>http://localhost:8080/wamblee-crawler-kissweb</code>). - The screen should show an overview of the last time it ran (if - it ran before) as well as a button to run the crawler immediately. - Also, the result of the last run can be viewed. - The crawler will run automatically every morning at 5 AM local time, - and will retry at 1 hour intervals in case of failure to retrieve - programme information. - </p> + <p> After deploying the web application, navigate to the application in your browser (e.g. + <code>http://localhost:8080/wamblee-crawler-kissweb</code>). The screen should show an + overview of the last time it ran (if it ran before) as well as a button to run the crawler + immediately. Also, the result of the last run can be viewed. The crawler will run + automatically starting at around 19:00 (this is not exactly 19:00), + and will retry at 1 hour intervals in case + of failure to retrieve programme information. </p> </section> - + <section> <title>Source distribution</title> - <p> - With the source code, build everything with - <code>ant dist-lite</code>, then locate the binary - distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>. - Then proceed as for the binary distribution. - </p> + <p> With the source code, build everything with <code>ant dist-lite</code>, then locate the + binary distribution in <code>lib/wamblee/crawler/kiss/kiss-crawler-bin.zip</code>. Then + proceed as for the binary distribution. </p> </section> - + <section> <title>General usage</title> - <p> - When the crawler runs, it - retrieves the programs for tomorrow. As a result, it is advisable - to run the program at an early point of the day as a scheduled - task (e.g. cron on unix). For the web application this is - preconfigured at 5AM. - </p> - <note> - If you deploy the web application today, it will run automatically - on the next (!) day. This even holds if you deploy the application - before the normal scheduled time. - </note> - - <p> - Modifying the program to allow it to investigate tomorrow's - programs instead is easy as well but not yet implemented. + <p> When the crawler runs, it retrieves the programs for tomorrow. </p> + <note> If you deploy the web application today, it will run automatically on the next (!) + day. This even holds if you deploy the application before the normal scheduled time. </note> </section> - - + + </section> <section id="examples"> <title>Examples</title> - - <p> - The best example is in the distribution itself. It is my personal - <code>programs.xml</code> file. - </p> + + <p> The best example is in the distribution itself. It is my personal + <code>programs.xml</code> file. </p> </section> - + <section> <title>Contributing</title> - - <p> - You are always welcome to contribute. If you find a problem just - tell me about it and if you have ideas am I always interested to - hear about them. - </p> - <p> - If you are a programmer and have a fix for a bug, just send me a - patch and if you are fanatic enough and have ideas, I can also - give you write access to the repository. - </p> + + <p> You are always welcome to contribute. If you find a problem just tell me about it and if + you have ideas am I always interested to hear about them. </p> + <p> If you are a programmer and have a fix for a bug, just send me a patch and if you are + fanatic enough and have ideas, I can also give you write access to the repository. </p> </section> - - + + </body> </document>