(no commit message)

[utils] / crawler / kiss / docs / content / xdocs / index.xml
diff --git a/crawler/kiss/docs/content/xdocs/index.xml b/crawler/kiss/docs/content/xdocs/index.xml

index 3dc5b2572b9e54718f429ee63765dc5dd68b6fb4..02d1a4668fe6609e68ac1f56835925650c8876ef 100644 (file)
--- a/crawler/kiss/docs/content/xdocs/index.xml
+++ b/crawler/kiss/docs/content/xdocs/index.xml
@@ -47,9 +47,14 @@
          records programs for you or sends notifications about interesting ones.
        </p>
        <p>
-        In its current version, the crawler can be used a standalone program
-        only and the preferred way to run it is as a scheduled task. 
+        In its current version, the crawler can be used in two ways:  
        </p>
+      <ul>
+        <li><strong>standalone program</strong>: A standalone program run as a scheduled task.</li>
+        <li><strong>web application</strong>: A web application running on a java
+          application server. With this type of use, the crawler also features an automatic retry
+          mechanism in case of failures, as well as a simple web interface. </li>
+      </ul>
      </section>
      
      <section>
@@ -61,22 +66,42 @@
        </p>
        <p>
          The easy way to start is the 
-        <a href="installs/crawler/kiss/kiss-crawler-bin.zip">binary version</a>.
+        <a href="installs/crawler/kiss/kiss-crawler-bin.zip">standalone program binary version</a>
+        or using the <a href="installs/crawler/kissweb/wamblee-crawler-kissweb.war">web
+          application</a>.
        </p>
        <p>
          The latest source can be obtained from subversion with the 
          URL <code>https://wamblee.org/svn/public/utils</code>. The subversion 
          repository allows read-only access to anyone. 
        </p>
+      <p>
+        The application was developed and tested on SuSE linux 9.1 with JBoss 4.0.2 application
+        server (only required for the web application). It requires at least a Java Virtual Machine
+        1.5 or greater to run. 
+      </p>
      </section>
      
      <section>
        <title>Configuring the crawler</title>
        
        <p>
-        The crawler comes with two configuration files, namely 
-        <code>crawler.xml</code> and <code>programs.xml</code>. 
+        The crawler comes with three configuration files:
+      </p>
+      <ul>
+        <li><code>crawler.xml</code>: basic crawler configuration
+          tailored to the KiSS electronic programme guide.</li>
+        <li><code>programs.xml</code>: containing a description of which 
+          programs must be recorded and which programs are interesting.</li>
+        <li><code>org.wamblee.crawler.properties</code>: Containing a configuration  </li>
+      </ul>
+      <p>
+        For the standalone program, all configuration files are in the <code>conf</code> directory.
+        For the web application, the properties files is located in the <code>WEB-INF/classes</code>
+        directory of the web application, and <code>crawler.xml</code> and <code>programs.xml</code>
+        are located outside of the web application at a location configured in the properties file. 
        </p>
+   
        
        <section>
          <title>Crawler configuration <code>crawler.xml</code></title>
@@ -89,33 +114,7 @@
            Programme Guide. 
          </p>
        </section>
-      
-      <section>
-        <title>Program configuration: <code>programs.xml</code></title>
-        
-        <p>
-          The <code>programs.xml</code> file contains the following 
-          configuration items: 
-        </p>
-        <ul>
-          <li>Notification configuration: Describing how to 
-            do notification of the results of crawling the site. </li>
-          <li>Zero or more configurations of interesting programs.  </li>
-        </ul>
-        <section>
-          <title>Notification configuration</title>
-          <p>
-            Notification is configured in the (surprise, surprise!) 
-            <code>notification</code> element. This notification element 
-            is used to configure respectively sender mail address (= reply 
-            address), recipient address, subject of the email, smtp server
-            host and port and optional username and password. 
-            In addition it contains the names of the stylesheets to 
-            generate the HTML and Text reports. These stylesheets 
-            should not be changed. 
-          </p>
-        </section>
-        
+
          <section>
            <title>Program configuration</title>
            <p>
@@ -178,7 +177,7 @@
            </table>
            
            <p>
-            It is possible that different programs cannot be recorded at
+            It is possible that different programs cannot be recorded 
              since they overlap. To deal with such conflicts, it is possible
              to specify a priority using the <code>priority</code> element. 
              Higher values of the priority value mean a higher priority. 
@@ -204,16 +203,24 @@
            </p>
            
          </section>
-        
-        
+      
+      <section>
+        <title>Notification configuration</title>
+        <p>
+           Edit the configuration file <code>org.wamblee.crawler.properties</code>. 
+          The properties file is self-explanatory. 
+        </p>
        </section>
      </section>
      
+   
+    
+    
      <section>
        <title>Installing and running the crawler</title>
        
        <section>
-        <title>Binary distribution</title>
+        <title>Standalone application</title>
          <p>
            In the binary distribution, execute the 
            <code>run</code> script for your operating system
@@ -222,6 +229,19 @@
          </p>
        </section>
        
+      <section>
+        <title>Web application</title>
+        <p>
+          After deploying the web application, navigate to the 
+          application in your browser (e.g. 
+          <code>http://localhost:8080/wamblee-crawler-kissweb</code>).
+          The screen should show an overview of the last time it ran (if
+          it ran before) as well as a button to run the crawler immediately.
+          Also, the result of the last run can be viewed.
+          The crawler will run automatically every morning at 5 AM local time. 
+        </p>
+      </section>
+      
        <section>
          <title>Source distribution</title>
          <p>
@@ -235,11 +255,11 @@
        <section>
          <title>General usage</title>
          <p>
-          The crawler, as it is now, is s standalone program which is 
-          intended to be run from a command-line. When it runs, it 
+          When the crawler runs, it 
            retrieves the programs for today. As a result, it is advisable 
            to run the program at an early point of the day as a scheduled
-          task (e.g. cron on unix). 
+          task (e.g. cron on unix). For the web application this is 
+          preconfigured at 5AM. 
          </p>
          <p>
            Modifying the program to allow it to investigate tomorrow's
@@ -255,7 +275,7 @@
      
        <p>
          The best example is in the distribution itself. It is my personal
-        <code>programs.xml</code> file. 
+        <code>programs.xml</code> file.
        </p>
      </section>