X-Git-Url: http://wamblee.org/gitweb/?a=blobdiff_plain;f=crawler%2Fbasic%2FABOUT.txt;fp=crawler%2Fbasic%2FABOUT.txt;h=b61c613d74afb9f4e4c802f96dfcc4770fe605ca;hb=a568b8d4bf3e277c31a63656f23ce59516ce3732;hp=0000000000000000000000000000000000000000;hpb=d4bb47fd284738756cd112b788a49caa1a9d5c38;p=utils diff --git a/crawler/basic/ABOUT.txt b/crawler/basic/ABOUT.txt new file mode 100644 index 00000000..b61c613d --- /dev/null +++ b/crawler/basic/ABOUT.txt @@ -0,0 +1,9 @@ +This is a general library for implementing a web crawler. + +The crawler works by retrieving an HTML page and transforming the HTML +(content + presentation) into content using XSLT stylesheets. Using a convention +for links in the converted content, it becomes possible to build a generic interface on the retrieved pages for navigating through the content. + +A configuration file determines how a certain page must be retrieved and transformed. + +