====== Configuration des options du fichier indexer.conf ======
Le fichier de configuration de l'indexation pour la recherche web est :
**/usr/local/mnogosearch/etc/indexer.conf**
./indexer --help
===== Options du programme indexer =====
Usage: indexer [[OPTIONS]] [[configfile]]
:
==== Crawler options ====
* **-a** : Revoir tous les documents, même s'ils ne sont pas expirés (can be limited using -t, -u, -s, -c, -y and -f options)
* **-m** : Mettre à jour les documents périmés, même s'ils ne sont pas modifiés (can be limited using -t, -u, -c, -s, -y and -f options)
* **-e** : Visite en premier les documents les plus anciens ('most expired')
* **-o** : Visite en premier les documents avec moins de profondeur (hops value)
* **-r** : Ne pas essayer de réduire les serveurs distants chargés aléatoirement dans la file d'attente du robot('randomising crawler queue order ') ... plus rapide, mais moins poli
* **-n #** : Visit only # documents and exit
* **-c #** : Visit only # seconds and exit
* **-q** : Quick startup (do not add Server URLs);
* **-qq** : even quicker
* **-b** : Bloque le démarrage de plus d'1 instance du programme indexer
* **-i** : Insert new URLs (URLs to insert must be given using -u or -f)
* **-p #** : Sleep # seconds after downloading every URL
* **-w** : Do not ask for confirmation when clearing documents from the database (e.g.: indexer -Cw)
* **-N #** : Run # crawler threads
==== Subsection control options (can be combined): ====
*-s name Limit indexer to documents matching status (HTTP Status code)
*-t name Limit indexer to documents matching tag
*-g name Limit indexer to documents matching category
*-y name Limit indexer to documents matching content-type
*-L name Limit indexer to documents matching language
*-u name Limit indexer to documents with URLs matching pattern
(supports SQL LIKE wildcards '%' and '_')
*--seed=name Limit indexer to documents with the given seed (0-255)
*-D name Work with the n-th database only (i.e. with the n-th DBAddr)
*-f name Read URLs to be visited/inserted/deleted from file (with -a or -C option, supports SQL LIKE wildcard '%%'; has no effect when combined with -m option)
*-f - Use stdin instead of a file as an URL list
==== Logging options: ====
*-l Do not log to stdout/stderr
*-v # Verbose level (0-5)
==== Misc. options: ====
***-F name** : Print compile configuration and exit (e.g.: indexer -F '*')
***-h, --help** : Print help page and exit; -hh print more help
***-?** : Print help page and exit;
***-??** : print more help
***-d name** : Use the given configuration file instead of indexer.conf. This option is usefull when running indexer as an interpreter, e.g.: #!/usr/local/sbin/indexer -d
***-j name** : Set current time for statistic (use with -S), format: YYYY-MM[[-DD[[|HH[[:MM[[:SS]]] or time offset, e.g. 1d12h (see Period in indexer.conf)
***--set=name** : Set variable
==== Commands (can be used with subsection control options): ====
*--crawl Crawl (default command)
*--index Create search index
*--wordstat Create statistics for misspelled word suggestions
*--rewriteurl Rewrite URL data into the current search index
*--rewritelimits Recreate all Limit, UserScore, UserOrder data
*-C, --delete Delete documents from the database
*-S, --statistics Print statistics and exit
*-I, --referers Print referers and exit
*-R Crawl then calculate popularity rank
==== Other commands: ====
*--create Create SQL table structure and exit
*--drop Drop SQL table structure and exit
*-Q, --sqlmon Run interactive SQL monitor
*--exec=name Execute SQL query
*--checkconf Check configuration file for good syntax
*--hashspell Create hash files for the active Ispell dictionaries
*--dumpspell Dump Ispell data for use with SQLWordForms
*--dumpdata Dump collected data using SQL statements
*--restoredata Load prevously dumped data (give a filename using -f)
===== Configuration du fichier indexer.conf =====
==== content-type ====
* UseRemoteContentType yes/no
* AddType [String|Regex] [Case|NoCase] [...]
*
[[http://www.alsacreations.com/astuce/lire/1152-en-tetes-http.html|Infos sur en-tete Http]]