====== Configuration des options du fichier indexer.conf ====== Le fichier de configuration de l'indexation pour la recherche web est : **/usr/local/mnogosearch/etc/indexer.conf** ./indexer --help ===== Options du programme indexer ===== Usage: indexer [[OPTIONS]] [[configfile]] : ==== Crawler options ==== * **-a** : Revoir tous les documents, même s'ils ne sont pas expirés (can be limited using -t, -u, -s, -c, -y and -f options) * **-m** : Mettre à jour les documents périmés, même s'ils ne sont pas modifiés (can be limited using -t, -u, -c, -s, -y and -f options) * **-e** : Visite en premier les documents les plus anciens ('most expired') * **-o** : Visite en premier les documents avec moins de profondeur (hops value) * **-r** : Ne pas essayer de réduire les serveurs distants chargés aléatoirement dans la file d'attente du robot('randomising crawler queue order ') ... plus rapide, mais moins poli * **-n #** : Visit only # documents and exit * **-c #** : Visit only # seconds and exit * **-q** : Quick startup (do not add Server URLs); * **-qq** : even quicker * **-b** : Bloque le démarrage de plus d'1 instance du programme indexer * **-i** : Insert new URLs (URLs to insert must be given using -u or -f) * **-p #** : Sleep # seconds after downloading every URL * **-w** : Do not ask for confirmation when clearing documents from the database (e.g.: indexer -Cw) * **-N #** : Run # crawler threads ==== Subsection control options (can be combined): ==== *-s name Limit indexer to documents matching status (HTTP Status code) *-t name Limit indexer to documents matching tag *-g name Limit indexer to documents matching category *-y name Limit indexer to documents matching content-type *-L name Limit indexer to documents matching language *-u name Limit indexer to documents with URLs matching pattern (supports SQL LIKE wildcards '%' and '_') *--seed=name Limit indexer to documents with the given seed (0-255) *-D name Work with the n-th database only (i.e. with the n-th DBAddr) *-f name Read URLs to be visited/inserted/deleted from file (with -a or -C option, supports SQL LIKE wildcard '%%'; has no effect when combined with -m option) *-f - Use stdin instead of a file as an URL list ==== Logging options: ==== *-l Do not log to stdout/stderr *-v # Verbose level (0-5) ==== Misc. options: ==== ***-F name** : Print compile configuration and exit (e.g.: indexer -F '*') ***-h, --help** : Print help page and exit; -hh print more help ***-?** : Print help page and exit; ***-??** : print more help ***-d name** : Use the given configuration file instead of indexer.conf. This option is usefull when running indexer as an interpreter, e.g.: #!/usr/local/sbin/indexer -d ***-j name** : Set current time for statistic (use with -S), format: YYYY-MM[[-DD[[|HH[[:MM[[:SS]]] or time offset, e.g. 1d12h (see Period in indexer.conf) ***--set=name** : Set variable ==== Commands (can be used with subsection control options): ==== *--crawl Crawl (default command) *--index Create search index *--wordstat Create statistics for misspelled word suggestions *--rewriteurl Rewrite URL data into the current search index *--rewritelimits Recreate all Limit, UserScore, UserOrder data *-C, --delete Delete documents from the database *-S, --statistics Print statistics and exit *-I, --referers Print referers and exit *-R Crawl then calculate popularity rank ==== Other commands: ==== *--create Create SQL table structure and exit *--drop Drop SQL table structure and exit *-Q, --sqlmon Run interactive SQL monitor *--exec=name Execute SQL query *--checkconf Check configuration file for good syntax *--hashspell Create hash files for the active Ispell dictionaries *--dumpspell Dump Ispell data for use with SQLWordForms *--dumpdata Dump collected data using SQL statements *--restoredata Load prevously dumped data (give a filename using -f) ===== Configuration du fichier indexer.conf ===== ==== content-type ==== * UseRemoteContentType yes/no * AddType [String|Regex] [Case|NoCase] [...] * [[http://www.alsacreations.com/astuce/lire/1152-en-tetes-http.html|Infos sur en-tete Http]]