Configuration des options du fichier indexer.conf

Le fichier de configuration de l'indexation pour la recherche web est : /usr/local/mnogosearch/etc/indexer.conf

./indexer --help

Options du programme indexer

Usage: indexer OPTIONS configfile :

Crawler options

  • -a : Revoir tous les documents, même s'ils ne sont pas expirés (can be limited using -t, -u, -s, -c, -y and -f options)
  • -m : Mettre à jour les documents périmés, même s'ils ne sont pas modifiés (can be limited using -t, -u, -c, -s, -y and -f options)
  • -e : Visite en premier les documents les plus anciens ('most expired')
  • -o : Visite en premier les documents avec moins de profondeur (hops value)
  • -r : Ne pas essayer de réduire les serveurs distants chargés aléatoirement dans la file d'attente du robot('randomising crawler queue order ') … plus rapide, mais moins poli
  • -n # : Visit only # documents and exit
  • -c # : Visit only # seconds and exit
  • -q : Quick startup (do not add Server URLs);
  • -qq : even quicker
  • -b : Bloque le démarrage de plus d'1 instance du programme indexer
  • -i : Insert new URLs (URLs to insert must be given using -u or -f)
  • -p # : Sleep # seconds after downloading every URL
  • -w : Do not ask for confirmation when clearing documents from the database (e.g.: indexer -Cw)
  • -N # : Run # crawler threads

Subsection control options (can be combined):

  • -s name Limit indexer to documents matching status (HTTP Status code)
  • -t name Limit indexer to documents matching tag
  • -g name Limit indexer to documents matching category
  • -y name Limit indexer to documents matching content-type
  • -L name Limit indexer to documents matching language
  • -u name Limit indexer to documents with URLs matching pattern

(supports SQL LIKE wildcards '%' and '_')

  • –seed=name Limit indexer to documents with the given seed (0-255)
  • -D name Work with the n-th database only (i.e. with the n-th DBAddr)
  • -f name Read URLs to be visited/inserted/deleted from file (with -a or -C option, supports SQL LIKE wildcard '%%'; has no effect when combined with -m option)
  • -f - Use stdin instead of a file as an URL list

Logging options:

  • -l Do not log to stdout/stderr
  • -v # Verbose level (0-5)

Misc. options:

  • -F name : Print compile configuration and exit (e.g.: indexer -F '*')
  • -h, –help : Print help page and exit; -hh print more help
  • -? : Print help page and exit;
  • -?? : print more help
  • -d name : Use the given configuration file instead of indexer.conf. This option is usefull when running indexer as an interpreter, e.g.: #!/usr/local/sbin/indexer -d
  • -j name : Set current time for statistic (use with -S), format: YYYY-MMHH[[:MM[[:SS] or time offset, e.g. 1d12h (see Period in indexer.conf)
  • –set=name : Set variable

Commands (can be used with subsection control options):

  • –crawl Crawl (default command)
  • –index Create search index
  • –wordstat Create statistics for misspelled word suggestions
  • –rewriteurl Rewrite URL data into the current search index
  • –rewritelimits Recreate all Limit, UserScore, UserOrder data
  • -C, –delete Delete documents from the database
  • -S, –statistics Print statistics and exit
  • -I, –referers Print referers and exit
  • -R Crawl then calculate popularity rank

Other commands:

  • –create Create SQL table structure and exit
  • –drop Drop SQL table structure and exit
  • -Q, –sqlmon Run interactive SQL monitor
  • –exec=name Execute SQL query
  • –checkconf Check configuration file for good syntax
  • –hashspell Create hash files for the active Ispell dictionaries
  • –dumpspell Dump Ispell data for use with SQLWordForms
  • –dumpdata Dump collected data using SQL statements
  • –restoredata Load prevously dumped data (give a filename using -f)

Configuration du fichier indexer.conf

content-type

  • UseRemoteContentType yes/no
  • AddType [String|Regex] [Case|NoCase] <mime type> <arg> [<arg>…]

Infos sur en-tete Http


Navigation

QR Code
QR Code systeme_informatique:bilatu:mngosearch:indexer.conf_use (generated for current page)