Module pywikipedia.interwiki
Script to check language links for general pages. This works by downloading the
page, and using existing translations plus hints from the command line to
download the equivalent pages from other languages. All of such pages are
downloaded as well and checked for interwiki links recursively until there are
no more links that are encountered. A rationalization process then selects the
right interwiki links, and if this is unambiguous, the interwiki links in the
original page will be automatically updated and the modified page uploaded.
This script understands various command-line arguments:
-force: do not ask permission to make "controversial" changes,
like removing a language because none of the found
alternatives actually exists.
-always: make changes even when a single byte is changed in
the page, not only when one of the links has a significant
change.
-hint: used as -hint:de:Anweisung to give the robot a hint
where to start looking for translations. This is only
useful if you specify a single page to work on. If no
text is given after the second ':', the name of the page
itself is used as the title for the hint.
There are some special hints, trying a number of languages at once:
all: Provides the hint for all languages with at least ca. 100 pages
10: Provides the hint for ca. 10 of the largest languages
20:, 30:, 50: Analogous to 10: with ca. 20, 30 and 50 languages
cyril: Provides the hint for all languages that use the cyrillic alphabet
-same: looks over all 'serious' languages for the same title.
-same is equivalent to -hint:all:
-name: similar to -same, but UPPERCASE the last name for eo:
-wiktionary: similar to -same, but will ONLY accept names that are
identical to the original. Also, if the title is not
capitalized, it will only go through other wikis without
automatic capitalization.
-askhints: for each page one or more hints are asked. See hint: above
for the format, one can for example give "en:something" or
"20:" as hint.
-untranslated: works normally on pages with at least one interlanguage
link; asks hints for pages that have none.
-untranslatedonly: same as -untranslated, but pages which already have a
translation are skipped. Hint: do NOT use this in
combination with -start without a -number limit, because
you will go through the whole alphabet before any queries
are performed!
-file: used as -file:filename, read a list of pages to treat
from the named file
-confirm: ask for confirmation before any page is changed on the
live wiki. Without this argument, additions and
unambiguous modifications are made without confirmation.
-autonomous: run automatically, do not ask any questions. If a question
to an operator is needed, write the name of the page
to autonomous_problems.dat and continue on the next page.
-nobacklink: switch off the backlink warnings
-start: used as -start:pagename, specifies that the robot should
go alphabetically through all pages on the home wiki,
starting at the named page.
-number: used as -number:#, specifies that the robot should process
that amount of pages and then stop. This is only useful in
combination with -start. The default is not to stop.
-array: used as -array:#, specifies that the robot should process
that amount of pages at once, only starting to load new
pages in the original language when the total falls below
that number. Default is to process (at least) 100 pages at
once. The number of new ones loaded is equal to the number
that is loaded at once from another language (default 60)
-years: run on all year pages in numerical order. Stop at year 2050.
If the argument is given in the form -years:XYZ, it
will run from [[XYZ]] through [[2050]]. If XYZ is a
negative value, it is interpreted as a year BC. If the
argument is simply given as -years, it will run from 1
through 2050.
This implies -noredirect.
-noauto: Do not use the automatic translation feature for years and
dates, only use found links and hits.
-days: Like -years, but runs through all date pages. Stops at
Dec 31. If the argument is given in the form -days:X,
it will start at month no. X through Dec 31. If the
argument is simply given as -days, it will run from
Jan 1 through Dec 31. E.g. for -days:9 it will run
from Sep 1 through Dec 31.
-skipfile: used as -skipfile:filename, skip all links mentioned in
the given file from the list generated by -start. This
does not work with -number!
-restore: restore a set of "dumped" pages the robot was working on
when it terminated.
-continue: as restore, but after having gone through the dumped pages,
continue alphabetically starting at the last of the dumped
pages.
-warnfile: used as -warnfile:filename, reads all warnings from the
given file that apply to the home wiki language,
and read the rest of the warning as a hint. Then
treats all the mentioned pages. A quicker way to
implement warnfile suggestions without verifying them
against the live wiki is using the warnfile.py
robot.
-noredirect do not follow redirects (note: without ending columns).
-noshownew: don't show the source of every new pagelink found.
-neverlink: used as -neverlink:xx where xx is a language code:
Disregard any links found to language xx. You can also
specify a list of languages to disregard, separated by
commas.
-showpage when asking for hints, show the first bit of the text
of the page always, rather than doing so only when being
asked for (by typing '?'). Only useful in combination
with a hint-asking option like -untranslated, -askhints
or -untranslatedonly
A configuration option can be used to change the working of this robot:
interwiki_backlink: if set to True, all problems in foreign wikis will
be reported
Both these options are set to True by default. They can be changed through
the user-config.py configuration file.
If interwiki.py is terminated before it is finished, it will write a file
"interwiki.dump"; the program will read it if invoked with the
"-restore" or "-continue" option, and finish all the subjects in that list.
To run the interwiki-bot on all pages on a language, run it with option
"-start:!", and if it takes so long you have to break it off, use "-continue"
next time.
Classes |
Global |
Container class for global settings. |
Subject |
Class to follow the progress of a single 'subject' (i.e. |
SubjectArray |
A class keeping track of a list of subjects, controlling which pages
are queried from which languages when. |
Function Summary |
|
compareLanguages(old,
new)
|
|
readWarnfile(filename,
sa)
|
- Imported modules:
-
codecs
,
pywikipedia.config
,
copy
,
pywikipedia.date
,
pywikipedia.pagegenerators
,
re
,
socket
,
sys
,
time
,
pywikipedia.titletranslate
,
pywikipedia.wikipedia
- Imported variables:
-
__version__
,
globalvar
,
msg