Package pywikipedia :: Module wikipedia
[show private | hide private]
[frames | no frames]

Module pywikipedia.wikipedia

Library to get and put pages on a MediaWiki.

Contents of the library (objects and functions to be used outside, situation
late August 2004)

Classes:
Page: A MediaWiki page
    __init__: Page(xx,Title) - the page with title Title on language xx:
    linkname: The name of the page, in a form suitable for an interwiki link
    urlname: The name of the page, in a form suitable for a URL
    catname: The name of the page, with the namespace part removed
    section: The section of the page (the part of the name after '#')
    sectionFreeLinkname: The name without the section part
    aslink: The name of the page in the form [[Title]] or [[lang:Title]]
    site: The wiki where this page is in
    encoding: The encoding the page is in

    get (*): The text of the page
    exists (*): True if the page actually exists, false otherwise
    isRedirectPage (*): True if the page is a redirect, false otherwise
    isEmpty (*): True if the page has 4 characters or less content, not
        counting interwiki and category links
    interwiki (*): The interwiki links from the page (list of Pages)
    categories (*): The categories the page is in (list of Pages)
    rawcategories (*): Like categories, but if the link contains a |, the
        part after the | is included.
    linkedPages (*): The normal pages linked from the page (list of Pages)
    imagelinks (*): The pictures on the page (list of strings)
    templates(*): All templates referenced on the page (list of strings)
    getRedirectTarget (*): The page the page redirects to
    isCategory: True if the page is a category, false otherwise
    isImage: True if the page is an image, false otherwise
    isDisambig (*): True if the page is a disambiguation page
    getReferences: The pages linking to the page
    namespace: The namespace in which the page is

    put(newtext): Saves the page
    delete: Deletes the page (requires being logged in)

    (*): This loads the page if it has not been loaded before

Other functions:
getall(xx,Pages): Get all pages in Pages (where Pages is a list of Pages,
    and xx: the language the pages are on)
setAction(text): Use 'text' instead of "Wikipedia python library" in
    summaries
allpages(): Get all page titles in one's home language as Pages (or all
    pages from 'Start' if allpages(start='Start') is used).
checkLogin(): gives True if the bot is logged in on the home language, False
    otherwise
argHandler(text): Checks whether text is an argument defined on wikipedia.py
    (these are -family, -lang, and -log)
translate(xx, dict): dict is a dictionary, giving text depending on language,
    xx is a language. Returns the text in the most applicable language for
    the xx: wiki

output(text): Prints the text 'text' in the encoding of the user's console.
input(text): Asks input from the user, printing the text 'text' first.
showDiff(oldtext, newtext): Prints the differences between oldtext and newtext
    on the screen

getLanguageLinks(text,xx): get all interlanguage links in wikicode text 'text'
    in the form xx:pagename
removeLanguageLinks(text): gives the wiki-code 'text' without any interlanguage
    links.
replaceLanguageLinks(oldtext, new): in the wiki-code 'oldtext' remove the
    language links and replace them by the language links in new, a dictionary
    with the languages as keys and either Pages or linknames as values
getCategoryLinks(text,xx): get all category links in text 'text' (links in the
    form xx:pagename)
removeCategoryLinks(text,xx): remove all category links in 'text'
replaceCategoryLinks(oldtext,new): replace the category links in oldtext by
    those in new (new a list of category Pages)
stopme(): Put this on a bot when it is not or not any more communicating
    with the Wiki. It will remove the bot from the list of running processes,
    and thus not slow down other bot threads any more.

Classes
GetAll  
MyURLopener  
Page A page on the wiki.
Site  
Throttle  
WikimediaXmlHandler  

Exceptions
EditConflict There has been an edit conflict while uploading the page
Error Wikipedia error
IsNotRedirectPage Wikipedia page is not a redirect page
IsRedirectPage Wikipedia page is a redirect page
LockedPage Wikipedia page is locked
NoNamespace Wikipedia page is not in a special namespace
NoPage Wikipedia page does not exist
NoSuchEntity No entity exist for this character
NotLoggedIn Anonymous editing Wikipedia is not possible
PageInList Trying to add page to list that is already included
PageNotFound Page not found in list
SectionError The section specified by # does not exist

Function Summary
  addEntity(name)
Convert a unicode name into ascii name with entities
  allpages(start, site, namespace, throttle)
Generator which yields all articles in the home language in alphanumerical order, starting at a given page.
  argHandler(arg, moduleName)
Takes a commandline parameter, converts it to unicode, and returns it unless it is one of the global parameters as -lang or -log.
  categoryFormat(links, insite)
Create a suitable string encoding all category links for a wikipedia page.
  checkLogin(site)
  Family(fam, fatal)
Import the named family.
  getall(site, pages, throttle)
  getCategoryLinks(text, site, raw)
Returns a list of category links.
  getEditPage(site, name, read_only, do_quote, get_redirect, throttle)
Get the contents of page 'name' from the 'site' wiki Do not use this directly; for 99% of the possible ideas you can use the Page object instead.
  getLanguageLinks(text, insite)
Returns a dictionary of other language links mentioned in the text in the form {code:pagename}.
  getSite(code, fam, user)
  getUrl(site, path)
Low-level routine to get a URL from the wiki.
  html2unicode(name, site, altsite)
  interwikiFormat(links, insite)
Create a suitable string encoding all interwiki links for a wikipedia page.
  isInterwikiLink(s, site)
Try to check whether s is in the form "xx:link" where xx: is a known language.
  link2url(name, site, insite)
Convert an interwiki link name of a page to the proper name to be used in a URL for that page.
  myencoding()
The character encoding used by the home wiki
  newpages(number, repeat, site)
Generator which yields new articles subsequently.
  normalWhitespace(text)
  putPage(site, name, text, comment, watchArticle, minorEdit, newPage, token, gettoken)
Upload 'text' on page 'name' to the 'site' wiki.
  redirectRe(site)
  removeCategoryLinks(text, site)
Given the wiki-text of a page, return that page with all category links removed.
  removeEntity(name)
  removeLanguageLinks(text, site)
Given the wiki-text of a page, return that page with all interwiki links removed.
  replaceCategoryLinks(oldtext, new, site)
Replace the category links given in the wikitext given in oldtext by the new links given in new.
  replaceLanguageLinks(oldtext, new, site)
Replace the interwiki language links given in the wikitext given in oldtext by the new links given in new.
  setAction(s)
Set a summary to use for changed page submissions
  space2underline(name)
  underline2space(name)
  unescape(s)
Replace escaped HTML-special characters by their originals
  unicode2html(x, encoding)
We have a unicode string.
  unicodeName(name, site, altsite)
  UnicodeToAsciiHtml(s)
  url2link(percentname, insite, site)
Convert a url-name of a page into a proper name for an interwiki link the argument 'insite' specifies the target wiki
  url2unicode(percentname, site)
  urlencode(query)
This can encode a query so that it can be sent as a query using a http POST request

Imported modules:
codecs, pywikipedia.config, datetime, difflib, htmlentitydefs, httplib, locale, math, pywikipedia.mediawiki_messages, os, re, socket, sys, time, traceback, urllib, warnings, xml
Imported classes:
set
Imported variables:
__version__, action, edittime, generators, get_throttle, put_throttle, Rmorespaces, Rmoreunderlines
Function Details

addEntity(name)

Convert a unicode name into ascii name with entities

allpages(start='!', site=None, namespace=0, throttle=True)

Generator which yields all articles in the home language in alphanumerical order, starting at a given page. By default, it starts at '!', so it should yield all pages.

The objects returned by this generator are all Page()s.

argHandler(arg, moduleName)

Takes a commandline parameter, converts it to unicode, and returns it unless it is one of the global parameters as -lang or -log. If it is a global parameter, processes it and returns None.

moduleName should be the name of the module calling this function. This is required because the -help option loads the module's docstring and because the module name will be used for the filename of the log.

categoryFormat(links, insite=None)

Create a suitable string encoding all category links for a wikipedia page.

'links' should be a list of category pagelink objects.

The string is formatted for inclusion in insite.

Family(fam=None, fatal=True)

Import the named family.

getCategoryLinks(text, site, raw=False)

Returns a list of category links. in the form {code:pagename}. Do not call this routine directly, use Page objects instead

getEditPage(site, name, read_only=False, do_quote=True, get_redirect=False, throttle=True)

Get the contents of page 'name' from the 'site' wiki
Do not use this directly; for 99% of the possible ideas you can
use the Page object instead.

Arguments:
    site          - the wiki site
    name          - the page name
    read_only     - If true, doesn't raise LockedPage exceptions.
    do_quote      - ??? (TODO: what is this for?)
    get_redirect  - Get the contents, even if it is a redirect page

This routine returns a unicode string containing the wiki text.

getLanguageLinks(text, insite=None)

Returns a dictionary of other language links mentioned in the text in the form {code:pagename}. Do not call this routine directly, use Page objects instead

getUrl(site, path)

Low-level routine to get a URL from the wiki.

site is a Site object, path is the absolute path.

Returns the HTML text of the page converted to unicode.

interwikiFormat(links, insite=None)

Create a suitable string encoding all interwiki links for a wikipedia page.

'links' should be a dictionary with the language names as keys, and either Page objects or the link-names of the pages as values.

The string is formatted for inclusion in insite (defaulting to your own).

isInterwikiLink(s, site=None)

Try to check whether s is in the form "xx:link" where xx: is a known language. In such a case we are dealing with an interwiki link.

link2url(name, site, insite=None)

Convert an interwiki link name of a page to the proper name to be used in a URL for that page. code should specify the language for the link

myencoding()

The character encoding used by the home wiki

newpages(number=10, repeat=False, site=None)

Generator which yields new articles subsequently. It starts with the article created 'number' articles ago (first argument). When these are all yielded it fetches NewPages again. If there is no new page, it blocks until there is one, sleeping between subsequent fetches of NewPages.

The objects yielded are dictionairies. The keys are date (datetime object), title (pagelink), length (int) user_login (only if user is logged in, string), comment (string) and user_anon (if user is not logged in, string).

The throttling is important here, so always enabled.

putPage(site, name, text, comment=None, watchArticle=False, minorEdit=True, newPage=False, token=None, gettoken=False)

Upload 'text' on page 'name' to the 'site' wiki. Use of this routine can normally be avoided; use Page.put instead.

removeCategoryLinks(text, site)

Given the wiki-text of a page, return that page with all category links removed.

removeLanguageLinks(text, site=None)

Given the wiki-text of a page, return that page with all interwiki links removed. If a link to an unknown language is encountered, a warning is printed.

replaceCategoryLinks(oldtext, new, site=None)

Replace the category links given in the wikitext given in oldtext by the new links given in new.

'new' should be a list of category pagelink objects.

replaceLanguageLinks(oldtext, new, site=None)

Replace the interwiki language links given in the wikitext given in oldtext by the new links given in new.

'new' should be a dictionary with the language names as keys, and either Page objects or the link-names of the pages as values.

setAction(s)

Set a summary to use for changed page submissions

unescape(s)

Replace escaped HTML-special characters by their originals

unicode2html(x, encoding)

We have a unicode string. We can attempt to encode it into the desired format, and if that doesn't work, we encode the unicode into html # entities. If it does work, we return it unchanged.

url2link(percentname, insite, site)

Convert a url-name of a page into a proper name for an interwiki link the argument 'insite' specifies the target wiki

urlencode(query)

This can encode a query so that it can be sent as a query using a http POST request

Generated by Epydoc 2.1 on Sun Jul 03 17:07:35 2005 http://epydoc.sf.net