Package pywikipedia :: Module weblinkchecker
[show private | hide private]
[frames | no frames]

Module pywikipedia.weblinkchecker

This bot is used for checking external links found at the wiki. It checks
several pages at once, with a limit set by the config variable
max_external_links.

The bot won't change any wiki pages, it will only report dead links such that
people can fix or remove the links themselves.

The bot will store all links found dead in a .dat file in the deadlinks
subdirectory. To avoid the removing of links which are only temporarily
unavailable, the bot only reports links which were reported dead at least
two times, with a time lag of at least one week. Such links will be stored
in a .txt file in the deadlinks subdirectory.

When a link is found alive, it will be removed from the .dat file.

Syntax examples:
    python weblinkchecker.py
        Loads all wiki pages in alphabetical order using the Special:Allpages
        feature.

    python weblinkchecker.py -start:Example_page
        Loads all wiki pages using the Special:Allpages feature, starting at
        "Example page"
    
    python weblinkchecker.py Example page
        Only checks links found in the wiki page "Example page"

    python weblinkchecker.py -sql:20050516.sql
        Checks all links found in an SQL cur dump.

Classes
AllpagesPageContentGenerator  
History Stores previously found dead links.
LinkChecker Given a HTTP URL, tries to load the page from the Internet and checks if it is still online.
LinkCheckThread A thread responsible for checking one page.
SinglePageContentGenerator Pseudo-generator
SqlPageContentGenerator Using an SQL dump file, retrieves all pages that are not redirects (doesn't load them from the live wiki), and yields title/text pairs.
WeblinkCheckerRobot Robot which will use several LinkCheckThreads at once to search for dead weblinks on pages provided by the given generator.

Function Summary
  main()

Imported modules:
codecs, pywikipedia.config, httplib, pywikipedia.pagegenerators, pickle, re, socket, sys, threading, time, urlparse, pywikipedia.wikipedia
Generated by Epydoc 2.1 on Sun Jul 03 17:07:38 2005 http://epydoc.sf.net