NAME¶
MediaWiki::DumpFile::Compat - Compatibility with Parse::MediaWikiDump
SYNOPSIS¶
use MediaWiki::DumpFile::Compat;
$pmwd = Parse::MediaWikiDump->new;
$pages = $pmwd->pages('pages-articles.xml');
$revisions = $pmwd->revisions('pages-articles.xml');
$links = $pmwd->links('links.sql');
ABOUT¶
This software suite provides the tools needed to process the contents of the XML
page dump files and the SQL based links dump file from a Mediawiki instance.
This is a compatibility layer between MediaWiki::Dumpfile and
Parse::MediaWikiDump; instead of "use Parse::MediaWikiDump;" you
"use MediaWiki::DumpFile::Compat;". The benefit of using the new
compatibility module is an increased processing speed - see the
MediaWiki::DumpFile::Benchmarks documentation for benchmark results.
MORE DOCUMENTATION¶
The original Parse::MediaWikiDump documentation is also available in this
package; it has been updated to include new features introduced by
MediaWiki::DumpFile. You can find the documentation in the following
locations:
- MediaWiki::DumpFile::Compat::Pages
- MediaWiki::DumpFile::Compat::Revisions
- MediaWiki::DumpFile::Compat::page
- MediaWiki::DumpFile::Compat::Links
- MediaWiki::DumpFile::Compat::link
USAGE¶
This module is a factory class that allows you to create instances of the
individual parser objects.
- $pmwd->pages
- Returns a Parse::MediaWikiDump::Pages object capable of parsing an article
XML dump file with one revision per each article.
- $pmwd->revisions
- Returns a Parse::MediaWikiDump::Revisions object capable of parsing an
article XML dump file with multiple revisions per each article.
- $pmwd->links
- Returns a Parse::MediaWikiDump::Links object capable of parsing an article
links SQL dump file.
General¶
All parser creation invocations require a location of source data to parse; this
argument can be either a filename or a reference to an already open
filehandle. This entire software suite will die() upon errors in the
file or if internal inconsistencies have been detected. If this concerns you
then you can wrap the portion of your code that uses these calls with
eval().
COMPATIBILITY¶
Any deviation of the behavior of MediaWiki::DumpFile::Compat from
Parse::MediaWikiDump that is not listed below is a bug. Please report it so
that this package can act as a near perfect standin for the original.
Compatibility is verified by using the existing Parse::MediaWikiDump test
suite with the following adjustments:
Parse::MediaWikiDump::Pages¶
- Parse::MediaWikiDump did not need to load all revisions of an article into
memory when processing dump files that contain more than one revision but
this compatibility module does. The API does not change but the memory
requirements for parsing those dump files certainly do. It is, however,
highly unlikely that you will notice this as most of the documents with
many revisions per article are so large that Parse::MediaWikiDump would
not have been able to parse them in any reasonable timeframe.
- The order of the results from namespaces() is now sorted by the
namespace ID instead of being in document order
- •
- Order of values from next() is now in identical order as SQL
file.
LIMITATIONS¶
- •
- This compatibility layer is not yet well tested.