MediaWiki extractor

WandoraWiki http://wandora.org/wiki/Main_Page MediaWiki 1.19.1 first-letter Media Special Talk User User talk WandoraWiki WandoraWiki talk File File talk MediaWiki MediaWiki talk Template Template talk Help Help talk Category Category talk MediaWiki extractor 0 1628 5n736b0yz62o9wbvpcpm9zuq1nkfsdj edit=sysop:move=sysop 7554 2010-01-09T17:54:43Z Akivela 3 /* Postprocessing MediaWiki extracted topics */ Wandora's [http://www.mediawiki.org/wiki/MediaWiki MediaWiki] extractor allows you to gather topics and associations from various large knowledge repositories such as [http://www.wikipedia.org Wikipedia]. The extractor can't handle HTML version of MediaWiki page but requires the XML exported page. MediaWiki extractor reads the XML dump of MediaWiki page and creates a topic for the page. Page content is attached to the topic as a text data occurrence. The extractor is started with '''File > Extract > Wiki > [[MediaWikiExtractor|MediaWiki extractor]]'''. You can extract data from local XML files or directly from MediaWiki site using export URL of the page. For example the export URL of this page is http://www.wandora.net/wandora/wiki/index.php?title=Special:Export/MediaWiki_extractor '''Note: Wandora or Wandora authors have no rights to give you any permission to use any content of any MediaWiki site. Wandora provides you nothing but a technology to create topic maps from MediaWiki pages. You should carefully read the content license of the MediaWiki site before using the extractor.''' == Postprocessing MediaWiki extracted topics == The MediaWiki extractor does not process the content of extracted pages. However, it is possible to create associations out of page content using another tool in Wandora. Context menu has a tool called '''Topics > Associations > [[FindAssociationInOccurrence|Find associations in occurrence...]]''' that can be used to extract associations out of text data. The tool requires type and scope of processed occurrence, topic's role in new associations, and a regular expression used to recognize extracted topics in text data. == See also == Wandora contains also separate [[Wikipedia extractor]] that is a graphical front end for MediaWiki extractor described here.