How long do you think it'll take to come up with a well formated version (and what exactly does nicely formatted mean)? Will there be hyperlinks within the text, or will that be left out? Also, given the size of the files, am I right in assume that this will this be a "do it yourself" kind of project (ie. you write the scripts and the people who want it download the raw wikipedia data and then run them to create the EPWING dictionary version)? If so, will it require a linux computer? I only have access to Windows and Mac boxes.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=123297\"][{POST_SNAPBACK}][/a][/div]
I have already generated a unformatted Wikipedia (in English and for a test, in Japanese), which uses only the FreePWING library's text encoder. The hyperlinks will not be active since I can't understand documentation which clues me on how to make inter-dictionary and internet hyperlinks. Everything else works.
The issue with distributing the program is that I packaged up Loeffler's parser and the FreePWING libraries with a commercial program into a .Net library. I don't think this package can be legally distributed, because I don't have the source to the packaging of the Perl runtime inside this package. The programs I wrote require Windows, this library, and Cygwin (Linux on Windows). I'm going to BitTorrent the Wikipedia. Eventually, somebody with bandwidth could host it.
Currently, I'm stuck on an encoding bug. The FreePWING parser wants ASCII text, but the Wikipedia is encoded between UTF8 and Unicode. That means I can format all day, but an accented character is seen by the parser as two characters, one of them invalid, instead of one character.