Author Topic: Anyone Got A Working Wiki2bedic.pl (Read 14732 times)

rafm · « **Reply #45 on:** April 22, 2005, 08:04:22 am »

Quote

I'll give it a go today. Hopefully the wireless is up to it - I'll be downloading straight to the Z.

EDIT: Nope. Not a chance. I'll be downloading this one at home.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=76125\"][{POST_SNAPBACK}][/a][/div]

Just let me know if it works for you.

tovarish · « **Reply #46 on:** April 22, 2005, 06:45:53 pm »

it worked for me but lot of the text had "\n"s in them.
its nice though to have it in the Z

tovarish

BarryW · « **Reply #47 on:** April 22, 2005, 10:49:14 pm »

Fixed version works great. Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??

ZDevil · « **Reply #48 on:** April 23, 2005, 08:06:18 am »

Thanks for the efforts! To me the wikipedia dump itself is quite a killing factor for getting a Z.
Got the same issue as posted by others: quite a number of links and texts become either /n or /n*. And I also find differences betweent the entries on the website and the dump.

Please keep it up!! Look forward to seeing a more improved version!

rafm · « **Reply #49 on:** May 06, 2005, 12:01:56 pm »

Quote

Fixed version works great. Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??
[div align=\"right\"][a href=\"index.php?act=findpost&pid=76492\"][{POST_SNAPBACK}][/a][/div]

Math symbols probably won't work if there are shown in Wikipedia as images.

iamasmith · « **Reply #50 on:** May 06, 2005, 12:13:27 pm »

There are quite a few blank articals (try CoventGarden) and still quite a few embedded line feeds that haven't been interpreted in the WIKIPEDIA stuff.

Every time I look at these scripts I think it's an uphill struggle because I don't know perl well enough... might have a go at something in C++.

kahm · « **Reply #51 on:** May 06, 2005, 01:02:25 pm »

Another blank article is A-10ThunderboltII. It has some on-screen corruption under the title as well.

rafm · « **Reply #52 on:** May 11, 2005, 10:10:38 am »

Does anyone have a version of wiki2bedic.pl that would work on the latest SQL dumps? My version (marked in the comments 0.9 (7.1.2004)) only goes into infinite loop when run on de.wikipedia or pl.wikipedia.

It may be good idea to put wiki2bedic.pl under the cvs of the bedic SourceForge project. I would also make some links from the zbedic home page to that file.

lucho · « **Reply #53 on:** May 11, 2005, 10:39:52 am »

I have a version that is (somewhat) working. At least it doesn't go to an infinite loop. The content of the entries is not perfect -- i see '\n', {sa} etc., but I don't have time (and free space on my laptop) to fix it.

chrisg · « **Reply #54 on:** May 12, 2005, 11:35:49 am »

I am currently working on new dumps for wikipedia (checkout http://www.crispy-cow.de/wikimedia/). Hope Rafal ("rafm") and me can work together on improving things

BarryW · « **Reply #55 on:** May 12, 2005, 12:01:07 pm »

Quote

Quote
Fixed version works great.Â Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??
[div align=\"right\"][a href=\"index.php?act=findpost&pid=76492\"][{POST_SNAPBACK}][/a][/div]

Math symbols probably won't work if there are shown in Wikipedia as images.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=78555\"][{POST_SNAPBACK}][/a][/div]

They were there with the last version, maby it's changed.

rafm · « **Reply #56 on:** May 12, 2005, 12:22:01 pm »

Quote

I have a version that is (somewhat) working. At least it doesn't go to an infinite loop. The content of the entries is not perfect -- i see '\n', {sa} etc., but I don't have time (and free space on my laptop) to fix it.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=79271\"][{POST_SNAPBACK}][/a][/div]

Could you put your version of the script to the CVS of the bedic SF project.

Thanks.

iamasmith · « **Reply #57 on:** May 12, 2005, 12:24:35 pm »

Quote

I am currently working on new dumps for wikipedia (checkout http://www.crispy-cow.de/wikimedia/). Hope Rafal ("rafm") and me can work together on improving things
[div align=\"right\"][a href=\"index.php?act=findpost&pid=79452\"][{POST_SNAPBACK}][/a][/div]

I have been looking at the quality of some of the dumps in BEDIC format, have seen some of the \n {} type artifacts, blank articals etc. and have always felt a little impotent about being to help given that I really don't have the pre-requisite perl skillsets necessary to intimately understand the scripts. I am thinking about producing something in C++ capable of doing this with extensible markup translation parsers for this project. Initially I have written an if extending ring-buffer module capable of reading articals into memory for processing using the minimum amount of RAM but accomodating some of the larger articals.

I have tested this ring buffer technique allowing it to read the current dumps which include archive articals approximately 3Mb in size (giving ~1500 x 512byte buffers in the ring).

My next step is to work on the markup translation and therefore am going to need a complete understanding of the markup used in the Wikipedia articals (this should be fairly easy - I expect that this takes the documented Wiki tags directly) and the ZBedic markup tags.

I know that the libbedic/doc directory describes the database format and tags used in the markup but I wanted first of all to check if this is up to date or if I should be pulling the markup render apart for Zbedic to determine new tags.... or if anyone has a more up to date list of markup tags could they possibly share please ?

- Andy

zuli · « **Reply #58 on:** May 28, 2005, 09:21:58 am »

There are new wikis in German and English made from Christian Geyer on his
homepage http://www.crispy-cow.de/wikimedia/

Uli

kahm · « **Reply #59 on:** May 28, 2005, 03:02:33 pm »

It doesn't look like he's got the English Wikipedia up there yet. Just the German one.

News:

Author Topic: Anyone Got A Working Wiki2bedic.pl (Read 14732 times)