OESF Portables Forum

Everything Else => General Support and Discussion => Zaurus General Forums => Archived Forums => Software => Topic started by: iamasmith on December 12, 2004, 03:11:53 pm

Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on December 12, 2004, 03:11:53 pm: Hi, I have a couple of verions of wiki2bedic.pl and neither of them convert current Wikipedia databases.

I think the latest prebuilt Wikipedia for zbedic is from around 10th July and there's been a lot of activity since.

Anyone got a working version of wiki2bedic.pl for current database format ?

- Andy
Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on December 12, 2004, 05:17:21 pm: More specifically if I run the wiki2bedic script on a current database it produces a bedic.dic file which gives 'Integrity Failure' when opened on the desktop using bedic (yep I want to move it to the Z but I guess it should also work using bedic).

Also there doesn't seem to be an index property at the beginning of the file with the version that I have.

Anyone able to point me at a working version ?
Title: Anyone Got A Working Wiki2bedic.pl
Post by: Teletubbie on December 12, 2004, 05:56:58 pm: Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl (http://www.freedict.de/download/wiki2bedic.pl)
Then:
libbedic from here:
http://sourceforge.net/project/showfiles.p...ackage_id=56566 (http://sourceforge.net/project/showfiles.php?group_id=51673&package_id=56566)

Then you have to replace dictionary.cpp
with this:
http://www.freedict.de/download/dictionary.cpp (http://www.freedict.de/download/dictionary.cpp)

(I heard this maybe dont works with the newest version of libbedic, but I had once success with version 0.9.1)

Then:
1) make
2) make xerox

After that you will get a binary named xerox. Then:
xerox -d wikipedia.dic wikipedianew.dic

After that you should pack the wikipedianew.dic with diczip

Please tell me if it worked, I have to do that again for the german wikipedia and deleted my environment, because another guy promised to provide the german community with actual wikipedias and the fols out there now running out of wikipedia.
Cheers,
Sam
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on December 13, 2004, 04:04:46 am: Quote
Then you have to replace dictionary.cpp
with this:
http://www.freedict.de/download/dictionary.cpp (http://www.freedict.de/download/dictionary.cpp)

(I heard this maybe dont works with the newest version of libbedic, but I had once success with version 0.9.1)
Could you briefly explain the changes in dictionary.cpp, so that I can add those changes to the latest version of zbedic. Thanks.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: Teletubbie on December 13, 2004, 06:07:28 am: Hi,
I am not responsible for the dictionary.cpp
It comes from Horst from freedict.de and the only thing I did was translating his posts in the german community and adding some missing issues by my experiences.
I think Horst is a nice Guy and you can send him an email. I fhe dont response, I can also write an german email to him, since the zbedic developement seems important to me.
About the file itself: I think horst made/changed it to be able to convert that wikipediaformat to zbedic.
Hope this helps,
Cheers,
Sam
Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on December 13, 2004, 12:47:07 pm: OK, I see xerox builds the index too when it sorts the dictionary.

Tried the versions you suggested which results in some success, however, some articles don't seem to have correctly processed format strings and have \n embedded in the text (the two characters, not processes) whilst if you start to type zaurus in bedic then the thing just hangs.

Now reverted back to the unpatched libbdic and and modified version of wiki2bedic.pl so I'll post an update with the results.

- Andy
Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on December 14, 2004, 08:50:29 am: No, older version of xerox fails with a segmentation fault before completion.
Newer version of xerox does complete, however, searching for anything in the range X,Y or Z makes bedic crash. Is there an upper limit on the index size I wonder ?, the cur, en, version of the Wikipedia database now has 425325 entries...
Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on December 14, 2004, 12:18:28 pm: OK, the unpatched version of xerox just runs and runs consuming memory. Set up my system with 3Gb of Swap and 1Gb RAM and it eventually failed with segmentation fault before the swap was depleted.

The patched version seems to run quite nicely without consuming all that memory, however, I think that the wiki2bedic.pl script is not doing all that it could. It's producing some blank articals, other articals have pretty duff formatting (I think there are now extra markup characters in the articals + \n appears quite a lot in the text).

I'm not really a perl programmer but I will take a look to see if there's as sensible way of adding the extra markup (bullet lists particularly seem to fail).

Lookups using qbedic still hang when typing in the artical name and a few articals seem to lack trimming on the artical name (they have a leading space).

So, again if anyone is maintaining this script and has a later version I could try then that would be good.

- Andy
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on March 03, 2005, 04:50:44 am: Quote
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl (http://www.freedict.de/download/wiki2bedic.pl)

...

[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=56911\")

I know this is a quite old issue, but maybe somebody can help.

I would like to build a bedic file from the latest wikipedia dumps. I downloaded the latest dumps from [a href=\"http://download.wikimedia.org/]http://download.wikimedia.org/[/url], wiki2bedic.pl from http://www.freedict.de/download/wiki2bedic.pl (http://www.freedict.de/download/wiki2bedic.pl), but on running wiki2bedic.pl, I got either:

'Cannot opendir /usr/src/packages/zaurus/wikip/1/wiki/de/ : No such file or directory'

or the script was running forever.

Is there a newer version of wiki2bedic.pl? Do I need any other software? Do I use the right Wikipedia dump files?

Thanks.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: holck on March 04, 2005, 09:28:58 am: Quote
Quote
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl (http://www.freedict.de/download/wiki2bedic.pl)

...

[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=56911\")

I know this is a quite old issue, but maybe somebody can help.

I would like to build a bedic file from the latest wikipedia dumps. I downloaded the latest dumps from [a href=\"http://download.wikimedia.org/]http://download.wikimedia.org/[/url], wiki2bedic.pl from http://www.freedict.de/download/wiki2bedic.pl (http://www.freedict.de/download/wiki2bedic.pl), but on running wiki2bedic.pl, I got either:

'Cannot opendir /usr/src/packages/zaurus/wikip/1/wiki/de/ : No such file or directory'

or the script was running forever.

Is there a newer version of wiki2bedic.pl? Do I need any other software? Do I use the right Wikipedia dump files?

Thanks.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=69223\"][{POST_SNAPBACK}][/a][/div]

You have to either changge the script or create the mentioned directory before you run the script.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: Cuivienor on March 10, 2005, 04:02:10 am: Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on March 10, 2005, 10:30:50 am: Quote
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
[div align=\"right\"][a href=\"index.php?act=findpost&pid=69984\"][{POST_SNAPBACK}][/a][/div]

0.9.4 already contains the patch from dictionary.cpp. 28 MB seems to be too good result for the compression . Could you send me your wiki2bedic.pl script and the URL from where you downloaded Wikipedia dump, so I can take a look what goes wrong in xerox.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: Cuivienor on March 10, 2005, 02:34:36 pm: Hi,
thanks for your help :-) Yeah, 28 megs sounds like a revolutionary compression. Maybe we should patent it? arg, no, forget it
Here is where I downloaded the dump :

http://download.wikimedia.org/archives/en/...r_table.sql.bz2 (http://download.wikimedia.org/archives/en/20050209_cur_table.sql.bz2)

The script I'm using is the following :

http://elelome.files5.free.fr/wiki2bedic.pl (http://elelome.files5.free.fr/wiki2bedic.pl)

Thanks soo much

Cheers
Yannick

PS : in case it's needed my e-mail is yannickd AT gmail DOT com or (for gmail haters )
yannick.dutertre AT enst-bretagne DOT fr
Title: Anyone Got A Working Wiki2bedic.pl
Post by: anonuk on March 10, 2005, 08:35:16 pm: Quote
thanks for your help :-) Yeah, 28 megs sounds like a revolutionary compression.

I have the same problem using the wiki2bedic.pl from freedict end up with a file that is waaaay too small..
Title: Anyone Got A Working Wiki2bedic.pl
Post by: Cuivienor on March 11, 2005, 01:45:42 pm: While I'm at it, is zbedic able to display pictures (meaning, do I have to bother with the images and the LaTeX things?) ?

Thanks
Yannick
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on March 14, 2005, 03:52:17 am: Quote
While I'm at it, is zbedic able to display pictures (meaning, do I have to bother with the images and the LaTeX things?) ?

Thanks
Yannick
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70275\"][{POST_SNAPBACK}][/a][/div]

zbedic dictionary file cannot contain any images but since it displays html text, wikipedia articles can theoretically refer to external image files. It may work if use only absolute path to images. I have never tried it so there is no guarantee it works. And of couse you need huge storage space.

So far I haven't found time to check why xerox fails with wikipedia file, but it is on my todo list.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on March 14, 2005, 10:10:21 am: Quote
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
[div align=\"right\"][a href=\"index.php?act=findpost&pid=69984\"][{POST_SNAPBACK}][/a][/div]

Ok, I found the problem. There is an entry in wikipedia that is longer than 500000 bytes, which is the limit set by wiki2bedic.pl script. If this limit is exceeded, xerox fails without printing out any error :-( (Currently I work on the new version of xerox, which should be more informative on errors).

To fix the problem, just change the line in wiki2bedic.pl from:
print PAGE "max-entry-length=500000\n";
to
print PAGE "max-entry-length=1024000\n";

Have fun!
Title: Anyone Got A Working Wiki2bedic.pl
Post by: tovarish on March 14, 2005, 04:09:10 pm: could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
Title: Anyone Got A Working Wiki2bedic.pl
Post by: Cuivienor on March 14, 2005, 05:51:44 pm: Thanks so much rafm, I'm converting the English file of February the ninth. I hope it will work.

I've noticed something though : apparently, the new dumps (from 2005/03/09 ) are not compatible with wiki2bedic.pl, either bunzipped or not, which would imply a slight change in the sql format wikimedia uses.

Cheers
Yannick
Title: Anyone Got A Working Wiki2bedic.pl
Post by: anonuk on March 14, 2005, 06:49:40 pm: Quote
Thanks so much rafm, I'm converting the English file of February the ninth. I hope it will work.

Thanks from me too, im busy converting the exact same file :-) Its onto the xerox stage now... thanks rafm
Title: Anyone Got A Working Wiki2bedic.pl
Post by: BarryW on March 15, 2005, 12:26:44 am: Quote
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70660\"][{POST_SNAPBACK}][/a][/div]

If someone makes it I'll put it up on my site, it's a .mac site so no worrys on bandwidth!!
Title: Anyone Got A Working Wiki2bedic.pl
Post by: tovarish on March 15, 2005, 10:41:59 am: Quote
Quote
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70660\"][{POST_SNAPBACK}][/a][/div]

If someone makes it I'll put it up on my site, it's a .mac site so no worrys on bandwidth!!
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70710\"][{POST_SNAPBACK}][/a][/div]

yes I would really appreciate it, I dont have the resources (disk space and ram) to convert it myself.

tovarish
Title: Anyone Got A Working Wiki2bedic.pl
Post by: anonuk on March 15, 2005, 11:27:13 am: ive made a wikipedia.dic.dz from Febuary 9th 2005 - I needed about 5Gb of space, 2Gb for the original sql dump, 1.3 Gb for the wikibedic.dic and another 1.3Gb for the xeroxed wikipedia.dic version. Then a final 0.5Gb for the compressed version.

The file came to 1.3Gb after the xerox process. dictzip wikipedia.dic gave me a 412Mb File.

This loads into zbedic and passes the integrity check, i noticed some textual problems, but mainly the program seems to lock anytime i search for something past N in the alphabet. I have a 32Mb swap file activated, i'll experiment some more when I get back home later and see if it is actually usable.

If it works well, I dont mind uploading it somewhere - I can do it later this week from university on a very fast (hopefully) connection. I'll keep the thread posted if anyone is interested.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: BarryW on March 18, 2005, 12:09:28 pm: Do you have to download both the old and current or will just the current do? Also what are the steps for converting? Haven't really been able to find a how-to.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: kahm on March 31, 2005, 09:22:00 pm: Quote
ive made a wikipedia.dic.dz from Febuary 9th 2005 - I needed about 5Gb of space, 2Gb for the original sql dump, 1.3 Gb for the wikibedic.dic and another 1.3Gb for the xeroxed wikipedia.dic version. Then a final 0.5Gb for the compressed version.

The file came to 1.3Gb after the xerox process. dictzip wikipedia.dic gave me a 412Mb File.

This loads into zbedic and passes the integrity check, i noticed some textual problems, but mainly the program seems to lock anytime i search for something past N in the alphabet. I have a 32Mb swap file activated, i'll experiment some more when I get back home later and see if it is actually usable.

If it works well, I dont mind uploading it somewhere - I can do it later this week from university on a very fast (hopefully) connection. I'll keep the thread posted if anyone is interested.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70793\"][{POST_SNAPBACK}][/a][/div]

Did this ever make it anywhere? I'd love a current wikipedia - especially with all the free HD space in the 3000
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on April 02, 2005, 02:33:25 pm: Sooner or later I would like to put all wikipedia files for english and other languages at zbedic web page.

I will also check why zbedic locks past letter "N".
Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on April 02, 2005, 02:41:23 pm: I'm not sure its letter related.. I noticed this too with earlier version of Wikipedia.. it may have something to do with size.. or I may be wrong.

This is why I ended up running Wikipedia as a set of static web pages rather than a dictionary but it would be nice to get it sorted.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: kahm on April 02, 2005, 08:27:14 pm: I'm currently running a June 2004 Wikipedia without issue. It's addictive - I wasted a couple of hours reading random entries after I installed it.

This one seems too old to have a Zaurus entry, though. =(
Title: Anyone Got A Working Wiki2bedic.pl
Post by: Cryssli on April 03, 2005, 01:24:30 pm: Quote
I'm currently running a June 2004 Wikipedia without issue. It's addictive - I wasted a couple of hours reading random entries after I installed it.

This one seems too old to have a Zaurus entry, though. =(
[div align=\"right\"][a href=\"index.php?act=findpost&pid=73314\"][{POST_SNAPBACK}][/a][/div]

Whats the current status of wikipedia for Zaurus? How can I set it up as Noob?
All the reading throughout the Internet didnÂ´t helped me. :-(
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on April 12, 2005, 03:55:00 am: Quote
Whats the current status of wikipedia for Zaurus? How can I set it up as Noob?
All the reading throughout the Internet didnÂ´t helped me. :-(
[div align=\"right\"][a href=\"index.php?act=findpost&pid=73378\"][{POST_SNAPBACK}][/a][/div]

Update: I found and fixed the problem in zbedic with the latest Wikipedia dump (the file size caused overflow in some arithmetic operations). The fix will be included in the upcoming 0.9.5 release. The latest Wikipedia for zbedic will probably (if quota allows) be available at zbedic home page.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: tovarish on April 12, 2005, 02:58:29 pm: nice
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on April 17, 2005, 12:01:23 pm: ZBEDic 0.9.5 with the lock-at-huge-file bug fixed is available at zbedic home page (http://bedic.sf.net/). A newer english Wikipedia file can be downloaded from the SourceForge project page as well.

Have fun!
Title: Anyone Got A Working Wiki2bedic.pl
Post by: ken on April 17, 2005, 03:22:33 pm: I'm glad you've been updating this app, it's one of the highlights of having my Z. For those wanting the newer wiki, be aware that it's grown a little .... from 191M to a whopping 412M
Title: Anyone Got A Working Wiki2bedic.pl
Post by: kahm on April 17, 2005, 03:42:56 pm: Quote
I'm glad you've been updating this app, it's one of the highlights of having my Z. For those wanting the newer wiki, be aware that it's grown a little .... from 191M to a whopping 412M
[div align=\"right\"][a href=\"index.php?act=findpost&pid=75660\"][{POST_SNAPBACK}][/a][/div]

That's what gigabyte+ flash cards and C3000's were made for.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: BarryW on April 17, 2005, 06:15:01 pm: I've been downloading it for the past 4 hours!! 2 left. Good thing I have a 1Gb sd and 1Gb cf card!
Title: Anyone Got A Working Wiki2bedic.pl
Post by: kahm on April 17, 2005, 07:31:08 pm: Quote
I've been downloading it for the past 4 hours!! 2 left. Good thing I have a 1Gb sd and 1Gb cf card!
[div align=\"right\"][a href=\"index.php?act=findpost&pid=75691\"][{POST_SNAPBACK}][/a][/div]

Most of the North American sourceforge mirrors don't seem to have the file yet. I had to grab it from somewhere in Europe at ~40kb/s.

I haven't installed it yet, as I'm currently mucking with everything, so I don't have a stable platform to bother copying a 420mb file to. Definitely looking forward to it though!
Title: Anyone Got A Working Wiki2bedic.pl
Post by: xjqian on April 17, 2005, 08:05:37 pm: Quote
Quote
I've been downloading it for the past 4 hours!! 2 left. Good thing I have a 1Gb sd and 1Gb cf card!
[div align=\"right\"][a href=\"index.php?act=findpost&pid=75691\"][{POST_SNAPBACK}][/a][/div]

Most of the North American sourceforge mirrors don't seem to have the file yet. I had to grab it from somewhere in Europe at ~40kb/s.

I haven't installed it yet, as I'm currently mucking with everything, so I don't have a stable platform to bother copying a 420mb file to. Definitely looking forward to it though!
[div align=\"right\"][a href=\"index.php?act=findpost&pid=75695\"][{POST_SNAPBACK}][/a][/div]
the one from Paris, France is not bad. keeping steady for me at ~150kb/s right now.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: tovarish on April 17, 2005, 08:20:54 pm: i cannot use the wikipedia dictionary. It shows status: entry too long and tge checkbox is grayed.

tovarish
Title: Anyone Got A Working Wiki2bedic.pl
Post by: BarryW on April 17, 2005, 09:16:56 pm: Same here. Crap, I already deleted the old one....
Title: Anyone Got A Working Wiki2bedic.pl
Post by: ken on April 17, 2005, 11:07:28 pm: Quote
Same here. Crap, I already deleted the old one....
[div align=\"right\"][a href=\"index.php?act=findpost&pid=75710\"][{POST_SNAPBACK}][/a][/div]

probably have to install the newer zbedic too
Title: Anyone Got A Working Wiki2bedic.pl
Post by: BarryW on April 18, 2005, 01:55:22 am: Nope had teh new version already installed. Tried redownloading the new wikipedia, twice. Same result, file too big. Am redownloading the old one now.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: tovarish on April 18, 2005, 05:19:48 am: anyone got the new wikipedia to work?
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on April 18, 2005, 05:38:12 am: Quote
anyone got the new wikipedia to work?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=75753\"][{POST_SNAPBACK}][/a][/div]

It might be that I uploaded a broken file to the SF. I will check it today. Meanwhile I have hiden the file so people don't waste bandwidth and time. Sorry.

The lastest Wikipedia requires ZBEDic 0.9.5.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on April 20, 2005, 04:02:45 am: Ok, the file with the english wikipedia at sf.net should be correct now. The previous one was actually damaged.

I uploaded the corrected file, downloaded and checked it. Everything seemed to work.

File: en-wikipedia_0.9.5_20050209.dic.dz
Size: 432311254 bytes
MD5SUM: 3216ee0a94009621522f42d5a5e6b48e

You need zbedic 0.9.5 to use that wikipedia file!
Title: Anyone Got A Working Wiki2bedic.pl
Post by: kahm on April 20, 2005, 10:55:09 am: Quote
Ok, the file with the english wikipedia at sf.net should be correct now. The previous one was actually damaged.

I uploaded the corrected file, downloaded and checked it. Everything seemed to work.

File: en-wikipedia_0.9.5_20050209.dic.dz
Size: 432311254 bytes
MD5SUM: 3216ee0a94009621522f42d5a5e6b48e

You need zbedic 0.9.5 to use that wikipedia file!
[div align=\"right\"][a href=\"index.php?act=findpost&pid=76077\"][{POST_SNAPBACK}][/a][/div]

I'll give it a go today. Hopefully the wireless is up to it - I'll be downloading straight to the Z.

EDIT: Nope. Not a chance. I'll be downloading this one at home.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on April 22, 2005, 08:04:22 am: Quote
I'll give it a go today. Hopefully the wireless is up to it - I'll be downloading straight to the Z.

EDIT: Nope. Not a chance. I'll be downloading this one at home.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=76125\"][{POST_SNAPBACK}][/a][/div]

Just let me know if it works for you.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: tovarish on April 22, 2005, 06:45:53 pm: it worked for me but lot of the text had "\n"s in them.
its nice though to have it in the Z

tovarish
Title: Anyone Got A Working Wiki2bedic.pl
Post by: BarryW on April 22, 2005, 10:49:14 pm: Fixed version works great. Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??
Title: Anyone Got A Working Wiki2bedic.pl
Post by: ZDevil on April 23, 2005, 08:06:18 am: Thanks for the efforts! To me the wikipedia dump itself is quite a killing factor for getting a Z.
Got the same issue as posted by others: quite a number of links and texts become either /n or /n*. And I also find differences betweent the entries on the website and the dump.

Please keep it up!! Look forward to seeing a more improved version!
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on May 06, 2005, 12:01:56 pm: Quote
Fixed version works great. Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??
[div align=\"right\"][a href=\"index.php?act=findpost&pid=76492\"][{POST_SNAPBACK}][/a][/div]

Math symbols probably won't work if there are shown in Wikipedia as images.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on May 06, 2005, 12:13:27 pm: There are quite a few blank articals (try CoventGarden) and still quite a few embedded line feeds that haven't been interpreted in the WIKIPEDIA stuff.

Every time I look at these scripts I think it's an uphill struggle because I don't know perl well enough... might have a go at something in C++.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: kahm on May 06, 2005, 01:02:25 pm: Another blank article is A-10ThunderboltII. It has some on-screen corruption under the title as well.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on May 11, 2005, 10:10:38 am: Does anyone have a version of wiki2bedic.pl that would work on the latest SQL dumps? My version (marked in the comments 0.9 (7.1.2004)) only goes into infinite loop when run on de.wikipedia or pl.wikipedia.

It may be good idea to put wiki2bedic.pl under the cvs of the bedic SourceForge project. I would also make some links from the zbedic home page to that file.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: lucho on May 11, 2005, 10:39:52 am: I have a version that is (somewhat) working. At least it doesn't go to an infinite loop. The content of the entries is not perfect -- i see '\n', {sa} etc., but I don't have time (and free space on my laptop) to fix it.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: chrisg on May 12, 2005, 11:35:49 am: I am currently working on new dumps for wikipedia (checkout http://www.crispy-cow.de/wikimedia/ (http://www.crispy-cow.de/wikimedia/)). Hope Rafal ("rafm") and me can work together on improving things
Title: Anyone Got A Working Wiki2bedic.pl
Post by: BarryW on May 12, 2005, 12:01:07 pm: Quote
Quote
Fixed version works great.Â Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??
[div align=\"right\"][a href=\"index.php?act=findpost&pid=76492\"][{POST_SNAPBACK}][/a][/div]

Math symbols probably won't work if there are shown in Wikipedia as images.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=78555\"][{POST_SNAPBACK}][/a][/div]

They were there with the last version, maby it's changed.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on May 12, 2005, 12:22:01 pm: Quote
I have a version that is (somewhat) working. At least it doesn't go to an infinite loop. The content of the entries is not perfect -- i see '\n', {sa} etc., but I don't have time (and free space on my laptop) to fix it.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=79271\"][{POST_SNAPBACK}][/a][/div]

Could you put your version of the script to the CVS of the bedic SF project.

Thanks.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: iamasmith on May 12, 2005, 12:24:35 pm: Quote
I am currently working on new dumps for wikipedia (checkout http://www.crispy-cow.de/wikimedia/ (http://www.crispy-cow.de/wikimedia/)). Hope Rafal ("rafm") and me can work together on improving things
[div align=\"right\"][a href=\"index.php?act=findpost&pid=79452\"][{POST_SNAPBACK}][/a][/div]

I have been looking at the quality of some of the dumps in BEDIC format, have seen some of the \n {} type artifacts, blank articals etc. and have always felt a little impotent about being to help given that I really don't have the pre-requisite perl skillsets necessary to intimately understand the scripts. I am thinking about producing something in C++ capable of doing this with extensible markup translation parsers for this project. Initially I have written an if extending ring-buffer module capable of reading articals into memory for processing using the minimum amount of RAM but accomodating some of the larger articals.

I have tested this ring buffer technique allowing it to read the current dumps which include archive articals approximately 3Mb in size (giving ~1500 x 512byte buffers in the ring).

My next step is to work on the markup translation and therefore am going to need a complete understanding of the markup used in the Wikipedia articals (this should be fairly easy - I expect that this takes the documented Wiki tags directly) and the ZBedic markup tags.

I know that the libbedic/doc directory describes the database format and tags used in the markup but I wanted first of all to check if this is up to date or if I should be pulling the markup render apart for Zbedic to determine new tags.... or if anyone has a more up to date list of markup tags could they possibly share please ?

- Andy
Title: Anyone Got A Working Wiki2bedic.pl
Post by: zuli on May 28, 2005, 09:21:58 am: There are new wikis in German and English made from Christian Geyer on his
homepage http://www.crispy-cow.de/wikimedia/ (http://www.crispy-cow.de/wikimedia/)

Uli
Title: Anyone Got A Working Wiki2bedic.pl
Post by: kahm on May 28, 2005, 03:02:33 pm: It doesn't look like he's got the English Wikipedia up there yet. Just the German one.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: chrisg on May 29, 2005, 03:20:35 am: Quote
It doesn't look like he's got the English Wikipedia up there yet. Just the German one.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=81949\"][{POST_SNAPBACK}][/a][/div]

Currently working on this. I will announce it, when I am finished.
Title: Anyone Got A Working Wiki2bedic.pl
Post by: rafm on May 30, 2005, 04:12:01 am: Quote
I know that the libbedic/doc directory describes the database format and tags used in the markup but I wanted first of all to check if this is up to date or if I should be pulling the markup render apart for Zbedic to determine new tags.... or if anyone has a more up to date list of markup tags could they possibly share please ?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=79469\"][{POST_SNAPBACK}][/a][/div]

libbedic/doc/bedic-format.txt has an up to date specification of all the tags (all that is available in zbedic 0.9.5). However, I currently work on an extended set of tags. Send me your email as PM, so I can send you a draft of a new specification.

Additionally, zbedic can display most of the HTML tags (everything what QTextBrowser can show), so converting Wikipedia articles to zbedic format should not be so difficult.

To consolidate the work on wikipedia -> zbedic converters, I would strongly suggest developing such converters under the CVS of the bedic SF project (or perhaps as a new SF project). There seems to be quite a few people interested in developing such software, but there is no much coordination and the sources are not published, so most of the effort is unfortunately lost. If anyone is interested in developing such a converter under bedic CVS (sources should be under GPL), please send me a PM.