Help - Search - Members - Calendar
Full Version: Anyone Got A Working Wiki2bedic.pl
OESF Forums > General Forums > General Support and Discussion > Software
Pages: 1, 2
iamasmith
Hi, I have a couple of verions of wiki2bedic.pl and neither of them convert current Wikipedia databases.

I think the latest prebuilt Wikipedia for zbedic is from around 10th July and there's been a lot of activity since.

Anyone got a working version of wiki2bedic.pl for current database format ?

- Andy
iamasmith
More specifically if I run the wiki2bedic script on a current database it produces a bedic.dic file which gives 'Integrity Failure' when opened on the desktop using bedic (yep I want to move it to the Z but I guess it should also work using bedic).

Also there doesn't seem to be an index property at the beginning of the file with the version that I have.

Anyone able to point me at a working version ?
Teletubbie
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl
Then:
libbedic from here:
http://sourceforge.net/project/showfiles.p...ackage_id=56566

Then you have to replace dictionary.cpp
with this:
http://www.freedict.de/download/dictionary.cpp

(I heard this maybe dont works with the newest version of libbedic, but I had once success with version 0.9.1)

Then:
1) make
2) make xerox

After that you will get a binary named xerox. Then:
xerox -d wikipedia.dic wikipedianew.dic

After that you should pack the wikipedianew.dic with diczip

Please tell me if it worked, I have to do that again for the german wikipedia and deleted my environment, because another guy promised to provide the german community with actual wikipedias and the fols out there now running out of wikipedia.
Cheers,
Sam
rafm
QUOTE(Teletubbie @ Dec 12 2004, 10:56 PM)
Then you have to replace dictionary.cpp
with this:
http://www.freedict.de/download/dictionary.cpp

(I heard this maybe dont works with the newest version of libbedic, but I had once success with version 0.9.1)

Could you briefly explain the changes in dictionary.cpp, so that I can add those changes to the latest version of zbedic. Thanks.
Teletubbie
Hi,
I am not responsible for the dictionary.cpp
It comes from Horst from freedict.de and the only thing I did was translating his posts in the german community and adding some missing issues by my experiences.
I think Horst is a nice Guy and you can send him an email. I fhe dont response, I can also write an german email to him, since the zbedic developement seems important to me.
About the file itself: I think horst made/changed it to be able to convert that wikipediaformat to zbedic.
Hope this helps,
Cheers,
Sam
iamasmith
OK, I see xerox builds the index too when it sorts the dictionary.

Tried the versions you suggested which results in some success, however, some articles don't seem to have correctly processed format strings and have \n embedded in the text (the two characters, not processes) whilst if you start to type zaurus in bedic then the thing just hangs.

Now reverted back to the unpatched libbdic and and modified version of wiki2bedic.pl so I'll post an update with the results.

- Andy
iamasmith
No, older version of xerox fails with a segmentation fault before completion.
Newer version of xerox does complete, however, searching for anything in the range X,Y or Z makes bedic crash. Is there an upper limit on the index size I wonder ?, the cur, en, version of the Wikipedia database now has 425325 entries...
iamasmith
OK, the unpatched version of xerox just runs and runs consuming memory. Set up my system with 3Gb of Swap and 1Gb RAM and it eventually failed with segmentation fault before the swap was depleted.

The patched version seems to run quite nicely without consuming all that memory, however, I think that the wiki2bedic.pl script is not doing all that it could. It's producing some blank articals, other articals have pretty duff formatting (I think there are now extra markup characters in the articals + \n appears quite a lot in the text).

I'm not really a perl programmer but I will take a look to see if there's as sensible way of adding the extra markup (bullet lists particularly seem to fail).

Lookups using qbedic still hang when typing in the artical name and a few articals seem to lack trimming on the artical name (they have a leading space).

So, again if anyone is maintaining this script and has a later version I could try then that would be good.

- Andy
rafm
QUOTE(Teletubbie @ Dec 12 2004, 10:56 PM)
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl

...

*


I know this is a quite old issue, but maybe somebody can help.

I would like to build a bedic file from the latest wikipedia dumps. I downloaded the latest dumps from http://download.wikimedia.org/, wiki2bedic.pl from http://www.freedict.de/download/wiki2bedic.pl, but on running wiki2bedic.pl, I got either:

'Cannot opendir /usr/src/packages/zaurus/wikip/1/wiki/de/ : No such file or directory'

or the script was running forever.

Is there a newer version of wiki2bedic.pl? Do I need any other software? Do I use the right Wikipedia dump files?

Thanks.
holck
QUOTE(rafm @ Mar 3 2005, 01:50 AM)
QUOTE(Teletubbie @ Dec 12 2004, 10:56 PM)
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl

...

*


I know this is a quite old issue, but maybe somebody can help.

I would like to build a bedic file from the latest wikipedia dumps. I downloaded the latest dumps from http://download.wikimedia.org/, wiki2bedic.pl from http://www.freedict.de/download/wiki2bedic.pl, but on running wiki2bedic.pl, I got either:

'Cannot opendir /usr/src/packages/zaurus/wikip/1/wiki/de/ : No such file or directory'

or the script was running forever.

Is there a newer version of wiki2bedic.pl? Do I need any other software? Do I use the right Wikipedia dump files?

Thanks.
*



You have to either changge the script or create the mentioned directory before you run the script.
Cuivienor
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
rafm
QUOTE(Cuivienor @ Mar 10 2005, 09:02 AM)
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
*


0.9.4 already contains the patch from dictionary.cpp. 28 MB seems to be too good result for the compression smile.gif. Could you send me your wiki2bedic.pl script and the URL from where you downloaded Wikipedia dump, so I can take a look what goes wrong in xerox.
Cuivienor
Hi,
thanks for your help :-) Yeah, 28 megs sounds like a revolutionary compression. Maybe we should patent it? arg, no, forget it wink.gif
Here is where I downloaded the dump :

http://download.wikimedia.org/archives/en/...r_table.sql.bz2

The script I'm using is the following :

http://elelome.files5.free.fr/wiki2bedic.pl

Thanks soo much smile.gif

Cheers
Yannick

PS : in case it's needed my e-mail is yannickd AT gmail DOT com or (for gmail haters wink.gif )
yannick.dutertre AT enst-bretagne DOT fr
anonuk
QUOTE(Cuivienor @ Mar 10 2005, 11:34 AM)
thanks for your help :-) Yeah, 28 megs sounds like a revolutionary compression.


I have the same problem using the wiki2bedic.pl from freedict end up with a file that is waaaay too small..
Cuivienor
While I'm at it, is zbedic able to display pictures (meaning, do I have to bother with the images and the LaTeX things?) ?

Thanks
Yannick
rafm
QUOTE(Cuivienor @ Mar 11 2005, 07:45 PM)
While I'm at it, is zbedic able to display pictures (meaning, do I have to bother with the images and the LaTeX things?)  ?

Thanks
Yannick
*


zbedic dictionary file cannot contain any images but since it displays html text, wikipedia articles can theoretically refer to external image files. It may work if use only absolute path to images. I have never tried it so there is no guarantee it works. And of couse you need huge storage space.

So far I haven't found time to check why xerox fails with wikipedia file, but it is on my todo list.
rafm
QUOTE(Cuivienor @ Mar 10 2005, 10:02 AM)
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
*


Ok, I found the problem. There is an entry in wikipedia that is longer than 500000 bytes, which is the limit set by wiki2bedic.pl script. If this limit is exceeded, xerox fails without printing out any error :-( (Currently I work on the new version of xerox, which should be more informative on errors).

To fix the problem, just change the line in wiki2bedic.pl from:
print PAGE "max-entry-length=500000\n";
to
print PAGE "max-entry-length=1024000\n";

Have fun!
tovarish
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
Cuivienor
Thanks so much rafm, I'm converting the English file of February the ninth. I hope it will work.

I've noticed something though : apparently, the new dumps (from 2005/03/09 ) are not compatible with wiki2bedic.pl, either bunzipped or not, which would imply a slight change in the sql format wikimedia uses.

Cheers
Yannick
anonuk
QUOTE(Cuivienor @ Mar 14 2005, 02:51 PM)
Thanks so much rafm, I'm converting the English file of February the ninth. I hope it will work.


Thanks from me too, im busy converting the exact same file :-) Its onto the xerox stage now... thanks rafm
BarryW
QUOTE(tovarish @ Mar 14 2005, 12:09 PM)
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
*


If someone makes it I'll put it up on my site, it's a .mac site so no worrys on bandwidth!! wink.gif
tovarish
QUOTE(BarryW @ Mar 15 2005, 06:26 AM)
QUOTE(tovarish @ Mar 14 2005, 12:09 PM)
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
*


If someone makes it I'll put it up on my site, it's a .mac site so no worrys on bandwidth!! wink.gif
*



yes I would really appreciate it, I dont have the resources (disk space and ram) to convert it myself.

tovarish
anonuk
ive made a wikipedia.dic.dz from Febuary 9th 2005 - I needed about 5Gb of space, 2Gb for the original sql dump, 1.3 Gb for the wikibedic.dic and another 1.3Gb for the xeroxed wikipedia.dic version. Then a final 0.5Gb for the compressed version.

The file came to 1.3Gb after the xerox process. dictzip wikipedia.dic gave me a 412Mb File.

This loads into zbedic and passes the integrity check, i noticed some textual problems, but mainly the program seems to lock anytime i search for something past N in the alphabet. I have a 32Mb swap file activated, i'll experiment some more when I get back home later and see if it is actually usable.

If it works well, I dont mind uploading it somewhere - I can do it later this week from university on a very fast (hopefully) connection. I'll keep the thread posted if anyone is interested.
BarryW
Do you have to download both the old and current or will just the current do? Also what are the steps for converting? Haven't really been able to find a how-to.
kahm
QUOTE(anonuk @ Mar 15 2005, 04:27 PM)
ive made a wikipedia.dic.dz from Febuary 9th 2005 - I needed about 5Gb of space, 2Gb for the original sql dump, 1.3 Gb for the wikibedic.dic and another 1.3Gb for the xeroxed wikipedia.dic version. Then a final 0.5Gb for the compressed version.

The file came to 1.3Gb after the xerox process. dictzip wikipedia.dic gave me a 412Mb File.

This loads into zbedic and passes the integrity check, i noticed some textual problems, but mainly the program seems to lock anytime i search for something past N in the alphabet. I have a 32Mb swap file activated, i'll experiment some more when I get back home later and see if it is actually usable.

If it works well, I dont mind uploading it somewhere - I can do it later this week from university on a very fast (hopefully) connection. I'll keep the thread posted if anyone is interested.
*


Did this ever make it anywhere? I'd love a current wikipedia - especially with all the free HD space in the 3000 smile.gif
rafm
Sooner or later I would like to put all wikipedia files for english and other languages at zbedic web page.

I will also check why zbedic locks past letter "N".
iamasmith
I'm not sure its letter related.. I noticed this too with earlier version of Wikipedia.. it may have something to do with size.. or I may be wrong.

This is why I ended up running Wikipedia as a set of static web pages rather than a dictionary but it would be nice to get it sorted.
kahm
I'm currently running a June 2004 Wikipedia without issue. It's addictive - I wasted a couple of hours reading random entries after I installed it.

This one seems too old to have a Zaurus entry, though. =(
Cryssli
QUOTE(kahm @ Apr 3 2005, 01:27 AM)
I'm currently running a June 2004 Wikipedia without issue. It's addictive - I wasted a couple of hours reading random entries after I installed it.

This one seems too old to have a Zaurus entry, though. =(
*



Whats the current status of wikipedia for Zaurus? How can I set it up as Noob?
All the reading throughout the Internet didn´t helped me. :-(
rafm
QUOTE(Cryssli @ Apr 3 2005, 06:24 PM)
Whats the current status of wikipedia for Zaurus? How can I set it up as Noob?
All the reading throughout the Internet didn´t  helped me. :-(
*


Update: I found and fixed the problem in zbedic with the latest Wikipedia dump (the file size caused overflow in some arithmetic operations). The fix will be included in the upcoming 0.9.5 release. The latest Wikipedia for zbedic will probably (if quota allows) be available at zbedic home page.
tovarish
nice smile.gif
rafm
ZBEDic 0.9.5 with the lock-at-huge-file bug fixed is available at zbedic home page. A newer english Wikipedia file can be downloaded from the SourceForge project page as well.

Have fun!
ken
I'm glad you've been updating this app, it's one of the highlights of having my Z. For those wanting the newer wiki, be aware that it's grown a little .... from 191M to a whopping 412M
kahm
QUOTE(ken @ Apr 17 2005, 07:22 PM)
I'm glad you've been updating this app, it's one of the highlights of having my Z.  For those wanting the newer wiki, be aware that it's grown a little .... from 191M to a whopping 412M
*


That's what gigabyte+ flash cards and C3000's were made for. happy.gif
BarryW
I've been downloading it for the past 4 hours!! 2 left. Good thing I have a 1Gb sd and 1Gb cf card!
kahm
QUOTE(BarryW @ Apr 17 2005, 10:15 PM)
I've been downloading it for the past 4 hours!!  2 left.  Good thing I have a 1Gb sd and 1Gb cf card!
*


Most of the North American sourceforge mirrors don't seem to have the file yet. I had to grab it from somewhere in Europe at ~40kb/s.

I haven't installed it yet, as I'm currently mucking with everything, so I don't have a stable platform to bother copying a 420mb file to. Definitely looking forward to it though! smile.gif
xjqian
QUOTE(kahm @ Apr 17 2005, 06:31 PM)
QUOTE(BarryW @ Apr 17 2005, 10:15 PM)
I've been downloading it for the past 4 hours!!  2 left.  Good thing I have a 1Gb sd and 1Gb cf card!
*


Most of the North American sourceforge mirrors don't seem to have the file yet. I had to grab it from somewhere in Europe at ~40kb/s.

I haven't installed it yet, as I'm currently mucking with everything, so I don't have a stable platform to bother copying a 420mb file to. Definitely looking forward to it though! smile.gif
*


the one from Paris, France is not bad. keeping steady for me at ~150kb/s right now.
tovarish
i cannot use the wikipedia dictionary. It shows status: entry too long and tge checkbox is grayed.

tovarish
BarryW
Same here. Crap, I already deleted the old one.... mad.gif
ken
QUOTE(BarryW @ Apr 17 2005, 03:16 PM)
Same here.  Crap, I already deleted the old one....  mad.gif
*


probably have to install the newer zbedic too
BarryW
Nope had teh new version already installed. Tried redownloading the new wikipedia, twice. Same result, file too big. Am redownloading the old one now.
tovarish
anyone got the new wikipedia to work?
rafm
QUOTE(tovarish @ Apr 18 2005, 10:19 AM)
anyone got the new wikipedia to work?
*


It might be that I uploaded a broken file to the SF. I will check it today. Meanwhile I have hiden the file so people don't waste bandwidth and time. Sorry.

The lastest Wikipedia requires ZBEDic 0.9.5.
rafm
Ok, the file with the english wikipedia at sf.net should be correct now. The previous one was actually damaged.

I uploaded the corrected file, downloaded and checked it. Everything seemed to work.

File: en-wikipedia_0.9.5_20050209.dic.dz
Size: 432311254 bytes
MD5SUM: 3216ee0a94009621522f42d5a5e6b48e

You need zbedic 0.9.5 to use that wikipedia file!
kahm
QUOTE(rafm @ Apr 20 2005, 08:02 AM)
Ok, the file with the english wikipedia at sf.net should be correct now. The previous one was actually damaged.

I uploaded the corrected file, downloaded and checked it. Everything seemed to work.

File: en-wikipedia_0.9.5_20050209.dic.dz
Size: 432311254 bytes
MD5SUM: 3216ee0a94009621522f42d5a5e6b48e

You need zbedic 0.9.5 to use that wikipedia file!
*


I'll give it a go today. Hopefully the wireless is up to it - I'll be downloading straight to the Z.

EDIT: Nope. Not a chance. I'll be downloading this one at home. sad.gif
rafm
QUOTE(kahm @ Apr 20 2005, 03:55 PM)
I'll give it a go today. Hopefully the wireless is up to it - I'll be downloading straight to the Z.

EDIT: Nope. Not a chance. I'll be downloading this one at home. sad.gif
*


Just let me know if it works for you.
tovarish
it worked for me but lot of the text had "\n"s in them.
its nice though to have it in the Z

tovarish
BarryW
Fixed version works great. Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??
ZDevil
Thanks for the efforts! To me the wikipedia dump itself is quite a killing factor for getting a Z.
Got the same issue as posted by others: quite a number of links and texts become either /n or /n*. And I also find differences betweent the entries on the website and the dump.

Please keep it up!! Look forward to seeing a more improved version!
rafm
QUOTE(BarryW @ Apr 23 2005, 03:49 AM)
Fixed version works great.  Quick question though, which set of fonts has all the cool extra's like the pi symbol and stuff like that??
*


Math symbols probably won't work if there are shown in Wikipedia as images.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2014 Invision Power Services, Inc.