Author Topic: Anyone Got A Working Wiki2bedic.pl  (Read 14731 times)

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« on: December 12, 2004, 03:11:53 pm »
Hi, I have a couple of verions of wiki2bedic.pl and neither of them convert current Wikipedia databases.

I think the latest prebuilt Wikipedia for zbedic is from around 10th July and there's been a lot of activity since.

Anyone got a working version of wiki2bedic.pl for current database format ?

- Andy
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #1 on: December 12, 2004, 05:17:21 pm »
More specifically if I run the wiki2bedic script on a current database it produces a bedic.dic file which gives 'Integrity Failure' when opened on the desktop using bedic (yep I want to move it to the Z but I guess it should also work using bedic).

Also there doesn't seem to be an index property at the beginning of the file with the version that I have.

Anyone able to point me at a working version ?
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

Teletubbie

  • Sr. Member
  • ****
  • Posts: 252
    • View Profile
    • http://
Anyone Got A Working Wiki2bedic.pl
« Reply #2 on: December 12, 2004, 05:56:58 pm »
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl
Then:
libbedic from here:
http://sourceforge.net/project/showfiles.p...ackage_id=56566

Then you have to replace dictionary.cpp
with this:
http://www.freedict.de/download/dictionary.cpp

(I heard this maybe dont works with the newest version of libbedic, but I had once success with version 0.9.1)

Then:
1) make
2) make xerox

After that you will get a binary named xerox. Then:
xerox -d wikipedia.dic wikipedianew.dic

After that you should pack the wikipedianew.dic with diczip

Please tell me if it worked, I have to do that again for the german wikipedia and deleted my environment, because another guy promised to provide the german community with actual wikipedias and the fols out there now running out of wikipedia.
Cheers,
Sam
SL-5500G
OZ 3.3.6-pre1
Opie 1.1.4

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #3 on: December 13, 2004, 04:04:46 am »
Quote
Then you have to replace dictionary.cpp
with this:
http://www.freedict.de/download/dictionary.cpp

(I heard this maybe dont works with the newest version of libbedic, but I had once success with version 0.9.1)
Could you briefly explain the changes in dictionary.cpp, so that I can add those changes to the latest version of zbedic. Thanks.
SL-C1000 w/ Cacko ROM 1.23

Teletubbie

  • Sr. Member
  • ****
  • Posts: 252
    • View Profile
    • http://
Anyone Got A Working Wiki2bedic.pl
« Reply #4 on: December 13, 2004, 06:07:28 am »
Hi,
I am not responsible for the dictionary.cpp
It comes from Horst from freedict.de and the only thing I did was translating his posts in the german community and adding some missing issues by my experiences.
I think Horst is a nice Guy and you can send him an email. I fhe dont response, I can also write an german email to him, since the zbedic developement seems important to me.
About the file itself: I think horst made/changed it to be able to convert that wikipediaformat to zbedic.
Hope this helps,
Cheers,
Sam
« Last Edit: December 13, 2004, 06:08:57 am by Teletubbie »
SL-5500G
OZ 3.3.6-pre1
Opie 1.1.4

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #5 on: December 13, 2004, 12:47:07 pm »
OK, I see xerox builds the index too when it sorts the dictionary.

Tried the versions you suggested which results in some success, however, some articles don't seem to have correctly processed format strings and have \n embedded in the text (the two characters, not processes) whilst if you start to type zaurus in bedic then the thing just hangs.

Now reverted back to the unpatched libbdic and and modified version of wiki2bedic.pl so I'll post an update with the results.

- Andy
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #6 on: December 14, 2004, 08:50:29 am »
No, older version of xerox fails with a segmentation fault before completion.
Newer version of xerox does complete, however, searching for anything in the range X,Y or Z makes bedic crash. Is there an upper limit on the index size I wonder ?, the cur, en, version of the Wikipedia database now has 425325 entries...
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #7 on: December 14, 2004, 12:18:28 pm »
OK, the unpatched version of xerox just runs and runs consuming memory. Set up my system with 3Gb of Swap and 1Gb RAM and it eventually failed with segmentation fault before the swap was depleted.

The patched version seems to run quite nicely without consuming all that memory, however, I think that the wiki2bedic.pl script is not doing all that it could. It's producing some blank articals, other articals have pretty duff formatting (I think there are now extra markup characters in the articals + \n appears quite a lot in the text).

I'm not really a perl programmer but I will take a look to see if there's as sensible way of adding the extra markup (bullet lists particularly seem to fail).

Lookups using qbedic still hang when typing in the artical name and a few articals seem to lack trimming on the artical name (they have a leading space).

So, again if anyone is maintaining this script and has a later version I could try then that would be good.

- Andy
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #8 on: March 03, 2005, 04:50:44 am »
Quote
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl

...

[div align=\"right\"][{POST_SNAPBACK}][/a][/div]

I know this is a quite old issue, but maybe somebody can help.

I would like to build a bedic file from the latest wikipedia dumps. I downloaded the latest dumps from [a href=\"http://download.wikimedia.org/]http://download.wikimedia.org/[/url], wiki2bedic.pl from http://www.freedict.de/download/wiki2bedic.pl, but on running wiki2bedic.pl, I got either:

'Cannot opendir /usr/src/packages/zaurus/wikip/1/wiki/de/ : No such file or directory'

or the script was running forever.

Is there a newer version of wiki2bedic.pl? Do I need any other software? Do I use the right Wikipedia dump files?

Thanks.
SL-C1000 w/ Cacko ROM 1.23

holck

  • Newbie
  • *
  • Posts: 34
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #9 on: March 04, 2005, 09:28:58 am »
Quote
Quote
Hi,
you are on the way.
You need this:
http://www.freedict.de/download/wiki2bedic.pl

...

[div align=\"right\"][{POST_SNAPBACK}][/a][/div]

I know this is a quite old issue, but maybe somebody can help.

I would like to build a bedic file from the latest wikipedia dumps. I downloaded the latest dumps from [a href=\"http://download.wikimedia.org/]http://download.wikimedia.org/[/url], wiki2bedic.pl from http://www.freedict.de/download/wiki2bedic.pl, but on running wiki2bedic.pl, I got either:

'Cannot opendir /usr/src/packages/zaurus/wikip/1/wiki/de/ : No such file or directory'

or the script was running forever.

Is there a newer version of wiki2bedic.pl? Do I need any other software? Do I use the right Wikipedia dump files?

Thanks.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=69223\"][{POST_SNAPBACK}][/a][/div]

You have to either changge the script or create the mentioned directory before you run the script.
SL760, tkcROM, ASUS WiFi

Cuivienor

  • Newbie
  • *
  • Posts: 4
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #10 on: March 10, 2005, 04:02:10 am »
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #11 on: March 10, 2005, 10:30:50 am »
Quote
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
[div align=\"right\"][a href=\"index.php?act=findpost&pid=69984\"][{POST_SNAPBACK}][/a][/div]

0.9.4 already contains the patch from dictionary.cpp. 28 MB seems to be too good result for the compression . Could you send me your wiki2bedic.pl script and the URL from where you downloaded Wikipedia dump, so I can take a look what goes wrong in xerox.
SL-C1000 w/ Cacko ROM 1.23

Cuivienor

  • Newbie
  • *
  • Posts: 4
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #12 on: March 10, 2005, 02:34:36 pm »
Hi,
thanks for your help :-) Yeah, 28 megs sounds like a revolutionary compression. Maybe we should patent it? arg, no, forget it
Here is where I downloaded the dump :

http://download.wikimedia.org/archives/en/...r_table.sql.bz2

The script I'm using is the following :

http://elelome.files5.free.fr/wiki2bedic.pl

Thanks soo much

Cheers
Yannick

PS : in case it's needed my e-mail is yannickd AT gmail DOT com or (for gmail haters  )
yannick.dutertre AT enst-bretagne DOT fr

anonuk

  • Full Member
  • ***
  • Posts: 176
    • View Profile
    • http://
Anyone Got A Working Wiki2bedic.pl
« Reply #13 on: March 10, 2005, 08:35:16 pm »
Quote
thanks for your help :-) Yeah, 28 megs sounds like a revolutionary compression.

I have the same problem using the wiki2bedic.pl from freedict end up with a file that is waaaay too small..
* C3100 with Cacko 1.23 and debian (pocketworkstation) - 1Gb SD / 1Gb CF / Prism Wifi
* C-860 with Cacko 1.21b/pdaXrom dualboot with 256Mb CF / 512 Mb SD / Prism Wifi
* SL-5500 with Cacko rom with 128Mb SD home on SD / 96 Mb CF

Cuivienor

  • Newbie
  • *
  • Posts: 4
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #14 on: March 11, 2005, 01:45:42 pm »
While I'm at it, is zbedic able to display pictures (meaning, do I have to bother with the images and the LaTeX things?)  ?

Thanks
Yannick