Author Topic: Anyone Got A Working Wiki2bedic.pl  (Read 14764 times)

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #15 on: March 14, 2005, 03:52:17 am »
Quote
While I'm at it, is zbedic able to display pictures (meaning, do I have to bother with the images and the LaTeX things?)  ?

Thanks
Yannick
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70275\"][{POST_SNAPBACK}][/a][/div]

zbedic dictionary file cannot contain any images but since it displays html text, wikipedia articles can theoretically refer to external image files. It may work if use only absolute path to images. I have never tried it so there is no guarantee it works. And of couse you need huge storage space.

So far I haven't found time to check why xerox fails with wikipedia file, but it is on my todo list.
SL-C1000 w/ Cacko ROM 1.23

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #16 on: March 14, 2005, 10:10:21 am »
Quote
Hi,
I've used the wiki2bedic.pl script (it took 10 hours on the English Wikipedia) and I get a bedic.dic file of 1.3 GB
When I do xerox -d bedic.dic bedic2.dic it works, no errors...
But the resulting file is about 28 MB large.
I'm using the libbbedic version 0.91 with the dictionary.cpp patch.
The 0.94 doesn't compile with the dictionary.cpp patch, and unpatched it will work with the same 28MB file as before...

Any suggestions?

Cheers
Yannick
[div align=\"right\"][a href=\"index.php?act=findpost&pid=69984\"][{POST_SNAPBACK}][/a][/div]

Ok, I found the problem. There is an entry in wikipedia that is longer than 500000 bytes, which is the limit set by wiki2bedic.pl script. If this limit is exceeded, xerox fails without printing out any error :-( (Currently I  work on the new version of xerox, which should be more informative on errors).

To fix the problem, just change the line in wiki2bedic.pl from:
  print PAGE "max-entry-length=500000\n";
to
   print PAGE "max-entry-length=1024000\n";

Have fun!
SL-C1000 w/ Cacko ROM 1.23

tovarish

  • Sr. Member
  • ****
  • Posts: 297
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #17 on: March 14, 2005, 04:09:10 pm »
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish

Cuivienor

  • Newbie
  • *
  • Posts: 4
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #18 on: March 14, 2005, 05:51:44 pm »
Thanks so much rafm, I'm converting the English file of February the ninth. I hope it will work.

I've noticed something though : apparently, the new dumps (from 2005/03/09 ) are not compatible with wiki2bedic.pl, either bunzipped or not, which would imply a slight change in the sql format wikimedia uses.

Cheers
Yannick

anonuk

  • Full Member
  • ***
  • Posts: 176
    • View Profile
    • http://
Anyone Got A Working Wiki2bedic.pl
« Reply #19 on: March 14, 2005, 06:49:40 pm »
Quote
Thanks so much rafm, I'm converting the English file of February the ninth. I hope it will work.

Thanks from me too, im busy converting the exact same file :-) Its onto the xerox stage now... thanks rafm
* C3100 with Cacko 1.23 and debian (pocketworkstation) - 1Gb SD / 1Gb CF / Prism Wifi
* C-860 with Cacko 1.21b/pdaXrom dualboot with 256Mb CF / 512 Mb SD / Prism Wifi
* SL-5500 with Cacko rom with 128Mb SD home on SD / 96 Mb CF

BarryW

  • Hero Member
  • *****
  • Posts: 690
    • View Profile
    • http://
Anyone Got A Working Wiki2bedic.pl
« Reply #20 on: March 15, 2005, 12:26:44 am »
Quote
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70660\"][{POST_SNAPBACK}][/a][/div]
 
 
If someone makes it I'll put it up on my site, it's a .mac site so no worrys on bandwidth!!  
What's this button do??

C3100
Distro changes almost weekly...

C3200
Distro also changes almost weekly...  :)

Hardware hacks and stuff.

tovarish

  • Sr. Member
  • ****
  • Posts: 297
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #21 on: March 15, 2005, 10:41:59 am »
Quote
Quote
could someone host the english wikipedia dic file somewhere. the one in zbedic's site is a bit outdated.

tovarish
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70660\"][{POST_SNAPBACK}][/a][/div]

If someone makes it I'll put it up on my site, it's a .mac site so no worrys on bandwidth!!  
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70710\"][{POST_SNAPBACK}][/a][/div]

yes I would really appreciate it, I dont have the resources (disk space and ram) to convert it myself.

tovarish

anonuk

  • Full Member
  • ***
  • Posts: 176
    • View Profile
    • http://
Anyone Got A Working Wiki2bedic.pl
« Reply #22 on: March 15, 2005, 11:27:13 am »
ive made a wikipedia.dic.dz from Febuary 9th 2005 - I needed about 5Gb of space, 2Gb for the original sql dump, 1.3 Gb for the wikibedic.dic and another 1.3Gb for the xeroxed wikipedia.dic version. Then a final 0.5Gb for the compressed version.

The file came to 1.3Gb after the xerox process. dictzip wikipedia.dic gave me a 412Mb File.

This loads into zbedic and passes the integrity check, i noticed some textual problems, but mainly the program seems to lock anytime i search for something past N in the alphabet. I have a 32Mb swap file activated, i'll experiment some more when I get back home later and see if it is actually usable.

If it works well, I dont mind uploading it somewhere - I can do it later this week from university on a very fast (hopefully) connection. I'll keep the thread posted if anyone is interested.
* C3100 with Cacko 1.23 and debian (pocketworkstation) - 1Gb SD / 1Gb CF / Prism Wifi
* C-860 with Cacko 1.21b/pdaXrom dualboot with 256Mb CF / 512 Mb SD / Prism Wifi
* SL-5500 with Cacko rom with 128Mb SD home on SD / 96 Mb CF

BarryW

  • Hero Member
  • *****
  • Posts: 690
    • View Profile
    • http://
Anyone Got A Working Wiki2bedic.pl
« Reply #23 on: March 18, 2005, 12:09:28 pm »
Do you have to download both the old and current or will just the current do?  Also what are the steps for converting?  Haven't really been able to find a how-to.
What's this button do??

C3100
Distro changes almost weekly...

C3200
Distro also changes almost weekly...  :)

Hardware hacks and stuff.

kahm

  • Hero Member
  • *****
  • Posts: 657
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #24 on: March 31, 2005, 09:22:00 pm »
Quote
ive made a wikipedia.dic.dz from Febuary 9th 2005 - I needed about 5Gb of space, 2Gb for the original sql dump, 1.3 Gb for the wikibedic.dic and another 1.3Gb for the xeroxed wikipedia.dic version. Then a final 0.5Gb for the compressed version.

The file came to 1.3Gb after the xerox process. dictzip wikipedia.dic gave me a 412Mb File.

This loads into zbedic and passes the integrity check, i noticed some textual problems, but mainly the program seems to lock anytime i search for something past N in the alphabet. I have a 32Mb swap file activated, i'll experiment some more when I get back home later and see if it is actually usable.

If it works well, I dont mind uploading it somewhere - I can do it later this week from university on a very fast (hopefully) connection. I'll keep the thread posted if anyone is interested.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=70793\"][{POST_SNAPBACK}][/a][/div]

Did this ever make it anywhere? I'd love a current wikipedia - especially with all the free HD space in the 3000
Fujitsu U8240 "Stormtrooper" -  Zaurus Supplement
Libretto U100 | Sony Librie, Sony Reader
SL-C3100: Sharp 1.11JP (Kanji Dictionary/Translator) - LCD Top swap with C1000.
SL-C3000: pdaXii13 5.4.7, SL-C3000 5.4.9 - microdrive replaced with 8gb Sandisk
SL-C1000: PDAXRom Beta3 | SL-6000L: Sharp 1.12 | SL-5500: Cacko, 64-0 kernel | SL-5000D: OZ-Opie
Linksys WCF12; Sharp CE-AG06, CE-RH2, CE-170TS; iRiver USB OTG Host cable; Socket BT rev.E CF; Hitachi 6gb Microdrive

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #25 on: April 02, 2005, 02:33:25 pm »
Sooner or later I would like to put all wikipedia files for english and other languages at zbedic web page.  

I will also check why zbedic locks past letter "N".
SL-C1000 w/ Cacko ROM 1.23

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #26 on: April 02, 2005, 02:41:23 pm »
I'm not sure its letter related.. I noticed this too with earlier version of Wikipedia.. it may have something to do with size.. or I may be wrong.

This is why I ended up running Wikipedia as a set of static web pages rather than a dictionary but it would be nice to get it sorted.
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

kahm

  • Hero Member
  • *****
  • Posts: 657
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #27 on: April 02, 2005, 08:27:14 pm »
I'm currently running a June 2004 Wikipedia without issue. It's addictive - I wasted a couple of hours reading random entries after I installed it.

This one seems too old to have a Zaurus entry, though. =(
Fujitsu U8240 "Stormtrooper" -  Zaurus Supplement
Libretto U100 | Sony Librie, Sony Reader
SL-C3100: Sharp 1.11JP (Kanji Dictionary/Translator) - LCD Top swap with C1000.
SL-C3000: pdaXii13 5.4.7, SL-C3000 5.4.9 - microdrive replaced with 8gb Sandisk
SL-C1000: PDAXRom Beta3 | SL-6000L: Sharp 1.12 | SL-5500: Cacko, 64-0 kernel | SL-5000D: OZ-Opie
Linksys WCF12; Sharp CE-AG06, CE-RH2, CE-170TS; iRiver USB OTG Host cable; Socket BT rev.E CF; Hitachi 6gb Microdrive

Cryssli

  • Newbie
  • *
  • Posts: 36
    • View Profile
    • http://www.cryss.net
Anyone Got A Working Wiki2bedic.pl
« Reply #28 on: April 03, 2005, 01:24:30 pm »
Quote
I'm currently running a June 2004 Wikipedia without issue. It's addictive - I wasted a couple of hours reading random entries after I installed it.

This one seems too old to have a Zaurus entry, though. =(
[div align=\"right\"][a href=\"index.php?act=findpost&pid=73314\"][{POST_SNAPBACK}][/a][/div]


Whats the current status of wikipedia for Zaurus? How can I set it up as Noob?
All the reading throughout the Internet didn´t  helped me. :-(
Zaurus C-750
Cacko ROM 2.21b
1 GB Transcend SD-Card

http://www.cryss.net
JabberID: Jabberwocky@amessage.de

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Anyone Got A Working Wiki2bedic.pl
« Reply #29 on: April 12, 2005, 03:55:00 am »
Quote
Whats the current status of wikipedia for Zaurus? How can I set it up as Noob?
All the reading throughout the Internet didn´t  helped me. :-(
[div align=\"right\"][a href=\"index.php?act=findpost&pid=73378\"][{POST_SNAPBACK}][/a][/div]

Update: I found and fixed the problem in zbedic with the latest Wikipedia dump (the file size caused overflow in some arithmetic operations). The fix will be included in the upcoming 0.9.5 release. The latest Wikipedia for zbedic will probably (if quota allows) be available at zbedic home page.
SL-C1000 w/ Cacko ROM 1.23