Author Topic: Wikipedia projects  (Read 8325 times)

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Wikipedia projects
« on: November 07, 2004, 08:38:37 am »
Hi,

I have used the zbedic Wikipedia stuff and found that it's not quite as nice as the web page based delivery of something like the wiki2zaurus stuff that build .tgz tarballs of the pages and uses cgi script to decompress on the fly.

Having recently attempted to process a dump I have found that the markup on the pages doesn't work with the current wiki2static.pl listed on the wiki2zaurus page and produces unprocessed MediaWiki markup inline with the page... that's a shame and it's pretty ugly.

There are newer versions of wiki2static.pl that produce a good markup, however, the wiki-tar script from the wiki2zaurus project (+the cgi scripts) need rework to cope with the new version (+ also there needs to be some heavy mods to the wiki2static.pl to get it to generate versions of the pages that work in this infrastructure).

Looking at the process involved in the conversion wiki2static.pl creates reusable caches which are rendered unusable for a second pass (next month ?) following a run of wiki-tar.

I'm just wondering if there isn't a better way of doing all this.

I haven't experimented with this yet and was hoping to get some feedback from someone that might have worked on a similar exercise.

My idea is to take the SQL dump from wikipedia, preprocess it to produce a normalized version so the delivery pages could remain relatively static - even when the MediaWiki markup language changes or there are changes in the database structure and produce a database driven website... possibly running off MySQL like the MediaWiki site does.

However, my idea is to possibly compress the pages using bzip2 compression and store them as blobs in the database, only storing the WikiWords uncompressed in a table that could be used as a search index.

Any feedback on this ?

- Andy
« Last Edit: November 07, 2004, 01:07:21 pm by iamasmith »
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Wikipedia projects
« Reply #1 on: November 07, 2004, 09:43:54 am »
OR, perhaps there's no need quite yet ! (goes away to experiment with squashfs and cloop stuff)...
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Wikipedia projects
« Reply #2 on: November 07, 2004, 10:52:24 am »
Oooh, got the squashfs2 module to build into a Cacko kernel. - the current patch sets are only from 2.4.20 kernel onwards but a slight tweak on the Makefile in the fs directory gets the module onto 2.4.18 and seems to work.

Just building my squashfs containing the wiki2static.pl output (doing away with wiki-tar completely) with the intent of giving that to my Z as a directory to run apache against..... original size before turning into squashfs (du -s -k) = 3111073k or ~2.97Gb...

Will post results of mksquashfs when it completes !
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Wikipedia projects
« Reply #3 on: November 07, 2004, 11:35:16 am »
Produces as squashfs just a tad over 533Mb !, will post the module to this thread if it works well.. just freeing up my 1Gb Microdrive to transfer onto (man I so want a 2Gb SD card now !).
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Wikipedia projects
« Reply #4 on: November 07, 2004, 12:07:15 pm »
Oh wow !, it works, no more wiki-tar !, just use the standard wiki2static.pl that seems to being maintained at http://www.tommasoconforti.com - man I've just gotta have a 2Gb SD card now (gotta free up my CF slot !).

All I did following this script was to turn it into a squashfs using the mksquashfs tool built out of the squashfs2 archive available from http://squashfs.sf.net to run on my desktop box to create the file system - As I said it took the original 3Gb of data down to just over 550Mb !

Then I took the 2.4.20 patch and applied it against the Cacko ROM kernel source, got a reject on the fs/Makefile so I added the squashfs directory in here manually, added the CONFIG_SQUASHFS=m line to the .config file and built the kernel module.

For those of you that are interested I have attached the Kernel module as an IPK (just uudecode it first) to get you on your way.

OOPS: Nearly forgot to mention, once you install the IPK either reboot or run depmod whilst logged in as root. You need to do this for the kernel to spot the module in the lib directory tree.

Have fun,

Andy
« Last Edit: November 07, 2004, 01:07:44 pm by iamasmith »
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

iamasmith

  • Hero Member
  • *****
  • Posts: 1248
    • View Profile
Wikipedia projects
« Reply #5 on: November 07, 2004, 01:08:30 pm »
Thread has now become...

On Sharp ROMs forum
« Last Edit: November 07, 2004, 01:09:47 pm by iamasmith »
OpenBSD 4.2 -current on full 4Gb of SL-C3000
Microdrive replaced with 4Gb SanDisk Extreme III card

Srono

  • Jr. Member
  • **
  • Posts: 83
    • View Profile
    • http://
Wikipedia projects
« Reply #6 on: June 05, 2006, 03:30:14 am »
Hi all,

I need a new version of wikipedia offline. I searched & read quite a lot of topics about it but they seem to be quite outdated. May i ask how do you use wikipedia on your Zaurus now?
I think zbedic should be the easiest way. Anyone has a good converter from sql (download from http://download.wikimedia.org/enwiki/20060518/, 1.3GB version) to zbedic dictionary?
Is there any better way? I can host a file to share with everybody if i am successful.

Any help is appreciated! Thanks!
[span style=\'font-size:7pt;line-height:100%\']C3100 /w Meanie's pdaxii13 5.2alpha
SL5500 collecting dust.

[/span]

kurochka

  • Sr. Member
  • ****
  • Posts: 301
    • View Profile
Wikipedia projects
« Reply #7 on: August 14, 2006, 04:42:34 pm »
Quote
Hi all,

I need a new version of wikipedia offline. I searched & read quite a lot of topics about it but they seem to be quite outdated. May i ask how do you use wikipedia on your Zaurus now?
I think zbedic should be the easiest way. Anyone has a good converter from sql (download from http://download.wikimedia.org/enwiki/20060518/, 1.3GB version) to zbedic dictionary?
Is there any better way? I can host a file to share with everybody if i am successful.

Any help is appreciated! Thanks!
[div align=\"right\"][a href=\"index.php?act=findpost&pid=129823\"][{POST_SNAPBACK}][/a][/div]

Srono,

Did you receive any responses to your question?  Available wikipedia for zbedic is quite dated now.
SL-C3100 (from PriceJapan.com): modified Sharp Rom (couldn't make Japanese input work in Cacko Rom)

ex-SL-C3000; ex-SL-5600; ex-Simpad

rolf

  • Full Member
  • ***
  • Posts: 105
    • View Profile
    • http://home.arcor.de/leggewie/
Wikipedia projects
« Reply #8 on: August 27, 2006, 01:57:21 pm »
I am interested in this, too.  I think it would be good if people joined forces and created some kind of "reference implementation" to create an offline dump that is pretty, has a small size, includes images and links, etc.  But I guess, I am day-dreaming ;-)

Overgauss

  • Newbie
  • *
  • Posts: 14
    • View Profile
Wikipedia projects
« Reply #9 on: September 01, 2006, 01:40:47 pm »
I'm dreaming too.  Especially since we are now able to use SD cards greater than 1 gig!

rolf

  • Full Member
  • ***
  • Posts: 105
    • View Profile
    • http://home.arcor.de/leggewie/
Wikipedia projects
« Reply #10 on: September 01, 2006, 06:23:19 pm »
Quote
Especially since we are now able to use SD cards greater than 1 gig!
On what device, what ROM and why?  What changed?
« Last Edit: September 01, 2006, 06:23:42 pm by rolf »

Cresho

  • Hero Member
  • *****
  • Posts: 1609
    • View Profile
    • http://home.earthlink.net/~cresho/
Wikipedia projects
« Reply #11 on: September 01, 2006, 07:38:13 pm »
Zaurus C-3200 (internal 8gb seagate drive) with buuf icon theme, cacko 1.23 full,  and also Meanie's pdaxqtrom-Debian/Open Office
Zaurus SL-5500 Sharp Rom 3.13 with steel theme
pretec pocket pc wi fi
ambicom bt2000-cf bluetooth-made in taiwan
simpletech 1gb cf
pny 1gb sd
patriot 2gb
ocz or patriot 4gb sd(failed after 2 weeks)only on z
creative csw-5300 speakers in stereo
DigiLife DDV-1000 for video, Audio, Picture recording playable on the zaurus
Mustek DV4500-video recorder, pictures, voice record on sd for z

zaurusthemes.biz | ZaurusVideo | Zaurus Software

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Wikipedia projects
« Reply #12 on: September 04, 2006, 09:31:16 pm »
iamasmith, this is an impressive description and a smart way of making wikipedia running as static web pages. But I still would regard browsing wikipedia in zbedic more convenient than installing apache, squashfs2 and running a web browser.

I wonder if anyone would like to invest time in wring a script that would convert wikipedia dumps to the zbedic format. zbedic offers only simplified HTML syntax, but consumes also less memory than a web browser. The script should:
* separate articles, extract a key-word for each article
* optionally truncate / remove large articles to reduce the total size
* change HTML links to zbedic links
* optionally simplify HTML (e.g. tables shown on the right which are usually too big and overlap text)

I can help with the zbedic syntax and generating the final dictionary, but I don't have time to help with the script.
SL-C1000 w/ Cacko ROM 1.23

gsgmx

  • Newbie
  • *
  • Posts: 32
    • View Profile
Wikipedia projects
« Reply #13 on: September 09, 2006, 06:22:36 am »
Quote
iamasmith, this is an impressive description and a smart way of making wikipedia running as static web pages. But I still would regard browsing wikipedia in zbedic more convenient than installing apache, squashfs2 and running a web browser.

I wonder if anyone would like to invest time in wring a script that would convert wikipedia dumps to the zbedic format. zbedic offers only simplified HTML syntax, but consumes also less memory than a web browser. The script should:
* separate articles, extract a key-word for each article
* optionally truncate / remove large articles to reduce the total size
* change HTML links to zbedic links
* optionally simplify HTML (e.g. tables shown on the right which are usually too big and overlap text)

I can help with the zbedic syntax and generating the final dictionary, but I don't have time to help with the script.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=140712\"][{POST_SNAPBACK}][/a][/div]

I would if i were able to program in any script language.  They only programming language i know is business basic, maybe it could be done with it, but would be not of much use.  What script language are you thinking about?

George
SL-C1000 with Cacko 1.23
DLlink CF WiFi
Socket CF Ethernet

rafm

  • Full Member
  • ***
  • Posts: 145
    • View Profile
Wikipedia projects
« Reply #14 on: September 17, 2006, 02:28:17 pm »
Quote
I would if i were able to program in any script language.  They only programming language i know is business basic, maybe it could be done with it, but would be not of much use.  What script language are you thinking about?

George
[div align=\"right\"][a href=\"index.php?act=findpost&pid=141020\"][{POST_SNAPBACK}][/a][/div]

Most of the scripts for converting Wikipedia are in Perl, which however, I consider a write-only scripting language (no way to understands somebody's else code). Some people write quite complex conversion scripts in Python or XSLT. For simple conversion tasks, I prefer awk. Certainly there are a lot of choices, especially if you work under Linux.
SL-C1000 w/ Cacko ROM 1.23