OESF | ELSI | pdaXrom | OpenZaurus | Zaurus Themes | Community Links | Ibiblio

IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
> Wikipedia projects, Better delivery..
iamasmith
post Nov 7 2004, 05:38 AM
Post #1





Group: Members
Posts: 1,248
Joined: 6-July 04
Member No.: 3,928



Hi,

I have used the zbedic Wikipedia stuff and found that it's not quite as nice as the web page based delivery of something like the wiki2zaurus stuff that build .tgz tarballs of the pages and uses cgi script to decompress on the fly.

Having recently attempted to process a dump I have found that the markup on the pages doesn't work with the current wiki2static.pl listed on the wiki2zaurus page and produces unprocessed MediaWiki markup inline with the page... that's a shame and it's pretty ugly.

There are newer versions of wiki2static.pl that produce a good markup, however, the wiki-tar script from the wiki2zaurus project (+the cgi scripts) need rework to cope with the new version (+ also there needs to be some heavy mods to the wiki2static.pl to get it to generate versions of the pages that work in this infrastructure).

Looking at the process involved in the conversion wiki2static.pl creates reusable caches which are rendered unusable for a second pass (next month ?) following a run of wiki-tar.

I'm just wondering if there isn't a better way of doing all this.

I haven't experimented with this yet and was hoping to get some feedback from someone that might have worked on a similar exercise.

My idea is to take the SQL dump from wikipedia, preprocess it to produce a normalized version so the delivery pages could remain relatively static - even when the MediaWiki markup language changes or there are changes in the database structure and produce a database driven website... possibly running off MySQL like the MediaWiki site does.

However, my idea is to possibly compress the pages using bzip2 compression and store them as blobs in the database, only storing the WikiWords uncompressed in a table that could be used as a search index.

Any feedback on this ?

- Andy
Go to the top of the page
 
+Quote Post
iamasmith
post Nov 7 2004, 06:43 AM
Post #2





Group: Members
Posts: 1,248
Joined: 6-July 04
Member No.: 3,928



OR, perhaps there's no need quite yet ! (goes away to experiment with squashfs and cloop stuff)...
Go to the top of the page
 
+Quote Post
iamasmith
post Nov 7 2004, 07:52 AM
Post #3





Group: Members
Posts: 1,248
Joined: 6-July 04
Member No.: 3,928



Oooh, got the squashfs2 module to build into a Cacko kernel. - the current patch sets are only from 2.4.20 kernel onwards but a slight tweak on the Makefile in the fs directory gets the module onto 2.4.18 and seems to work.

Just building my squashfs containing the wiki2static.pl output (doing away with wiki-tar completely) with the intent of giving that to my Z as a directory to run apache against..... original size before turning into squashfs (du -s -k) = 3111073k or ~2.97Gb...

Will post results of mksquashfs when it completes !
Go to the top of the page
 
+Quote Post
iamasmith
post Nov 7 2004, 08:35 AM
Post #4





Group: Members
Posts: 1,248
Joined: 6-July 04
Member No.: 3,928



Produces as squashfs just a tad over 533Mb !, will post the module to this thread if it works well.. just freeing up my 1Gb Microdrive to transfer onto (man I so want a 2Gb SD card now !).
Go to the top of the page
 
+Quote Post
iamasmith
post Nov 7 2004, 09:07 AM
Post #5





Group: Members
Posts: 1,248
Joined: 6-July 04
Member No.: 3,928



Oh wow !, it works, no more wiki-tar !, just use the standard wiki2static.pl that seems to being maintained at http://www.tommasoconforti.com - man I've just gotta have a 2Gb SD card now (gotta free up my CF slot !).

All I did following this script was to turn it into a squashfs using the mksquashfs tool built out of the squashfs2 archive available from http://squashfs.sf.net to run on my desktop box to create the file system - As I said it took the original 3Gb of data down to just over 550Mb !

Then I took the 2.4.20 patch and applied it against the Cacko ROM kernel source, got a reject on the fs/Makefile so I added the squashfs directory in here manually, added the CONFIG_SQUASHFS=m line to the .config file and built the kernel module.

For those of you that are interested I have attached the Kernel module as an IPK (just uudecode it first) to get you on your way.

OOPS: Nearly forgot to mention, once you install the IPK either reboot or run depmod whilst logged in as root. You need to do this for the kernel to spot the module in the lib directory tree.

Have fun,

Andy
Go to the top of the page
 
+Quote Post
iamasmith
post Nov 7 2004, 10:08 AM
Post #6





Group: Members
Posts: 1,248
Joined: 6-July 04
Member No.: 3,928



Thread has now become...

On Sharp ROMs forum
Go to the top of the page
 
+Quote Post
Srono
post Jun 4 2006, 11:30 PM
Post #7





Group: Members
Posts: 83
Joined: 14-March 05
Member No.: 6,626



Hi all,

I need a new version of wikipedia offline. I searched & read quite a lot of topics about it but they seem to be quite outdated. May i ask how do you use wikipedia on your Zaurus now?
I think zbedic should be the easiest way. Anyone has a good converter from sql (download from http://download.wikimedia.org/enwiki/20060518/, 1.3GB version) to zbedic dictionary?
Is there any better way? I can host a file to share with everybody if i am successful.

Any help is appreciated! Thanks!
Go to the top of the page
 
+Quote Post
kurochka
post Aug 14 2006, 12:42 PM
Post #8





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(Srono @ Jun 4 2006, 11:30 PM)
Hi all,

I need a new version of wikipedia offline. I searched & read quite a lot of topics about it but they seem to be quite outdated. May i ask how do you use wikipedia on your Zaurus now?
I think zbedic should be the easiest way. Anyone has a good converter from sql (download from http://download.wikimedia.org/enwiki/20060518/, 1.3GB version) to zbedic dictionary?
Is there any better way? I can host a file to share with everybody if i am successful.

Any help is appreciated! Thanks!
*


Srono,

Did you receive any responses to your question? Available wikipedia for zbedic is quite dated now.
Go to the top of the page
 
+Quote Post
rolf
post Aug 27 2006, 09:57 AM
Post #9





Group: Members
Posts: 108
Joined: 5-October 04
Member No.: 4,884



I am interested in this, too. I think it would be good if people joined forces and created some kind of "reference implementation" to create an offline dump that is pretty, has a small size, includes images and links, etc. But I guess, I am day-dreaming ;-)
Go to the top of the page
 
+Quote Post
Overgauss
post Sep 1 2006, 09:40 AM
Post #10





Group: Members
Posts: 14
Joined: 17-August 05
Member No.: 7,886



I'm dreaming too. Especially since we are now able to use SD cards greater than 1 gig!
Go to the top of the page
 
+Quote Post
rolf
post Sep 1 2006, 02:23 PM
Post #11





Group: Members
Posts: 108
Joined: 5-October 04
Member No.: 4,884



QUOTE(Overgauss @ Sep 1 2006, 07:40 PM)
Especially since we are now able to use SD cards greater than 1 gig!

On what device, what ROM and why? What changed?
Go to the top of the page
 
+Quote Post
Cresho
post Sep 1 2006, 03:38 PM
Post #12





Group: Moderators
Posts: 1,619
Joined: 29-October 03
From: Los Angeles
Member No.: 809



http://www.oesf.org/forums/index.php?showtopic=18523
Go to the top of the page
 
+Quote Post
rafm
post Sep 4 2006, 05:31 PM
Post #13





Group: Members
Posts: 145
Joined: 13-November 04
Member No.: 5,449



iamasmith, this is an impressive description and a smart way of making wikipedia running as static web pages. But I still would regard browsing wikipedia in zbedic more convenient than installing apache, squashfs2 and running a web browser.

I wonder if anyone would like to invest time in wring a script that would convert wikipedia dumps to the zbedic format. zbedic offers only simplified HTML syntax, but consumes also less memory than a web browser. The script should:
* separate articles, extract a key-word for each article
* optionally truncate / remove large articles to reduce the total size
* change HTML links to zbedic links
* optionally simplify HTML (e.g. tables shown on the right which are usually too big and overlap text)

I can help with the zbedic syntax and generating the final dictionary, but I don't have time to help with the script.
Go to the top of the page
 
+Quote Post
gsgmx
post Sep 9 2006, 02:22 AM
Post #14





Group: Members
Posts: 32
Joined: 25-September 05
Member No.: 8,188



QUOTE(rafm @ Sep 5 2006, 03:31 AM)
iamasmith, this is an impressive description and a smart way of making wikipedia running as static web pages. But I still would regard browsing wikipedia in zbedic more convenient than installing apache, squashfs2 and running a web browser.

I wonder if anyone would like to invest time in wring a script that would convert wikipedia dumps to the zbedic format. zbedic offers only simplified HTML syntax, but consumes also less memory than a web browser. The script should:
* separate articles, extract a key-word for each article
* optionally truncate / remove large articles to reduce the total size
* change HTML links to zbedic links
* optionally simplify HTML (e.g. tables shown on the right which are usually too big and overlap text)

I can help with the zbedic syntax and generating the final dictionary, but I don't have time to help with the script.
*


I would if i were able to program in any script language. They only programming language i know is business basic, maybe it could be done with it, but would be not of much use. What script language are you thinking about?

George
Go to the top of the page
 
+Quote Post
rafm
post Sep 17 2006, 10:28 AM
Post #15





Group: Members
Posts: 145
Joined: 13-November 04
Member No.: 5,449



QUOTE(gsgmx @ Sep 9 2006, 11:22 AM)
I would if i were able to program in any script language.  They only programming language i know is business basic, maybe it could be done with it, but would be not of much use.  What script language are you thinking about?

George
*


Most of the scripts for converting Wikipedia are in Perl, which however, I consider a write-only scripting language (no way to understands somebody's else code). Some people write quite complex conversion scripts in Python or XSLT. For simple conversion tasks, I prefer awk. Certainly there are a lot of choices, especially if you work under Linux.
Go to the top of the page
 
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 27th November 2014 - 03:50 AM