OESF Portables Forum
Everything Else => General Support and Discussion => Zaurus General Forums => Archived Forums => Software => Topic started by: spartan on April 09, 2006, 06:53:18 pm
-
Does anyone have an accurate definition of the EPWING format that is compatible with the C3000's ZDict?
Or, does anyone have a detailed description of what the bedic project's xerox application does and how it does it?
-
What do you mean by, a "definition?" It's a format used by Japanese CD-ROM dictionaries.
-
I mean specification; a description of the file format.
-
Epwing works very well with Zten, which is a nice app for the zaurus. I use it with Kojien without problems.
-
I mean specification; a description of the file format.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=122446\"][{POST_SNAPBACK}][/a][/div]
I'm still not sure if I follow you. Some dictionary programs read text files that are written in a particular format (such as "word // definition") but that's not what EPWING is, just so we're clear. It's basically only used for CD-ROMs that are commercially sold in Japan, although there are a few programs that can convert (with varying degrees of success) between formats like System Soft and EPWING. But in any case, it's not the kind of thing that would allow you to easily make your own dictionary files. What is it that you want to do?
-
Or, does anyone have a detailed description of what the bedic project's xerox application does and how it does it?
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=122434\")
By searching, you could have found these links where you could have found a detailed description of the bedic format and nuts and bolts of making bedic dictionaries:
[a href=\"https://www.oesf.org/forums/index.php?showtopic=16160&st=0]https://www.oesf.org/forums/index.php?showtopic=16160&st=0[/url]
http://cvs.sourceforge.net/viewcvs.py/*che...mat.txt?rev=1.5 (http://cvs.sourceforge.net/viewcvs.py/*checkout*/bedic/libbedic/doc/bedic-format.txt?rev=1.5)
http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/ (http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/)
http://bedic.sourceforge.net/index.html (http://bedic.sourceforge.net/index.html)
-
Or, does anyone have a detailed description of what the bedic project's xerox application does and how it does it?
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=122434\")
By searching, you could have found these links where you could have found a detailed description of the bedic format and nuts and bolts of making bedic dictionaries:
[a href=\"https://www.oesf.org/forums/index.php?showtopic=16160&st=0]https://www.oesf.org/forums/index.php?showtopic=16160&st=0[/url]
http://cvs.sourceforge.net/viewcvs.py/*che...mat.txt?rev=1.5 (http://cvs.sourceforge.net/viewcvs.py/*checkout*/bedic/libbedic/doc/bedic-format.txt?rev=1.5)
http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/ (http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/)
http://bedic.sourceforge.net/index.html (http://bedic.sourceforge.net/index.html)
[div align=\"right\"][a href=\"index.php?act=findpost&pid=122535\"][{POST_SNAPBACK}][/a][/div]
Thanks-I tried reading the source for Xerox to figure out what it does with a file in 'simplified bedic' format. Unfortunately, I don't really understand C.
I'm trying to write a program that will transform the new Wikipedia XML files into bedic and EPWING dictionaries (preferably EPWING). Since the Wikipedia-to-simplified-bedic conversion produces a file too large for Xerox to handle, I'm just building a C#/vb.net program to let people build an updated Wikipedia for themselves whenever they please. In order to do that, I need to know how Xerox constructs the index and calculates the remaining fields.
It would be better to use the EPWING format for the Zaurus considering that I can put pictures and hyperlinks into it. I couldn't find anything under the libeb project that actually documents the construction of an EPWING file, so I'm wondering if anyone knows where I can find the specifications of the format.
Thanks again
-
First of all, let me say that I would be very very interested if you could get the Wikipedia converted to EPWING format. But my hunch is that it is a only available to commercial dictionary makers.
-
Freepwing lets you make epwing compatible files.
-
Does anyone have an accurate definition of the EPWING format that is compatible with the C3000's ZDict?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=122434\"][{POST_SNAPBACK}][/a][/div]
You could have gotten your answer by googling for "epwing format". It's in the very first hit (the creator of that website, Hannes Löffler happens to be a member of this forum).
-
Does anyone have an accurate definition of the EPWING format that is compatible with the C3000's ZDict?
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=122434\")
You could have gotten your answer by googling for "epwing format". It's in the very first hit (the creator of that website, Hannes Löffler happens to be a member of this forum).
[div align=\"right\"][a href=\"index.php?act=findpost&pid=122593\"][{POST_SNAPBACK}][/a][/div]
Maybe things have changed since you performed your search, but it's not in the first hit now, or even on the first page. But here it is in any case:
[a href=\"http://www.hloeffler.info/epwing/]http://www.hloeffler.info/epwing/[/url]
Does this mean that making an EPWING version of the Wikipedia is doable? I would love something like that. Right now I have an older version of the wikipedia for ZBEDic, but I don't like that program very much and all of my other dictionaries are in EPWING format. I also use an EPWING compatible dictionary program on my Mac in my work as a translator, and being able to add Wikipedia to that would be a great resource. Fingers crossed...
-
Maybe things have changed since you performed your search, but it's not in the first hit now, or even on the first page. But here it is in any case:
http://www.hloeffler.info/epwing/ (http://www.hloeffler.info/epwing/)
What are you talking about? Its right there in the first hit of the first page:
http://www.google.com/search?client=opera&...=utf-8&oe=utf-8 (http://www.google.com/search?client=opera&rls=en&q=epwing+format&sourceid=opera&ie=utf-8&oe=utf-8)
How much more simpler can it get?
-
Well...
First I tried both "epwing format" (with quotes) and "epwing format" (without quotes) on google.co.jp (which is the site I usually search from). Neither way results in his site being first. I then tried Google.com and typed in "epwing format" with the quotes (as you wrote in your message above). This also doesn't result in the site being first. Finally, I found that going to google.com and typing "epwing format" without quotes does make it the first result. The point is that deriding someone for not noticing the first search result is not such a good idea, because the search results vary widely depending on the portal you use and whether you use quotes or not.
-
For you to have implied that my info was wrong just because you couldn''t get the same results isn't such a good idea either. Who else uses google.jp here? And you don't need to do phrase searching (ie using quotation marks) unless the search results get too general.
Anyway I think my reply to the OP was clear and simple enough so I'll not belabour the point.
-
For you to have implied that my info was wrong just because you couldn''t get the same results isn't such a good idea either.[div align=\"right\"][a href=\"index.php?act=findpost&pid=122797\"][{POST_SNAPBACK}][/a][/div]
At the time, I thought it was wrong, because I didn't realize that using a different portal would give different results. But please, let's get back to the topic at hand.
Does an EPWING version of the wikipedia look doable?
-
Yes, an EPWING version is very doable. If I use Mr. Löffler's markup-parser scripts, the issue will be the Perl code that builds the actual EPWING dictionary. It would not be difficult to transform the 4.8 GB English Wikipedia XML into a document in this markup.
I should mention that I have built a .Net 1.1-compatible library for manipulating EPWING files based on FreePWING (it will only run on Windows because of how the Perl intepreter is packaged).
-
Here is the problem:
I've made the "eword", "head", "text", "textref", "texttag", and "word" files. What do I do to turn them into a "honmon" and "catalogs"? The Google translation of the FreePWING documentation isn't much good.
(http://www.sra.co.jp/people/m-kasahr/freepwing/doc/freepwing.html)
I'm under the impression that I use "fpwmake" with a specially crafted Makefile to produce a "honmon" and "catalogs".
(http://www.sra.co.jp/people/m-kasahr/freepwing/doc/freepwing-02.html#Makefile)
I add "include fpwutils.mk" into the Makefile and perform...
% perl /usr/local/libexec/freepwing/fpwsort
% perl /usr/local/libexec/freepwing/fpwindex
% perl /usr/local/libexec/freepwing/fpwcontrol
% perl /usr/local/libexec/freepwing/fpwlink
...in the directory with the "eword" et cetera files.
I end up with...
esort sort
ctrl eword text
ctrlref head textref
eidx0 idx0 texttag
eidxref0 idxref0 word
...but running "fpwmake" yields...
test -d work || /usr/local/libexec/freepwing/mkdirhier work
/usr/local/libexec/freepwing/perl.sh /usr/local/libexec/freepwing/fpwhalfchar
\
-workdir work
/usr/local/libexec/freepwing/perl.sh /usr/local/libexec/freepwing/fpwfullchar
\
-workdir work
/usr/local/libexec/freepwing/perl.sh /usr/local/libexec/freepwing/fpwparser \
-workdir work
Can't open perl script "/usr/local/libexec/freepwing/fpwparser": No such file or
directory
make: *** [work/parse.dep] Error 2
Since I used Mr. Loffler's markup parser, I don't think I need to run fpwparser.
I'm running this inside of Cygwin and performed a normal "./configure & make & make install" procedure on the FreePWING utilities. Is there something I'm missing?
-
I read Japanese, but unfortunately my knowledge of this kind of thing is pretty limited. Is there a specific sentence or sentences in the google translation that you would like translated into real English?
-
Problem solved: a Makefile for a Loeffler-markup processed EPWING dictionary should read...
FPWPARSER = null.pl
include fpwutils.mk
...where null.pl is an empty file.
Then, create a catalogs.txt with the following...
[Catalog]
FileName = catalogs
Type = EPWING1
Books = 1
[Book]
Title = "Wikipedia-English"
BookType = 6001
Directory = "WIKI"
...replacing the title and directory as seen fit. The title must be EUC-JP encoded, so the above text would produce an error. Leaving the title space empty seems to work fine.
-
That's good news. Does that mean that you're close to success? How big do you think the resulting files will be? Obviously, it would preferable if it would be under 4GB, so it could fit on the microdrive of the older Zaurus models, or on a 4GB SD card. I think most people want to avoid using the CF card slot for memory.
By the way, do you know if this same process can be done for the Japanese language version of the Wikipedia?
-
I'
That's good news. Does that mean that you're close to success? How big do you think the resulting files will be? Obviously, it would preferable if it would be under 4GB, so it could fit on the microdrive of the older Zaurus models, or on a 4GB SD card. I think most people want to avoid using the CF card slot for memory.
By the way, do you know if this same process can be done for the Japanese language version of the Wikipedia?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=123149\"][{POST_SNAPBACK}][/a][/div]
I could make an ugly Wikipedia now, bt it would be better to have a nicely formatted Wikipedia. I've confirmed with a test dictionary that the Epwing system works.
This process will work for the Japanese, and for that matter any, Wikipedia. It should work with all the other Wikis with the code I have now and could be easily extended to support any XML document.
For a size estimate, the bz2-compressed text-only English Wikipedia is about 1 GB.
-
How long do you think it'll take to come up with a well formated version (and what exactly does nicely formatted mean)? Will there be hyperlinks within the text, or will that be left out? Also, given the size of the files, am I right in assume that this will this be a "do it yourself" kind of project (ie. you write the scripts and the people who want it download the raw wikipedia data and then run them to create the EPWING dictionary version)? If so, will it require a linux computer? I only have access to Windows and Mac boxes.
-
How long do you think it'll take to come up with a well formated version (and what exactly does nicely formatted mean)? Will there be hyperlinks within the text, or will that be left out? Also, given the size of the files, am I right in assume that this will this be a "do it yourself" kind of project (ie. you write the scripts and the people who want it download the raw wikipedia data and then run them to create the EPWING dictionary version)? If so, will it require a linux computer? I only have access to Windows and Mac boxes.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=123297\"][{POST_SNAPBACK}][/a][/div]
I have already generated a unformatted Wikipedia (in English and for a test, in Japanese), which uses only the FreePWING library's text encoder. The hyperlinks will not be active since I can't understand documentation which clues me on how to make inter-dictionary and internet hyperlinks. Everything else works.
The issue with distributing the program is that I packaged up Loeffler's parser and the FreePWING libraries with a commercial program into a .Net library. I don't think this package can be legally distributed, because I don't have the source to the packaging of the Perl runtime inside this package. The programs I wrote require Windows, this library, and Cygwin (Linux on Windows). I'm going to BitTorrent the Wikipedia. Eventually, somebody with bandwidth could host it.
Currently, I'm stuck on an encoding bug. The FreePWING parser wants ASCII text, but the Wikipedia is encoded between UTF8 and Unicode. That means I can format all day, but an accented character is seen by the parser as two characters, one of them invalid, instead of one character.
-
Unfortunately, the FreePWING Perl library breaks at about 250 MBs worth of articles. I'll have a version reworked for the simplified bedic format and I'll try it with Xerox.
-
Unfortunately, the FreePWING Perl library breaks at about 250 MBs worth of articles. I'll have a version reworked for the simplified bedic format and I'll try it with Xerox.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=123838\"][{POST_SNAPBACK}][/a][/div]
Is it possible to break up the Wikipedia by letter to make each file smaller?
-
Thanks-I tried reading the source for Xerox to figure out what it does with a file in 'simplified bedic' format. Unfortunately, I don't really understand C.
I'm trying to write a program that will transform the new Wikipedia XML files into bedic and EPWING dictionaries (preferably EPWING). Since the Wikipedia-to-simplified-bedic conversion produces a file too large for Xerox to handle, I'm just building a C#/vb.net program to let people build an updated Wikipedia for themselves whenever they please. [div align=\"right\"][a href=\"index.php?act=findpost&pid=122560\"][{POST_SNAPBACK}][/a][/div]
It is bad a idea to duplicate the work of xerox or mkbedic. If mkbedic fails with your file in a simplified zbedic format, you can put somewhere (ftp/http) this file so I can download it and check what's wrong.
-
So have you given up on the idea of the EPWING wikipedia? Even if it has to be split up into a bunch of different subdictionaries, I'm not sure it would make much difference in terms of usability, since programs like Zten can search multiple dictionaries at once.
-
Sorry about the belated response icruise; that is a great idea. The encoding problem was solved, which means there will be no accented characters in the dictionary. I'll have it finished even sooner. I was already refactoring it to work with bedic, so I'll make a Wikipedia in both formats.
-
Any news about this?
-
I'd be interested to hear about this as well.
If other people are interested in joining forces to make this happen, come on over to http://gakusei.sf.net (http://gakusei.sf.net) (I want the Japanese Wikipedia as a "kojien-replacement"). I am not yet decided on what format is best. plucker seems to be nice as well. The creator (?) of plucker seems to have been able to create a very nice plucker ebook out of wikipedia (http://code.plkr.org/ep/) but he has shown no reaction to my mail so I am afraid the project might be dead.