OESF | ELSI | pdaXrom | OpenZaurus | Zaurus Themes | Community Links | Ibiblio

IPB

Welcome Guest ( Log In | Register )

3 Pages V  < 1 2 3 >  
Reply to this topicStart new topic
> Various Questions About Zbedic, its format, etc.
kurochka
post Dec 20 2005, 02:01 PM
Post #16





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(ludo @ Dec 19 2005, 09:03 PM)
I can already answer some of my own questions.
I report my findings here, in case any one would be interested, though I'm sure most of you know it already, there may be newbies (as I am) interested.

First of all, the ZBedict home page:
http://bedic.sourceforge.net/index.html

To make a dictionary:
Very good explanations are given here, with step by step at the end of the page:
http://cvs.sourceforge.net/viewcvs.py/bedi...1.5&view=markup

The feature to edit an existing dictionary will be developped int the coming future.
See news:
http://bedic.sourceforge.net/index.html#news

Now, I need to find out how to create a Japanese dictionary, that seems to be more tricky.

Any help from any one?

Ludo
*



Hi, ludo!!! So glad you enjoy your 3000!

RAFM made it much easier to create new dictionaries with the mkbedic program.

First, you need to create a dictionary source file in any text editor (it should be simple txt in unicode, I am not sure if UTF-8 will support Japanese though) using the Zbedic format (the link you provided above). Please see this example dictionary:
http://cvs.sourceforge.net/viewcvs.py/bedi...1.2&view=markup

The file may need to contain the char-precendent entry (it is not necessary under some circumstances). However, with kanji I do not see it is possible. Please see an example of char-precedence here:
http://cvs.sourceforge.net/viewcvs.py/bedi...1.1&view=markup

Second, you will need to compile mkbedic program on your linux system and see the man page for mkbedic on how to use it. Mkbedic will tell you if there are some important errors with your dic source file. Note that mkbedic may miss some problems that you will only discover when some of the entries are not displayed correctly (like, missing closing tags, etc.).

Third, after you run mkbedic on your dictionary source file, you will need to dictzip it. See http://www.die.net/doc/linux/man/man1/dictzip.1.html for more info. You will need to install dictzip on your system. The resulting file will be dic.dz

I will add more details later.

I also believe that there is a way to unpack the old dic.dz files (they used a different form of compression and then modify the contents of the file to add new entries. I have never done it though. I always work with my source files and make any changes to the original source files.
Go to the top of the page
 
+Quote Post
ludo
post Dec 22 2005, 07:42 PM
Post #17





Group: Members
Posts: 53
Joined: 9-December 05
Member No.: 8,688



Rafm,

is that possible to create an english-japanese-english dictionary to work on the zbedict? What would be the issue with caracter encoding? From what I understand, ZBEDic accepts only UTF-8 coding, and it seems that lot of Japanese is coded with EUC-JP. What do you think?

Thanks Kurochka.

In the meantime I have been investigating about the EPWING format, which seem dedicated to Japanese. But I come back to Zbedict anyway.

I have no linux machine now, do I need one to run mkbedic, or can I do it on the Z? What Linux distribution should I chose?

As for the japanese, I have seen there is a Japanese- english dictionary available for the Bedic. I am just downloading it now, will install it soon to see how it works.

The japanese english dictionary for the ZBEDic works all right, display is nice. Do you know how is the japanese encoded there?

Ludo
Go to the top of the page
 
+Quote Post
kurochka
post Dec 23 2005, 08:58 AM
Post #18





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(ludo @ Dec 22 2005, 07:42 PM)
Rafm,

is that possible to create an english-japanese-english dictionary to work on the zbedict? What would be the issue with caracter encoding? From what I understand, ZBEDic accepts only UTF-8 coding, and it seems that lot of Japanese is coded with EUC-JP. What do you think?

Thanks Kurochka.

In the meantime I have been investigating about the EPWING format, which seem dedicated to Japanese. But I come back to Zbedict anyway.

I have no linux machine now, do I need one to run mkbedic, or can I do it on the Z? What Linux distribution should I chose?

As for the japanese, I have seen there is a Japanese- english dictionary available for the Bedic. I am just downloading it now, will install it soon to see how it works.

The japanese english dictionary for the ZBEDic works all right, display is nice. Do you know how is the japanese encoded there?

Ludo
*


I don't know much about Japanese. I know that Zten, Zdic and some other dictionaries (including the Zaurus built in dictionary) that are based on EPWING format are popular among Japanese learners. Moreover, I understand that there are a bunch of dictionaries available in those formats.

I am not sure about the encoding for Japanese-English zbedic dictionary. I suspect it is unicode of some sort (UTF-16?). You may actually try different encodings on a test dictionary and see whether it displays properly using a font that supports that encoding.

mkbedic will compile and run only under Linux, as I understand (there are some special projects that try to port Linux programs to Windows API but I have not tried those). I have never compiled anything on Zaurus but I've heard it is possible. Given that you need to jump through hoops to install the development environment and the fact that Zaurus is much slower than a desktop, you are better off compiling dictionaries on a desktop. If your dictionary is not too big you could make the txt file on the Zaurus first and then mkbedic and dictzip it on a desktop.

You could look at knoppix.net for a simple LiveCD Linux that you don't have to install on your Windows computer. You just boot your computer from the CD without installation. When you are done using Linux you are back to your Windows without any modifications. I hope that knoppix has all the necessary libraries and programs to compile mkbedic. It must have.

I use Suse Linux 10 and Windows XP on the same system (dual boot). All Linux flavors can be downloaded for free and burned to a cd or dvd.
Go to the top of the page
 
+Quote Post
kurochka
post Dec 24 2005, 06:40 AM
Post #19





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(kurochka @ Nov 29 2005, 08:58 AM)
I will keep this thread going by asking other questions and making suggestions and comments:

1. Emphasis Tag. I've noticed that {em} tag is hardly seen on the screen.  I understand it makes letters bold but somehow they are almost indistinguishable from normal letters.  I wonder if {em} could instead make the tagged text a different color to make it stand out.  I am using this tag for showing accent in the words.  BTW, if the full text search is implemented, will the {de} tags within the word break the search?  Any other useful tags that can work for showing word accent (stress)?

2. Ways to Show Word Accent/Stress.  As indicated earlier, I use {em} tags to show word stress in the words in the translation portion.  This seems somewhat awkward now but it works for the translation portion.  How can I show word accent for words for the keywords without ruining the search mechanism?  One solution that might work (I am not sure though) --  I could probably use the Unicode stress symbol (I think there is a special symbol in Unicode for word accent) and put it in the ignore char list.  Will this work?  For example, if the keyword is "a'rmy" (showing stress on A) will the search for "army" locate the keyword if I use the above described approach?

*


Just several comments to my earlier notes.

Emphasis Tag. Emphasis {em} tag works. The problem of it not showing was somehow connected to the size of the font I used. I decreased the size of the font and the emphasis started to show.

Showing Word Accent I am still struggling with this. The proper way would be to use "combining diacritical marks" from Unicode standard (see http://www.unicode.org/charts/PDF/U0300.pdf). 0301 looks like the one I would like to use (however, 02c8 or 02b9 could potentially be also used, I guess). However, it turns out that a lot of programs cannot combine these marks with other glyphs. Microsoft Word kind of works but still it is a problem. I have not tested whether Zbedic supports this because I have hard time creating a test dictionary with these marks. In the perfect world, these combining diacritical marks would combine with a preceding glyph to form one character for displaying purposes. I would then only list the main character in the char-precedence while putting the diacritical mark 0301 into ignore list. This way I could see the stress of all words (not just in transcription) but the word stress would be ingnored for searching purposes. Does this make sense? Anyone has any ideas or suggestions?

{ph} tag? I have just noticed a {ph} tag. I don't see a description of this tag other than that it is somehow used for "description" field. What is it supposed to do? What are the suggestions for its use? If nobody knows for sure, I will try it and report back.

One other question. Will the old linux bedic program (qbedic?) work with the files created by mkbedic (by ignoring the new features) or will it fail to open the new dictionaries? Again, I will try to test this. I feat that it will fail and we will have to wait for a zbedic port to linux.

RAFM, do you have plans for zbedic to go in parallel with TEI XML (making it easy to convert between the two)? Is TEI XML something worth learning?

I presume that there could be new tags in the future. Is it possible for Zbedic to ignore all tags that it does not yet know (for example, {bla}, {/bla})? This way I could put some placeholding tags into my dictionaries for the future and then one day hope that the new tags will be implemented. In the meantime, the dictionaries could be used as is. Mostly, these tags would deal with grammatical categories or wordforms, etc.


Otherwise, I am moving along and getting better with using zbedic format.
Go to the top of the page
 
+Quote Post
kurochka
post Dec 29 2005, 09:43 AM
Post #20





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



A couple of suggestions:

1. Internal Comments. I think it would be great to have a tag to comment out text in the dic file. This way we could put comments or yet unimplemented text in the dictionaries without it showing.

2. Another tag could be for comments that could be displayed or hidden depending on users' choice (similar to how pronounciation tag works).

These are just ideas.
Go to the top of the page
 
+Quote Post
rafm
post Jan 4 2006, 12:32 PM
Post #21





Group: Members
Posts: 145
Joined: 13-November 04
Member No.: 5,449



QUOTE(ludo @ Dec 23 2005, 04:42 AM)
Rafm,

is that possible to create an english-japanese-english dictionary to work on the zbedict? What would be the issue with caracter encoding? From what I understand, ZBEDic accepts only UTF-8 coding, and it seems that lot of Japanese is coded with EUC-JP. What do you think?

Thanks Kurochka.

In the meantime I have been investigating about the EPWING format, which seem dedicated to Japanese. But I come back to Zbedict anyway.

I have no linux machine now, do I need one to run mkbedic, or can I do it on the Z? What Linux distribution should I chose?

As for the japanese, I have seen there is a Japanese- english dictionary available for the Bedic. I am just downloading it now, will install it soon to see how it works.

The japanese english dictionary for the ZBEDic works all right, display is nice. Do you know how is the japanese encoded there?

Ludo
*



* You need either Linux (gcc 3.x) or cygwin on Windows to compile mkbedic. It is also possible to cross-compile mkbedic for zaurus. I will consider including mkbedic for zaurus in the future release of zbedic.

* All bedic dictionaries must be in utf-8. You can use "iconv" under Linux to convert from whatever encoding to utf-8. I use emacs to edit utf-8 text files.
Go to the top of the page
 
+Quote Post
rafm
post Jan 4 2006, 12:51 PM
Post #22





Group: Members
Posts: 145
Joined: 13-November 04
Member No.: 5,449



QUOTE(kurochka @ Dec 24 2005, 03:40 PM)
Just several comments to my earlier notes.
Emphasis Tag.  Emphasis {em} tag works.  The problem of it not showing was somehow connected to the size of the font I used.  I decreased the size of the font and the emphasis started to show.
*


Some fonts or font sizes may be missing an italic, which is used for the emphasis.

QUOTE
Showing Word Accent  I am still struggling with this.  The proper way would be to use "combining diacritical marks" from Unicode standard (see http://www.unicode.org/charts/PDF/U0300.pdf).  0301 looks like the one I would like to use (however, 02c8 or 02b9 could potentially be also used, I guess).  However, it turns out that a lot of programs cannot combine these marks with other glyphs.  Microsoft Word kind of works but still it is a problem.  I have not tested whether Zbedic supports this because I have hard time creating a test dictionary with these marks.  In the perfect world, these combining diacritical marks would combine with a preceding glyph to form one character for displaying purposes.  I would then only list the main character in the char-precedence while putting the diacritical mark 0301 into ignore list.  This way I could see the stress of all words (not just in transcription) but the word stress would be ingnored for searching purposes.  Does this make sense?  Anyone has any ideas or suggestions?


As far as I know combining diacritical marks does not work under QTopia.

QUOTE
{ph} tag? I have just noticed a {ph} tag.  I don't see a description of this tag other than that it is somehow used for "description" field.  What is it supposed to do?  What are the suggestions for its use?  If nobody knows for sure, I will try it and report back.


This tag does not / should not work. Once I decided that this tag was a bad idea and I removed it. If there is a trace of this tag somewhere in the documentation, I should remove it.

QUOTE
One other question.  Will the old linux bedic program (qbedic?) work with the files created by mkbedic (by ignoring the new features) or will it fail to open the new dictionaries?  Again, I will try to test this.  I feat that it will fail and we will have to wait for a zbedic port to linux.


If you don't use any new features and your dictionaries are <2GB, files generated with mkbedic should work with qbedic. But I haven't checked this.

QUOTE
RAFM, do you have plans for zbedic to go in parallel with TEI XML (making it easy to convert between the two)?  Is TEI XML something worth learning? 


Freedict project has a script for converting TEI XML -> bedic. TEI XML could be very useful if there were more software supporting it.

QUOTE
I presume that there could be new tags in the future.  Is it possible for Zbedic to ignore all tags that it does not yet know (for example, {bla}, {/bla})?  This way I could put some placeholding tags into my dictionaries for the future and then one day hope that the new tags will be implemented.  In the meantime, the dictionaries could be used as is.  Mostly, these tags would deal with grammatical categories or wordforms, etc.


Currently zbedic will "ignore" unknown tags by displaying them as they are.
Go to the top of the page
 
+Quote Post
kurochka
post Jan 11 2006, 01:31 PM
Post #23





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(rafm @ Jan 4 2006, 12:51 PM)
This tag ({ph}) does not  / should not work.  Once I decided that this tag was a bad idea and I removed it. If there is a trace of this tag somewhere in the documentation, I should remove it.


Reference to this tag is in one of the examples in the bedic-format.txt file.

QUOTE
Currently zbedic will "ignore" unknown tags by displaying them as they are.
*

Understood. But this would look ugly. Is it hard to implement hiding unrecognized tags without taking any action?

By the way, I have ended up using non-combining accent grave (can't remember the Unicode code) as the word emphasis. Now, I can even have word accent for the entry words. I have put accent grave in the ignore-chars list.

By the way, I wonder how wikipedia for zbedic works. Especially, I am interested in inserting pictures in the body of translation field. How can it be done (html?)? WHere should the pictures be? Will mkbedic compile this type of dictionaries?

Thanks.
Go to the top of the page
 
+Quote Post
rafm
post Jan 15 2006, 10:26 AM
Post #24





Group: Members
Posts: 145
Joined: 13-November 04
Member No.: 5,449



QUOTE(kurochka @ Jan 11 2006, 10:31 PM)
QUOTE
Currently zbedic will "ignore" unknown tags by displaying them as they are.
*

Understood. But this would look ugly. Is it hard to implement hiding unrecognized tags without taking any action?


Is there a point of keeping in a dictionary file the information that is never displayed? You can easily remove unwanted tags with awk/perl script from a source file.

Another important thing about zbedic parser: if parsing fails, zbedic will display the entry without any parsing.

QUOTE
By the way, I have ended up using non-combining accent grave (can't remember the Unicode code) as the word emphasis.  Now, I can even have word accent for the entry words.  I have put accent grave in the ignore-chars list.


If you give more details how to get those accented characters working under Qtopia, more people could benefit from this. I could put it somewhere in the documentation.

QUOTE
By the way, I wonder how wikipedia for zbedic works.  Especially, I am interested in inserting pictures in the body of translation field.  How can it be done (html?)?  WHere should the pictures be?  Will mkbedic compile this type of dictionaries?
*


Wikipedia is formated as HTML. mkbedic does not check syntax, so there is no problem if entries are html without any proper zbedic syntax. Images should be theoretically possible, but I have not checked where the files should be located.

I though about an extension for an efficient storage of images in zbedic dictionary files. For example, images could be wavelet compressed (dejavu compression?) with lower bit-depth as zaurus screen can handle 6 bits per r,g,b anyway. Question is whether this is worth the effort.
Go to the top of the page
 
+Quote Post
kurochka
post Apr 10 2006, 09:06 AM
Post #25





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(rafm @ Jan 15 2006, 10:26 AM)
I though about an extension for an efficient storage of images in zbedic dictionary files. For example, images could be wavelet compressed (dejavu compression?)  with lower bit-depth as zaurus screen can handle 6 bits per r,g,b anyway.  Question is whether this is worth the effort.



Having this functionality (having images in the dictionaries themselves) would allow creation of encyclopedias for zbedic. In addition, images could be used for inserting information in the form not supported by bedic format (tables, etc.). The question is how hard is it to implement? Is it worth the effort?

One of these days I will gather my tips about creating bedic dictionaries (e.g., word stress, etc.) and put them in this thread.
Go to the top of the page
 
+Quote Post
ShiroiKuma
post Apr 10 2006, 10:20 AM
Post #26





Group: Members
Posts: 902
Joined: 22-May 04
Member No.: 3,385



kurochka, I'm trying to make a simplified dictionary too. However I can't seem to compile the mkbedic.

Could you send me or post your compiled mkbedic binary?
Go to the top of the page
 
+Quote Post
ludo
post Apr 11 2006, 06:19 PM
Post #27





Group: Members
Posts: 53
Joined: 9-December 05
Member No.: 8,688



Hi

I'm interested also in putting images in a dictionary. That would be a great thing to do. Worth the effort? I don't know how difficult it would be, but worth the result yes for sure!

Ludo
Go to the top of the page
 
+Quote Post
kurochka
post Apr 12 2006, 01:15 PM
Post #28





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(ShiroiKuma @ Apr 10 2006, 10:20 AM)
kurochka, I'm trying to make a simplified dictionary too. However I can't seem to compile the mkbedic.

Could you send me or post your compiled mkbedic binary?
*


Will do. It may take a while though. Have you indicated in another thread that you may have Korean dics. If so could you share them with me?
Go to the top of the page
 
+Quote Post
keisangi
post Aug 28 2006, 10:40 AM
Post #29





Group: Members
Posts: 3
Joined: 20-October 04
Member No.: 5,112



QUOTE(kurochka @ Apr 12 2006, 09:15 PM)
QUOTE(ShiroiKuma @ Apr 10 2006, 10:20 AM)
kurochka, I'm trying to make a simplified dictionary too. However I can't seem to compile the mkbedic.

Could you send me or post your compiled mkbedic binary?
*


Will do. It may take a while though. Have you indicated in another thread that you may have Korean dics. If so could you share them with me?
*




i'm interested in the korean dictionary files too..
i searched a lot but couldn't find anything for the zaurus, except a rather limited one,
here: http://dbrechalov.narod.ru/zaurus/koen.dic.dz

can you share the korean dictionary files pretty please smile.gif
Go to the top of the page
 
+Quote Post
rafm
post Aug 28 2006, 10:54 AM
Post #30





Group: Members
Posts: 145
Joined: 13-November 04
Member No.: 5,449



QUOTE(kurochka @ Apr 10 2006, 06:06 PM)
Having this functionality (having images in the dictionaries themselves) would allow creation of encyclopedias for zbedic.  In addition, images could be used for inserting information in the form not supported by bedic format (tables, etc.).  The question is how hard is it to implement?  Is it worth the effort?
*


Tables can be put in dictionaries using HTML syntax. I works fine unless the table is too large for the small screen.

zbedic can handle images starting from v1.1. Can be both .jpg and .png. To get better compression, one can experiment with reduction of bit-depth and scaling. Linux 'convert' command from ImageMagick can be useful for these operations.
Go to the top of the page
 
+Quote Post

3 Pages V  < 1 2 3 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 28th August 2014 - 03:15 PM