OESF Portables Forum
Everything Else => General Support and Discussion => Zaurus General Forums => Archived Forums => Software => Topic started by: kurochka on November 17, 2005, 02:59:00 pm
-
I have spent the last couple of days reading all the documents about ZBedic format (the old and the new simplified), man page for mkbedic and the example dictionary file. Here is what I did:
1. I compiled mkbedic.
2. I processed by mkbedic the example dic file (mkbedic example.dic dictionary.dic)
4. I installed the new dictionary on Zaurus into the directory where my other dictionaries are.
5. I used "search for dictionaries" function of ZBedic (alternatively, I wrote the path to the dictionary.dic into the zbedic conf file)
Nothing happened. For some reason, ZBedic could not recognize the resulting file. I know I didn't compress the new dic but I understand that it is not necessary. Then, just in case I processed the dictionary.dic (the file resulting from mkbedic) with xerox (although I understand that mkbedic is a replacement for xerox), this did not work either.
Could somebody walk me through the process and explain what I did wrong? Please either use the example.dic or a very simple dic file, like:
id= Dictionary
Word
{s}{ss}meaning{ss/}{s}
Give me examples, please.
Thank in advance
-
Nothing happened. For some reason, ZBedic could not recognize the resulting file. I know I didn't compress the new dic but I understand that it is not necessary.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=104083\"][{POST_SNAPBACK}][/a][/div]
I am affraid that it may be necessary to compress the dictionary. At some point I removed the ".dic" extension from the MIME types, since some people had complained that zbedic was finding too many system files with the ".dic" extension. Now, it can recognize only ".dic.dz" files. It may still read ".dic" files if you add ".dic" to the MIME types, but I wouldn't bet on that.
-
I think I am making progress (but very slow). Here is another problem that I am facing.
I need to make a ZBedic dictionary (this is still a test dictionary to figure out the inner workings). I've prepared the text .dic file (size 3.7 MB). Mkbedic command runs without any errors but the resulting dictionary is exactly 0 bytes.
I've re-read the man page for mkbedic and it says that the command cannot process "very large files." When I make the .dic files smaller (just a couple of pages), then everything works and zbedic can access the dictionary.
My intention is to make a large dictionary (tens of thousands of entries about 40MB or more in txt format) after I figured out how everything works. So, I need to come up with a solution.
Does this mean I have to use xerox? If so, it will be tough for me. It took me a while to figure out the simplified format. I am not sure if I can do the original format. Can I use a text editor to prepare the original format dictionary file? If so, how do I enter 0 byte, etc.?
Can it be that the problem is not the size but something else? But there is no error. What is meant by the "very large files"?
-
Ok, one more question for those in the know.
I need to put pronunciation (or transcription) for entry words in IPA (International Phonetic Alphabet) http://en.wikipedia.org/wiki/IPA (http://en.wikipedia.org/wiki/IPA) and http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm (http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm)
There is a tag for pronunciation {pr} {/pr}. However, I think that none of the unicode fonts for Zs have IPA in them, right? I can't even find a font for Windows with IPA.
Does somebody know whether there is a font for Z that would work on VGA screens and include IPA? I haven't tested. Maybe unifont for Zbedic includes IPA. Does anybody know sure?
-
I think I am making progress (but very slow). Here is another problem that I am facing.
I need to make a ZBedic dictionary (this is still a test dictionary to figure out the inner workings). I've prepared the text .dic file (size 3.7 MB). Mkbedic command runs without any errors but the resulting dictionary is exactly 0 bytes.
I've re-read the man page for mkbedic and it says that the command cannot process "very large files." When I make the .dic files smaller (just a couple of pages), then everything works and zbedic can access the dictionary.
Can it be that the problem is not the size but something else? But there is no error. What is meant by the "very large files"?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105076\"][{POST_SNAPBACK}][/a][/div]
I should have been more precise: very large files means >2GB. So a few megabyte dictionary should work fine.
If mkbedic does not show any error and still you get 0 bytes file, this can be a bug. If possible, please send me this file to rafm at users.sourceforge.net, so I can check what goes wrong.
-
I should have been more precise: very large files means >2GB. So a few megabyte dictionary should work fine.
If mkbedic does not show any error and still you get 0 bytes file, this can be a bug. If possible, please send me this file to rafm at users.sourceforge.net, so I can check what goes wrong.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105097\"][{POST_SNAPBACK}][/a][/div]
Thanks, rafm. It is good news. None of my dictionaries will exceed 2GB (I guess it only matters for wickipedia and such other big projects).
The original problem may not actually be connected to mkbedic. The files were 0 at first. Then, after a while I looked at them again and they were of normal size. It's weird but now everything works - I just have to wait if the size is 0 and then it fixes itself.
I will keep this thread going by asking other questions and making suggestions and comments:
1. Emphasis Tag. I've noticed that {em} tag is hardly seen on the screen. I understand it makes letters bold but somehow they are almost indistinguishable from normal letters. I wonder if {em} could instead make the tagged text a different color to make it stand out. I am using this tag for showing accent in the words. BTW, if the full text search is implemented, will the {de} tags within the word break the search? Any other useful tags that can work for showing word accent (stress)?
2. Ways to Show Word Accent/Stress. As indicated earlier, I use {em} tags to show word stress in the words in the translation portion. This seems somewhat awkward now but it works for the translation portion. How can I show word accent for words for the keywords without ruining the search mechanism? One solution that might work (I am not sure though) -- I could probably use the Unicode stress symbol (I think there is a special symbol in Unicode for word accent) and put it in the ignore char list. Will this work? For example, if the keyword is "a'rmy" (showing stress on A) will the search for "army" locate the keyword if I use the above described approach?
3. Category Tag. The format description document indicates that each sense or subsense can have zero or one category taged text. I have noticed that this is not enforced by zbedic and the sense or subsense can have zero, one or more than one category taged text. This is great news as words/meanings can be of multiple categories (e.g., medicine and chemistry at the same time). So, please leave this as it is. I think the official format definition should be also amended to allow zero or any number of categories.
4. Part of Speech Tag. The format description states that each sense can have zero or one part of speech tagged text {ps}. This is enforced by zbedic. If a sense has more than one {ps} then only the first one is shown, the others disappear. This makes sense (pan intended). However, I just want to note that when converting from other dictionary formats it is to burdensome (probably, should be done manually) to convert the entries that have multiple parts of speech tags in one sense. The solution for me was to just use {de} instead of {ps} because the sense and subsense can have any number of {de}'s. I have noticed that Mueller English-Russian dictionary uses {ps} tag to put a Roman number for each sense (e.g., "a I" then a line and "a II" in the translation window). I think I will also use it for this purpose.
5. Strict Order of Opening/Closing Tags. Some dictionary formats (e.g., DSL for Lingvo) allow any order of tags as long as the closing tag follows the corresponding opening tag (e.g., [ex][cl] any word[ex][cl] in DSL). Zbedic enforces the order of closing tags depending on the order of opening tags -- the outer (inner) opening tag should have a corresponding outer (inner) closing tag (e.g., {ex}{de}Text{de}{ex} and not {ex}{de}Text{ex}{de}). This is just a note for others (some of my entries did not work because of this). I think it makes sense to enforce the order.
6. Pronunciation Tag. I know that a lot of dictionaries use IPA (International Phonetic Alphabet) for pronunciation/transcription of words. So far, I have not seen a font that supports IPA for Zaurus. Therefore, when converting a dictionary I just deleted the pronunciation portion, which is a shame. Maslovsky, do you know any fonts that have IPA and cyrillic in them?
7. Use of Senses and Subsenses. When converting dictionaries, I went the easier route of keeping the hardcoded separation of senses and subsenses (no tags, just text "1." "2." and "1)" and "2)" ). I just put the whole thing into one sense and subsense. The better way is to replace it with the Zbedic separation into {s} and {ss}. But it works anyway. Anybody sees a problem with this approach?
8. Conversion Process. Since I do not know any programming language, I just used the find and replace (including regular expressions) to convert the dictionaries in other formats into Zbedic format. I know that there are some scripts available but they are specific to the format from which the conversion takes place (Wikipedia, Muller). I would appreciate if people would share their scripts with the community here or at Zbedic SF site. Maybe I could adapt those for my use.
9. Making the Source Files for Dictionaries Available. I know that dic.dz can be opened and modified but I think it would be more accessable for those who want to learn the format and/or modify the text of the dictionary files to make available on SF site regular text .dic files (with the new mkbedic the source files are pure text).
10. New Line Break Tags. Looking at the example.dic I see that there are new tags available for line break {br/}. I guess this would be the only tag that does not have/need a corresponding second tag.
11. Just A Sense Without Subsenses? Don't know why I have not tried it yet but I wonder if there can be an entry with just a simple sense (e.g., {s}meaning{/s}) without subsenses? There are lots of words that require a simple one or two word translation and the {ss} tag seams redudant.
-
1. Emphasis Tag. I've noticed that {em} tag is hardly seen on the screen. I understand it makes letters bold but somehow they are almost indistinguishable from normal letters. I wonder if {em} could instead make the tagged text a different color to make it stand out. I am using this tag for showing accent in the words. BTW, if the full text search is implemented, will the {de} tags within the word break the search? Any other useful tags that can work for showing word accent (stress)?
I know that there is a problam with SL5500 - an HTML widget does not handle colors well. I will take a look why there is no much difference on SL-C series.
2. Ways to Show Word Accent/Stress. As indicated earlier, I use {em} tags to show word stress in the words in the translation portion. This seems somewhat awkward now but it works for the translation portion. How can I show word accent for words for the keywords without ruining the search mechanism? One solution that might work (I am not sure though) -- I could probably use the Unicode stress symbol (I think there is a special symbol in Unicode for word accent) and put it in the ignore char list. Will this work? For example, if the keyword is "a'rmy" (showing stress on A) will the search for "army" locate the keyword if I use the above described approach?
This should work. But wouldn't it be better to include stess in the pronunciation?
3. Category Tag. The format description document indicates that each sense or subsense can have zero or one category taged text. I have noticed that this is not enforced by zbedic and the sense or subsense can have zero, one or more than one category taged text. This is great news as words/meanings can be of multiple categories (e.g., medicine and chemistry at the same time). So, please leave this as it is. I think the official format definition should be also amended to allow zero or any number of categories.
OK. I will update the specification.
4. Part of Speech Tag. The format description states that each sense can have zero or one part of speech tagged text {ps}. This is enforced by zbedic. If a sense has more than one {ps} then only the first one is shown, the others disappear. This makes sense (pan intended). However, I just want to note that when converting from other dictionary formats it is to burdensome (probably, should be done manually) to convert the entries that have multiple parts of speech tags in one sense. The solution for me was to just use {de} instead of {ps} because the sense and subsense can have any number of {de}'s. I have noticed that Mueller English-Russian dictionary uses {ps} tag to put a Roman number for each sense (e.g., "a I" then a line and "a II" in the translation window). I think I will also use it for this purpose.
This is specific to zbedic - each entry should be unique, otherwise search does not work. Currently it would be too much work to change it.
You should take a look at: http://www.freedict.org/en/ (http://www.freedict.org/en/) They store dictionaries in XML and they have scripts to convert from XML to multiple dictionary formats, including bedic. The scripts can handle merging of multiple part of speach into one entry with multiple "senses". You could contribute your dictionary to this project.
5. Strict Order of Opening/Closing Tags. Some dictionary formats (e.g., DSL for Lingvo) allow any order of tags as long as the closing tag follows the corresponding opening tag (e.g., [ex][cl] any word[ex][cl] in DSL). Zbedic enforces the order of closing tags depending on the order of opening tags -- the outer (inner) opening tag should have a corresponding outer (inner) closing tag (e.g., {ex}{de}Text{de}{ex} and not {ex}{de}Text{ex}{de}). This is just a note for others (some of my entries did not work because of this). I think it makes sense to enforce the order.
zbedic has a very simple parser, which may fail if the syntax is wrong. mkbedic should perform syntax check in the future.
7. Use of Senses and Subsenses. When converting dictionaries, I went the easier route of keeping the hardcoded separation of senses and subsenses (no tags, just text "1." "2." and "1)" and "2)" ). I just put the whole thing into one sense and subsense. The better way is to replace it with the Zbedic separation into {s} and {ss}. But it works anyway. Anybody sees a problem with this approach?
There should be no problem, but using "ss" tags is be recomended.
8. Conversion Process. Since I do not know any programming language, I just used the find and replace (including regular expressions) to convert the dictionaries in other formats into Zbedic format. I know that there are some scripts available but they are specific to the format from which the conversion takes place (Wikipedia, Muller). I would appreciate if people would share their scripts with the community here or at Zbedic SF site. Maybe I could adapt those for my use.
9. Making the Source Files for Dictionaries Available. I know that dic.dz can be opened and modified but I think it would be more accessable for those who want to learn the format and/or modify the text of the dictionary files to make available on SF site regular text .dic files (with the new mkbedic the source files are pure text).
freedict has perhaps the right set of tools and can store a "source" version of the dictionary.
10. New Line Break Tags. Looking at the example.dic I see that there are new tags available for line break {br/}. I guess this would be the only tag that does not have/need a corresponding second tag.
Yes.
11. Just A Sense Without Subsenses? Don't know why I have not tried it yet but I wonder if there can be an entry with just a simple sense (e.g., {s}meaning{/s}) without subsenses? There are lots of words that require a simple one or two word translation and the {ss} tag seams redudant.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105411\"][{POST_SNAPBACK}][/a][/div]
A single sense without subsenses should work.
-
Thanks for your responses. I am not complaining. I think zbedic is one of the greatest programs for Zaurus and you are kind enough to keep it going.
-
Here is another quirk that I have noticed.
If {ct}-tagged text appears within an example ({ex}), then that {ct}-text is dispayed on the first line next to the keyword (similar to how {ps}-text is displayed), instead of in the body of the example.
If there are several examples and each contains {ct} tagged text, only the last {ct}-text will be displayed next to the keyword, the others will not be displayed at all. The solution is to move the {ct} outside of {ex}.
-
Here is another quirk that I have noticed.
If {ct}-tagged text appears within an example ({ex}), then that {ct}-text is dispayed on the first line next to the keyword (similar to how {ps}-text is displayed), instead of in the body of the example.
If there are several examples and each contains {ct} tagged text, only the last {ct}-text will be displayed next to the keyword, the others will not be displayed at all. The solution is to move the {ct} outside of {ex}.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105641\"][{POST_SNAPBACK}][/a][/div]
Category, {ct}, is usually associated with {ss}. It tells that the particular meaning of a word is only used , for example, in mathematics.
-
Here is another quirk that I have noticed.
If {ct}-tagged text appears within an example ({ex}), then that {ct}-text is dispayed on the first line next to the keyword (similar to how {ps}-text is displayed), instead of in the body of the example.
If there are several examples and each contains {ct} tagged text, only the last {ct}-text will be displayed next to the keyword, the others will not be displayed at all. The solution is to move the {ct} outside of {ex}.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105641\"][{POST_SNAPBACK}][/a][/div]
Category, {ct}, is usually associated with {ss}. It tells that the particular meaning of a word is only used , for example, in mathematics.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105652\"][{POST_SNAPBACK}][/a][/div]
I understand.
With this dictionary that I am working to convert, some examples (which are phrases that have the keyword) belong to a different {ct} category from the {ss} meaning (which may have no specific {ct}). It's like the phrase (that is given as an example {ex}) using the word is only used in politics while the word itself could be general use.
Well, there is no need for modifications to Zbedic. I will just modify the code of the dictionaries. This is something to be mindful of when making dic files.
-
Here is another quirk that I have noticed.
If {ct}-tagged text appears within an example ({ex}), then that {ct}-text is dispayed on the first line next to the keyword (similar to how {ps}-text is displayed), instead of in the body of the example.
If there are several examples and each contains {ct} tagged text, only the last {ct}-text will be displayed next to the keyword, the others will not be displayed at all. The solution is to move the {ct} outside of {ex}.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105641\"][{POST_SNAPBACK}][/a][/div]
Category, {ct}, is usually associated with {ss}. It tells that the particular meaning of a word is only used , for example, in mathematics.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105652\"][{POST_SNAPBACK}][/a][/div]
I've tested it now and it does not make sense. As indicated earlier in this thread {ct} does not behave similar to {ps} anywhere but within an example. As I said earlier one {ss} can have multiple {ct}. Therefore, I think the behaviour of {ct} within an example is anomolous. Here is an example from my upcoming English-Ukrainian dictionary:
misadventure 1
{s}{ss}{ps}n{/ps}
нещ{em}а{/em}стя, нещ{em}а{/em}сний в{em}и{/em}падок
{ex}homicide by misadventure - {ct}юр.{/ct} ненавм{em}и{/em}сне вб{em}и{/em}вство{/ex}{/ss}{/s}
misadventure 2
{s}{ss}{ps}n{/ps}
нещ{em}а{/em}стя, нещ{em}а{/em}сний в{em}и{/em}падок
{ct}юр.{/ct} {ex}homicide by misadventure - ненавм{em}и{/em}сне вб{em}и{/em}вство{/ex}{/ss}{/s}
I could not attach a compiled version for some reason. But if you compile it, you will see, in "misadventure 2" {ct} is displayed in the body of the subsense as I think it should be. In "misadventure 1" {ct} is displayed the same way as {ps} and makes the entry erroneous because only the example belongs to the legal (юр. means legal) category, while the word by itself is general use. The only difference between "misadventure 1" and "misadventure 2" is the position of {ct}: in 1 it's within the example; in 2 it's outside the example. Therefore, I advocate for modifying the way {ct} within in an example is displayed but leaving the {ct} outside an example as it is.
-
I have converted English-Ukrainian and Ukrainian-English dictionaries to zbedic format from the Lingvo format. Although the Lingvo files for these dictionaries are freely available, I could not trace what sort of license is attached to them. If somebody needs them, send me a pm.
-
Hello kurochka and rafm
This is a very interesting thread. Though I am far from being at your level of knowledge and I couldn't pretend to create a dictionary, I have few questions also:
-I want to add entries to an existing dictionary: can I do it and what to do? I am not asking you a step by step detailed type of answer, but just hints to get started. I would like to add entries to the chinese-english / en-ch dictionary. And would be interested also in adding pynyin pronounciation for the chinese.
-Then I would like to make a mini specialized dictionary, japanese-en-japanese, of woodworking and wood related terms. Where to start?
Thanks for any of your help. I am with you to promote the ZBEDIC.
kurochka, I am making good progresses with your C3000!
Ludo from Taiwan
-
I can already answer some of my own questions.
I report my findings here, in case any one would be interested, though I'm sure most of you know it already, there may be newbies (as I am) interested.
First of all, the ZBedict home page:
http://bedic.sourceforge.net/index.html (http://bedic.sourceforge.net/index.html)
To make a dictionary:
Very good explanations are given here, with step by step at the end of the page:
http://cvs.sourceforge.net/viewcvs.py/bedi...1.5&view=markup (http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/bedic-format.txt?rev=1.5&view=markup)
The feature to edit an existing dictionary will be developped int the coming future.
See news:
http://bedic.sourceforge.net/index.html#news (http://bedic.sourceforge.net/index.html#news)
Now, I need to find out how to create a Japanese dictionary, that seems to be more tricky.
Any help from any one?
Ludo
-
I can already answer some of my own questions.
I report my findings here, in case any one would be interested, though I'm sure most of you know it already, there may be newbies (as I am) interested.
First of all, the ZBedict home page:
http://bedic.sourceforge.net/index.html (http://bedic.sourceforge.net/index.html)
To make a dictionary:
Very good explanations are given here, with step by step at the end of the page:
http://cvs.sourceforge.net/viewcvs.py/bedi...1.5&view=markup (http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/bedic-format.txt?rev=1.5&view=markup)
The feature to edit an existing dictionary will be developped int the coming future.
See news:
http://bedic.sourceforge.net/index.html#news (http://bedic.sourceforge.net/index.html#news)
Now, I need to find out how to create a Japanese dictionary, that seems to be more tricky.
Any help from any one?
Ludo
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=107698\")
Hi, ludo!!! So glad you enjoy your 3000!
RAFM made it much easier to create new dictionaries with the mkbedic program.
First, you need to create a dictionary source file in any text editor (it should be simple txt in unicode, I am not sure if UTF-8 will support Japanese though) using the Zbedic format (the link you provided above). Please see this example dictionary:
[a href=\"http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/example.dic?rev=1.2&view=markup]http://cvs.sourceforge.net/viewcvs.py/bedi...1.2&view=markup[/url]
The file may need to contain the char-precendent entry (it is not necessary under some circumstances). However, with kanji I do not see it is possible. Please see an example of char-precedence here:
http://cvs.sourceforge.net/viewcvs.py/bedi...1.1&view=markup (http://cvs.sourceforge.net/viewcvs.py/bedic/libbedic/doc/char-precedence_wikipedia.dic?rev=1.1&view=markup)
Second, you will need to compile mkbedic program on your linux system and see the man page for mkbedic on how to use it. Mkbedic will tell you if there are some important errors with your dic source file. Note that mkbedic may miss some problems that you will only discover when some of the entries are not displayed correctly (like, missing closing tags, etc.).
Third, after you run mkbedic on your dictionary source file, you will need to dictzip it. See http://www.die.net/doc/linux/man/man1/dictzip.1.html (http://www.die.net/doc/linux/man/man1/dictzip.1.html) for more info. You will need to install dictzip on your system. The resulting file will be dic.dz
I will add more details later.
I also believe that there is a way to unpack the old dic.dz files (they used a different form of compression and then modify the contents of the file to add new entries. I have never done it though. I always work with my source files and make any changes to the original source files.
-
Rafm,
is that possible to create an english-japanese-english dictionary to work on the zbedict? What would be the issue with caracter encoding? From what I understand, ZBEDic accepts only UTF-8 coding, and it seems that lot of Japanese is coded with EUC-JP. What do you think?
Thanks Kurochka.
In the meantime I have been investigating about the EPWING format, which seem dedicated to Japanese. But I come back to Zbedict anyway.
I have no linux machine now, do I need one to run mkbedic, or can I do it on the Z? What Linux distribution should I chose?
As for the japanese, I have seen there is a Japanese- english dictionary available for the Bedic. I am just downloading it now, will install it soon to see how it works.
The japanese english dictionary for the ZBEDic works all right, display is nice. Do you know how is the japanese encoded there?
Ludo
-
Rafm,
is that possible to create an english-japanese-english dictionary to work on the zbedict? What would be the issue with caracter encoding? From what I understand, ZBEDic accepts only UTF-8 coding, and it seems that lot of Japanese is coded with EUC-JP. What do you think?
Thanks Kurochka.
In the meantime I have been investigating about the EPWING format, which seem dedicated to Japanese. But I come back to Zbedict anyway.
I have no linux machine now, do I need one to run mkbedic, or can I do it on the Z? What Linux distribution should I chose?
As for the japanese, I have seen there is a Japanese- english dictionary available for the Bedic. I am just downloading it now, will install it soon to see how it works.
The japanese english dictionary for the ZBEDic works all right, display is nice. Do you know how is the japanese encoded there?
Ludo
[div align=\"right\"][a href=\"index.php?act=findpost&pid=108170\"][{POST_SNAPBACK}][/a][/div]
I don't know much about Japanese. I know that Zten, Zdic and some other dictionaries (including the Zaurus built in dictionary) that are based on EPWING format are popular among Japanese learners. Moreover, I understand that there are a bunch of dictionaries available in those formats.
I am not sure about the encoding for Japanese-English zbedic dictionary. I suspect it is unicode of some sort (UTF-16?). You may actually try different encodings on a test dictionary and see whether it displays properly using a font that supports that encoding.
mkbedic will compile and run only under Linux, as I understand (there are some special projects that try to port Linux programs to Windows API but I have not tried those). I have never compiled anything on Zaurus but I've heard it is possible. Given that you need to jump through hoops to install the development environment and the fact that Zaurus is much slower than a desktop, you are better off compiling dictionaries on a desktop. If your dictionary is not too big you could make the txt file on the Zaurus first and then mkbedic and dictzip it on a desktop.
You could look at knoppix.net for a simple LiveCD Linux that you don't have to install on your Windows computer. You just boot your computer from the CD without installation. When you are done using Linux you are back to your Windows without any modifications. I hope that knoppix has all the necessary libraries and programs to compile mkbedic. It must have.
I use Suse Linux 10 and Windows XP on the same system (dual boot). All Linux flavors can be downloaded for free and burned to a cd or dvd.
-
I will keep this thread going by asking other questions and making suggestions and comments:
1. Emphasis Tag. I've noticed that {em} tag is hardly seen on the screen. I understand it makes letters bold but somehow they are almost indistinguishable from normal letters. I wonder if {em} could instead make the tagged text a different color to make it stand out. I am using this tag for showing accent in the words. BTW, if the full text search is implemented, will the {de} tags within the word break the search? Any other useful tags that can work for showing word accent (stress)?
2. Ways to Show Word Accent/Stress. As indicated earlier, I use {em} tags to show word stress in the words in the translation portion. This seems somewhat awkward now but it works for the translation portion. How can I show word accent for words for the keywords without ruining the search mechanism? One solution that might work (I am not sure though) -- I could probably use the Unicode stress symbol (I think there is a special symbol in Unicode for word accent) and put it in the ignore char list. Will this work? For example, if the keyword is "a'rmy" (showing stress on A) will the search for "army" locate the keyword if I use the above described approach?
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=105411\")
Just several comments to my earlier notes.
Emphasis Tag. Emphasis {em} tag works. The problem of it not showing was somehow connected to the size of the font I used. I decreased the size of the font and the emphasis started to show.
Showing Word Accent I am still struggling with this. The proper way would be to use "combining diacritical marks" from Unicode standard (see [a href=\"http://www.unicode.org/charts/PDF/U0300.pdf)]http://www.unicode.org/charts/PDF/U0300.pdf)[/url]. 0301 looks like the one I would like to use (however, 02c8 or 02b9 could potentially be also used, I guess). However, it turns out that a lot of programs cannot combine these marks with other glyphs. Microsoft Word kind of works but still it is a problem. I have not tested whether Zbedic supports this because I have hard time creating a test dictionary with these marks. In the perfect world, these combining diacritical marks would combine with a preceding glyph to form one character for displaying purposes. I would then only list the main character in the char-precedence while putting the diacritical mark 0301 into ignore list. This way I could see the stress of all words (not just in transcription) but the word stress would be ingnored for searching purposes. Does this make sense? Anyone has any ideas or suggestions?
{ph} tag? I have just noticed a {ph} tag. I don't see a description of this tag other than that it is somehow used for "description" field. What is it supposed to do? What are the suggestions for its use? If nobody knows for sure, I will try it and report back.
One other question. Will the old linux bedic program (qbedic?) work with the files created by mkbedic (by ignoring the new features) or will it fail to open the new dictionaries? Again, I will try to test this. I feat that it will fail and we will have to wait for a zbedic port to linux.
RAFM, do you have plans for zbedic to go in parallel with TEI XML (making it easy to convert between the two)? Is TEI XML something worth learning?
I presume that there could be new tags in the future. Is it possible for Zbedic to ignore all tags that it does not yet know (for example, {bla}, {/bla})? This way I could put some placeholding tags into my dictionaries for the future and then one day hope that the new tags will be implemented. In the meantime, the dictionaries could be used as is. Mostly, these tags would deal with grammatical categories or wordforms, etc.
Otherwise, I am moving along and getting better with using zbedic format.
-
A couple of suggestions:
1. Internal Comments. I think it would be great to have a tag to comment out text in the dic file. This way we could put comments or yet unimplemented text in the dictionaries without it showing.
2. Another tag could be for comments that could be displayed or hidden depending on users' choice (similar to how pronounciation tag works).
These are just ideas.
-
Rafm,
is that possible to create an english-japanese-english dictionary to work on the zbedict? What would be the issue with caracter encoding? From what I understand, ZBEDic accepts only UTF-8 coding, and it seems that lot of Japanese is coded with EUC-JP. What do you think?
Thanks Kurochka.
In the meantime I have been investigating about the EPWING format, which seem dedicated to Japanese. But I come back to Zbedict anyway.
I have no linux machine now, do I need one to run mkbedic, or can I do it on the Z? What Linux distribution should I chose?
As for the japanese, I have seen there is a Japanese- english dictionary available for the Bedic. I am just downloading it now, will install it soon to see how it works.
The japanese english dictionary for the ZBEDic works all right, display is nice. Do you know how is the japanese encoded there?
Ludo
[div align=\"right\"][a href=\"index.php?act=findpost&pid=108170\"][{POST_SNAPBACK}][/a][/div]
* You need either Linux (gcc 3.x) or cygwin on Windows to compile mkbedic. It is also possible to cross-compile mkbedic for zaurus. I will consider including mkbedic for zaurus in the future release of zbedic.
* All bedic dictionaries must be in utf-8. You can use "iconv" under Linux to convert from whatever encoding to utf-8. I use emacs to edit utf-8 text files.
-
Just several comments to my earlier notes.
Emphasis Tag. Emphasis {em} tag works. The problem of it not showing was somehow connected to the size of the font I used. I decreased the size of the font and the emphasis started to show.
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=108370\")
Some fonts or font sizes may be missing an italic, which is used for the emphasis.
Showing Word Accent I am still struggling with this. The proper way would be to use "combining diacritical marks" from Unicode standard (see [a href=\"http://www.unicode.org/charts/PDF/U0300.pdf)]http://www.unicode.org/charts/PDF/U0300.pdf)[/url]. 0301 looks like the one I would like to use (however, 02c8 or 02b9 could potentially be also used, I guess). However, it turns out that a lot of programs cannot combine these marks with other glyphs. Microsoft Word kind of works but still it is a problem. I have not tested whether Zbedic supports this because I have hard time creating a test dictionary with these marks. In the perfect world, these combining diacritical marks would combine with a preceding glyph to form one character for displaying purposes. I would then only list the main character in the char-precedence while putting the diacritical mark 0301 into ignore list. This way I could see the stress of all words (not just in transcription) but the word stress would be ingnored for searching purposes. Does this make sense? Anyone has any ideas or suggestions?
As far as I know combining diacritical marks does not work under QTopia.
{ph} tag? I have just noticed a {ph} tag. I don't see a description of this tag other than that it is somehow used for "description" field. What is it supposed to do? What are the suggestions for its use? If nobody knows for sure, I will try it and report back.
This tag does not / should not work. Once I decided that this tag was a bad idea and I removed it. If there is a trace of this tag somewhere in the documentation, I should remove it.
One other question. Will the old linux bedic program (qbedic?) work with the files created by mkbedic (by ignoring the new features) or will it fail to open the new dictionaries? Again, I will try to test this. I feat that it will fail and we will have to wait for a zbedic port to linux.
If you don't use any new features and your dictionaries are <2GB, files generated with mkbedic should work with qbedic. But I haven't checked this.
RAFM, do you have plans for zbedic to go in parallel with TEI XML (making it easy to convert between the two)? Is TEI XML something worth learning?
Freedict project has a script for converting TEI XML -> bedic. TEI XML could be very useful if there were more software supporting it.
I presume that there could be new tags in the future. Is it possible for Zbedic to ignore all tags that it does not yet know (for example, {bla}, {/bla})? This way I could put some placeholding tags into my dictionaries for the future and then one day hope that the new tags will be implemented. In the meantime, the dictionaries could be used as is. Mostly, these tags would deal with grammatical categories or wordforms, etc.
Currently zbedic will "ignore" unknown tags by displaying them as they are.
-
This tag ({ph}) does not / should not work. Once I decided that this tag was a bad idea and I removed it. If there is a trace of this tag somewhere in the documentation, I should remove it.
Reference to this tag is in one of the examples in the bedic-format.txt file.
Currently zbedic will "ignore" unknown tags by displaying them as they are.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=109541\"][{POST_SNAPBACK}][/a][/div]
Understood. But this would look ugly. Is it hard to implement hiding unrecognized tags without taking any action?
By the way, I have ended up using non-combining accent grave (can't remember the Unicode code) as the word emphasis. Now, I can even have word accent for the entry words. I have put accent grave in the ignore-chars list.
By the way, I wonder how wikipedia for zbedic works. Especially, I am interested in inserting pictures in the body of translation field. How can it be done (html?)? WHere should the pictures be? Will mkbedic compile this type of dictionaries?
Thanks.
-
Currently zbedic will "ignore" unknown tags by displaying them as they are.
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=109541\")
Understood. But this would look ugly. Is it hard to implement hiding unrecognized tags without taking any action?
Is there a point of keeping in a dictionary file the information that is never displayed? You can easily remove unwanted tags with awk/perl script from a source file.
Another important thing about zbedic parser: if parsing fails, zbedic will display the entry without any parsing.
By the way, I have ended up using non-combining accent grave (can't remember the Unicode code) as the word emphasis. Now, I can even have word accent for the entry words. I have put accent grave in the ignore-chars list.
If you give more details how to get those accented characters working under Qtopia, more people could benefit from this. I could put it somewhere in the documentation.
By the way, I wonder how wikipedia for zbedic works. Especially, I am interested in inserting pictures in the body of translation field. How can it be done (html?)? WHere should the pictures be? Will mkbedic compile this type of dictionaries?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=110545\"][{POST_SNAPBACK}][/a][/div]
Wikipedia is formated as HTML. mkbedic does not check syntax, so there is no problem if entries are html without any proper zbedic syntax. Images should be theoretically possible, but I have not checked where the files should be located.
I though about an extension for an efficient storage of images in zbedic dictionary files. For example, images could be wavelet compressed ([a href=\"http://www.lizardtech.com/download/]dejavu[/url] compression?) with lower bit-depth as zaurus screen can handle 6 bits per r,g,b anyway. Question is whether this is worth the effort.
-
I though about an extension for an efficient storage of images in zbedic dictionary files. For example, images could be wavelet compressed (dejavu (http://www.lizardtech.com/download/) compression?) with lower bit-depth as zaurus screen can handle 6 bits per r,g,b anyway. Question is whether this is worth the effort.
Having this functionality (having images in the dictionaries themselves) would allow creation of encyclopedias for zbedic. In addition, images could be used for inserting information in the form not supported by bedic format (tables, etc.). The question is how hard is it to implement? Is it worth the effort?
One of these days I will gather my tips about creating bedic dictionaries (e.g., word stress, etc.) and put them in this thread.
-
kurochka, I'm trying to make a simplified dictionary too. However I can't seem to compile the mkbedic.
Could you send me or post your compiled mkbedic binary?
-
Hi
I'm interested also in putting images in a dictionary. That would be a great thing to do. Worth the effort? I don't know how difficult it would be, but worth the result yes for sure!
Ludo
-
kurochka, I'm trying to make a simplified dictionary too. However I can't seem to compile the mkbedic.
Could you send me or post your compiled mkbedic binary?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=122544\"][{POST_SNAPBACK}][/a][/div]
Will do. It may take a while though. Have you indicated in another thread that you may have Korean dics. If so could you share them with me?
-
kurochka, I'm trying to make a simplified dictionary too. However I can't seem to compile the mkbedic.
Could you send me or post your compiled mkbedic binary?
[div align=\"right\"][{POST_SNAPBACK}][/a][/div] (http://index.php?act=findpost&pid=122544\")
Will do. It may take a while though. Have you indicated in another thread that you may have Korean dics. If so could you share them with me?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=122920\"][{POST_SNAPBACK}][/a][/div]
i'm interested in the korean dictionary files too..
i searched a lot but couldn't find anything for the zaurus, except a rather limited one,
here: [a href=\"http://dbrechalov.narod.ru/zaurus/koen.dic.dz]http://dbrechalov.narod.ru/zaurus/koen.dic.dz[/url]
can you share the korean dictionary files pretty please
-
Having this functionality (having images in the dictionaries themselves) would allow creation of encyclopedias for zbedic. In addition, images could be used for inserting information in the form not supported by bedic format (tables, etc.). The question is how hard is it to implement? Is it worth the effort?
[div align=\"right\"][a href=\"index.php?act=findpost&pid=122536\"][{POST_SNAPBACK}][/a][/div]
Tables can be put in dictionaries using HTML syntax. I works fine unless the table is too large for the small screen.
zbedic can handle images starting from v1.1. Can be both .jpg and .png. To get better compression, one can experiment with reduction of bit-depth and scaling. Linux 'convert' command from ImageMagick can be useful for these operations.
-
I am trying to make the text of different color within the entry with the following html code:
TEXT
It does not work on Z (but does work on the desktop Linux). Can somebody suggest what is wrong? Does the html widget that is used by zbedic support font color?
Thanks
-
Does it work if you do this instead ?
TEXT
-
Does it work if you do this instead ?
TEXT
[div align=\"right\"][a href=\"index.php?act=findpost&pid=164751\"][{POST_SNAPBACK}][/a][/div]
Thank you for the suggestion. I didn't think that should make a difference but it did.
Hex colors work!