I should have been more precise: very large files means >2GB. So a few megabyte dictionary should work fine.
If mkbedic does not show any error and still you get 0 bytes file, this can be a bug. If possible, please send me this file to rafm at users.sourceforge.net, so I can check what goes wrong.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=105097\"][{POST_SNAPBACK}][/a][/div]
Thanks, rafm. It is good news. None of my dictionaries will exceed 2GB (I guess it only matters for wickipedia and such other big projects).
The original problem may not actually be connected to mkbedic. The files were 0 at first. Then, after a while I looked at them again and they were of normal size. It's weird but now everything works - I just have to wait if the size is 0 and then it fixes itself.
I will keep this thread going by asking other questions and making suggestions and comments:
1.
Emphasis Tag. I've noticed that {em} tag is hardly seen on the screen. I understand it makes letters bold but somehow they are almost indistinguishable from normal letters. I wonder if {em} could instead make the tagged text a different color to make it stand out. I am using this tag for showing accent in the words. BTW, if the full text search is implemented, will the {de} tags within the word break the search? Any other useful tags that can work for showing word accent (stress)?
2.
Ways to Show Word Accent/Stress. As indicated earlier, I use {em} tags to show word stress in the words in the translation portion. This seems somewhat awkward now but it works for the translation portion. How can I show word accent for words for the keywords without ruining the search mechanism? One solution that might work (I am not sure though) -- I could probably use the Unicode stress symbol (I think there is a special symbol in Unicode for word accent) and put it in the ignore char list. Will this work? For example, if the keyword is "a'rmy" (showing stress on A) will the search for "army" locate the keyword if I use the above described approach?
3.
Category Tag. The format description document indicates that each sense or subsense can have zero or one category taged text. I have noticed that this is not enforced by zbedic and the sense or subsense can have zero, one or more than one category taged text. This is great news as words/meanings can be of multiple categories (e.g., medicine and chemistry at the same time). So, please leave this as it is. I think the official format definition should be also amended to allow zero or any number of categories.
4.
Part of Speech Tag. The format description states that each sense can have zero or one part of speech tagged text {ps}. This is enforced by zbedic. If a sense has more than one {ps} then only the first one is shown, the others disappear. This makes sense (pan intended). However, I just want to note that when converting from other dictionary formats it is to burdensome (probably, should be done manually) to convert the entries that have multiple parts of speech tags in one sense. The solution for me was to just use {de} instead of {ps} because the sense and subsense can have any number of {de}'s. I have noticed that Mueller English-Russian dictionary uses {ps} tag to put a Roman number for each sense (e.g., "a I" then a line and "a II" in the translation window). I think I will also use it for this purpose.
5.
Strict Order of Opening/Closing Tags. Some dictionary formats (e.g., DSL for Lingvo) allow any order of tags as long as the closing tag follows the corresponding opening tag (e.g., [ex][cl] any word[ex][cl] in DSL). Zbedic enforces the order of closing tags depending on the order of opening tags -- the outer (inner) opening tag should have a corresponding outer (inner) closing tag (e.g., {ex}{de}Text{de}{ex} and not {ex}{de}Text{ex}{de}). This is just a note for others (some of my entries did not work because of this). I think it makes sense to enforce the order.
6.
Pronunciation Tag. I know that a lot of dictionaries use IPA (International Phonetic Alphabet) for pronunciation/transcription of words. So far, I have not seen a font that supports IPA for Zaurus. Therefore, when converting a dictionary I just deleted the pronunciation portion, which is a shame. Maslovsky, do you know any fonts that have IPA and cyrillic in them?
7.
Use of Senses and Subsenses. When converting dictionaries, I went the easier route of keeping the hardcoded separation of senses and subsenses (no tags, just text "1." "2." and "1)" and "2)" ). I just put the whole thing into one sense and subsense. The better way is to replace it with the Zbedic separation into {s} and {ss}. But it works anyway. Anybody sees a problem with this approach?
8.
Conversion Process. Since I do not know any programming language, I just used the find and replace (including regular expressions) to convert the dictionaries in other formats into Zbedic format. I know that there are some scripts available but they are specific to the format from which the conversion takes place (Wikipedia, Muller). I would appreciate if people would share their scripts with the community here or at Zbedic SF site. Maybe I could adapt those for my use.
9.
Making the Source Files for Dictionaries Available. I know that dic.dz can be opened and modified but I think it would be more accessable for those who want to learn the format and/or modify the text of the dictionary files to make available on SF site regular text .dic files (with the new mkbedic the source files are pure text).
10.
New Line Break Tags. Looking at the example.dic I see that there are new tags available for line break {br/}. I guess this would be the only tag that does not have/need a corresponding second tag.
11.
Just A Sense Without Subsenses? Don't know why I have not tried it yet but I wonder if there can be an entry with just a simple sense (e.g., {s}meaning{/s}) without subsenses? There are lots of words that require a simple one or two word translation and the {ss} tag seams redudant.