Author Topic: New Japanese Dictionaries For Zbedic  (Read 18870 times)

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« on: July 27, 2007, 08:14:43 am »
I have converted EDICT to bedic format so that it can be used with zbedic. The Nihongo-philes among you will probably say "that's not new". It's true, other people have converted EDICT before me and they have been satisfactory until now but for several reasons I thought I could improve upon those conversions. Mine are new because:
  • I'm using a newer version of EDICT. The last ja-en dictionary on the zbedic page was several years old and EDICT is updated regularly. They even released a new version today so already my conversion is old! I convert using a script so I can easily download it again and re-run the conversion process.
  • I added a version where the keywords are kana. I don't know about you, but I don't know all the kanji yet. I also made sure the kana is included in the pronunciation of the kanji keyword version.
  • My scripts mark up all the information such as part of speech or category and don't miss it out. I deal with duplicated entries properly (which are otherwise lost).
  • Unfortunately (in the sense of fair play) I have to recommend you de-install the previous conversions by other people as they do not acknowledge EDICT properly as required by it's distribution licence.
I don't know how the previously available en-ja dictionary was created; it looks like it could have been made manually due to the strange formatting and it only covers about 50% of the entries in EDICT. However, it is created by reverse look up (like my version) which is decidedly dodgy. I've tried to make up some sensible rules to group together entries that should be listed together, like "run" with "to run", "good" with "to be good" etc etc.

I'm using Japanese every day so as time passes I hope to improve the conversion (particularly en-ja) as experience prompts me to make improvements.

You can find the new files and some details here.
Screenshots: ja-en Kanji, ja-en Kana, en-ja.

I have a Shepherd with 1 GB SD card so I just slip these files (~3 MB each) onto the SD card, no problems.

Regarding the aforementioned duplicated entries problem, bedic also has a problem with repeated headwords. I will be making my scripts to fix such problems available shortly, in the hope to encourage other people to put together useful dictionaries and glossaries.

Comments welcome!
« Last Edit: July 27, 2007, 08:20:21 am by koan »
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

kurochka

  • Sr. Member
  • ****
  • Posts: 301
    • View Profile
New Japanese Dictionaries For Zbedic
« Reply #1 on: July 27, 2007, 09:02:29 am »
Thank you, Koan!!!

I am studying Japanese on and off.  I've got a good a good EPWING ja-en dictionary but I am having a problem with en-ja since the pricy EPWING dictionaries do not include kana.

I looked at the previous EDICT versions and was not satisfied.  I'll check out yours.

Will you open your conversion script?

What sort of problem with headwords are you referring to?  Lots of dictionary software does not allow two or more same headwords to be used.  In zbedic, headwords could turn to be "same" in many instances where some chars are in ignore-char list.

I also wish that zbedic would support ~ in the text of the translation (by substituting the ~ with the headword).  This way the size of the dictionaries could be reduced further.
« Last Edit: July 27, 2007, 09:03:46 am by kurochka »
SL-C3100 (from PriceJapan.com): modified Sharp Rom (couldn't make Japanese input work in Cacko Rom)

ex-SL-C3000; ex-SL-5600; ex-Simpad

Frederic Bergeron

  • Full Member
  • ***
  • Posts: 150
    • View Profile
    • http://
New Japanese Dictionaries For Zbedic
« Reply #2 on: July 27, 2007, 10:55:34 am »
I've just installed the dictionaries.  They look good so far.  I can see that the data has been updated and that's very appreciated. Thanks for your good work.
toMOTko Flashcard Project
SL-C1000 Sharp ROM

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #3 on: July 28, 2007, 09:32:00 am »
Quote
What sort of problem with headwords are you referring to?  Lots of dictionary software does not allow two or more same headwords to be used.  In zbedic, headwords could turn to be "same" in many instances where some chars are in ignore-char list.

It's nothing to do with ignore-char; headwords are the part marked up with {hw} and {/hw}.

If you have an entry with 3 senses and the middle sense has a headword, the last sense is rendered by zbedic with the middle sense's headword even though none is set. e.g.
Code: [Select]
entry
{s}
this is sense 1
{/s}
{s}
{hw}new headword{/hw}
this is sense 2
{/s}
{s}
this is sense 3
{/s}

the result looks like

Quote
entry
----
sense 1
----
/new headword/
sense 2
----
/new headword/
sense 3

the example can easily happen when you have a dictionary with duplicate entries e.g.
Code: [Select]
entry
{s}
sense 1
{/s}
{s}
{hw}new headword{/hw}
sense 2
{/s}

... (other entries)

entry
{s}
sense 3
{/s}

I have a script to detect and join up these "duplicate entries" but the headword is repeated as above.
Does this make sense ? I think it is a bug of zbedic. Anyway, I wrote another script that fixes the above problem by shuffling entries.

This will all be explained on the bedic tips web page I am writing.
« Last Edit: August 14, 2007, 05:01:00 am by koan »
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #4 on: August 05, 2007, 02:55:35 am »
As promised, my tips and scripts for making bedic format dictionaries can now be found: here


Quote
Will you open your conversion script?

Not at this time.

thanks!
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

lucho

  • Jr. Member
  • **
  • Posts: 57
    • View Profile
    • http://
New Japanese Dictionaries For Zbedic
« Reply #5 on: August 07, 2007, 10:56:39 pm »
Quote
I also wish that zbedic would support ~ in the text of the translation (by substituting the ~ with the headword).  This way the size of the dictionaries could be reduced further.

I am not sure that's true. zlib probably takes care of headword repetitions. Anyway, zbedic support the hw tag, which is some cases could take less space than the headword itself

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #6 on: August 07, 2007, 11:11:56 pm »
Quote
I also wish that zbedic would support ~ in the text of the translation (by substituting the ~ with the headword).  This way the size of the dictionaries could be reduced further.
[div align=\"right\"][a href=\"index.php?act=findpost&pid=165431\"][{POST_SNAPBACK}][/a][/div]

I missed this part until lucho's post.

Doesn't the {hw/} tag accomplish this ? See the examples section in bedic-format.txt and check out sense 3. Or is this not actually implemented in zbedic itself ?
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

rolf

  • Full Member
  • ***
  • Posts: 105
    • View Profile
    • http://home.arcor.de/leggewie/
New Japanese Dictionaries For Zbedic
« Reply #7 on: August 13, 2007, 08:34:12 pm »
Thanks, koan for your work.  I hope you consider joining the gakusei community.  FOSS is all about focal points and critical mass.

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #8 on: December 16, 2007, 09:05:24 am »
It's been some months since I uploaded these dictionary files so here's a little Christmas gift: I've updated to today's version of EDICT. The latest version of my files is 1.2.

You can find the new files in the usual place here. Comments and suggestions still welcome, especially now as I'm starting to think about my plans for next year.
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

snk4ever

  • Jr. Member
  • **
  • Posts: 52
    • View Profile
New Japanese Dictionaries For Zbedic
« Reply #9 on: December 16, 2007, 04:00:36 pm »
Quote from: koan
It's been some months since I uploaded these dictionary files so here's a little Christmas gift: I've updated to today's version of EDICT. The latest version of my files is 1.2.

You can find the new files in the usual place here. Comments and suggestions still welcome, especially now as I'm starting to think about my plans for next year.
I have some questions there, is it possible to have japanese/korean/chinese input methods at the same time on a zaurus ?
Do you know a Chinese dictionnary that would be free ?

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #10 on: December 16, 2007, 07:34:41 pm »
Quote from: snk4ever
I have some questions there, is it possible to have japanese/korean/chinese input methods at the same time on a zaurus ?
Yes, no problem.

Quote
Do you know a Chinese dictionnary that would be free ?

It would be better to start a new thread to ask that question.

Take a look here: http://bedic.sourceforge.net/dict-list-keyword-lang.html
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #11 on: October 12, 2009, 04:39:47 pm »
I just made an update to these dictionary files, you can find them on my website at the usual place.

I updated my conversion scripts to handle some new parts of speech and categories
in EDICT. I also wrote a mega script that downloads EDICT, decompresses it,
converts the coding, calls all 3 conversion scripts, runs my other scripts to fix
duplications and broken headwords and finally converts from simple bedic
to final bedic output.

This is version 1.3; the last version was almost 2 years ago so there are many
thousands of new entries. It's definitely worth updating.
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #12 on: June 19, 2010, 05:55:55 pm »
It's update time. Version 1.4 of my conversion of EDICT is now available from my website, freshly converted last night. Several thousand words have been added since I made version 1.3. By the way, my version numbers are something I made up so you can tell if you have the latest file. EDICT is updated much more regularly.

I noticed in my server logs the other day that there were some attempts to download older versions, e.g. 1.2. I have to delete old versions because I have limited space on my server and these files are quite large. Please make sure you update your links or link to my dictionaries page instead.
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #13 on: July 26, 2011, 05:02:46 pm »
Seeing a recent post about using Zaurus as a Japanese dictionary reminded me to update my EDICT conversion. Version 1.5 is available for downloading; there are more than 10,000 new entries since version 1.4.

Yes, I said 10,000:

Japanese-English (Kanji keywords)
version 1.4: 170,827 items
version 1.5: 184,404 items

as measured by zbedic, select the dictionary and clear the input field to get the dictionary info.
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM

koan

  • Sr. Member
  • ****
  • Posts: 370
    • View Profile
    • http://www.lyndonhill.com
New Japanese Dictionaries For Zbedic
« Reply #14 on: November 10, 2013, 06:58:05 am »
I'm continuing to update my conversion of EDICT for bedic format and I just uploaded version 1.8 to my website.

For comparison to previous versions, 1.8 has 216,014 Kanji keywords. That's over 31,000 definitions added since my last post in this thread.

I was wondering if EDICT was converging to the point of having almost all of modern Japanese covered. Something to bear in mind is that there are a number of brand names and expressions in EDICT but mostly it remains fairly high quality.
« Last Edit: November 10, 2013, 06:58:29 am by koan »
Zocalo Feed Reader : Thai on Zaurus : Dictionaries for zbedic : Sharp ROM package feed
HELUX Handheld Embedded Linux Blog
SL-C3200 Multiboot : SL-C750  Sharp ROM