OESF | ELSI | pdaXrom | OpenZaurus | Zaurus Themes | Community Links | Ibiblio

IPB

Welcome Guest ( Log In | Register )

2 Pages V  < 1 2  
Reply to this topicStart new topic
> Korean-english Dictionary (ldic)
koan
post Sep 18 2006, 01:56 AM
Post #16





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



QUOTE(paka @ Aug 31 2006, 06:38 AM)
So I have an offer, if anyone can get the dictionary files from this site (they are for a program that is open source with code on the site):
http://kldp.net/frs/?group_id=73
into zbedic format, I will pay them $100 for their time....just a little incentive to maybe contribute something to the community.  I'd do it myself, but my C skills are kind of lacking....
Anyone who thinks they would be willing to take on the project, PM and we can work out the details...


It really isn't that difficult.

I have converted several Thai dictionaries to zbedic format and am in process of converting some more. I write a PERL script to convert from the original format to zbedic basic format and then use the programs supplied with bedic to make it into a dictionary file.

I use PERL to extract the different parts of the word definition and generate the output.

However, the difficult bit is that you need to know something about the language in order to make a decent quality conversion.
Go to the top of the page
 
+Quote Post
koan
post Sep 18 2006, 02:10 AM
Post #17





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



OK, I had a quick look at the dictionary file and the ldic source. It doesn't look difficult to extract the definitions from the dictionary. However, the original dictionary file looks like it might be in mark up but I can't quite tell if the coding is UTF-8 or something Korean specific.

I think it would be quite easy to hack a short program to load the dictionary and spew out bedic simple format but you'd need to know more about the character coding in order to complete the conversion.
Go to the top of the page
 
+Quote Post
Guest_ttkman_*
post Sep 18 2006, 04:53 AM
Post #18





Guests






QUOTE(koan @ Sep 18 2006, 02:10 AM)
OK, I had a quick look at the dictionary file and the ldic source. It doesn't look difficult to extract the definitions from the dictionary. However, the original dictionary file looks like it might be in mark up but I can't quite tell if the coding is UTF-8 or something Korean specific.

I think it would be quite easy to hack a short program to load the dictionary and spew out bedic simple format but you'd need to know more about the character coding in order to complete the conversion.
*


Dear koan,

as you might have read in the hole thread before, we know how to make that dic. The problem is just to extract it from the ldic-source. And that is problematic first related to the lack of C-knowledge by those who are interested in this dictionary. So if you could hack something together, do it biggrin.gif .. we will be quite thankful.

Btw. I made a zbedic-dic from that pdf I wrote about ... its not perfect yet, but quit usable ... if someone wants it ... send me a pm ... also I do have "other" korean-engl. dics ... just got them from a nice korean guy. But all together they are quit big and as I don't know, whether they are copyright-protected or not, I intend to not share them officially. So if someone is interested, send me a pm too ... we will work out a way.

greetings
Thomas

btw: I will be in Japan from WED this week, I don't know when I will be able to get a line ... so please be patient.
Go to the top of the page
 
+Quote Post
koan
post Sep 19 2006, 04:56 AM
Post #19





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



QUOTE(ttkman @ Sep 18 2006, 04:53 AM)
as you might have read in the hole thread before, we know how to make that dic. The problem is just to extract it from the ldic-source. And that is problematic first related to the lack of C-knowledge by those who are interested in this dictionary. So if you could hack something together, do it biggrin.gif .. we will be quite thankful.


I managed to compile the ldic program although I got a bit confused because I don't have a Korean font (couldn't see any output). Apart from that it looks OK, very basic GUI.

I'm thinking that extracting the dictionary info should be straightforward but putting it in a sensible bedic file might be tricky because I only know 2 phrases in Korean.

koan
Go to the top of the page
 
+Quote Post
Guest_ttkman_*
post Sep 19 2006, 05:29 AM
Post #20





Guests






if we are able to extract the hole dic out of the ldic-program, we have a basic structure, then we should be able to use a bash or perl script to convert it ... but really, my C-skillz are sooo bad ... I would be glad if you perhaps just could review the ldic-code and change it to send the hole data to stdout ... I don't know if you have time or not to do this, or if your skills are good enough. Perhaps i will try it sometimes by myselfe, but right now I am bothered with learning korean and japanese, so I don't really have time to focus myselfe on that.

So ppl, please do something ... biggrin.gif

thomas
Go to the top of the page
 
+Quote Post
koan
post Sep 20 2006, 12:39 AM
Post #21





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



QUOTE(ttkman @ Sep 19 2006, 05:29 AM)
So ppl, please do something ... biggrin.gif


I'll have a go but it's not top priority for me - I'm converting 3 Thai dictionaries at the moment.

By the way, how big is the "small" dictionary already available from the bedic site ?

koan
Go to the top of the page
 
+Quote Post
koan
post Sep 30 2006, 12:59 PM
Post #22





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



Hi guys

Here is a screenshot from the current status:



Can someone tell me if it's somewhere near correct ?
This is a quick attempt to parse the file so it doesn't utilise all the
bedic features, hence some strange things like "2. 1." etc.

thanks

koan
Go to the top of the page
 
+Quote Post
kurochka
post Oct 2 2006, 10:14 AM
Post #23





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



ttkman,

Where can I find the dictionaries that you mentioned earlier?
Go to the top of the page
 
+Quote Post
koan
post Oct 16 2006, 02:02 PM
Post #24





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



Check the previous post screenshot, it has been updated.
(Work in Progress)

cheers

koan
Go to the top of the page
 
+Quote Post
kurochka
post Oct 17 2006, 12:22 PM
Post #25





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(koan @ Sep 30 2006, 12:59 PM)
Hi guys

Here is a screenshot from the current status:



Can someone tell me if it's somewhere near correct ?
This is a quick attempt to parse the file so it doesn't utilise all the
bedic features, hence some strange things like "2. 1." etc.

thanks

koan
*


Well, the characters are Korean for sure and it looks good. However, can't tell you if those characters make correct Korean words smile.gif
Go to the top of the page
 
+Quote Post
coklat
post Oct 19 2006, 03:24 AM
Post #26





Group: Members
Posts: 11
Joined: 11-March 06
From: Brisbane, Australia
Member No.: 9,346



QUOTE(koan @ Oct 1 2006, 06:59 AM)
Hi guys

Here is a screenshot from the current status:



Can someone tell me if it's somewhere near correct ?
This is a quick attempt to parse the file so it doesn't utilise all the
bedic features, hence some strange things like "2. 1." etc.

thanks

koan
*



Well, my Korean sharemate said that it is correct smile.gif Where can I download the file? I would like to try it. Thanks smile.gif
Go to the top of the page
 
+Quote Post
koan
post Oct 24 2006, 10:17 AM
Post #27





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



QUOTE(coklat @ Oct 19 2006, 03:24 AM)
Where can I download the file? I would like to try it. Thanks smile.gif
*


There are still many issues to fix that are not visible in the screenshot.

I am trying to sensibly separate the different sub-senses, parts of speech, categories etc. by developing a set of rules for the script that does the conversion. Also, I am trying to make the best conversion between the original format and bedic format.

Please understand, I think it is better to do a good job of the conversion rather than upload a half baked mess that gets distributed widely. Do it once, properly and everyone can use a good quality dictionary.

It may take a little bit of time but the wait will be worth it.

koan
Go to the top of the page
 
+Quote Post
kurochka
post Oct 24 2006, 11:54 AM
Post #28





Group: Members
Posts: 303
Joined: 6-February 04
Member No.: 1,740



QUOTE(koan @ Oct 24 2006, 10:17 AM)
There are still many issues to fix that are not visible in the screenshot.

I am trying to sensibly separate the different sub-senses, parts of speech, categories etc. by developing a set of rules for the script that does the conversion. Also, I am trying to make the best conversion between the original format and bedic format.

Please understand, I think it is better to do a good job of the conversion rather than upload a half baked mess that gets distributed widely. Do it once, properly and everyone can use a good quality dictionary.

It may take a little bit of time but the wait will be worth it.

koan
*


That's the best approach. Good luck.

Do you think your script could be useful for other attempts to convert other formats into zbedic format?
Go to the top of the page
 
+Quote Post
koan
post Jan 2 2007, 06:34 AM
Post #29





Group: Members
Posts: 328
Joined: 25-February 04
From: UK
Member No.: 2,025



Hi

paka and I managed to finish the conversion of these dictionary files.

If you are interested in downloading, please go to my Zaurus Dictionaries Page.

thanks

koan
Go to the top of the page
 
+Quote Post

2 Pages V  < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd December 2014 - 12:25 AM