It is currently Tue 21 Apr 2026 10:48 pm

All times are UTC


Forum rules


Please click here to view the forum rules



Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Tue 01 Apr 2014 4:25 pm 
Offline

Joined: Thu 26 Dec 2013 3:21 pm
Posts: 155
I was wondering if there was anything comparable in Irish to the General Service List for English. I know Buntús Gaeilge, published in the early 60s I believe and the fore-runner to Buntús Cainte, did contain information regarding word frequency but I was hoping maybe that research had been distilled into a list of the most common (roughly) 2000 words. Thanks for taking the time to read!


Top
 Profile  
 
PostPosted: Tue 01 Apr 2014 8:43 pm 
Offline

Joined: Sat 01 Jun 2013 8:46 pm
Posts: 466
There is the new corpus for Ireland which has thousands of entrys and you could make a list of the top 2,000 all though im not sure if thats exactly what youre looking for?

The frequency is taken from texts with a total of 30 million words. it will most likely be different from the most common 2,000 spoken words.

I looked for such a list a few years ago and couldn't find anything.

_________________
Bíonn rudaí maithe mall


Top
 Profile  
 
PostPosted: Sun 06 Apr 2014 8:29 pm 
Offline

Joined: Thu 26 Dec 2013 3:21 pm
Posts: 155
Go raibh maith agat. I think I am going to work on combining that with the vocabulary from BC.


Top
 Profile  
 
PostPosted: Tue 08 Apr 2014 11:22 am 
Offline

Joined: Mon 25 Feb 2013 12:44 pm
Posts: 80
Here's my list of the most common n-grams in Irish based on their frequency:

https://docs.google.com/spreadsheet/ccc ... sp=sharing

If you filter away those that don't have a length of 1 then you're left with the General Service List!

This list is based on the largest Irish corpus available with 30 million words.

I also have a list of 70,000 Irish sentences with their translations:

https://docs.google.com/spreadsheets/d/ ... sp=sharing

These were extracted from focloir.ie.

I'm interested in creating a tutorial for Irish as outlined here:

viewtopic.php?f=28&t=2949

Please contact me if you feel you can help. My Irish is poor...


Top
 Profile  
 
PostPosted: Fri 16 Jan 2015 2:00 am 
Offline

Joined: Fri 16 Jan 2015 1:43 am
Posts: 3
Barra79, this is extraordinarily important work you have done. I wonder if you could contact me as I'm interested in this project and I may well be able to help you with your tutorial?


Top
 Profile  
 
PostPosted: Fri 16 Jan 2015 1:40 pm 
Offline

Joined: Wed 19 Dec 2012 3:58 pm
Posts: 488
How is the score column calculated?

I was assuming for individual types it's relative frequency to the most common type ("a") but there's no "1" score for bigrams, trigrams etc, so I don't suppose it's that simple....

_________________
A language belongs to its native speakers, and when you speak it, you are a guest in their homes.
If you are not a good guest, you have no right to complain about receiving poor hospitality.


Top
 Profile  
 
PostPosted: Sun 18 Jan 2015 12:09 pm 
Offline

Joined: Mon 25 Feb 2013 12:44 pm
Posts: 80
NiallBeag wrote:
How is the score column calculated?

I was assuming for individual types it's relative frequency to the most common type ("a") but there's no "1" score for bigrams, trigrams etc, so I don't suppose it's that simple....


The most frequent n-gram gets a score of 1. The second most common n-gram get the score 1-x, and the next 1-2x and so on. The least common n-gram gets a score of x. x is equal to 1/n where n is the number of unique frequencies for the list - if I remember correctly. I don't distinguish in terms of the length of the n-grams when calculating the scores.


Top
 Profile  
 
PostPosted: Tue 20 Jan 2015 1:46 pm 
Offline

Joined: Wed 19 Dec 2012 3:58 pm
Posts: 488
barra79 wrote:
The most frequent n-gram gets a score of 1. The second most common n-gram get the score 1-x, and the next 1-2x and so on. The least common n-gram gets a score of x. x is equal to 1/n where n is the number of unique frequencies for the list - if I remember correctly. I don't distinguish in terms of the length of the n-grams when calculating the scores.

I see. Maybe I'm just being thick, but I don't see how this offers any benefit over calling the most common "1", the second most common "2" etc up to nth most common "n". After all, your score is currently just an arbitrary enumeration of the order.

_________________
A language belongs to its native speakers, and when you speak it, you are a guest in their homes.
If you are not a good guest, you have no right to complain about receiving poor hospitality.


Top
 Profile  
 
PostPosted: Fri 06 Feb 2015 1:32 am 
Offline

Joined: Fri 16 Jan 2015 1:43 am
Posts: 3
Barra, I have received an email from this site stating that you sent me a private message. Unfortunately, however, I am unable to read private messages, presumably because my post count is too low at this time.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 326 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group