It is currently Mon 15 Apr 2024 1:55 am

All times are UTC


Forum rules


Please click here to view the forum rules



Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Sun 03 Mar 2024 6:22 pm 
Offline

Joined: Mon 01 Sep 2014 10:03 pm
Posts: 518
Location: SAM
iambullivant wrote:

Which native speakers or sources in the Munster dialect would you recommend be it on the radio, the television or online?



An Saol Ó Dheas is all you need. Munster Irish, outside of Kerry, is very weak.

As for Abair, I dunno. I know they're working on Speech-to-Text models too, but I also know they opened the recording challenge to everyone with no filter for quality of Irish. It immediately makes me skeptical to the quality of their models, as they let people self-identify as natives and what dialect they speak, which won't cause any issues, I'm sure. I can name many who self-identify as natives and as having a dialect when they clearly don't. Now, supposedly that's only for their speech-to-text models, but still I want some assurance of quality assurance of their actual text-to-speech models being trained solely on competent, Gaeltacht-raised natives.


Top
 Profile  
 
PostPosted: Sun 03 Mar 2024 7:44 pm 
Offline

Joined: Thu 22 Dec 2011 6:28 am
Posts: 373
Location: Corcaigh
I agree. An Saol Ó Dheas is the first thing that came to my mind too.

galaxyrocker wrote:
...

I know they're working on Speech-to-Text models too, but I also know they opened the recording challenge to everyone with no filter for quality of Irish. It immediately makes me skeptical to the quality of their models, as they let people self-identify as natives and what dialect they speak, which won't cause any issues, I'm sure. ... Now, supposedly that's only for their speech-to-text models, but still I want some assurance of quality assurance of their actual text-to-speech models being trained solely on competent, Gaeltacht-raised natives.


If that's only for their speech-to-text models, then it's the right move to include all manner of speakers. Such models need to be able to correctly interpret even the pronunciation of non-native speakers. Siri or Alexa wouldn't be considered very good if they couldn't understand the accents of non-native English speakers, and it's arguably non-native speakers who could benefit most from speach-to-text systems. It would allow them to ask a digital assistants "what's the definition of X?" or "how do I use Y in a sentence?"

From a technical perspective, too, training AI models on large amounts data, though some of it may not be good quality, is better than training on a small amount of high quality data. Repeated studies have shown the benefit of transfer learning, i.e. even training on data from different languages can improve models for less resourced languages for which a sufficient quantity of training data does not exist.

Of course, ideally many native speakers would take part also, but it's entirely up to individual native speakers whether or not they want to be involved. There's certainly not much point complaining (not that I'm accusing anyone here of complaining) that systems like this don't understand native speech if native speakers aren't interested or inclined to help in their development.


Top
 Profile  
 
PostPosted: Sun 03 Mar 2024 10:06 pm 
Offline

Joined: Mon 01 Sep 2014 10:03 pm
Posts: 518
Location: SAM
Ade wrote:

If that's only for their speech-to-text models, then it's the right move to include all manner of speakers. Such models need to be able to correctly interpret even the pronunciation of non-native speakers. Siri or Alexa wouldn't be considered very good if they couldn't understand the accents of non-native English speakers, and it's arguably non-native speakers who could benefit most from speach-to-text systems. It would allow them to ask a digital assistants "what's the definition of X?" or "how do I use Y in a sentence?"


See, I tend to think the opposite way. If their Irish pronunciation is so bad it can't be understood correctly by things trained on native speech, maybe they'd realise it's not just their 'dialect' and that 'My English accent came from Irish' and realise that pronunciation is supposed to be entirely different.

Quote:
From a technical perspective, too, training AI models on large amounts data, though some of it may not be good quality, is better than training on a small amount of high quality data. Repeated studies have shown the benefit of transfer learning, i.e. even training on data from different languages can improve models for less resourced languages for which a sufficient quantity of training data does not exist.

Of course, ideally many native speakers would take part also, but it's entirely up to individual native speakers whether or not they want to be involved. There's certainly not much point complaining (not that I'm accusing anyone here of complaining) that systems like this don't understand native speech if native speakers aren't interested or inclined to help in their development.


The issue is when the non-native speech vastly outweighs the native as it likely does with Irish. That's a huge issue. It's also one of my issues with Corpas Naisiúnta na Gaeilge, which only compiles Irish from 2000 onwards (and Foclóir's, where they give non-natives the same weight; perhaps even worse, as it's what they use for dictionaries! But at least the corpus for them can be filtered by natives).


Top
 Profile  
 
PostPosted: Mon 04 Mar 2024 4:50 am 
Offline

Joined: Thu 22 Dec 2011 6:28 am
Posts: 373
Location: Corcaigh
galaxyrocker wrote:
See, I tend to think the opposite way. If their Irish pronunciation is so bad it can't be understood correctly by things trained on native speech, maybe they'd realise it's not just their 'dialect' and that 'My English accent came from Irish' and realise that pronunciation is supposed to be entirely different.


A hard line to take. Reminds me of this. :LOL:

Still, I think they probably just wouldn't use it in that case.

galaxyrocker wrote:
The issue is when the non-native speech vastly outweighs the native as it likely does with Irish. That's a huge issue. It's also one of my issues with Corpas Naisiúnta na Gaeilge, which only compiles Irish from 2000 onwards (and Foclóir's, where they give non-natives the same weight; perhaps even worse, as it's what they use for dictionaries! But at least the corpus for them can be filtered by natives).


Well, we don't know if the non-native speech used for training a speech-to-text model does vastly outweigh native speech. If it does, we really need to assess why so few native speakers would want to be involved in a project like this. But I'm inclined to doubt it's very likely that non-native speakers are so over-represented that it could negatively impact an AI model.

In any case, I don't think you can compare a dictionary or text corpus to an AI speech-to-text model. It's a big issue if a non-native term is presented matter-of-factly in a dictionary or text as if it were something a native speaker would use. If a speech-to-text model learns to recognise a mispronunciation which is common among non-native speakers, that just makes the model more accessible to those speakers. It doesn't outweigh the correct pronunciation, or make the model less capable of recognising the correct pronunciation.


Top
 Profile  
 
PostPosted: Mon 04 Mar 2024 2:44 pm 
Offline

Joined: Tue 09 Jan 2024 8:15 pm
Posts: 35
Thank you for all the considered, interesting, comments. It is obviously a very difficult complicated area. In the meantime I will definitely give Saol Ó Dheas a listen and delve into the huge amount of resources gathered on Cork Irish.


Top
 Profile  
 
PostPosted: Mon 04 Mar 2024 5:25 pm 
Offline

Joined: Tue 09 Jan 2024 8:15 pm
Posts: 35
Ade wrote:
galaxyrocker wrote:
See, I tend to think the opposite way. If their Irish pronunciation is so bad it can't be understood correctly by things trained on native speech, maybe they'd realise it's not just their 'dialect' and that 'My English accent came from Irish' and realise that pronunciation is supposed to be entirely different.


A hard line to take. Reminds me of this. :LOL:

Still, I think they probably just wouldn't use it in that case.

galaxyrocker wrote:
The issue is when the non-native speech vastly outweighs the native as it likely does with Irish. That's a huge issue. It's also one of my issues with Corpas Naisiúnta na Gaeilge, which only compiles Irish from 2000 onwards (and Foclóir's, where they give non-natives the same weight; perhaps even worse, as it's what they use for dictionaries! But at least the corpus for them can be filtered by natives).


Well, we don't know if the non-native speech used for training a speech-to-text model does vastly outweigh native speech. If it does, we really need to assess why so few native speakers would want to be involved in a project like this. But I'm inclined to doubt it's very likely that non-native speakers are so over-represented that it could negatively impact an AI model.

In any case, I don't think you can compare a dictionary or text corpus to an AI speech-to-text model. It's a big issue if a non-native term is presented matter-of-factly in a dictionary or text as if it were something a native speaker would use. If a speech-to-text model learns to recognise a mispronunciation which is common among non-native speakers, that just makes the model more accessible to those speakers. It doesn't outweigh the correct pronunciation, or make the model less capable of recognising the correct pronunciation.


The elevator sketch is very funny. It had me laughing out loud. Thank you for sharing.


Top
 Profile  
 
PostPosted: Mon 04 Mar 2024 5:27 pm 
Offline

Joined: Tue 09 Jan 2024 8:15 pm
Posts: 35
A supplemental question if I may: Do Fuaimeanna.ie and Teanglann.ie have the same or separate challenges when it comes to reliability?


Last edited by iambullivant on Tue 05 Mar 2024 11:58 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Mon 04 Mar 2024 5:51 pm 
Offline

Joined: Mon 01 Sep 2014 10:03 pm
Posts: 518
Location: SAM
iambullivant wrote:
A supplemental question if I may: Do Fuaimeanna.ie and Tteanglann.ie have the same or separate challenges when it comes to reliability?



Fuaimeanna is better in general as Teanglann sometimes has weaker speakers for some words. But both use fully recorded audio, not AI.


Top
 Profile  
 
PostPosted: Mon 04 Mar 2024 6:34 pm 
Offline

Joined: Tue 09 Jan 2024 8:15 pm
Posts: 35
galaxyrocker wrote:
iambullivant wrote:
A supplemental question if I may: Do Fuaimeanna.ie and Tteanglann.ie have the same or separate challenges when it comes to reliability?



Fuaimeanna is better in general as Teanglann sometimes has weaker speakers for some words. But both use fully recorded audio, not AI.


That is helpful, thank you.


Top
 Profile  
 
PostPosted: Tue 05 Mar 2024 12:04 am 
Offline

Joined: Thu 27 May 2021 3:22 am
Posts: 1105
galaxyrocker wrote:
iambullivant wrote:
A supplemental question if I may: Do Fuaimeanna.ie and Tteanglann.ie have the same or separate challenges when it comes to reliability?



Fuaimeanna is better in general as Teanglann sometimes has weaker speakers for some words. But both use fully recorded audio, not AI.


Yes and Dara Ó Cinnéide's pronunciation on fuaimeanna is really the gold standard for Munster...


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group