KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: How to improve recognition accuracy for a user whose speech is not 'standard'
Topic Summary: how to improve recognition accuracy for a user whose speech is affected by cerebral palsy
Created On: 05/18/2023 06:12 PM
Status: Post and Reply
Linear : Threading : Single : Branch
 05/18/2023 06:12 PM
User is offline View Users Profile Print this message

Author Icon
SueW
New Member

Posts: 7
Joined: 01/16/2019

 

I am assisting a client whose speech is affected by cerebral palsy. He has DPI V 15.61.200.010, using a high quality noise-cancelling microphone, Australian accent model. Computer is high spec, running Windows 11. 

Recognition accuracy is very poor. I have tested Dragon myself on his computer, using the same microphone (set up my own user profile) and I get good results, so it seems clear that the reason for the poor recognition is my client’s speech. We have tried using the Vocabulary tools such as adding words and phrases to the vocabulary, training words and phrases, and Learn from specific documents, which help a little but not enough. 

Client is adamant that he achieved much better recognition accuracy on an older version of Dragon (V 11). He did lots of the readings, and after each reading Dragon’s recognition would improve.

Client is really keen to use Dragon on his computer as he has lots of things to do, he is CEO of an online business. He doesn’t have many other assistive technology options that would work for him. 

My question is…

Is there a way to create and import your own readings for Dragon 15 or 16? I have found some posts on this subject but from many years ago, relating to earlier versions of Dragon. 

Other thoughts:

The level of recognition accuracy with Dragon 15, 16 is very good but is it possible that removing the additional readings has made it more difficult for users with ‘non-standard’ speech to improve recognition accuracy? 

It is also interesting that my client is getting better results when using SIRI and Voice Control on iPhone. This is useful for speech to text however client really needs to be able to use Dragon on his computer. 

I’m wondering why he would be getting significantly better recognition accuracy on his iPhone, compared to using Dragon on his computer.  It’s usually the other way round. Could it be that SIRI and Voice Control on iPhone make more use of the probability model than the acoustic model? This is the only way I can explain that someone with non-standard speech is getting better recognition accuracy on his iPhone compared to using Dragon on his computer.

Sue

 

 

 



-------------------------


Cheers
SueW
 05/18/2023 08:01 PM
User is offline View Users Profile Print this message


Alan Cantor
Top-Tier Member

Posts: 4616
Joined: 12/08/2007

Hi Sue,

I'm experimenting with a technique to improve Dragon accuracy for someone with a non-standard accent and manner of speaking. The technique is labour-intensive and persnickety. I've tested it only once, and I could see the need to make the system work a little better. Accuracy was initially about 50%, and jumped to 70% in about 90 minutes.

I'll be testing again in a few weeks. Not sure the technique is ready for prime-time.
 05/18/2023 08:23 PM
User is offline View Users Profile Print this message

Author Icon
Lunis Orcutt
Top-Tier Member

Posts: 40912
Joined: 10/01/2006

It's not unusual for someone with a verbal disability to experience better accuracy in Siri then Dragon. By design, Dragon can be a bit picky. Rather than delving deeper, let's jump into it:

 

1. Open the DragonBar Settings / Microphone / Choose Microphone menu

2. Remove the checkmark from Automatically adjust microphone level as I speak

3. DPI 15 will not prompt you to rerun the Microphone Check but you will need to do so. v16 is smarter

4. Use the KnowBrainer Train Dragon Command (courtesy of Monkey8). Those MIA training scripts were never removed. Nuance only removed the menu. We have heard that this training can help end-users with abnormal voices. Because of your client's disability, your client has the option of receiving a free or discounted copyof KnowBrainer 2022. There is also a 30 day trial in our signature tag. We recommend downloading and installing a 30 day trial of KnowBrainer 2022. Then say Train Dragon

PS: If your client hasn't already purchased the Dragon Professional 16 Upgrade, when anyone with a permanent disability purchases from us, they receive a complimentary copy of KnowBrainer 2022 and a copy of our installation guide which pictorially details fixing 3 dozen sandtraps



-------------------------

Change "No" to "Know" w/KnowBrainer 2022
Trial Downloads
Dragon/Sales@KnowBrainer.com 
(615) 884-4558 ex 1



 05/19/2023 04:00 AM
User is offline View Users Profile Print this message


wheels496
Top-Tier Member

Posts: 218
Joined: 10/01/2008

Hello

I have no idea how you client's speech compares with mine and therefore I don't know how useful the following will be.

I also have cerebral palsy and a slight speech defect. I tried to train various (maybe three) versions of Dragon (meaning NaturallySpeaking), with the success. Because of this, until about seven years ago I was still using DragonDictate for windows.

When I read that the speech recognition engine of Dragon 15 had been rewritten (deep learning) and I also read something on the Dragon website that it was capable of handling speech like mine I arranged to give it a go and this time it was successful.

Admittedly, I do not recall how I trained it up initially seven years ago. However, I have trained it up in the past year (the microphone I was using, was discontinued and my voice profile was useless with other microphone).

A dragon trainer pointed me to the "rainbow passage", which seemingly includes every syllable in the UK language. I had about three sessions, with a work colleague, who helped me with the training. Initially, I just dictated the passage, with my colleague helping me to correct and train every misrecognised utterance. I also trained up the alphabet (alpha, Bravo et cetera).

In the second or third session, I started making corrections myself and by the end of the third session, yes recognition was still poor but I was able to make corrections by speech myself.

From there it was just a question of dictating/correcting/training until I was confident enough to do my work. In my situation this is important, since a lot of the software I use is to some degree incompatible with Dragon and you cannot correct utterances.

Hope this helps.



-------------------------

DP 16

 05/20/2023 05:49 PM
User is offline View Users Profile Print this message

Author Icon
SueW
New Member

Posts: 7
Joined: 01/16/2019

Thanks for your replies.
Alan, I look forward to hearing more.
Thanks for your tips Wheels496.
Lunis, I will send you an email.
Sue W


-------------------------


Cheers
SueW
 05/20/2023 06:07 PM
User is offline View Users Profile Print this message

Author Icon
SueW
New Member

Posts: 7
Joined: 01/16/2019

I followed Lunis' suggestions to download free trial of Knowbrainer. I then used the Train Dragon command and Voila the readings appeared.
Thanks Lunis! and thanks to others for your suggestions.
Alan, I would be interested to hear more about your method when it is ready.

-------------------------


Cheers
SueW
 07/22/2023 05:28 PM
User is offline View Users Profile Print this message


Alan Cantor
Top-Tier Member

Posts: 4616
Joined: 12/08/2007

Yesterday I tested my new training protocol. The process took a little over an hour. Accuracy climbed from around 50%, to around 80%.

(I also tried the old training module -- the one that can be invoked via a custom command. It helped, but not as much as my method.)

The protocol is driven mostly by two macros. In outline, here's how it works:

I began by generating a set of 225 phrases, mostly four to ten words each. The number 225 was based on the amount of data Dragon must collect before acoustic and language model optimization can be done. The lower limit appears to be about 1000 words.

Examples:

The drain is clogged.
Could you please call a plumber?
Hello, my name is Elizabeth. How are you?
etc.

(I "borrowed" many phrases from web sites that purport to teach English!)

1. I press a hotkey to invoke a macro (written in Macro Express) that outputs one phrase into a blank document.

2. The learner reads the phrase out loud. Their words land on a separate line in the document. (I encouraged the learner to include punctuation marks when dictating, in the hope that the extra acoustical context might help.)

3. I press a second hotkey to select the phrase just spoken, bring up the Spelling Window, output the correct phrase, choose it, and close the Spelling Window.

Then the macro deletes everything in the document, so we're starting again with a blank canvas.

4. I press the first hotkey to output the next phrase of 225 into the empty document.

Return to Step 1, and loop through the steps until the user has read and corrected all 200+ phrases. (We took a short break every ten minutes or so.)

5. Run the Optimizer. It took about 20 minutes.

Result: A noticeable improvement in accuracy: From 50% before, to about 80% afterwards. Accuracy is sometimes better. Many utterances now come out perfectly.

My guess is that accuracy can be improved slightly, even now, by manually correcting frequently-used individual words, e.g., my clients says "pick" but Dragon outputs "peek." We corrected this once in the Spelling Window, and then Dragon got "pick" right the next time. Not sure whether this improvement will persist!

An example: One of the presented phrases was:

"Nice to meet you too"

The user read it, and Dragon outputted:

"Niece to be with YouTube"

The macro selects the above text, calls up the Spelling Window, outputs "nice to meet you too" in the text field, and then does the equivalent of "Choose 1".

Without macros, this process would be extremely labour intensive: selecting, copying, navigating, and pasting, again and again. The macros do all the heavy lifting. Step 3, the most complicated sequence of steps, runs in about three seconds.

The macros are not 100% reliable. At least one macro needs tweaking. Step 3 didn't quite work 10% or 15% of the time. And because of this unreliability, I'm not planning to share the macros. The only reason I was able to get through the protocol was because I knew what the macros are doing, recognized when they are acting flaky, and knew how to get back on track. (I actually built two additional macros to quickly fix things when things go awry!)

Total development time for the macros: about 20 hours. A lot of work. I did it because it seemed like an interesting project!

Macros are not the best solution to the problem that Dragon does a poor job of handling "non-standard" accents and manners of speaking. The functionality I developed via macros should be built into Dragon: a system that presents users with short phrases to dictate, and corrects them -- regardless of the user's pronunciation, and then incorporates the data into the acoustic model.

Having been through this process, I would say that correcting hundreds of phrases -- not individual words, but words as they appear in the context of phrases -- does slightly improve the acoustic model. But it's impractical to correct hundreds of misrecognitions without an automated or purpose-built system. I estimate that without macros, dictating and correcting 200 phrases would have chewed up many hours; and the task would have been exhausting!



 07/23/2023 01:12 PM
User is offline View Users Profile Print this message

Author Icon
R. Wilke
Top-Tier Member

Posts: 8104
Joined: 03/04/2007

Alan,

This is really marvellous testing indeed. It might be worthwhile automating via the API, although that would take longer than just 20 hours, not even accounting for debugging.

Nonetheless, great job, specifically as it clearly demonstrates some of the underlying concepts.

-------------------------


The New Game in Town: DragonConnect

 07/23/2023 02:27 PM
User is offline View Users Profile Print this message

Author Icon
ax
Top-Tier Member

Posts: 777
Joined: 03/22/2012

Your approach is methodical and the validation of a principle impressive, Alan!

 

On DMO, recognition improvement from going through the corrections menu always seems marginal at best.  It's perceptible, if one zooms in on a single correction that's repeatedly made.  But I can't really be sure how long after the correction(s) any improvement would kick in or how durable it is.

 

Our of curiosity, on desktop Dragon at least, does any tangible improvement from your exercise get encapsulated/captured in a discrete file (or files) that can be exported and preserved?   Or does an individual have to go through the "curated corrections" exercise all over if they are forced into a new profile?



 07/23/2023 04:43 PM
User is offline View Users Profile Print this message


Alan Cantor
Top-Tier Member

Posts: 4616
Joined: 12/08/2007

The only way I can think of to preserve the results is to make a backup copy of the entire user profile. It's probably easier than backing up just the acoustic model; but who, if anybody, knows which file (or files) hold the acoustic model!

If I needed to go through the exercise again, I might make some changes:

1. Tweak the macro that occasionally messes up. My guess is the fix is to increase the length of one wait.

2. Simplify the sentences. My client stumbled on British- and America-centric phrases such as "lo and behold!" and "Carnegie Hall."

3. Draw on texts that she has written to create the list of phrases. In other words, include more of the words that she tends to use.

I'll be curious to see whether the accuracy improvements lasts. Perhaps Dragon's algorithms will nudge the acoustic model back toward more standard pronunciations!



Statistics
32617 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 1 users logged in.
The most users ever online was 12124 on 09/09/2020 at 04:59 AM.
There are currently 118 guests browsing this forum, which makes a total of 119 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2023 FuseTalk™ Inc. All rights reserved.