![]() |
KnowBrainer Speech Recognition | ![]() |


|
Topic Title: "We're" and "were" Topic Summary: How to improve recognition Created On: 06/04/2012 01:22 PM Status: Post and Reply |
|
![]() |
- SusanG | - 06/04/2012 01:22 PM |
![]() |
- Alan Cantor | - 06/04/2012 02:02 PM |
![]() |
- SusanG | - 06/04/2012 02:17 PM |
![]() |
- Lunis Orcutt | - 06/04/2012 02:44 PM |
![]() |
- Alan Cantor | - 06/04/2012 03:25 PM |
![]() |
- SusanG | - 06/04/2012 03:39 PM |
![]() |
- brainybanana | - 06/05/2012 06:27 PM |
![]() |
- Alan Cantor | - 06/05/2012 10:20 PM |
![]() |
- Chucker | - 06/06/2012 08:38 AM |
![]() |
- Alan Cantor | - 06/06/2012 04:02 PM |
![]() |
- maxr | - 06/08/2012 03:03 AM |
|
|
|
|
Hi- I have DNS 11.5 and generally the recognition is outstanding. However, when I dictate "we're" more often than not DNS types "were" even when I'd think the context would be clear. Sometimes it even types "where" instead and Word will flag it. I pronounce the former as "weer" and the latter as "wur" and I've confirmed this by playing back my dictation. Is there a way to get DNS to distinguish between the two more reliably? Thanks, Susan
|
|
|
|
|
|
|
|
|
Are you dictating words in isolation, or long phrase? DNS won't always get it right, but you increase the likelihood of better accuracy if you dictate entire phrases, without pausing.
Training the words individually is unlikely to have much of an effect.
When I dictate "We're going to where we were before" in one fell swoop, DNS didn't get it 100% right, but it certainly had the right idea: "We are going to where we were before." |
|
|
|
|
|
|
|
|
Hi Alan, |
|
|
|
|
|
|
|
|
Welcome to the World's Most Popular Speech Recognition Forum
In theory, NaturallySpeaking should be able to differentiate between “were” and “we're” from your use of the word in a phrase which DNS compares in its internal tables (think of it as a pseudo-grammar checker) but we also experience this recognition error a little too frequently and haven't managed to come up with a workaround either. -------------------------
|
|
|
|
|
|
|
|
|
I suggest creating a writing sample that contains sentences that include several examples of the phrases you dictate, with emphasis on phrases that contain the words "we're" and "were."
Five or ten pages is probably plenty. Save it as a plain text file.
Create a new profile,and skip training. When DNS prompts you to allow it to analyze documents and email, decline the offer.
After you have created the profile, choose "Learn from specific documents," which is on the "Vocabulary" menu. Let DNS analyze your writing sample.
Test your new profile. How does it work?
If the problem is resolved, or mostly resolved, switch back to your original profile, and export your custom words and commands. Review the word list before proceeding. There will be one word (or phrase) per line. If you find words you don't need or want, delete them. Finally, switch to your new profile and import your custom words and commands. |
|
|
|
|
|
|
|
|
Hi Alan, That sounds like a plan! I'll try it next chance and report back.
Thanks!
Susan
|
|
|
|
|
|
|
|
|
------------------------- DNS 12.0 Professional, Windows 7, Intel Core i7 2630QM, 16GB of RAM. Second-Generation SpeechWare 6-in-1. |
|
|
|
|
|
|
|
|
Brainy Banana and Susan, |
|
|
|
|
|
|
|
|
Alan,
Bravo!!!
I'm glad to see someone else is positing the difference between human speech and computer speech recognition.
One additional point in emphasizing the difference between the way the human brain works the way that computer-based speech recognition works. Speech recognition attempts to convert speech to text. When to people enter into a conversation, there is no results box displaying in your mind. That is, when you talk to another person, neither you nor the other person sees words flashing before your eyes. Words are transparent in human conversation. Context in human speech is based on perception and meaning. Context in speech recognition is based on the relationship of each word to every other word in an utterance. In this sense, for human beings, transcription is instantaneous and we are much better at understanding what another person saying because of this even if we don't know what a particular word or words mean. Human speech recognition does not require sophisticated thrashing to find the "BestMatch". We do this instantly and automatically. The human brain functions on the basis of what are called autonomous ego functions (i.e., memory, motility, perception, and judgment). These do not require conscious thought. For example, how many times have you driven through a stoplight and then looked in your rearview mirror to see if the light was green. Our conscious perception in such cases is based on our perceptual focus and print i.e., on the traffic around us and the other circumstances that are more important for conscious concentration. Even in this context, we don't consciously think about the perceptual input of everything around us. It's all subconscious and automatic. Speech recognition simply cannot do this. Perhaps someday when better forms of speech rhythms and artificial intelligence programming is incorporated into speech recognition, the results will be much more like humans speech. However, even in this context, computer speech recognition will never absolutely equal the amazing characteristics and capabilities of the human brain.
Many users wonder why Dragon gives them some bizarre results that don't seem to make sense. This is simply because, as you put it so aptly, they are anthropomorphizing speech recognition. Even a three-year-old child just learning to use speech understands basically what others are saying to them significantly better than contemporary speech recognition. As long as users misinterpret speech recognition as being equivalent to carrying on a conversation with another person, they're going to continue to be baffled and confused when it doesn't work the way they expect it to. This is simply because their expectations are inappropriate to the context of how speech recognition works.
Chuck Runquist -------------------------
|
|
|
|
|
|
|
|
|
I'm glad to see someone else is positing the difference between human speech and computer speech recognition.
Chucker, some of the things I know about speech recognition have come about from reading your posts on this forum!
|
|
|
|
|
|
|
|
|
Nice one Chuck! Bravo to you too. Perfect speech recognition will require the development AI as intelligent as we are because of all the conscious and subconscious mechanisms at work when interpreting speech. However, I think we can get 90% there with just a little more environment cues. Just knowing a few basic things will make a tremendous difference even without ever touching AI or handling these hard machine learning issues. Just knowing if the user is looking at the PC (rather than distracted, on the phone, talking to someone else, etc) will make a tremendous difference. Face tracking and lip reading via a webcam will likely solve a lot of misrecognition issues. As will simpler operating systems that remove a lot of clutter and complexity. A good byproduct of metro, for instance, is having just one app at a time. This will greatly simplify the speech interface. Right now we have to do kungfu basically to navigate an operating system that was never intended to be speech friendly. Simplification will narrow the scope and provide a better voice experience. I don't think we are that far off from having really great voice recognition and bridging the divide from computer speech and human speech interaction. I think people will mistake relatively simple tricks for AI before long.
Max Roth Maker of DynamicKeyboardOne for Naturally Speaking ------------------------- ErgoArchitect Assistive Technologies - ShowNumbers Plus! Addon to Naturally Speaking - www.ergoarchitect.com |
|
|
|
|
FuseTalk Standard Edition v4.0 - © 1999-2013 FuseTalk™ Inc. All rights reserved.