![]() |
KnowBrainer Speech Recognition | ![]() |


|
Topic Title: Question about new dictation source Topic Summary: Is writing style adapatation shared by a new dictation source? Created On: 04/12/2008 05:44 PM Status: Post and Reply |
|
![]() |
- pmaddern | - 04/12/2008 05:44 PM |
![]() |
- Lunis Orcutt | - 04/12/2008 08:48 PM |
![]() |
- pmaddern | - 04/13/2008 04:26 AM |
![]() |
- Chucker | - 04/13/2008 06:46 AM |
![]() |
- pmaddern | - 04/13/2008 07:00 AM |
![]() |
- R. Wilke | - 04/13/2008 08:49 AM |
![]() |
- Chucker | - 04/13/2008 09:47 AM |
![]() |
- R. Wilke | - 04/13/2008 10:15 AM |
![]() |
- Chucker | - 04/13/2008 10:38 AM |
![]() |
- matthewls | - 04/13/2008 01:27 PM |
![]() |
- R. Wilke | - 04/13/2008 02:02 PM |
![]() |
- matthewls | - 04/13/2008 02:06 PM |
![]() |
- Lunis Orcutt | - 04/13/2008 03:39 PM |
![]() |
- R. Wilke | - 04/13/2008 04:54 PM |
![]() |
- Lunis Orcutt | - 04/13/2008 06:07 PM |
|
|
||||||
|
I'm helping someone get started with DNS Professional 9.5 on a Windows XP Professional PC. He has created a headset profile. Next week, I will guide him through setting up a profile for an Olympus DS 4000 digital recorder. If we create a profile for the Olympus DS 4000 as a "new dictation source" for the headset profile (selecting "digital recorder using sound files - wav., mp3, wma - on disk), Dragon will prompt the user to read at least 15 minutes of recorded text and then he will be all set. I understand that the advantage of doing this is that if new vocabulary and commands are added to the headset profile, they are automatically updated into the digital recorder profile. And I understand that the reverse is true - if new vocabulary and commands are added to the DS 4000 profile, they are automatically updated into the headset profile. But what about when we run the tool for the headset profile to "Add words from your documents to the vocabulary" which finds new words and adapts to the user's writing style? Is the adaption a user's writing style updated and "shared" by the Olympus DS 4000 recorder? And is this true if this tool is run in the Olympus DS 4000 profile i.e. Is any adaption to the user's writing style as a result of processing documents through the DS 4000 profile updated and "shared" by the headset profile? If so, it would seem to me to be beneficial to operate this way i.e. setting up the DS 4000 as a new dictation source for the headset profile, rather than maintaining two independent user profiles. Peter ------------------------- DNS Professional 12 UK English version, Windows 8 64-bit with 8 Gb RAM plus 8 Gb ReadyBoost, Audio Technica ATH-COM2 headset microphone/Buddy 7G USB sound adapter, VoicePower Ultimate. Skype user name peter.maddern www.speechempoweredcomputing.co.uk |
||||||
|
|
||||||
|
|
||||||
-------------------------
|
||||||
|
|
||||||
|
|
||||||
|
Well if the case is that when you add a new Dictation Source, your custom vocabulary, custom commands and language models are the same as the base user dictation source, then this seems to me to be a strong argument to set up a new recorder profile as a new dictation source rather than create a brand new, stand - alone recorder user profile. Apart from the initial need to give data to the new recorder profile for a its new accoustic model by reading at least 15 minutes of your printed out enrollment text, we agree that the new recorder profile gets the "benefit" of custom words/commands and (very importantly) the language model improvements from the base headset profile and vice versa. This inter - profile synchronisation between a "new dictation source" for a new digital recorder and its base headset profile looks like a real plus and looks like its the way to go. Peter
------------------------- DNS Professional 12 UK English version, Windows 8 64-bit with 8 Gb RAM plus 8 Gb ReadyBoost, Audio Technica ATH-COM2 headset microphone/Buddy 7G USB sound adapter, VoicePower Ultimate. Skype user name peter.maddern www.speechempoweredcomputing.co.uk |
||||||
|
|
||||||
|
|
||||||
|
Quote: Well if the case is that when you add a new Dictation Source, your custom vocabulary, custom commands and language models are the same as the base user dictation source, then this seems to me to be a strong argument to set up a new recorder profile as a new dictation source rather than create a brand new, stand - alone recorder user profile. Apart from the initial need to give data to the new recorder profile for a its new accoustic model by reading at least 15 minutes of your printed out enrollment text, we agree that the new recorder profile gets the "benefit" of custom words/commands and (very importantly) the language model improvements from the base headset profile and vice versa. This inter - profile synchronisation between a "new dictation source" for a new digital recorder and its base headset profile looks like a real plus and looks like its the way to go. Peter, Your assumptions are only partially correct. First, each dictation source creates its own Acoustic Model. The Acoustic Model from the standard microphone input user and that from the digital voice recorder dictation source do not mix and match. They are distinct and never the twain shall meet. DNS does this because the developers know that you can't mix Acoustic Model's. Second, it to dictation sources will share a common custom Vocabulary, but they will not share the same Language Models. Simply doesn't happen. You can also share the same MyCmds.dat, but each dictation source retains its own Acoustic Model and Language Models. Third, when using the DVR dictation source, you can't use your microphone to correct your transcriptions. The standard microphone/headset input is not available to the DVR dictation source. You still have to switch dictation sources. Try it out and take a look at your user profile and you will see. On the other hand, I agree with you that, in terms of what you want to share, the to dictation sources will share those things that it will have in common and can have in common. It will not share those components that it cannot share or are not compatible with one another. This makes it a more efficient way of dealing with the standard user profile vs. creating a separate DVR user profile, but you will not get the crossovers where you think you will. This also works very nicely in situations where the user is using DNS Preferred because, as Lunis points out, you can't export and import vocabularies in DNS Preferred. Nevertheless, creating to dictation sources in this manner will not share apples and oranges. They will only share apples and apples, or oranges and oranges. Chuck Runquist "It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so." -- Mark Twain -------------------------
|
||||||
|
|
||||||
|
|
||||||
|
Chuck I understood (as stated in my post) that the new recorder doesn't share the Accoustic model of the headset base user. I also understood that the recorder "new dictation source" shared the custom vocabulary and commands created in the headset base user. I guess I thought that the language model in the headset base user profile (from processing your concatenated documents so it adapts to your writing style) was also shared. This was my original question and you have pointed out that the language model is NOT shared. So, in other words, from what you've said, I assume that anyone who creates a new user as a "new dictation source" piggy - backing onto an existing user profile should still employ the "Add words from your documents to the vocabulary" tool in the new recorder profile. Thanks for the clarification. Peter
------------------------- DNS Professional 12 UK English version, Windows 8 64-bit with 8 Gb RAM plus 8 Gb ReadyBoost, Audio Technica ATH-COM2 headset microphone/Buddy 7G USB sound adapter, VoicePower Ultimate. Skype user name peter.maddern www.speechempoweredcomputing.co.uk |
||||||
|
|
||||||
|
|
||||||
|
Chuck, on this occasion I would like to ask two more questions. First, from what I understood the vocabulary doesn't contain information with regards to the Acoustic/Language models. The vocabulary being kept in those files you can export and import in the Pro version, 6 files with different extensions (.top the one being referrenced via import), right? If my understanding of this is correct, in which files are the Acoustic/Language models or parts of them stored? Second, when it comes to what is called "user file corruption", is it a matter of something going wrong with one or two of these models only and which one, or is the vocabulary also concerned? When starting a new user from scratch and importing the vocabulary from the previous one (the whole vocabulary as described above, and not just the user defined words), can we be sure, that the the faults are not transferred? This is something I've been wishing to know for so long, and I'm looking forward to an answer so much. Rüdiger Wilke
------------------------- Well, it's past the point where we can make any changes in the code, but we can still make changes to the Easter Egg! |
||||||
|
|
||||||
|
|
||||||
|
Quote: First, from what I understood the vocabulary doesn't contain information with regards to the Acoustic/Language models. The vocabulary being kept in those files you can export and import in the Pro version, 6 files with different extensions (.top the one being referrenced via import), right? If my understanding of this is correct, in which files are the Acoustic/Language models or parts of them stored? None of them. These files are pure Vocabulary. There is no acoustic or language model information contained in any of these files. The Acoustic Model is located in the following folder: C:\Documents and Settings\All Users\Application Data\Nuance\NaturallySpeaking9\Users\\current\voice the Acoustic Model is "acoustic.enh" along with its associated pronunciation tables. This file also incorporates any general training. In previous versions, you could not create a new user with the "Skip initial training for this user." Because on installation there is no acoustic.enh file until you perform general training. In DNS 9, Nuance has provided a speaker independent acoustic.enh, which can then be trained in addition, but which is based on nuances speech data. When you send your speech data out to Nuance using the data collection tool, what you get back is both an updated set of Vocabulary files, along with a new acoustic.enh file and new Language Model files if they are updated using the data that you send to Nuance. Nuance is data collection adaptation process is much more sophisticated than what is currently available in DNS via the Acoustic and Language Model Optimizer. Nevertheless, this is also why some people report dramatic improvement and others report no improvement at all. The Language Model files are located in the following folder: C:\Documents and Settings\All Users\Application Data\Nuance\NaturallySpeaking9\Users\\current\General__container lmo.slt = middle slot Language Model (professional versions only) When you export and import the Vocabulary using one of the professional versions, only the Vocabulary is exported and imported. However, some of the files that you refer to contain your custom words, some of them contain the pronunciation tables for written form/spoken form. However, any of the pronunciation tables for your Acoustic Model are located in the associated files where the Acoustic Model file is located. This is where any training that you do for individual words in the Vocabulary editor are stored (i.e., along with the acoustic.enh). Training that you do in the Correction window or Spell dialog are stored in dra files that are associated with the Acoustic and Language Model Optimizer, but which are also incorporated into the Acoustic Model when you save your user(s). PS, I should have added that in the Vocabulary files, one of these files is the linguistic pronunciation table that is based on standard linguistics (pronunciations) for each specific language (i.e., English contains all of the lexicon phonetics that you can find in any dictionary stored in a standard format that you can find on the web using the IPA pronunciation format ( /'la?ni/ ). If you add a new word that does not exist in either the active Vocabulary or the background dictionary, DNS uses the standard IPA pronunciation table to assign a pronunciation to new words. If you train a new word that is not in either the active Vocabulary or the background dictionary, then DNS uses the IPA pronunciation table to add your pronunciation to that word and, in addition, also as it to your Acoustic Model so that DNS can identify it when you pronounce it the way you train it. For example, the initial word "liney" would be entered with the IPA pronunciation table entry " /'la?ni/ ", but if you train it, for example, as "lina", then DNS would add (append) that pronunciation " /'la?ni//' la?na/ ". And to add that pronunciation to your Acoustic Model so that when you speak the word pronouncing it the way that you want to, DNS recognizers the word via the appended pronunciation. If the word already exists in the either the active Vocabulary or the background Vocabulary/dictionary, then DNS only adds the standard IPA pronunciation table entry. Any training that you do under this condition is only added to the Acoustic Model and linked to the IPA pronunciation. If I go into any greater detail, I will be asked so many questions about how this works. So, please don't ask because I've given you enough to understand how it works. Chuck Runquist If you hear the sound of hoofbeats, think horses not zebras.
-------------------------
|
||||||
|
|
||||||
|
|
||||||
|
Thank you very much for the quick response. Rüdiger Wilke ------------------------- Well, it's past the point where we can make any changes in the code, but we can still make changes to the Easter Egg! |
||||||
|
|
||||||
|
|
||||||
|
Roger, Chuck Runquist I know that you believe you understand what you think I said, but, I am not sure you realize that what you heard is not what I meant.. -------------------------
|
||||||
|
|
||||||
|
|
||||||
|
Quote: None of them. These files are pure Vocabulary. There is no acoustic or language model information contained in any of these files. The Acoustic Model is located in the following folder: C:\Documents and Settings\All Users\Application Data\Nuance\NaturallySpeaking9\Users\\current\voice the Acoustic Model is "acoustic.enh" along with its associated pronunciation tables. In my ..\Users\Matthew\current\ directory are several numbered folders (e.g. 0_1, 0_10, 0_17) that contain different acoustic.enh files (comp shows mismatches). I'm not sure how to releate these directory names/numbers to the names in the "Manage users" list, but it seems they're listed in order in the acoustic.ini file.
|
||||||
|
|
||||||
|
|
||||||
|
Quote: In my ..\Users\Matthew\current\ directory are several numbered folders (e.g. 0_1, 0_10, 0_17) that contain different acoustic.enh files (comp shows mismatches). I'm not sure how to releate these directory names/numbers to the names in the "Manage users" list, but it seems they're listed in order in the acoustic.ini file. From what I remember when having different dictation sources within one user profile, this is the way the folders corresponding with the sources are numbered. And from what Chuck pointed out further above, each source has its own acoustic model, hence different .enh files, I presume. Rüdiger Wilke ------------------------- Well, it's past the point where we can make any changes in the code, but we can still make changes to the Easter Egg! |
||||||
|
|
||||||
|
|
||||||
|
my understanding exactly.
|
||||||
|
|
||||||
|
|
||||||
-------------------------
|
||||||
|
|
||||||
|
|
||||||
|
Lunis, I know it works, and I've done so following your workaround two years ago when I needed input from headset and digital recorder occasionally, but what does it mean for the acoustic models, are they getting mixed up? Apples and oranges? Rüdiger Wilke
------------------------- Well, it's past the point where we can make any changes in the code, but we can still make changes to the Easter Egg! |
||||||
|
|
||||||
|
|
||||||
-------------------------
|
||||||
|
|
||||||
FuseTalk Standard Edition v4.0 - © 1999-2013 FuseTalk™ Inc. All rights reserved.