KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: Is UniVoice 9.5 an improvement over 9.1 for interviews?
Topic Summary: The interviewee voice is the one I really care about.
Created On: 01/29/2008 09:17 PM
Status: Post and Reply
Linear : Threading : Single : Branch
 Is UniVoice 9.5 an improvement over 9.1 for interviews?   - coloradoguy - 01/29/2008 09:17 PM  
 Is UniVoice 9.5 an improvement over 9.1 for interviews?   - Chucker - 01/29/2008 10:19 PM  
 Is UniVoice 9.5 an improvement over 9.1 for interviews?   - coloradoguy - 01/30/2008 12:31 AM  
 Is UniVoice 9.5 an improvement over 9.1 for interviews?   - Chucker - 01/30/2008 01:49 AM  
 Is UniVoice 9.5 an improvement over 9.1 for interviews?   - RockinOut - 01/30/2008 03:05 AM  
 Is UniVoice 9.5 an improvement over 9.1 for interviews?   - Lunis Orcutt - 01/30/2008 06:59 PM  
Keyword
 01/29/2008 09:17 PM
User is offline View Users Profile Print this message


coloradoguy
Junior Member

Posts: 2
Joined: 01/29/2008

Please forgive me if this question has been asked before (I did do a forum search first, and it doesn't look like the question has been asked since 9.5 was released), but is UniVoice 9.5 an improvement over the previous versions when it comes to transcribing more than one voice? I'm a journalist who bases a lot of my work on recorded interviews. The vast majority involve only one person, and capturing their words accurately is far more important than getting mine. Therefore, can I do anything with UniVoice or my recorder to maximize the accuracy for my subjects, since I can generally recall what I asked anyway simply based on their answers? And when it comes to recording interviews, am I better off purchasing a device like the new Olympus DS-50 Digital Stereo Pro Conference Recorder? I saw it offered on your site, so I was wondering if it is really optomized for multiple voices in a way that will work well with UniVoice. (I've been transcribing cassette tapes of my interviews for years, always with the hope that someone will someday create a digital solution to my transcription nightmares.)

I'm not looking for 98 percent accuracy, mind you, but the interview transcriptions at least need to be readable. I haven't made a DNS or KB purchase as yet, but I'm really hoping that UniVoice is the solution I've been waiting and hoping for. Finally, if I go ahead and take the financial plunge, is there any sort of refund/guarantee if the software can't do at all what I'm asking?

 01/29/2008 10:19 PM
User is offline View Users Profile Print this message

Author Icon
Chucker
Top-Tier Member

Posts: 11237
Joined: 10/10/2006

The basic answer to your question is NO.  There are 5 basic speech models in DNS: (1) Acoustic Model, (2) Vocabulary, (3) bigram Language Model, (4) trigram Language Model, and (5) Quadgrams Language Model.

UniVoice is basically the Acoustic Model.  No matter what changes are made to the Acoustic Model (i.e., between UniVoice 9.1 and 9.5), no changes can change the way that the Acoustic Model interacts with the other speech models.  That is, there is nothing that can be done to the Acoustic Model that will improve the accuracy with regard to multiple speakers.  DNS is simply not designed to do this even though you will achieve a certain amount of success in doing so.

There would have to be major changes to the other speech models, which only Nuance can do.  UniVoice uses the basic DNS user profile creation procedures simply using different speech data.  Although a speaker independent user profile (Acoustic Model – UniVoice) will achieve somewhat better results, it won't change the fact that DNS is not designed to handle multiple speakers.

In addition, DNS also provide you with an independent speaker profile.  When you create a new user in DNS 9.0/9.1/9.5 and you elect the option to "Skip initial training for this user" you are essentially creating an independent speaker profile that is essentially the same as UniVoice except that it is based on the speech data that Nuance uses to create such as opposed to the speech data that KnowBrainer has used.  Which one is better for standard single speaker use is a matter of personal experience.  Some users find UniVoice better suited for their purposes.  Others find the speaker independent profile created by using the above option and creating a DNS speaker profile better suited for theirs.  There are pros and cons in using either approach.  Nevertheless, you won't find any better recognition of multiple speakers regardless of which one you choose.

Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS

"We are all victims of mythology in one way or another. We are the inheritors, and many times the propagators, of a desire to believe what we want to believe, regardless of whether or not it is true." -- J.V. Stewart

 



-------------------------

 01/30/2008 12:31 AM
User is offline View Users Profile Print this message


coloradoguy
Junior Member

Posts: 2
Joined: 01/29/2008

Thanks for the thoughtful reply, Chuck. I asked the question after reading online -- on more than one site -- that UniVoice is the first VR software designed to handle multiple speakers (http://speechwiki.org/SR/UniVoice.html) -- but now I'm not so convinced. After reading your reply, it sounds like some people prefer to use DNS "untrained" for each new speaker, while others prefer using UniVoice. But if that's the case, then what, exactly, is the benefit of UniVoice over DNS alone? Why spend the money on UniVoice if I might actually prefer DNS alone? And if DNS 9 is actually better than 97 percent accurate without any training at all (http://www.nytimes.com/2006/07/20/technology/20pogue.html), then as long as it can perform at such a reasonable level for each new interview, I might decide that's good enough. Again, I read that UniVoice was designed with tasks like interviews in mind (http://www.knowbrainer.com/PubForum/index.cfm?page=viewForumTopic&topicId=41&pageNo=1#thread193), though I'm beginning to think that those initials tests have not held up.
 01/30/2008 01:49 AM
User is offline View Users Profile Print this message

Author Icon
Chucker
Top-Tier Member

Posts: 11237
Joined: 10/10/2006

Let me explain a couple of things about speaker independent user profiles.

First, a user profile, while consisting of vocabulary, Language Models, and the Acoustic Model, consists of primarily the Acoustic Model.  This is a collection of speech data that includes the following components:
 
1.  A large collection of speech from a variety of speakers dictating for at least 15 to 20 minutes.
 
2.  The corresponding text that is the proofed and corrected which is based on the dictation of each of the speakers used to compile this Acoustic Model.  The text must correspond exactly to the speaker's dictation.

Second, the degree of speaker independence and the accuracy of the corresponding Acoustic Model creation is dependent on the number of speakers, as well is their gender, topic, age, and general dictation style (diction, enunciation, pronunciation, and clarity of their speech).

I don't know how many speakers, how much speech data, what topics (i.e., business, technology, sports, broadcast news, financial, etc. ad infinitum), the age/gender of the speakers used, or the general dictation style were used in the creation of UniVoice.  However, one of the reasons why ScanSoft acquired Nuance, and it is still ScanSoft even though they are using the Nuance name and logo because ScanSoft acquired Nuance not the other way around, primarily because of the tremendous amount of speech data that Nuance had collected over time.

As regards the speaker independent user profile which you can create in DNS 9 using the "Skip initial training for this user," the total combined speech data collected as far back as 1999 (Dragon Systems) from DARPA and all the speech data collected by L&H and Nuance consists of over 15,000 speakers and the corresponding text as noted above, which consists of documents containing a total of over 10 million words.  The speakers from whom this data was compiled is a combination of end-user, newscasters, professional speakers, men and women of all ages (including teenagers).  I would rather doubt that KnowBrainer had or has access to this much speech data. Saying that, I'm not saying that a good Acoustic Model for an independent speaker user profile can't be compiled from a smaller source of speech data.  What I am saying is two things:
 
1.  ScanSoft/Nuance had been working on a speaker independent user profile (Acoustic Model) for several years.  The project was started at L&H prior to the acquisition by ScanSoft, but put on the back burner because of the bankruptcy issues and cutbacks resulting from a year and a half of Chapter 11.  It was picked up again after the completion of the reorganization of the ScanSoft acquisition of the L&H assets.  It should also be noted that L&H had approximately 4000 linguists doing language translation in 23 different languages, from which a lot of speech data was gathered.  Regardless of all this, ScanSoft released their speaker independent Acoustic Model with the initial release of the Beta of DNS 9 in February of 2006, as well as the final release in June of 2006, both of which predated UniVoice.
 
2.  You can't make a claim for absolute speaker independence because there are too many variables that impact on the overall compilation of the Acoustic Model under these circumstances.  For example, if you have a collection of 10 speakers that are between 40 and 60, and 10 speakers that are between 10 and 21, you introduce some age related variables.  It's unavoidable and inevitable.  The same applies to gender, accent, and any other factor introduced in the speech data.  What you can claim, and what is generally true is that the proper balancing of all of these variables will tend to create an Acoustic Model that is reasonably speaker independent.  However, any claims with regard to such being an absolute is simply and flat out false.  In other words, you can't combine 86 octane gasoline and 93 octane gasoline and claim that the result will work in all cars under all conditions.  That's the simplest analogy I can find.  The proof in the pudding is that UniVoice and the DNS speaker independent Acoustic Model's work well for some users and not for others.  They work well under some conditions and not others.

As regards multiple speakers, a speaker independent user profile (Acoustic Model) may work reasonably well for multiple speakers under the following conditions:
 
1.  Only one speaker is speaking at a time.  If two or more speakers speak over each other, no amount of speaker independence is going to work.
 
2.  Each speaker is speaking in terms of "structured speech" and not conversational speech.  No speaker independent user profile (Acoustic Model) will accurately transcribe conversational speech.  That won't come until the advent of neural networking and artificial intelligence, which will require significantly more computer power than is currently available.  This should be obvious to every user.
 
3.  Even given that #s 1 and 2 above are optimal as specified, you still have the problems associated with the quality of the hardware (how the speakers are recorded, what hardware is used, distance of the microphone from the speaker, the quality of the hardware used for transcription, the volume of the speakers voice as well is the clarity, how clearly does the speaker enunciate their words, and whether or not the speaker mumbles, slurs their words, or runs their words together.

Given all of this, will a speaker independent Acoustic Model work with multiple speakers.  Yes, to the degree that 1, 2, and 3 above are carefully monitored and controlled to provide optimal recording and transcription.  However, the accuracy is very likely to be significantly less than 92 or 93%.  I would guess that the overall accuracy would be in the range of between 80 and 85%.  Nevertheless, it's worth experimenting with.

In addition, the information contained on the SpeechWiki website is somewhat outdated and not completely accurate as I've already explained above.

Is it worth trying the DNS (Skip initial training for this user) user profile a viable and less expensive alternative to UniVoice.  Absolutely.  Unfortunately, I have yet to see even UniVoice 9.5 perform adequately with regard to SilentAdapt.  I'm not claiming that it doesn't work with SilentAdapt.  However, in my testing of both the DNS independent speaker profile and the UniVoice user profile, the DNS profile seems to improve in accuracy at a rate of about 30 to 40% faster than UniVoice.  This applies even to UniVoice 9.5.  However, there are some caveats here on both sides of the fence.  My testing is my voice with my knowledge and skill with speech recognition based on over 10 years of experience as both part of the programming team for DNS, as well is an independent user, trainer, consultant, and demonstrator to over 3000 user groups nationwide.  So, that's a variable that has to be taken into consideration.  The other side of the coin is that some users seem to be able to get UniVoice to work exceedingly well for them.  However, that only proves my point; that any set up or configuration will work well for some, and not so well for others.  So is there any advantage of a DNS independent speaker over the UniVoice independent speaker.  Absolutely not.  Each is equally viable, has its own place, and will work well for some and not for others.  So the bottom line is, as I have always said, "If it ain't broke, don't fix it."  Whatever works for you is what you should use, and if it is broke (i.e., it doesn't work for you), either fix it or don't use it.

My purpose here is not to laud one product over the other, or to put down and criticize either.  My purpose here is to point out that regardless of which speaker independent Acoustic Model you use, how it was created and how well it works is dependent upon the factors that I have outlined above.  Whether one works better than the other, or vice a versa, is a matter of which variables impact on each.  The main point here is simply to point out the fact that speaker independent Acoustic Model's will only get you so far with regard to recording and transcribing interviews, and that there are limitations to this process which neither DNS nor UniVoice will or can overcome or compensate for.  In other words, which one is better is irrelevant, moot, and a complete non sequitur.

Chuck Runquist
Former Dragon NaturallySpeaking SDK & Senior Technical Solutions PM for DNS

"We are all victims of mythology in one way or another. We are the inheritors, and many times the propagators, of a desire to believe what we want to believe, regardless of whether or not it is true." -- J.V. Stewart



-------------------------

 01/30/2008 03:05 AM
User is offline View Users Profile Print this message

Author Icon
RockinOut
Senior Member

Posts: 485
Joined: 07/27/2007

Chucker's input is definitely a reason why this forum is so awesome, and why I'm addicted. I got  weaned from Headfi.org, now I'm addicted to the KnowBrainer forum.  Thanks a lot people.

Knowing how something works is so important, especially if  we want to make things work better or more customized for our own needs and goals. A big problem I find with DNS is that there is a lack of information by the manufacturer on things. The same thing can be said about much software nowadays. Remember when things actually came with a comprehensive written, hard-copy user manual. Those were the days.

Unless people are aware of this forum, they wouldn't even know that an "independent user file" as Chucker calls it exists,
much less that it would improve over time.

Most newbies, including me "logically" would think that more training would lead to better user profiles. Obviously those
that surf this forum know this to be false.

To coloradoguy:

What Chucker states is congruent with my experiences with UniVoice. Don't forget it is designed for "North American" English accents only. I have a very "network TV" voice.

I love voice recognition, and I love to talk about it with friends and acquantainces. So whenever people come over my house or my work, I let them play with a "fresh" UniVoice profile. I haven't taken "numbers" on this because research is my work, and I leave that at work.

But for most of my friends and acquantainces, the UniVoice is excellent with better accuracy than if they go through the initial DNS training. I've never knew about the untrained "independent user file" that Chucker discusses because I'm the typical end-user.

I've tried this UniVoice demonstration with males, females, and kids. Some have Southern accents, Boston accents, New York accents, and Asian-American and Mexican-American accents (first generation immigrant English speakers). All do well, except the the first generational Asian- and Mexican-American accents have a very hard time (I live in a very diverse area). The thicker the accent, the worst the results, but not always. Always exceptions, why this is I have no clue.

I don't have any experience with recorders and those DNS features. Zero, zilcho.

But if I required to record and transcribe as you did I would definitely give UniVoice a try depending on your needs for accuracy. There are some caveats and Chucker shares this "insider info" so well.

In addition, KnowBrainer has a good return and satisfaction policy. I don't think this forum and the KnowBrainer brand would last long, if all they cared about was taking your money without delivering a good product.Your satisfaction depends on the amount of accuracy you require. It depends on hardware. It depends on the dictation and enunciation of the person being interviewed. These are issues that are present in any type of voice recognition regardless of UniVoice usage.

I do research in public health, and often conduct interviews and health assessments in the community. I think I would get reasonably good results if I gave the person a good microphone connected to a proper recorder (Again I'm admitting my ignorance in this area, I'm just explaining what I would try if I had this need) and asked the person to speak clearly and enunciate since I would be recording. Conversational dictation is difficult with any voice recognition system, but if all you require is "readable" then it might work.

Also the more "standard" the accents and verbal language, the better the results. I would not get good results with some of the clients that I serve as they utilize non-standard verbal language: local/regional/cultural slang, inflections, tone, and accessory sounds/noises (grunts, wheezes,lisps, etc).

I base the above paragraph based on your requirement to have things merely readable which I think can be accomplished if attention is paid to all the caveats mentioned in this thread.

Again it depends on what your requirements are for accuracy.

When I perform the "here give my voice recognition system a try" demonstration with visitors, I usually just tell them to speak clearly and enunciate. The results are excellent and everyone's impressed. No it's not perfect, but pretty darn good, and like I said, I've had more luck and more fun with it than having the visitor do the DNS initial training. But for those with Asian or Mexican Spanish accents, result are not good. I've also tried with my native British and Irish friends and coworkers. Again, not great results. The thicker the accent, the worst the results. This is not new information, but is well disclosed by many responses by Lunis to customers' pre-purchase concerns and questions.

For the type of interview situations in my work, I personally use my own system of alphabetic and symbolic shorthand and utilize "standard" medical abbreviation. I've never used a recorder as my shorthand was always good enough for my needs in school and work.

That's my experience with UniVoice.

I'm sure Lunis (KnowBrainer) will let you know his views and recommendations regarding your needs. Utimately, the success of the company depends on the satisfaction of its customers, and the existence and livelyness of this forum itself shows it's commitment to potential customers, loyal customers, and the entire voice recognition community. 



-------------------------
1111MacBook Pro, Mac Pro, or Mac Mini --> OS X Leopard --> VMware Fusion --> XP SP2 Home --> KnowBrainer 2007 Command Software, DNS 9.5 Preferred, Steelcase Leap Chair. Current: Revolabs xTag. Previous: KnowBrainer Hybrid Plantronics CS55.
 01/30/2008 06:59 PM
User is online View Users Profile Print this message

Author Icon
Lunis Orcutt
Top-Tier Member

Posts: 25850
Joined: 10/01/2006

UniVoice is a non-gender specific pretrained multiuser profile. The idea is to make it universal and although we initially toyed with the idea of using UniVoice in an interview situation, it simply didn't work well. We actually got it to work reasonably well in a couple of experiments but when we got serious with our experimentation we found the accuracy to be unacceptable and we think you'd probably be just as well of using Chuck’s recommendation for creating a new user profile and skipping training. UniVoice will work better in situations where you have multiple single user profiles; such as a classroom.

-------------------------


Click KB 2012 REV D to Download a 30 Day Evaluation of KnowBrainer 2012
Click SpeechStart+ to Download a 15 Day Evaluation of SpeechStart+

Statistics
28522 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 3 users logged in.
The most users ever online was 2028 on 04/05/2013 at 07:36 PM.
There are currently 144 guests browsing this forum, which makes a total of 147 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2014 FuseTalk™ Inc. All rights reserved.