![]() |
KnowBrainer Speech Recognition | ![]() |


|
Topic Title: Is UniVoice 9.5 an improvement over 9.1 for interviews? Topic Summary: The interviewee voice is the one I really care about. Created On: 01/29/2008 09:17 PM Status: Post and Reply |
|
![]() |
- coloradoguy | - 01/29/2008 09:17 PM |
![]() |
- Chucker | - 01/29/2008 10:19 PM |
![]() |
- coloradoguy | - 01/30/2008 12:31 AM |
![]() |
- Chucker | - 01/30/2008 01:49 AM |
![]() |
- RockinOut | - 01/30/2008 03:05 AM |
![]() |
- Lunis Orcutt | - 01/30/2008 06:59 PM |
|
|
|
|
Please forgive me if this question has been asked before (I did do a forum search first, and it doesn't look like the question has been asked since 9.5 was released), but is UniVoice 9.5 an improvement over the previous versions when it comes to transcribing more than one voice? I'm a journalist who bases a lot of my work on recorded interviews. The vast majority involve only one person, and capturing their words accurately is far more important than getting mine. Therefore, can I do anything with UniVoice or my recorder to maximize the accuracy for my subjects, since I can generally recall what I asked anyway simply based on their answers? And when it comes to recording interviews, am I better off purchasing a device like the new Olympus DS-50 Digital Stereo Pro Conference Recorder? I saw it offered on your site, so I was wondering if it is really optomized for multiple voices in a way that will work well with UniVoice. (I've been transcribing cassette tapes of my interviews for years, always with the hope that someone will someday create a digital solution to my transcription nightmares.) I'm not looking for 98 percent accuracy, mind you, but the interview transcriptions at least need to be readable. I haven't made a DNS or KB purchase as yet, but I'm really hoping that UniVoice is the solution I've been waiting and hoping for. Finally, if I go ahead and take the financial plunge, is there any sort of refund/guarantee if the software can't do at all what I'm asking? |
|
|
|
|
|
|
|
|
The basic answer to your question is NO. There are 5 basic speech models in DNS: (1) Acoustic Model, (2) Vocabulary, (3) bigram Language Model, (4) trigram Language Model, and (5) Quadgrams Language Model. UniVoice is basically the Acoustic Model. No matter what changes are made to the Acoustic Model (i.e., between UniVoice 9.1 and 9.5), no changes can change the way that the Acoustic Model interacts with the other speech models. That is, there is nothing that can be done to the Acoustic Model that will improve the accuracy with regard to multiple speakers. DNS is simply not designed to do this even though you will achieve a certain amount of success in doing so. There would have to be major changes to the other speech models, which only Nuance can do. UniVoice uses the basic DNS user profile creation procedures simply using different speech data. Although a speaker independent user profile (Acoustic Model – UniVoice) will achieve somewhat better results, it won't change the fact that DNS is not designed to handle multiple speakers. In addition, DNS also provide you with an independent speaker profile. When you create a new user in DNS 9.0/9.1/9.5 and you elect the option to "Skip initial training for this user" you are essentially creating an independent speaker profile that is essentially the same as UniVoice except that it is based on the speech data that Nuance uses to create such as opposed to the speech data that KnowBrainer has used. Which one is better for standard single speaker use is a matter of personal experience. Some users find UniVoice better suited for their purposes. Others find the speaker independent profile created by using the above option and creating a DNS speaker profile better suited for theirs. There are pros and cons in using either approach. Nevertheless, you won't find any better recognition of multiple speakers regardless of which one you choose. Chuck Runquist "We are all victims of mythology in one way or another. We are the inheritors, and many times the propagators, of a desire to believe what we want to believe, regardless of whether or not it is true." -- J.V. Stewart
-------------------------
|
|
|
|
|
|
|
|
|
Thanks for the thoughtful reply, Chuck. I asked the question after reading online -- on more than one site -- that UniVoice is the first VR software designed to handle multiple speakers (http://speechwiki.org/SR/UniVoice.html) -- but now I'm not so convinced. After reading your reply, it sounds like some people prefer to use DNS "untrained" for each new speaker, while others prefer using UniVoice. But if that's the case, then what, exactly, is the benefit of UniVoice over DNS alone? Why spend the money on UniVoice if I might actually prefer DNS alone? And if DNS 9 is actually better than 97 percent accurate without any training at all (http://www.nytimes.com/2006/07/20/technology/20pogue.html), then as long as it can perform at such a reasonable level for each new interview, I might decide that's good enough. Again, I read that UniVoice was designed with tasks like interviews in mind (http://www.knowbrainer.com/PubForum/index.cfm?page=viewForumTopic&topicId=41&pageNo=1#thread193), though I'm beginning to think that those initials tests have not held up.
|
|
|
|
|
|
|
|
|
Let me explain a couple of things about speaker independent user profiles. First, a user profile, while consisting of vocabulary, Language Models, and the Acoustic Model, consists of primarily the Acoustic Model. This is a collection of speech data that includes the following components: Second, the degree of speaker independence and the accuracy of the corresponding Acoustic Model creation is dependent on the number of speakers, as well is their gender, topic, age, and general dictation style (diction, enunciation, pronunciation, and clarity of their speech). I don't know how many speakers, how much speech data, what topics (i.e., business, technology, sports, broadcast news, financial, etc. ad infinitum), the age/gender of the speakers used, or the general dictation style were used in the creation of UniVoice. However, one of the reasons why ScanSoft acquired Nuance, and it is still ScanSoft even though they are using the Nuance name and logo because ScanSoft acquired Nuance not the other way around, primarily because of the tremendous amount of speech data that Nuance had collected over time. As regards the speaker independent user profile which you can create in DNS 9 using the "Skip initial training for this user," the total combined speech data collected as far back as 1999 (Dragon Systems) from DARPA and all the speech data collected by L&H and Nuance consists of over 15,000 speakers and the corresponding text as noted above, which consists of documents containing a total of over 10 million words. The speakers from whom this data was compiled is a combination of end-user, newscasters, professional speakers, men and women of all ages (including teenagers). I would rather doubt that KnowBrainer had or has access to this much speech data. Saying that, I'm not saying that a good Acoustic Model for an independent speaker user profile can't be compiled from a smaller source of speech data. What I am saying is two things: As regards multiple speakers, a speaker independent user profile (Acoustic Model) may work reasonably well for multiple speakers under the following conditions: Given all of this, will a speaker independent Acoustic Model work with multiple speakers. Yes, to the degree that 1, 2, and 3 above are carefully monitored and controlled to provide optimal recording and transcription. However, the accuracy is very likely to be significantly less than 92 or 93%. I would guess that the overall accuracy would be in the range of between 80 and 85%. Nevertheless, it's worth experimenting with. In addition, the information contained on the SpeechWiki website is somewhat outdated and not completely accurate as I've already explained above. Is it worth trying the DNS (Skip initial training for this user) user profile a viable and less expensive alternative to UniVoice. Absolutely. Unfortunately, I have yet to see even UniVoice 9.5 perform adequately with regard to SilentAdapt. I'm not claiming that it doesn't work with SilentAdapt. However, in my testing of both the DNS independent speaker profile and the UniVoice user profile, the DNS profile seems to improve in accuracy at a rate of about 30 to 40% faster than UniVoice. This applies even to UniVoice 9.5. However, there are some caveats here on both sides of the fence. My testing is my voice with my knowledge and skill with speech recognition based on over 10 years of experience as both part of the programming team for DNS, as well is an independent user, trainer, consultant, and demonstrator to over 3000 user groups nationwide. So, that's a variable that has to be taken into consideration. The other side of the coin is that some users seem to be able to get UniVoice to work exceedingly well for them. However, that only proves my point; that any set up or configuration will work well for some, and not so well for others. So is there any advantage of a DNS independent speaker over the UniVoice independent speaker. Absolutely not. Each is equally viable, has its own place, and will work well for some and not for others. So the bottom line is, as I have always said, "If it ain't broke, don't fix it." Whatever works for you is what you should use, and if it is broke (i.e., it doesn't work for you), either fix it or don't use it. My purpose here is not to laud one product over the other, or to put down and criticize either. My purpose here is to point out that regardless of which speaker independent Acoustic Model you use, how it was created and how well it works is dependent upon the factors that I have outlined above. Whether one works better than the other, or vice a versa, is a matter of which variables impact on each. The main point here is simply to point out the fact that speaker independent Acoustic Model's will only get you so far with regard to recording and transcribing interviews, and that there are limitations to this process which neither DNS nor UniVoice will or can overcome or compensate for. In other words, which one is better is irrelevant, moot, and a complete non sequitur. "We are all victims of mythology in one way or another. We are the inheritors, and many times the propagators, of a desire to believe what we want to believe, regardless of whether or not it is true." -- J.V. Stewart -------------------------
|
|
|
|
|
|
|
|
|
Chucker's input is definitely a reason why this forum is so awesome, and why I'm addicted. I got weaned from Headfi.org, now I'm addicted to the KnowBrainer forum. Thanks a lot people. Knowing how something works is so important, especially if we want to make things work better or more customized for our own needs and goals. A big problem I find with DNS is that there is a lack of information by the manufacturer on things. The same thing can be said about much software nowadays. Remember when things actually came with a comprehensive written, hard-copy user manual. Those were the days. Unless people are aware of this forum, they wouldn't even know that an "independent user file" as Chucker calls it exists, To coloradoguy: I love voice recognition, and I love to talk about it with friends and acquantainces. So whenever people come over my house or my work, I let them play with a "fresh" UniVoice profile. I haven't taken "numbers" on this because research is my work, and I leave that at work. But for most of my friends and acquantainces, the UniVoice is excellent with better accuracy than if they go through the initial DNS training. I've never knew about the untrained "independent user file" that Chucker discusses because I'm the typical end-user. I've tried this UniVoice demonstration with males, females, and kids. Some have Southern accents, Boston accents, New York accents, and Asian-American and Mexican-American accents (first generation immigrant English speakers). All do well, except the the first generational Asian- and Mexican-American accents have a very hard time (I live in a very diverse area). The thicker the accent, the worst the results, but not always. Always exceptions, why this is I have no clue. I don't have any experience with recorders and those DNS features. Zero, zilcho. But if I required to record and transcribe as you did I would definitely give UniVoice a try depending on your needs for accuracy. There are some caveats and Chucker shares this "insider info" so well. In addition, KnowBrainer has a good return and satisfaction policy. I don't think this forum and the KnowBrainer brand would last long, if all they cared about was taking your money without delivering a good product.Your satisfaction depends on the amount of accuracy you require. It depends on hardware. It depends on the dictation and enunciation of the person being interviewed. These are issues that are present in any type of voice recognition regardless of UniVoice usage. I do research in public health, and often conduct interviews and health assessments in the community. I think I would get reasonably good results if I gave the person a good microphone connected to a proper recorder (Again I'm admitting my ignorance in this area, I'm just explaining what I would try if I had this need) and asked the person to speak clearly and enunciate since I would be recording. Conversational dictation is difficult with any voice recognition system, but if all you require is "readable" then it might work. Also the more "standard" the accents and verbal language, the better the results. I would not get good results with some of the clients that I serve as they utilize non-standard verbal language: local/regional/cultural slang, inflections, tone, and accessory sounds/noises (grunts, wheezes,lisps, etc). I base the above paragraph based on your requirement to have things merely readable which I think can be accomplished if attention is paid to all the caveats mentioned in this thread. Again it depends on what your requirements are for accuracy. When I perform the "here give my voice recognition system a try" demonstration with visitors, I usually just tell them to speak clearly and enunciate. The results are excellent and everyone's impressed. No it's not perfect, but pretty darn good, and like I said, I've had more luck and more fun with it than having the visitor do the DNS initial training. But for those with Asian or Mexican Spanish accents, result are not good. I've also tried with my native British and Irish friends and coworkers. Again, not great results. The thicker the accent, the worst the results. This is not new information, but is well disclosed by many responses by Lunis to customers' pre-purchase concerns and questions. For the type of interview situations in my work, I personally use my own system of alphabetic and symbolic shorthand and utilize "standard" medical abbreviation. I've never used a recorder as my shorthand was always good enough for my needs in school and work. That's my experience with UniVoice. I'm sure Lunis (KnowBrainer) will let you know his views and recommendations regarding your needs. Utimately, the success of the company depends on the satisfaction of its customers, and the existence and livelyness of this forum itself shows it's commitment to potential customers, loyal customers, and the entire voice recognition community. ------------------------- 111MacBook Pro, Mac Pro, or Mac Mini --> OS X Leopard --> VMware Fusion --> XP SP2 Home --> KnowBrainer 2007 Command Software, DNS 9.5 Preferred, Steelcase Leap Chair. Current: Revolabs xTag. Previous: KnowBrainer Hybrid Plantronics CS55. |
|
|
|
|
|
|
|
|
UniVoice is a non-gender specific pretrained multiuser profile. The idea is to make it universal and although we initially toyed with the idea of using UniVoice in an interview situation, it simply didn't work well. We actually got it to work reasonably well in a couple of experiments but when we got serious with our experimentation we found the accuracy to be unacceptable and we think you'd probably be just as well of using Chuck’s recommendation for creating a new user profile and skipping training. UniVoice will work better in situations where you have multiple single user profiles; such as a classroom.
-------------------------
|
|
|
|
|
FuseTalk Standard Edition v4.0 - © 1999-2013 FuseTalk™ Inc. All rights reserved.