KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: test
Topic Summary:
Created On: 02/27/2022 07:22 PM
Status: Post and Reply
Linear : Threading : Single : Branch
 test   - David.P - 02/27/2022 07:22 PM  
 test   - R. Wilke - 02/28/2022 10:38 AM  
 02/27/2022 07:22 PM
User is offline View Users Profile Print this message

Author Icon
Top-Tier Member

Posts: 639
Joined: 10/05/2006


Forum member GeoffinDarwin has suggested that a new thread should be created in order to separate the present subject from the recent latency thread, and in order to better differentiate between measuring performance of NaturallySpeaking (using the Speakometer tool) on one hand, and measuring the accuracy of NaturallySpeaking (using for example the method below) on the other hand.


I'm happy to follow this suggestion. Therefore below, a method for objectively measuring the actual (relative) accuracy of different DNS user files.


There is no reproducible way to reliably measure absolute accuracy figures and to compare them between different people using NaturallySpeaking. This is because the absolute accuracy level depends much more on the respective user's current "enunciating fitness" than on anything else.


Therefore, at one time a given person might get 99.9% accuracy, and the next day, using the same text and the same equipment, that same person might get only 99.7 (which is already a 200% difference in accuracy compared to 99.9%), or even a much lower accuracy figure. Therefore, the error ranges with such absolute accuracy measurements are way too large for absolute measurements to be of much significance and validity. 


It is however perfectly possible to produce objective and reproducible, relative measurements between several user files of the same person. This way, it is actually quite easy to find out, for example, whether the accuracy of a user file has deteriorated, or whether it has improved, over time.


The method goes like below. This is the method at its current stage, so expect some changes or improvements over time. 


  1. Open or create a document with one or two pages of typical text of yours. Make sure it does not contain unknown words (e.g. by running it through the DNS voctool).
  2. Make a high quality sound recording of yourself reading that text (using e.g. the free Audacity recorder).
  3. Save the recording as a standard WAV file.
  4. Now use different sets or versions of your NaturallySpeaking user files and have NaturallySpeaking produce one automatic transcription of that recording for each of these user files. Save each transcription's text result in a separate file.
  5. After this, using the original document (from step 1) as a reference file, compare each of the obtained files (from step 4) to your original document, preferably using the Microsoft Word markup function "Tools -> Compare documents".
  6. Then count the number of differences (=recognition errors) in each of the markup documents that Word has created. This already shows which user file is better. Next, you can calculate exactly how much better.

  7. Define one of your user files as the reference or starting point for the accuracy comparison. Then calculate the fraction (proportion or percentage) between the number of errors in this reference user file, relative to the number of errors in each of the other transcriptions (from step 4). (Absolute values will not have much significance since they depend on the way you enunciated when you made the recording, more than on anything else).

  8. Ready. What you get is (fractional or percentage, if you like) values that show how your different user files are performing relative to each other, regarding accuracy.

Using this method, I actually found my current user files to be twice as accurate (actually exactly 2.09 times as accurate) than what they were in their original state (which means that my current user files produced less than half the number of errors as compared to the error rate they had after the respective user just had been created.)


The interesting fact is that, before having carried out the above measurements, subjectively I had the feeling that my user files actually gotten worse over time. Similar experiences are reported also often by other users -- "user file degradation" almost has grown to be an accepted and unavoidable (mythical) fact for many. However, in my case these measurements proved that, contrary to this "feeling" of accuracy degradation, it was clearly not the case -- instead, the quality of my user files actually had doubled!


Therefore, my suggestion is that a new term called "Usage Degradation" (in addition, and contrary to "User File Degradation") could be introduced. This is of course a "soft" and mainly psychological variable that might be explained by "gradually decreasing efforts" over time, of a person using NaturallySpeaking, to properly and clearly enunciate - every single day anew. Such "Usage Degradation" effects aren't done on purpose -- but I'm sure that they actually do happen, unconsciously.


In order to better understand such effects, one might compare them to the phenomenon of initially greater "communication efforts" that we undertake when we meet someone for the first time ( not only applied to a human person, but also to a new user file or to a new version of NaturallySpeaking), as well as to the phenomenon of such efforts subtly and imperceptibly "degrading" over time, as we get to know that person better and better.


Anyway, this natural way of "communication efforts degradation" of course still only (sort of) works with people, but surely not with computer programs, and therefore might be one possible explanation and reason for the "user file degradation" myth, or feeling.


BTW, my measurements also showed that (with my current user files) it didn't make any difference regarding accuracy where the speed vs. accuracy slider was set -- only regarding speed. Therefore, it turned out to be perfectly OK to set the slider to the position where NaturallySpeaking speed is fastest, which turned out to be not 0% but 25% -- the latter measurements carried out using NaturallySpeakometer.


Hth & Regards David.P




Sennheiser MKH Mic
Visual & Acoustic Feedback + Automatic Mic Control

 02/28/2022 10:38 AM
User is offline View Users Profile Print this message

Author Icon
R. Wilke
Top-Tier Member

Posts: 7840
Joined: 03/04/2007

Therefore, at one time a given person might get 99.9% accuracy, and the next day, using the same text and the same equipment, that same person might get only 99.7 (which is already a 200% difference in accuracy compared to 99.9%), or even a much lower accuracy figure. Therefore, the error ranges with such absolute accuracy measurements are way too large for absolute measurements to be of much significance and validity.

Effectively, you can't speak the same thing twice. So let's do away with real-world, free-form testing, unless you want to benchmark the user, and turn onto using "canned" input for viable, reproducible testing instead.

Your description of the methodology required to accomplish this is spot on. However, why reinvent the wheel? It has all been done already. Many years ago, power user Phil Schaadt made an incredible effort in spending hours and days in doing all the testing, provided with my tooling. The results have been published here on this forum, and I'm just too lazy to dig up the particular threads. Basically, he used two different tools, my benchmarking tool which you should be already aware of, and another one called TextCompare, doing all the heavy lifting when it comes to calculating the word error rate.

Both utilities have in common that there are based on interfaces and methods provided by the Dragon SDK, thus talking to the horse's mouth directly. And no, it isn't anything I invented, I just put the ones and ones together, many ones for that matter.

To illustrate how the text comparing utility works, have a look at the screenshot below. It starts from a "reference text" and compares it to a "draft text", technical terms introduced by the SDK, but basically works exactly along the lines as you have described. In addition to this, it accounts for differences falling into three categories: deletions, insertions, and substitutions (technical terms used in speech recognition research, Wikipedia will be your friend).

Actually, I first created this tool in 2014, hadn't touched it for years until last year when I gave it to someone and it needed a minor fix, but astonishingly, it still works in DPI and DPG 15. Although there may be problems with certain audio file formats, which were less critical in previous Dragon versions.

If you're interested in a copy, let me know.

However, to emphasise one of your findings, a user profile typically does not degrade, but improves, at least potentially, over time if treated correctly by correcting, saving and optimising. This is what all the relevant testing has shown.




No need to buy if all you want to do is try ...

DragonCapture KB Download (Latest)
DragonCapture Homepage

KnowBrainer Speech Recognition » Website Comments » test

32307 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 2 users logged in.
The most users ever online was 12124 on 09/09/2020 at 04:59 AM.
There are currently 372 guests browsing this forum, which makes a total of 374 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2022 FuseTalk™ Inc. All rights reserved.