KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: Whisper vs Dragon
Topic Summary:
Created On: 11/24/2022 03:21 AM
Status: Post and Reply
Linear : Threading : Single : Branch
 Whisper vs Dragon   - marc_vie - 11/24/2022 03:21 AM  
 Whisper vs Dragon   - Ag - 11/25/2022 02:46 PM  
 Whisper vs Dragon   - Stephan Kuepper - 11/28/2022 02:57 AM  
 Whisper vs Dragon   - rjwilmsi - 01/30/2023 02:06 PM  
 Whisper vs Dragon   - Lunis Orcutt - 01/30/2023 03:14 PM  
 Whisper vs Dragon   - Stephan Kuepper - 01/31/2023 03:14 AM  
 Whisper vs Dragon   - george_pat - 01/31/2023 10:13 AM  
 Whisper vs Dragon   - Matt_Chambers - 02/08/2023 07:09 AM  
 Whisper vs Dragon   - ax - 02/11/2023 07:03 PM  
 Whisper vs Dragon   - drrunev - 02/12/2023 02:18 PM  
Keyword
 11/24/2022 03:21 AM
User is offline View Users Profile Print this message

Author Icon
marc_vie
Power Member

Posts: 54
Joined: 07/01/2014

Has anyone here had a chance to look at or trial this automatic speech recognition (ASR) system?  Seems to hold much promise but I wonder how well it will stack up against Dragon in terms of accuracy and of course price.

https://openai.com/blog/whisper/



 11/25/2022 02:46 PM
User is offline View Users Profile Print this message

Author Icon
Ag
Top-Tier Member

Posts: 967
Joined: 07/08/2019

Isn't whisper open source?

-------------------------

DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design.

 11/28/2022 02:57 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2348
Joined: 10/04/2006

Above all, it is an API, not a product.

-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de

 01/30/2023 02:06 PM
User is offline View Users Profile Print this message


rjwilmsi
Power Member

Posts: 77
Joined: 08/24/2008

I've been playing with whisper more or less since it was first released on github https://github.com/openai/whisper/. It's freely available on github and after installation and download of the models it runs entirely offline. The installation is easy if you're a Linux user used to fiddling with python/utilities from github, probably a bit challenging if you are new to that sort of set up. Though I haven't tested the Windows installation of whisper, maybe that is packaged up so easier. I use it in conjunction with a "whisper_mic" utility to get live dictation https://github.com/mallorbc/whisper_mic's

Compared to the last versions of DNS that I was using regularly (DNS 12 and DNS 13), for general speech and dictation the accuracy of whisper is much better. I don't know if DNS 15 is much more accurate than 12 or 13, if it's just incrementally better then I'd say whisper would be clearly better. Whisper has 5 different models trading time versus accuracy, on the fastest two models its error rate is much lower than I got with DNS (for live dictation of general speech in English). On the slower models (which are too slow on CPU for real time dictation) the accuracy on everything I've played with (youtube video audio etc.) has been so good that differences versus a transcript I'd do myself are nearly all differences about punctuation (how do you spit sentences etc. from a speaker speaking off the cuff).

Because whisper is done by machine learning on big datasets of various audio sources it supports multiple languages, accents etc. so there is no concept of training it for your voice or specifying your accent. It means that it is very good at all sorts of accents and you don't have to cultivate your own custom trained profile etc. Also there is no big focus on your mic quality like with DNS.

The downsides of whisper is that it's much more resource demanding than DNS 12 - on CPU need a modern 6+ core CPU and to use the tiny or base model (fastest two). Otherwise for the larger 3 models need a workstation CPU (16 threads etc.) or better a reasonable NVIDIA graphics card to run in CUDA mode (e.g. a GTX 1060 or better), and unless you have a high end GPU (RTX etc.) then transcription would still be worse than real time on those larger models. Also there is no "training" you can do or custom vocabulary, so if you have specific terms that it hasn't got in its model it may not work so well there - though there is a "prompt" option where you can pre-feed it words likely to be in the audio (so sort-of custom vocab option). And of course its core is just speech recognition so it doesn't of itself provide any computer automation / macro functionality.

So I think if you use DNS for general dictation / transcription and find DNS's accuracy lacking then whisper is very much worth looking at. If you use DNS tied into the environment of automation, macros and specific software integration then whisper doesn't cover that.



 01/30/2023 03:14 PM
User is offline View Users Profile Print this message

Author Icon
Lunis Orcutt
Top-Tier Member

Posts: 40525
Joined: 10/01/2006

You really can't make a fair comparison until you compare it to DPI 15.61. For example, DPI 15 is less than half the size of Ver. 12 and notably more accurate



-------------------------

Change "No" to "Know" w/KnowBrainer 2022
Trial Downloads
Dragon/Sales@KnowBrainer.com 
(615) 884-4558 ex 1

 01/31/2023 03:14 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2348
Joined: 10/04/2006

Thank you rjwilmsi for your fair and unbiased evaluation of Whisper.

I have tested a few speech recognition engines that have been made available over the last year. Although for many recognition quality has been impressive, I have yet to find one that does a) Command and Control and b) allows for easy editing of the vocabulary.

As long as these features are absent, Dragon is still the main player as far as dictation is concerned. IMHO other engines are better suited to contact center monitoring, live captioning, and other applications where Command and Control doesn't matter, and where prime accuracy isn't the first consideration. I also have a hunch that dictation doesn't monetize as well as other applications, or someone would have created an alternative to Dragon long ago. However, even the cloud-based transcription services that I've tested either integrate Dragon, or are a pain to use.

Just my 2 eurocents, Stephan

-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de

 01/31/2023 10:13 AM
User is offline View Users Profile Print this message

Author Icon
george_pat
New Member

Posts: 3
Joined: 10/22/2021

rjwilmsi - where in Whisper can I find the "prompt" option where you can pre-feed it words likely to be in the audio or how do I add a custom list of words, please? I'm interested in doing more research on this.
 02/08/2023 07:09 AM
User is offline View Users Profile Print this message

Author Icon
Matt_Chambers
Top-Tier Member

Posts: 686
Joined: 08/09/2018

 02/11/2023 07:03 PM
User is offline View Users Profile Print this message

Author Icon
ax
Top-Tier Member

Posts: 676
Joined: 03/22/2012

^^^ Nice write-up. Interesting to read. Thanks for sharing.

I regret having bought my "thin-and-light" ThinkPad Carbon in late 2021. I should've exercised better sense to acquire a proper "workstation replacement" laptop with dedicated GPU ...

Not sure if anyone is going to "package up" this Whisper gig to make it conducive to install/run on Windows for "everyday enthusiasts" (whatever that means ... ChatGPT would know, which I have yet to sign up).

 02/12/2023 02:18 PM
User is offline View Users Profile Print this message

Author Icon
drrunev
Junior Member

Posts: 40
Joined: 01/23/2015

I have been using Dragon for several years. Version 15. Difficult to compare Dragon and whisper because Dragon is a full product.
Dragon is superior to all other speech recognition systems. But when it comes to the speech recognition engine of Whisper I was blown away.
Using it on my mobile and it is just incredibly accurate. Close to unbelievable compared to Google voice typing or other speech solutions on mobile.
I am using it all the time to make notes if I get an idea, dictating outside when hiking. I found it extremely useful. You always get full sentences so I suspect it might be using Transformer architecture like GPT -3 predicting what you are going to say informed by your speech. Sometimes there are sentences at the end which I did not say.
Since 2020 I have been thinking that Nuance should integrate large language model technology into their product. Given what you have been writing so far, it should predict the upcoming text (next most probably text token) highly informed by the audio(what you are actually saying).
I think this would boost Dragon to a new level. Since Microsoft bought Nuance and Microsoft has tight collaboration with OpenAI they should do this.



-------------------------

Rune Vabø

KnowBrainer Speech Recognition » Dragon Speech Recognition » Whisper vs Dragon

Statistics
32472 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 3 users logged in.
The most users ever online was 12124 on 09/09/2020 at 04:59 AM.
There are currently 321 guests browsing this forum, which makes a total of 324 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2023 FuseTalk™ Inc. All rights reserved.