KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: Whisper vs Dragon
Topic Summary:
Created On: 11/24/2022 03:21 AM
Status: Post and Reply
Linear : Threading : Single : Branch
 Whisper vs Dragon   - marc_vie - 11/24/2022 03:21 AM  
 Whisper vs Dragon   - Ag - 11/25/2022 02:46 PM  
 Whisper vs Dragon   - Stephan Kuepper - 11/28/2022 02:57 AM  
 Whisper vs Dragon   - rjwilmsi - 01/30/2023 02:06 PM  
 Whisper vs Dragon   - Lunis Orcutt - 01/30/2023 03:14 PM  
 Whisper vs Dragon   - Stephan Kuepper - 01/31/2023 03:14 AM  
 Whisper vs Dragon   - george_pat - 01/31/2023 10:13 AM  
 Whisper vs Dragon   - Matt_Chambers - 02/08/2023 07:09 AM  
 Whisper vs Dragon   - ax - 02/11/2023 07:03 PM  
 Whisper vs Dragon   - drrunev - 02/12/2023 02:18 PM  
 Whisper vs Dragon   - Stephan Kuepper - 04/14/2023 04:26 AM  
 Whisper vs Dragon   - Matt_Chambers - 04/14/2023 07:56 AM  
 Whisper vs Dragon   - Stephan Kuepper - 04/14/2023 08:41 AM  
 Whisper vs Dragon   - Matt_Chambers - 04/14/2023 10:29 AM  
 Whisper vs Dragon   - Matt_Chambers - 04/14/2023 06:07 PM  
 Whisper vs Dragon   - MDH - 04/14/2023 09:21 PM  
 Whisper vs Dragon   - Lunis Orcutt - 04/14/2023 08:35 PM  
 Whisper vs Dragon   - WriterGuy - 04/18/2023 08:47 PM  
 Whisper vs Dragon   - Stephan Kuepper - 04/19/2023 04:40 AM  
 Whisper vs Dragon   - WriterGuy - 04/19/2023 08:49 AM  
 Whisper vs Dragon   - Stephan Kuepper - 04/19/2023 09:20 AM  
 Whisper vs Dragon   - Matt_Chambers - 05/18/2023 02:44 PM  
 Whisper vs Dragon   - Elvis143BRA - 05/20/2023 01:05 PM  
 Whisper vs Dragon   - WriterGuy - 05/23/2023 08:34 AM  
 Whisper vs Dragon   - michaelbeijer - 12/10/2023 11:05 AM  
 Whisper vs Dragon   - michaelbeijer - 12/10/2023 11:11 AM  
Keyword
 11/24/2022 03:21 AM
User is offline View Users Profile Print this message

Author Icon
marc_vie
Power Member

Posts: 54
Joined: 07/01/2014

Has anyone here had a chance to look at or trial this automatic speech recognition (ASR) system?  Seems to hold much promise but I wonder how well it will stack up against Dragon in terms of accuracy and of course price.

https://openai.com/blog/whisper/



 11/25/2022 02:46 PM
User is online View Users Profile Print this message

Author Icon
Ag
Top-Tier Member

Posts: 1207
Joined: 07/08/2019

Isn't whisper open source?

-------------------------

DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design.

 11/28/2022 02:57 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2460
Joined: 10/04/2006

Above all, it is an API, not a product.

-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de

 01/30/2023 02:06 PM
User is offline View Users Profile Print this message


rjwilmsi
Power Member

Posts: 77
Joined: 08/24/2008

I've been playing with whisper more or less since it was first released on github https://github.com/openai/whisper/. It's freely available on github and after installation and download of the models it runs entirely offline. The installation is easy if you're a Linux user used to fiddling with python/utilities from github, probably a bit challenging if you are new to that sort of set up. Though I haven't tested the Windows installation of whisper, maybe that is packaged up so easier. I use it in conjunction with a "whisper_mic" utility to get live dictation https://github.com/mallorbc/whisper_mic's

Compared to the last versions of DNS that I was using regularly (DNS 12 and DNS 13), for general speech and dictation the accuracy of whisper is much better. I don't know if DNS 15 is much more accurate than 12 or 13, if it's just incrementally better then I'd say whisper would be clearly better. Whisper has 5 different models trading time versus accuracy, on the fastest two models its error rate is much lower than I got with DNS (for live dictation of general speech in English). On the slower models (which are too slow on CPU for real time dictation) the accuracy on everything I've played with (youtube video audio etc.) has been so good that differences versus a transcript I'd do myself are nearly all differences about punctuation (how do you spit sentences etc. from a speaker speaking off the cuff).

Because whisper is done by machine learning on big datasets of various audio sources it supports multiple languages, accents etc. so there is no concept of training it for your voice or specifying your accent. It means that it is very good at all sorts of accents and you don't have to cultivate your own custom trained profile etc. Also there is no big focus on your mic quality like with DNS.

The downsides of whisper is that it's much more resource demanding than DNS 12 - on CPU need a modern 6+ core CPU and to use the tiny or base model (fastest two). Otherwise for the larger 3 models need a workstation CPU (16 threads etc.) or better a reasonable NVIDIA graphics card to run in CUDA mode (e.g. a GTX 1060 or better), and unless you have a high end GPU (RTX etc.) then transcription would still be worse than real time on those larger models. Also there is no "training" you can do or custom vocabulary, so if you have specific terms that it hasn't got in its model it may not work so well there - though there is a "prompt" option where you can pre-feed it words likely to be in the audio (so sort-of custom vocab option). And of course its core is just speech recognition so it doesn't of itself provide any computer automation / macro functionality.

So I think if you use DNS for general dictation / transcription and find DNS's accuracy lacking then whisper is very much worth looking at. If you use DNS tied into the environment of automation, macros and specific software integration then whisper doesn't cover that.



 01/30/2023 03:14 PM
User is offline View Users Profile Print this message

Author Icon
Lunis Orcutt
Top-Tier Member

Posts: 40984
Joined: 10/01/2006

You really can't make a fair comparison until you compare it to DPI 15.61. For example, DPI 15 is less than half the size of Ver. 12 and notably more accurate



-------------------------

Change "No" to "Know" w/KnowBrainer 2022
Trial Downloads
Dragon/Sales@KnowBrainer.com 
(615) 884-4558 ex 1

 01/31/2023 03:14 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2460
Joined: 10/04/2006

Thank you rjwilmsi for your fair and unbiased evaluation of Whisper.

I have tested a few speech recognition engines that have been made available over the last year. Although for many recognition quality has been impressive, I have yet to find one that does a) Command and Control and b) allows for easy editing of the vocabulary.

As long as these features are absent, Dragon is still the main player as far as dictation is concerned. IMHO other engines are better suited to contact center monitoring, live captioning, and other applications where Command and Control doesn't matter, and where prime accuracy isn't the first consideration. I also have a hunch that dictation doesn't monetize as well as other applications, or someone would have created an alternative to Dragon long ago. However, even the cloud-based transcription services that I've tested either integrate Dragon, or are a pain to use.

Just my 2 eurocents, Stephan

-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de

 01/31/2023 10:13 AM
User is offline View Users Profile Print this message

Author Icon
george_pat
New Member

Posts: 3
Joined: 10/22/2021

rjwilmsi - where in Whisper can I find the "prompt" option where you can pre-feed it words likely to be in the audio or how do I add a custom list of words, please? I'm interested in doing more research on this.
 02/08/2023 07:09 AM
User is offline View Users Profile Print this message

Author Icon
Matt_Chambers
Top-Tier Member

Posts: 839
Joined: 08/09/2018

 02/11/2023 07:03 PM
User is offline View Users Profile Print this message

Author Icon
ax
Top-Tier Member

Posts: 792
Joined: 03/22/2012

^^^ Nice write-up. Interesting to read. Thanks for sharing.

I regret having bought my "thin-and-light" ThinkPad Carbon in late 2021. I should've exercised better sense to acquire a proper "workstation replacement" laptop with dedicated GPU ...

Not sure if anyone is going to "package up" this Whisper gig to make it conducive to install/run on Windows for "everyday enthusiasts" (whatever that means ... ChatGPT would know, which I have yet to sign up).

 02/12/2023 02:18 PM
User is offline View Users Profile Print this message

Author Icon
drrunev
Junior Member

Posts: 40
Joined: 01/23/2015

I have been using Dragon for several years. Version 15. Difficult to compare Dragon and whisper because Dragon is a full product.
Dragon is superior to all other speech recognition systems. But when it comes to the speech recognition engine of Whisper I was blown away.
Using it on my mobile and it is just incredibly accurate. Close to unbelievable compared to Google voice typing or other speech solutions on mobile.
I am using it all the time to make notes if I get an idea, dictating outside when hiking. I found it extremely useful. You always get full sentences so I suspect it might be using Transformer architecture like GPT -3 predicting what you are going to say informed by your speech. Sometimes there are sentences at the end which I did not say.
Since 2020 I have been thinking that Nuance should integrate large language model technology into their product. Given what you have been writing so far, it should predict the upcoming text (next most probably text token) highly informed by the audio(what you are actually saying).
I think this would boost Dragon to a new level. Since Microsoft bought Nuance and Microsoft has tight collaboration with OpenAI they should do this.



-------------------------

Rune Vabø

 04/14/2023 04:26 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2460
Joined: 10/04/2006

I was alerted to this app:

 

https://apps.apple.com/us/app/whisperboard/id1661442906

 

Whisperboard can do offline recognition on your iPhone / iPad (iOS 16 needed). The quality is astonishing. Unfortunately, the app is still quite in beta: the medium and large models make the app crash every time, at least on my phone (admittedly not the latest model). I'd love to see what the larger models can do.



-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de

 04/14/2023 07:56 AM
User is offline View Users Profile Print this message

Author Icon
Matt_Chambers
Top-Tier Member

Posts: 839
Joined: 08/09/2018

Originally posted by: Stephan Kuepper I was alerted to this app:

 

 https://apps.apple.com/us/app/whisperboard/id1661442906

 

 

Whisperboard can do offline recognition on your iPhone / iPad (iOS 16 needed). The quality is astonishing. Unfortunately, the app is still quite in beta: the medium and large models make the app crash every time, at least on my phone (admittedly not the latest model). I'd love to see what the larger models can do.

 

Thanks for posting about this. The app seems to run without crashing on my not very new iPhone 11 accuracy, however, is somewhat disappointing. Also, how do you do punctuation?

 04/14/2023 08:41 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2460
Joined: 10/04/2006

Try a different language model for better accuracy. The default model seems to be "tiny", "small" already gives much better results.

Punctuation is inserted automatically. Things like "open paren", "colon" etc. seem to be an issue, though, as is "new paragraph". But maybe I'm just spoiled by Dragon.

Obviously, the target is not dictation, but transcription of recordings. They are doing amazingly well on subtitles and interview transcriptions, from what I can find on Youtube.

-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de

 04/14/2023 10:29 AM
User is offline View Users Profile Print this message

Author Icon
Matt_Chambers
Top-Tier Member

Posts: 839
Joined: 08/09/2018

Yes, I realized this after I posted, and have now downloaded the medium model.  Will post about results when I have experimented some more.

 04/14/2023 06:07 PM
User is offline View Users Profile Print this message

Author Icon
Matt_Chambers
Top-Tier Member

Posts: 839
Joined: 08/09/2018

I have now tried WhisperBoard with the medium speech model. The accuracy is greatly improved. Punctuation is still an issue, of course since, as you say, this app is really intended to transcribe recordings.
 04/14/2023 09:21 PM
User is offline View Users Profile Print this message

Author Icon
MDH
Top-Tier Member

Posts: 2339
Joined: 04/02/2008

I tried it this am on my iphone Xr, IOS version 16.3.1 using the "small"version, as opposed to tiny or basic. (First I tried the "tiny" which was awful.) The "small" version was only minimally better for me than the "tiny" version, so I deleted the app.

 

MDH



-------------------------
 04/14/2023 08:35 PM
User is offline View Users Profile Print this message

Author Icon
Lunis Orcutt
Top-Tier Member

Posts: 40984
Joined: 10/01/2006

Just for fun, I tried WhisperBoard, on my iPhone, with the large speech model. It took 44 seconds to transcribe 100 words. I dictated the commas but let WhisperBoard add the periods. I ran 2 rapid fire dictation tests and you probably know that most of my sentences are exceedingly long. I should point out that Dragon wouldn't have a prayer with autopunctuation but WhisperBoard scored 100% accuracy, perfect punctuation and it's FREE. Of course you can not edit the vocabulary so "KnowBrainer" is always going to be "no-brainer" but I was surprised how well it handled abbreviations like VLA. I couldn't get Dragon to recognize VLA until I removed BLA and some other abbreviation from my personal vocabulary. WhisperBoard even capitalized Dragon and Nuance. I couldn't make that work in Dragon until I removed the lowercase versions. Not bad for free and works on my iPad



-------------------------

Change "No" to "Know" w/KnowBrainer 2022
Trial Downloads
Dragon/Sales@KnowBrainer.com 
(615) 884-4558 ex 1



 04/18/2023 08:47 PM
User is offline View Users Profile Print this message


WriterGuy
Power Member

Posts: 53
Joined: 10/27/2010

I just transcribed a 35-minute interview featuring 3 voices (mine and two interviewees, one of whom had a German accent) using the Small English-specific version of the model in WhisperBoard on my iPhone SE. It was nearly flawless. I wouldn't use this to dictate the articles I write, but it could be a real game-changer in terms of transcription. I've tried some of the online automated transcription services, but none were accurate enough for my work. WhisperBoard shocked me. I pay a fair amount of money to have real human beings transcribe my interviews for me, and turnaround times can be a problem; when I need something fast (and cheap), I do it myself. WhisperBoard did this much faster than I could have, and nearly as well.

-------------------------

DPI 16/Windows 11/Parallels Desktop 18 on a MacBook Air M2 (24GB RAM) 

 04/19/2023 04:40 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2460
Joined: 10/04/2006

Yes, as far as I can see transcription is the real strength of Whisper as opposed to dictation. Call me spoiled but a dictation solution without the option to add words on the fly is useless at worst, and a stopgap at best (think Siri for dictation of text messages).

The thing with Whisper is that you have to take the language models as they are, with no mechanisms to tweak them - which is probably just as well, considering the amount of time and text that went into them. As a developer, I wouldn't want any old user to mess around with my model, either.

-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de

 04/19/2023 08:49 AM
User is offline View Users Profile Print this message


WriterGuy
Power Member

Posts: 53
Joined: 10/27/2010

Yeah, I can't really imagine using Whisper for dictation without the ability to edit. But like Rune, I wish Nuance would integrate this kind of model into Dragon; I bet it would supercharge dictation accuracy. And thank you for letting us all know about WhisperBoard, Stephan! I intend to use it for transcription purposes whenever possible.

-------------------------

DPI 16/Windows 11/Parallels Desktop 18 on a MacBook Air M2 (24GB RAM) 

 04/19/2023 09:20 AM
User is offline View Users Profile Print this message

Author Icon
Stephan Kuepper
Top-Tier Member

Posts: 2460
Joined: 10/04/2006

Absolutely. Punctuation should be the first thing to go. Dragon can do it when transcribing interviews, so why not during dictation?



-------------------------

www.egs-vertrieb.de - www.spracherkennungscloud.de



 05/18/2023 02:44 PM
User is offline View Users Profile Print this message

Author Icon
Matt_Chambers
Top-Tier Member

Posts: 839
Joined: 08/09/2018

 05/20/2023 01:05 PM
User is offline View Users Profile Print this message

Author Icon
Elvis143BRA
New Member

Posts: 6
Joined: 09/20/2021

Whisper is pretty good, but on a PC, there isn't a UI that works like Dragon and you have to know your way around Python and CMD to try some of the repos on Github. It's not getting a lot of attention so it isn't a substitute for Dragon yet. I wish it was because it's so much better at speech recognition out of the box.
 05/23/2023 08:34 AM
User is offline View Users Profile Print this message


WriterGuy
Power Member

Posts: 53
Joined: 10/27/2010

I'm using WhisperBoard, an app for my iPhone, that has a nice interface. Transcriptions using even the small model are as accurate as most human ones (I do a lot of interviews with scientists, and the app is shockingly good at transcribing technical language, presumably because it was trained on such a large corpus) and much, much faster (usually less than a minute of processing time for every minute of audio). My use of human transcription has fallen precipitously.

-------------------------

DPI 16/Windows 11/Parallels Desktop 18 on a MacBook Air M2 (24GB RAM) 

 12/10/2023 11:05 AM
User is offline View Users Profile Print this message

Author Icon
michaelbeijer
Top-Tier Member

Posts: 280
Joined: 12/07/2014

Just came across a GUI, but only for audio/video files, no on the fly dictation: https://grisk.itch.io/whisper-gui

-------------------------

Dragon Professional 16 + Speech Productivity + KnowBrainer
Win 11 – 64-bit, i9, 64GB RAM
Logitech webcam mic 


 


 

 12/10/2023 11:11 AM
User is offline View Users Profile Print this message

Author Icon
michaelbeijer
Top-Tier Member

Posts: 280
Joined: 12/07/2014

It would be amazing if someone would take Whisper and wrap a GUI around it that would allow on-the-fly dictation as well as commands. For as long as I can remember I have been fighting with Dragon, trying to get it to work in my main translation program (memoQ), but everything always ends up slowing down and just getting really annoying until I usually kill Dragon and just get on with my work without dictation. Every time a new version comes out I pay the money and get it and it's always the same story: everything works amazingly well for about 10 to 15 minutes (extremely good quality dictation, all kinds of cool commands work really well, I even have Select-and-Say functionality in memoQ now) ... and then my computer turns to molasses, and I close Knowbrainer and Dragon in frustration once again.

I would pay good money for a dictation and command solution that ACTUALLY WORKED in both memoQ and on my computer in general.

-------------------------

Dragon Professional 16 + Speech Productivity + KnowBrainer
Win 11 – 64-bit, i9, 64GB RAM
Logitech webcam mic 


 


 

KnowBrainer Speech Recognition » Dragon Speech Recognition » Whisper vs Dragon

Statistics
32634 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 2 users logged in.
The most users ever online was 12124 on 09/09/2020 at 04:59 AM.
There are currently 260 guests browsing this forum, which makes a total of 262 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2023 FuseTalk™ Inc. All rights reserved.