![]() |
KnowBrainer Speech Recognition | ![]() |
Topic Title: Whisper vs Dragon Topic Summary: Created On: 11/24/2022 03:21 AM Status: Post and Reply |
|
![]() |
![]() |
- marc_vie | - 11/24/2022 03:21 AM |
![]() |
![]() |
- Ag | - 11/25/2022 02:46 PM |
![]() |
![]() |
- Stephan Kuepper | - 11/28/2022 02:57 AM |
![]() |
![]() |
- rjwilmsi | - 01/30/2023 02:06 PM |
![]() |
![]() |
- Lunis Orcutt | - 01/30/2023 03:14 PM |
![]() |
![]() |
- Stephan Kuepper | - 01/31/2023 03:14 AM |
![]() |
![]() |
- george_pat | - 01/31/2023 10:13 AM |
![]() |
![]() |
- Matt_Chambers | - 02/08/2023 07:09 AM |
![]() |
![]() |
- ax | - 02/11/2023 07:03 PM |
![]() |
![]() |
- drrunev | - 02/12/2023 02:18 PM |
![]() |
![]() |
- Stephan Kuepper | - 04/14/2023 04:26 AM |
![]() |
![]() |
- Matt_Chambers | - 04/14/2023 07:56 AM |
![]() |
![]() |
- Stephan Kuepper | - 04/14/2023 08:41 AM |
![]() |
![]() |
- Matt_Chambers | - 04/14/2023 10:29 AM |
![]() |
![]() |
- Matt_Chambers | - 04/14/2023 06:07 PM |
![]() |
![]() |
- MDH | - 04/14/2023 09:21 PM |
![]() |
![]() |
- Lunis Orcutt | - 04/14/2023 08:35 PM |
![]() |
![]() |
- WriterGuy | - 04/18/2023 08:47 PM |
![]() |
![]() |
- Stephan Kuepper | - 04/19/2023 04:40 AM |
![]() |
![]() |
- WriterGuy | - 04/19/2023 08:49 AM |
![]() |
![]() |
- Stephan Kuepper | - 04/19/2023 09:20 AM |
![]() |
![]() |
- Matt_Chambers | - 05/18/2023 02:44 PM |
![]() |
![]() |
- Elvis143BRA | - 05/20/2023 01:05 PM |
![]() |
![]() |
- WriterGuy | - 05/23/2023 08:34 AM |
![]() |
![]() |
- michaelbeijer | - 12/10/2023 11:05 AM |
![]() |
![]() |
- michaelbeijer | - 12/10/2023 11:11 AM |
![]() |
|
Has anyone here had a chance to look at or trial this automatic speech recognition (ASR) system? Seems to hold much promise but I wonder how well it will stack up against Dragon in terms of accuracy and of course price. https://openai.com/blog/whisper/ |
|
|
|
![]() |
|
Isn't whisper open source?
------------------------- DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design. |
|
|
|
![]() |
|
Above all, it is an API, not a product.
------------------------- |
|
|
|
![]() |
|
I've been playing with whisper more or less since it was first released on github https://github.com/openai/whisper/. It's freely available on github and after installation and download of the models it runs entirely offline. The installation is easy if you're a Linux user used to fiddling with python/utilities from github, probably a bit challenging if you are new to that sort of set up. Though I haven't tested the Windows installation of whisper, maybe that is packaged up so easier. I use it in conjunction with a "whisper_mic" utility to get live dictation https://github.com/mallorbc/whisper_mic's |
|
|
|
![]() |
|
You really can't make a fair comparison until you compare it to DPI 15.61. For example, DPI 15 is less than half the size of Ver. 12 and notably more accurate ------------------------- Change "No" to "Know" w/KnowBrainer 2022 |
|
|
|
![]() |
|
Thank you rjwilmsi for your fair and unbiased evaluation of Whisper.
I have tested a few speech recognition engines that have been made available over the last year. Although for many recognition quality has been impressive, I have yet to find one that does a) Command and Control and b) allows for easy editing of the vocabulary. As long as these features are absent, Dragon is still the main player as far as dictation is concerned. IMHO other engines are better suited to contact center monitoring, live captioning, and other applications where Command and Control doesn't matter, and where prime accuracy isn't the first consideration. I also have a hunch that dictation doesn't monetize as well as other applications, or someone would have created an alternative to Dragon long ago. However, even the cloud-based transcription services that I've tested either integrate Dragon, or are a pain to use. Just my 2 eurocents, Stephan ------------------------- |
|
|
|
![]() |
|
rjwilmsi - where in Whisper can I find the "prompt" option where you can pre-feed it words likely to be in the audio or how do I add a custom list of words, please? I'm interested in doing more research on this.
|
|
|
|
![]() |
|
Interesting article on Whisper in the New Yorker: https://www.newyorker.com/tech/annals-of-technology/whispers-of-ais-modular-future |
|
|
|
![]() |
|
^^^ Nice write-up. Interesting to read. Thanks for sharing.
|
|
|
|
![]() |
|
I have been using Dragon for several years. Version 15. Difficult to compare Dragon and whisper because Dragon is a full product.
Dragon is superior to all other speech recognition systems. But when it comes to the speech recognition engine of Whisper I was blown away. Using it on my mobile and it is just incredibly accurate. Close to unbelievable compared to Google voice typing or other speech solutions on mobile. I am using it all the time to make notes if I get an idea, dictating outside when hiking. I found it extremely useful. You always get full sentences so I suspect it might be using Transformer architecture like GPT -3 predicting what you are going to say informed by your speech. Sometimes there are sentences at the end which I did not say. Since 2020 I have been thinking that Nuance should integrate large language model technology into their product. Given what you have been writing so far, it should predict the upcoming text (next most probably text token) highly informed by the audio(what you are actually saying). I think this would boost Dragon to a new level. Since Microsoft bought Nuance and Microsoft has tight collaboration with OpenAI they should do this. ------------------------- Rune Vabø |
|
|
|
![]() |
|
I was alerted to this app:
https://apps.apple.com/us/app/whisperboard/id1661442906
Whisperboard can do offline recognition on your iPhone / iPad (iOS 16 needed). The quality is astonishing. Unfortunately, the app is still quite in beta: the medium and large models make the app crash every time, at least on my phone (admittedly not the latest model). I'd love to see what the larger models can do. ------------------------- |
|
|
|
![]() |
|
https://apps.apple.com/us/app/whisperboard/id1661442906
Whisperboard can do offline recognition on your iPhone / iPad (iOS 16 needed). The quality is astonishing. Unfortunately, the app is still quite in beta: the medium and large models make the app crash every time, at least on my phone (admittedly not the latest model). I'd love to see what the larger models can do.
Thanks for posting about this. The app seems to run without crashing on my not very new iPhone 11 accuracy, however, is somewhat disappointing. Also, how do you do punctuation? |
|
|
|
![]() |
|
Try a different language model for better accuracy. The default model seems to be "tiny", "small" already gives much better results.
Punctuation is inserted automatically. Things like "open paren", "colon" etc. seem to be an issue, though, as is "new paragraph". But maybe I'm just spoiled by Dragon. Obviously, the target is not dictation, but transcription of recordings. They are doing amazingly well on subtitles and interview transcriptions, from what I can find on Youtube. ------------------------- |
|
|
|
![]() |
|
Yes, I realized this after I posted, and have now downloaded the medium model. Will post about results when I have experimented some more. |
|
|
|
![]() |
|
I have now tried WhisperBoard with the medium speech model. The accuracy is greatly improved. Punctuation is still an issue, of course since, as you say, this app is really intended to transcribe recordings.
|
|
|
|
![]() |
|
I tried it this am on my iphone Xr, IOS version 16.3.1 using the "small"version, as opposed to tiny or basic. (First I tried the "tiny" which was awful.) The "small" version was only minimally better for me than the "tiny" version, so I deleted the app.
MDH ------------------------- |
|
|
|
![]() |
|
Just for fun, I tried WhisperBoard, on my iPhone, with the large speech model. It took 44 seconds to transcribe 100 words. I dictated the commas but let WhisperBoard add the periods. I ran 2 rapid fire dictation tests and you probably know that most of my sentences are exceedingly long. I should point out that Dragon wouldn't have a prayer with autopunctuation but WhisperBoard scored 100% accuracy, perfect punctuation and it's FREE. Of course you can not edit the vocabulary so "KnowBrainer" is always going to be "no-brainer" but I was surprised how well it handled abbreviations like VLA. I couldn't get Dragon to recognize VLA until I removed BLA and some other abbreviation from my personal vocabulary. WhisperBoard even capitalized Dragon and Nuance. I couldn't make that work in Dragon until I removed the lowercase versions. Not bad for free and works on my iPad ------------------------- Change "No" to "Know" w/KnowBrainer 2022 |
|
|
|
![]() |
|
I just transcribed a 35-minute interview featuring 3 voices (mine and two interviewees, one of whom had a German accent) using the Small English-specific version of the model in WhisperBoard on my iPhone SE. It was nearly flawless. I wouldn't use this to dictate the articles I write, but it could be a real game-changer in terms of transcription. I've tried some of the online automated transcription services, but none were accurate enough for my work. WhisperBoard shocked me. I pay a fair amount of money to have real human beings transcribe my interviews for me, and turnaround times can be a problem; when I need something fast (and cheap), I do it myself. WhisperBoard did this much faster than I could have, and nearly as well.
------------------------- DPI 16/Windows 11/Parallels Desktop 18 on a MacBook Air M2 (24GB RAM) |
|
|
|
![]() |
|
Yes, as far as I can see transcription is the real strength of Whisper as opposed to dictation. Call me spoiled but a dictation solution without the option to add words on the fly is useless at worst, and a stopgap at best (think Siri for dictation of text messages).
The thing with Whisper is that you have to take the language models as they are, with no mechanisms to tweak them - which is probably just as well, considering the amount of time and text that went into them. As a developer, I wouldn't want any old user to mess around with my model, either. ------------------------- |
|
|
|
![]() |
|
Yeah, I can't really imagine using Whisper for dictation without the ability to edit. But like Rune, I wish Nuance would integrate this kind of model into Dragon; I bet it would supercharge dictation accuracy. And thank you for letting us all know about WhisperBoard, Stephan! I intend to use it for transcription purposes whenever possible.
------------------------- DPI 16/Windows 11/Parallels Desktop 18 on a MacBook Air M2 (24GB RAM) |
|
|
|
![]() |
|
Absolutely. Punctuation should be the first thing to go. Dragon can do it when transcribing interviews, so why not during dictation? ------------------------- www.egs-vertrieb.de - www.spracherkennungscloud.de |
|
|
|
![]() |
|
|
|
![]() |
|
Whisper is pretty good, but on a PC, there isn't a UI that works like Dragon and you have to know your way around Python and CMD to try some of the repos on Github. It's not getting a lot of attention so it isn't a substitute for Dragon yet. I wish it was because it's so much better at speech recognition out of the box.
|
|
|
|
![]() |
|
I'm using WhisperBoard, an app for my iPhone, that has a nice interface. Transcriptions using even the small model are as accurate as most human ones (I do a lot of interviews with scientists, and the app is shockingly good at transcribing technical language, presumably because it was trained on such a large corpus) and much, much faster (usually less than a minute of processing time for every minute of audio). My use of human transcription has fallen precipitously.
------------------------- DPI 16/Windows 11/Parallels Desktop 18 on a MacBook Air M2 (24GB RAM) |
|
|
|
![]() |
|
Just came across a GUI, but only for audio/video files, no on the fly dictation: https://grisk.itch.io/whisper-gui
------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|
|
|
![]() |
|
It would be amazing if someone would take Whisper and wrap a GUI around it that would allow on-the-fly dictation as well as commands. For as long as I can remember I have been fighting with Dragon, trying to get it to work in my main translation program (memoQ), but everything always ends up slowing down and just getting really annoying until I usually kill Dragon and just get on with my work without dictation. Every time a new version comes out I pay the money and get it and it's always the same story: everything works amazingly well for about 10 to 15 minutes (extremely good quality dictation, all kinds of cool commands work really well, I even have Select-and-Say functionality in memoQ now) ... and then my computer turns to molasses, and I close Knowbrainer and Dragon in frustration once again.
I would pay good money for a dictation and command solution that ACTUALLY WORKED in both memoQ and on my computer in general. ------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|
|
FuseTalk Standard Edition v4.0 - © 1999-2023 FuseTalk™ Inc. All rights reserved.