![]() |
KnowBrainer Speech Recognition | ![]() |
Topic Title: Whisper vs Dragon Topic Summary: Created On: 11/24/2022 03:21 AM Status: Post and Reply |
|
![]() |
![]() |
- marc_vie | - 11/24/2022 03:21 AM |
![]() |
![]() |
- Ag | - 11/25/2022 02:46 PM |
![]() |
![]() |
- Stephan Kuepper | - 11/28/2022 02:57 AM |
![]() |
![]() |
- rjwilmsi | - 01/30/2023 02:06 PM |
![]() |
![]() |
- Lunis Orcutt | - 01/30/2023 03:14 PM |
![]() |
![]() |
- Stephan Kuepper | - 01/31/2023 03:14 AM |
![]() |
![]() |
- george_pat | - 01/31/2023 10:13 AM |
![]() |
![]() |
- Matt_Chambers | - 02/08/2023 07:09 AM |
![]() |
![]() |
- ax | - 02/11/2023 07:03 PM |
![]() |
![]() |
- drrunev | - 02/12/2023 02:18 PM |
![]() |
|
Has anyone here had a chance to look at or trial this automatic speech recognition (ASR) system? Seems to hold much promise but I wonder how well it will stack up against Dragon in terms of accuracy and of course price. https://openai.com/blog/whisper/ |
|
|
|
![]() |
|
Isn't whisper open source?
------------------------- DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design. |
|
|
|
![]() |
|
Above all, it is an API, not a product.
------------------------- |
|
|
|
![]() |
|
I've been playing with whisper more or less since it was first released on github https://github.com/openai/whisper/. It's freely available on github and after installation and download of the models it runs entirely offline. The installation is easy if you're a Linux user used to fiddling with python/utilities from github, probably a bit challenging if you are new to that sort of set up. Though I haven't tested the Windows installation of whisper, maybe that is packaged up so easier. I use it in conjunction with a "whisper_mic" utility to get live dictation https://github.com/mallorbc/whisper_mic's |
|
|
|
![]() |
|
You really can't make a fair comparison until you compare it to DPI 15.61. For example, DPI 15 is less than half the size of Ver. 12 and notably more accurate ------------------------- Change "No" to "Know" w/KnowBrainer 2022 |
|
|
|
![]() |
|
Thank you rjwilmsi for your fair and unbiased evaluation of Whisper.
I have tested a few speech recognition engines that have been made available over the last year. Although for many recognition quality has been impressive, I have yet to find one that does a) Command and Control and b) allows for easy editing of the vocabulary. As long as these features are absent, Dragon is still the main player as far as dictation is concerned. IMHO other engines are better suited to contact center monitoring, live captioning, and other applications where Command and Control doesn't matter, and where prime accuracy isn't the first consideration. I also have a hunch that dictation doesn't monetize as well as other applications, or someone would have created an alternative to Dragon long ago. However, even the cloud-based transcription services that I've tested either integrate Dragon, or are a pain to use. Just my 2 eurocents, Stephan ------------------------- |
|
|
|
![]() |
|
rjwilmsi - where in Whisper can I find the "prompt" option where you can pre-feed it words likely to be in the audio or how do I add a custom list of words, please? I'm interested in doing more research on this.
|
|
|
|
![]() |
|
Interesting article on Whisper in the New Yorker: https://www.newyorker.com/tech/annals-of-technology/whispers-of-ais-modular-future |
|
|
|
![]() |
|
^^^ Nice write-up. Interesting to read. Thanks for sharing.
|
|
|
|
![]() |
|
I have been using Dragon for several years. Version 15. Difficult to compare Dragon and whisper because Dragon is a full product.
Dragon is superior to all other speech recognition systems. But when it comes to the speech recognition engine of Whisper I was blown away. Using it on my mobile and it is just incredibly accurate. Close to unbelievable compared to Google voice typing or other speech solutions on mobile. I am using it all the time to make notes if I get an idea, dictating outside when hiking. I found it extremely useful. You always get full sentences so I suspect it might be using Transformer architecture like GPT -3 predicting what you are going to say informed by your speech. Sometimes there are sentences at the end which I did not say. Since 2020 I have been thinking that Nuance should integrate large language model technology into their product. Given what you have been writing so far, it should predict the upcoming text (next most probably text token) highly informed by the audio(what you are actually saying). I think this would boost Dragon to a new level. Since Microsoft bought Nuance and Microsoft has tight collaboration with OpenAI they should do this. ------------------------- Rune Vabø |
|
|
FuseTalk Standard Edition v4.0 - © 1999-2023 FuseTalk™ Inc. All rights reserved.