![]() |
KnowBrainer Speech Recognition | ![]() |
Topic Title: VoiceMacro Topic Summary: Let's dig Created On: 07/31/2021 04:39 PM Status: Post and Reply |
|
![]() |
![]() |
- ax | - 07/31/2021 04:39 PM |
![]() |
![]() |
- ax | - 07/31/2021 04:56 PM |
![]() |
![]() |
- ax | - 07/31/2021 05:27 PM |
![]() |
![]() |
- ax | - 07/31/2021 05:44 PM |
![]() |
![]() |
- ax | - 07/31/2021 05:59 PM |
![]() |
![]() |
- ax | - 07/31/2021 06:21 PM |
![]() |
![]() |
- ax | - 07/31/2021 06:40 PM |
![]() |
![]() |
- R. Wilke | - 08/01/2021 06:27 AM |
![]() |
![]() |
- ax | - 08/01/2021 03:09 PM |
![]() |
![]() |
- kkkwj | - 08/01/2021 04:43 PM |
![]() |
![]() |
- ax | - 08/01/2021 05:16 PM |
![]() |
![]() |
- ax | - 12/30/2021 04:21 PM |
![]() |
![]() |
- Ag | - 01/13/2022 05:56 PM |
![]() |
![]() |
- ax | - 01/15/2022 10:25 PM |
![]() |
![]() |
- michaelbeijer | - 01/17/2022 09:34 AM |
![]() |
![]() |
- ax | - 01/17/2022 04:24 PM |
![]() |
![]() |
- michaelbeijer | - 05/06/2022 01:18 PM |
![]() |
![]() |
- ax | - 05/06/2022 06:20 PM |
![]() |
![]() |
- michaelbeijer | - 06/02/2022 04:00 PM |
![]() |
![]() |
- ax | - 06/02/2022 04:59 PM |
![]() |
![]() |
- michaelbeijer | - 06/02/2022 05:51 PM |
![]() |
![]() |
- ax | - 06/02/2022 06:05 PM |
![]() |
![]() |
- michaelbeijer | - 06/02/2022 06:25 PM |
![]() |
![]() |
- michaelbeijer | - 06/02/2022 06:01 PM |
![]() |
![]() |
- kkkwj | - 06/13/2022 05:17 PM |
![]() |
![]() |
- dilligence | - 06/13/2022 08:36 PM |
![]() |
![]() |
- ax | - 09/15/2022 11:31 PM |
![]() |
![]() |
- kkkwj | - 09/16/2022 11:21 AM |
![]() |
![]() |
- PG LTU | - 09/27/2022 10:47 AM |
![]() |
|||||
First off, let's get "smooth scroll" out of the way.
Setting a variable "loop_large_scroll" is probably unnecessary. But doing so lends to "on-the-fly" change of scrolling distance. Again not so useful as you can use a voice command "stop scrolling" to abort all running macros, thereby stopping scrolling.
One can add a small pause in front of the mouse scroll increment of +/- 1 (smallest), and making it even slower than in this video.
A scroll step of 5 is plenty fast for me. You can always up that to your heart's content.
Two caveats about variables:
1. They are case sensitive.
2. They are "space-sensitive" when you declare them! I.e., "variable" and "variable " are two different variables!! Oh well.
Otherwise VM provides built-in basic conditional structures such as loop and if/else and toggle. The toggling and setting of "toggle state" (from a separate macro) is quite useful.
P.S., I moved the scrolling macro below to a different profile and changed "Abort all running macros" to "Abort all macros from this profile" so as to preserve my listening indicator OSD when I terminate scrolling. Alternatively, one can probably combine "IgnoreCommands" with "IgnoreExceptions" to use voice to stop scrolling.
|
|||||
|
|||||
![]() |
|||||
Next up, basic Recognizer Settings:
See this VM author comment on why 0% (or too low of a) dictionary weight is undesirable (I have changed mine consequently). Apparently this whole "weight" thing under WSR could be a wee bit "voodoo" ... thus requiring some trial and error.
I choose to uncheck the default 'Process "failed" recognition' - I'd rather it do nothing than doing the wrong thing.
|
|||||
|
|||||
![]() |
|||||
Carry on - NATO alphabet speller, which is an annoyingly weak link for my otherwise fit-for-purpose cloud Dragon in its designated browser sessions.
Here VM shines through.
Refer to VM Author's own explanation on this:
|
|||||
|
|||||
![]() |
|||||
Numeral Enunciation:
Prepending with words such as "numeral" increases recognition reliability given that single digits are single-syllable phrases.
No similar compact macros exist for pressing F1 - F12, however, unless one resorts to 12 If statements, which I don't see any advantage of. As far as I can figure out, one needs a separate macro for each F(unction) key press. It is doable to combine, say, F1, Shift-F1, and Alt-F1 into one macro through string manipulation of RecCommand and 1 or 2 conditional statements.
|
|||||
|
|||||
![]() |
|||||
VM Control:
In the odd times when native CapsLock function gets triggered, pressing "Ctrl-CapsLock" will toggle it off.
The "RunOtherMacro" just switches over to a different profile so I can run AHK programs with different, machine-dependent paths. This is for portability purposes. Otherwise gratuitous.
My main work profile is named "Production". By relegating machine-dependent elements to a separate profile, I make sure I can sync my main "Production" profile by exporting and importing the XML file associated with it - among the different machines I use.
In the voice command "stop listening", one can use the thoughtfully implemented "Set Toggle State" to reset existing running toggles to an "off" state, so that keyboard toggle doesn't go out of sync. Finally, a voice prompt on whether the toggle is on or off quickly gets old. I find it much more useful to customize AHK 2-liners to pop an icon into the tray so I know when VM is listening or not.
(Channelling VM author: an alternative to inserting custom tray icon in order to indicate a listening state is to take advantage of the built-in OSD functionality - which I am now using instead, and I have revised the screenshots accordingly).
Of course, this improvization of a tray indicator icon could be avoided if VM's own icon would differentiate into a bright colour when listening is activated. That could be a "feature request" I suppose.
Finally, see VM Author's own pro tip on how to control VM listening and heeding commands through "wake-up words".
P.S., VoiceMacro comes preloaded with a slew of demo macros in its "Demo" profile. A fast way of learning the ropes is simply by modifying them. Some of them have "Only when target window active" checked. Uncheck that when modifying and executing macros intended for any active window/process. |
|||||
|
|||||
![]() |
|||||
Now a few generic comments from the VM Author regarding SAPI and that sort of thing:
SAPI compatibility (also see RW's clarification in post following) Dragon will always be better at "free dictation" (which we take for granted) Author's comment on "pseudo-list commands" (my paraphrase)
Moreover, combining group affixes with profile switches / window targeting will likely deliver well-organized application-specific command deployment. I am speculating here as I haven't tested out the possibilities myself.
Just going by the few examples I outlined above, it is easy to conclude that VoiceMacro is CAPABLE. It goes without saying that VM also has the basic mouse control and coordinate focusing capabilities built-in.
In fact, even sans the voice component, VM's keyboard hotkey implementation aspect alone can probably give something like Macro Express a run for its money.
Anyway, I hope by scratching the surface, it helps someone.
|
|||||
|
|||||
![]() |
|||||
|
|||||
![]() |
|||||
Thanks RW for clarifying dependencies under the hood.
Moreover, author welcomes bug/crash reports through email/forum or preferably VM's built-in crash reporter.
Lastly, the latest builds are usually more stable, incorporating more bug fixes (they are not mere "betas"). |
|||||
|
|||||
![]() |
|||||
Nice thread, ax! It must have taken a lot of work to create it so that others could learn about VM. Thank you.
------------------------- Win10/11/x64, AMD Ryzen 7 3700X/3950X, 64/128GB RAM, Dragon 15.3, SP 7 Standard, SpeechStart, Office 365, KB 2017, Dragon Capture, Samson Meteor USB Desk Mic, Amazon YUWAKAYI headset, Klim and JUKSTG earbuds with microphones, excellent Sareville Wireless Mono Headset, 3 BenQ 2560x1440 monitors, Microsoft Sculpt Keyboard and Logitech G502 awesome gaming mouse. |
|||||
|
|||||
![]() |
|||||
Glad you appreciate my effort to introduce this, Kevin! |
|||||
|
|||||
![]() |
|||||
See this VM author comment on why 0% (or too low of a) dictionary weight is undesirable (I have changed mine consequently). Apparently this whole "weight" thing under WSR could be a wee bit "voodoo" ... thus requiring some trial and error.
Recognition threshold: 85 - 90% is suitable for me as I prefer specificity over sensitivity.
I choose to uncheck the default 'Process "failed" recognition' - I'd rather it do nothing than doing the wrong thing.
A field update after months of production use:
Despite what I wrote above with respect to a recommended "non-zero low Dictionary weight", in my case, the 0% "Dictionary weight" specificity still works out the best, in conjunction with 85 to 90% of "Recognition threshold" sensitivity.
NB: if the default Windows Speech Recognition Profile, usually named after the current login, is the only one in use, then you can't delete it until you start up a second one, which could be named "temp". After a "temp" profile is created and your default profile deleted (needs to press "Apply" button for the change to stick), you can go on to re-create another profile, which Windows will default back to your login name.
P.S., the recommendation to use a "non-zero" value for Dictionary Weight by VoiceMacro author was most likely based on user experience such as described here (partly in Deutsch - nothing Google Translate couldn't handle with aplomb). But that user's mother was quite debilitated and didn't have any "sovereign control" over noises in her environment. Nor could she be expected to know how to "reset" speech Recognition Profile in Windows" once in while as necessary. |
|||||
|
|||||
![]() |
|||||
I only just noticed, 6 months later, @ax taking my name in vain - or at least my chemical symbol "Ag". :-)
Pretending to be Ag (except minus that IEEE-certified engineering mindset) for a day, let's ask some elementary questions (from a "certifiable mindset", perhaps):
Q: If AR-15 represents the "pinnacle" of small-arms design, why would anyone still carry a Glock?
A: Seriously, how would one expect a dodo north of the 49th parallel to be able to answer that (and one who has never even beholden either up close)?
Hey, @ax, *some* Canadians were or are in the Army reserve.
Neither AR-15 nor Glock are on https://en.wikipedia.org/wiki/List_of_equipment_of_the_Canadian_Army.
I was sad when Canada switched to the C7 member of the M-16 family.
For my money, as an assault rifle the AK-47 is better suited to Canadian conditions, ranging from Arctic to muskeg to boreal forest. If you want distance, accuracy, and the ability to stop polar/grizzly/pizzly bears or moose, the good old FN FAL (Canadian C1). Not as good as a real hunting rifle, but better than the AK-47 and much better than the AR-15.
:-)
------------------------- DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design. |
|||||
|
|||||
![]() |
|||||
I only just noticed, 6 months later, @ax taking my name in vain - or at least my chemical symbol "Ag". :-)
Pretending to be Ag (except minus that IEEE-certified engineering mindset) for a day, let's ask some elementary questions (from a "certifiable mindset", perhaps):
Q: If AR-15 represents the "pinnacle" of small-arms design, why would anyone still carry a Glock?
Excusez mon "certifiable mindset", for transgressing your sterling call sign, Capitaine! Number myself a fan of the Socratic style of discourse.
The only alphanumerically-monikered "equipment" that regretably became a requisite in my daily existence are stamped with "N95" ... not a huge enthusiast of this style of "nom de guerre".
At least "Glock" sounds colloquial ... not that I have anything else (knowledge or desire) to add to this subject. |
|||||
|
|||||
![]() |
|||||
Wow, great post! I have been using VoiceMacro extensively (and very happily) for the past few weeks in my actual work, having uninstalled Dragon again recently for the millionth time in disgust. I'm a technical/patent translator, and so use speech recognition mainly to control my translation software (memoQ), and to dictate the occasional bit of text. VM is much lighter/quicker on my computer and doesn't bring things to a crawl like Dragon invariably does. I haven't had much time to add commands, but here is what I currently have: (basically all the stuff I need to do when working: add selected terms to termbase, run concordance search, insert matches from termbases/translation memories, search in termbases, etc.) ------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|||||
|
|||||
![]() |
|||||
..., and to dictate the occasional bit of text.
Nice to hear you were able to dictate some prose with it. That's something I haven't tried myself. Here is hoping that the new and improved MS Speech Recognizer on the horizon will "kick it up a notch" in that regard. |
|||||
|
|||||
![]() |
|||||
Here's a quick test of me dictating some flowing text with VoiceMacro: https://www.youtube.com/watch?v=AE5Y1Pcu5o4 ------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|||||
|
|||||
![]() |
|||||
Your video definitely piqued my interest and indeed the output you showed was quite "not bad". |
|||||
|
|||||
![]() |
|||||
Hi Ax, No, my copy of Windows 11 doesn't have Voice Access yet. VoiceMacro is using "Microsoft Speech Recognizer 8.0 for Windows (English - UK)". I'm running: Windows 11 Pro Note that I am also using the microphone built into my Logitech ("logi") webcam; so I'm not using any fancy microphones. I am actually finding VoiceMacro insanely useful lately, and am using it in ALL my work, which usually consists of working in my main CAT tool (= translation software), where I use it to do all kinds of crazy things. I am also using it to do all my dictation with, in emails, etc. By, bye, big heavy cumbersome Dragon!
------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|||||
|
|||||
![]() |
|||||
Thanks for sharing the good news and your screenshots, Michael!
Did you then import or construct a dictionary set of your own custom vocabulary then? |
|||||
|
|||||
![]() |
|||||
No, I didn't do any kind of customization at all. In fact, I have no idea how to do it. I don't really understand Windows speech engines, since there seem to be various versions of it.
For example, if I press Win+H, this little thing pops up: Within, VoiceMacro, if I do:
Menu > Windows Speech Recogniser > Recognizer dictionary
and then: "Add new word"
I can train new words.
However, any training I do here doesn't seem to have any effect on the Win+H dictation route. It only works if I use the old-fashioned
WSR dictation thingee:
However, I have VoiceMacro set so I can say: "Wake up", and the Win+H dialogue will appear and I can dictate flowing text.
So, apparently – at least until the new Voice Access is finally released – Windows 11 has 2 different speech recognitions systems:
(1) a device-based speech recognition feature (the old WSR "Listening" dialogue) (2) cloud-based (online) speech recognition technologies (what appears when you press Win+H)
see e.g.: https://privacy.microsoft.com/en-gb/privacystatement#mainspeechinkingtypingmodule ------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|||||
|
|||||
![]() |
|||||
Now it makes sense, Michael. I sure as heck don't know the ins and outs of various MS recognizers. For that we would have to wait for tier 1 experts such as Lindsay to chime in.
But I am fairly certain that "Windows-H" brings up NOT the resident Recognizer 8.0, which underpins WSR and VoiceMacro, but it brings up "online dictation", which might just be Cortana's half-sister (or transgendered step-brother). No wonder it worked out so well for you ... because the prose dictation is handled by a semblance of cloud dragon, in a Redmond sheep's clothing.
Interesting and innovative workflow you got, nonetheless!
P.S., just as I was posting the above, I see that you came to the same conclusion. |
|||||
|
|||||
![]() |
|||||
Yes, my current system is a bit of a Frankenstein, but it actually works better than using Dragon + KnowBrainer/Vocola, which is what I used to use. Plus, its free. ------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|||||
|
|||||
![]() |
|||||
Okay, I think I am starting to figure it out: If a command isn’t working in VoiceMacro, you could train the old WSR system it uses to recognise the specific command word(s). Not that I have ever needed to do this, mind you. It works flawlessly at recognising short command phrases without any training whatsoever. However if you are using Windows+H to dictate flowing text and want to teach it a specific word, you are screwed. ------------------------- Dragon Professional 16 + Speech Productivity + KnowBrainer
|
|||||
|
|||||
![]() |
|||||
Yes, the trend for the past decade has been toward non-trainable speech systems that work with a wide variety of speakers. WSR, aka Microsoft Speech Recognizer 8.0, for the desktop, creates a cursory training profile on Windows machines. Then the "new" Microsoft Speech Platform 11.0 joined the desktop and server flavors into one product. You can't train it either, but it has separate recognizers for different languages, so that's a plus for developers of international products. Since 11.0 (circa 2012, I think), Microsoft went to Azure (non-trainable again), Google Cloud, Dragon Anywhere cloud, and so on. Even Lunis recommends not training Dragon, so the new speech recognizers must be getting pretty smart. It helps *a lot* to be working with grammar-supported command utterances. Free-form recognition is much more difficult.
------------------------- Win10/11/x64, AMD Ryzen 7 3700X/3950X, 64/128GB RAM, Dragon 15.3, SP 7 Standard, SpeechStart, Office 365, KB 2017, Dragon Capture, Samson Meteor USB Desk Mic, Amazon YUWAKAYI headset, Klim and JUKSTG earbuds with microphones, excellent Sareville Wireless Mono Headset, 3 BenQ 2560x1440 monitors, Microsoft Sculpt Keyboard and Logitech G502 awesome gaming mouse. |
|||||
|
|||||
![]() |
|||||
The Windows cloud dictation system indeed works very well (albeit it's a bit slow). Particularly when it comes to recognizing words/phrases that are not particularly known to Dragon®, but are known on the Internet. Similar to specific recognitions in "Hey Google".
However, the big downside is that Select-and-Say capability is not available.... Although I do seem to remember that in one of the previous Windows insider builds there was some brief Select-and-Say capability in the Edge address bar with Voice Access (could not reproduce that at the time in other browsers...). ------------------------- Turbocharge your Dragon productivity with 40+ Power Addons |
|||||
|
|||||
![]() |
|||||
... VoiceMacro is "EDC" in a manner desktop Dragon in its current incarnation could not claim to be.
Adding a "smidgen" to my own "sage" assessment. Using DMO tonight for a bit, and trialing/think about my workflow, I realize that even with cloud Dragon's step-by-step command capabilities, it can't replace VoiceMacro in terms of what latter currently does for me.
......
P.S., the attraction of runing one voice app as opposed to 2 is just too great. Plus I couldn't run VoiceMacro on hospital's VDI system due to some language setting issues.
I reluctantly gave up the slick "2C" and embraced DMO's more cumbersome "single/left click" and "double click". In step-by-step, assigning "to see" to mouse action (through AHK workaround as there is no direct call to mouse click in step-by-step) doesn't lead to command recognition even with the stipulated spoken form. Ditto "1C".
Probably not unexpected.
Anyway, that was the price to pay. Otherwise I transferred the majority of my VM macros to DMO on the 3rd day of using latter. |
|||||
|
|||||
![]() |
|||||
I like that abbreviation for a double click! It reminds me of the 2ic abbreviation for second-in-command.
------------------------- Win10/11/x64, AMD Ryzen 7 3700X/3950X, 64/128GB RAM, Dragon 15.3, SP 7 Standard, SpeechStart, Office 365, KB 2017, Dragon Capture, Samson Meteor USB Desk Mic, Amazon YUWAKAYI headset, Klim and JUKSTG earbuds with microphones, excellent Sareville Wireless Mono Headset, 3 BenQ 2560x1440 monitors, Microsoft Sculpt Keyboard and Logitech G502 awesome gaming mouse. |
|||||
|
|||||
![]() |
|||||
Another excellent use for VoiceMacro for me has been a command to "cancel dictation" and a similar command to "cancel and resume dictation."
It's a little tricky because you either have to anchor the Dragon results box showing preliminary results or otherwise figure out where it is on the screen, but once you know, you just send the mouse there, right click and select the menu choice to cancel recognition. Cancelling also turns off the mic - and to me anyway, *appears to flush the speech buffer* making canceling and resuming the dictation by turning the mic back on (the 2nd command) the quickest way to resume quick dictation when the preliminary results are showing in the tooltip but not landing on your screen quickly anymore. Any of you know how to issue that command via the API (so I don't have to mouse move to the tool tip)? There is the Dragon mic option constant called "dgnmicoptionChangeStateImmediately" which, after pausing and resuming the mic, "sets the microphone pause count to zero (cancel all pending pauses)." Does it do the same thing or offer any help? LMK your thoughts, pls and thx, PG -------------------------
|
|||||
|
FuseTalk Standard Edition v4.0 - © 1999-2023 FuseTalk™ Inc. All rights reserved.