KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: Speech command set design for Markup Languages
Topic Summary: Like Markdown, AsciiDoc, MediaWiki, ...
Created On: 03/16/2021 02:14 PM
Status: Post and Reply
Linear : Threading : Single : Branch
 Speech command set design for Markup Languages   - Ag - 03/16/2021 02:14 PM  
 Speech command set design for Markup Languages   - kkkwj - 03/16/2021 10:33 PM  
 Speech command set design for Markup Languages   - Ag - 03/16/2021 10:51 PM  
 Speech command set design for Markup Languages   - Ag - 03/16/2021 10:59 PM  
 Speech command set design for Markup Languages   - wristofdoom - 03/18/2021 12:37 PM  
 Speech command set design for Markup Languages   - kkkwj - 03/18/2021 06:30 PM  
 Speech command set design for Markup Languages   - benTalks - 12/22/2021 08:18 AM  
 Speech command set design for Markup Languages   - wristofdoom - 12/22/2021 11:53 AM  
 Speech command set design for Markup Languages   - dilligence - 12/22/2021 12:07 PM  
 Speech command set design for Markup Languages   - benTalks - 12/22/2021 08:19 AM  
 Speech command set design for Markup Languages   - kkkwj - 12/23/2021 10:30 PM  
Keyword
 03/16/2021 02:14 PM
User is offline View Users Profile Print this message

Author Icon
Ag
Top-Tier Member

Posts: 775
Joined: 07/08/2019

---+ BRIEF: Does anyone have a set of speech commands/shortcuts for markup languages?

 

I am most interested in the design principles. It seems sensible to have similar speech command set patterns for different markup languages.  The actual commands are fairly straightforward.

In particular, I frequently switch between different markup languages. E.g. on GitHub I will use all of AsciiDoc and MediaWiki and Markdown - and there are a few other markup languages supported by the GitHub wiki. Sometimes different pages in the same wiki are written in different markup languages. I have not yet figured out a good way of inferring Or controlling which markup to use from my speech commands.

 

---+ DETAIL: 

 

Sigh. Every few weeks I need to start adding a new class of applications for speech commands.

 

Today: markup languages like Markdown, AsciiDoc, MediaWiki, ...

 

My job requires me to maintain documents in AsciiDoc, and to interact with people on wikis using markup languages like Markdown, MediaWiki, and a few others.  Sometimes on GitHub, sometimes elsewhere.

 

One quickly gets tired of saying "equal sign; equal sign; Heading #2; equal sign; equal sign" in MediaWiki, and "sharp sign; sharp sign; Heading #2" in Markdown.

 

Obviously, I will have use a set of commands like "Heading <Level> <Dictation>", Similar syntax across markup languages, different implementations. 

 

One part of my question is asking about patterns for the syntax of such a common Markdown speech command set.

 

A second part of my question is how to implement inferring or controlling the class of Markdown.   Application and window/webpage name do not help.  I believe I need persistence.

 

---+ Inferring and controlling markup language to use

So far the main problem is detecting when to use which Markdown language.  

 

I and my coworker/collaborators frequently switch between different markup languages. E.g. on GitHub I will use all of AsciiDoc and MediaWiki and Markdown - and there are a few other markup languages supported by the GitHub wiki. Sometimes different pages in the same wiki are written in different markup languages.

 

I have not yet figured out a good way of inferring which markup language should be used. On GitHub, all of the markups use the same webpage text box editing system. Though I cannot use application name or window name to figure out which command set is appropriate.

 

As I am writing this, I realize one thing that might help: I can make a local clone of the GitHub wiki and repo, and edit the files locally in my favorite text editor. Emacs, in my case. Editing locally the markup language suffix like .md or .asciidoc or .adoc is already known.   There are already emacs modes for many of these markup languages. I've already begun generic interfacing of speech commands to emacs commands. In fact, overall, could leave most of the intelligence in emacs, leaving it out of the speech commands.

 

But I would still like to be able to do quick edits on the fly using the webpage editor for wikis and other markup systems

 

If I cannot use application or window name context and the speech commands, perhaps I need persistence - a command that says "use MediaWiki syntax from here on".   But that requires persistent state.

 

I have asked about persistent state before on this forum.  Edgar's recommendation is to use files. Which I think will amount to opening a file every time I bounce to a new wiki webpage editing tab or window.  I have not tried it yet, but that sounds expensive.

 

Mostly the sort of persistence that I need is "state that persists from one speech command to another".  Probably not across reboot or restart of the threads that are maintaining the persistence. 

 

I already have an unsatisfactory implementation of persistence, sending hotkeys to a persistent AHK script.  

 

I have seen AHK scripts that send text messages to each other using Windows messaging interprocess communication APIs.  It should be straightforward to have a transient speech command, whether in AHK or Dragon/KnowBrainer basic, send such a message.

 

More generically, using standard networking APIs like Berkeley sockets.  This would have the advantage that it could communicate from speech commands on Windows to demand servers in different operating system environments like Linux, whether in different virtual machines on the same PC, or across the network. With my usual paranoia about security.

 

Does anyone have better or alternative ideas?  Here, example code is always appreciated

 

 



-------------------------

DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design.

 03/16/2021 10:33 PM
User is offline View Users Profile Print this message

Author Icon
kkkwj
Top-Tier Member

Posts: 865
Joined: 11/05/2015

I'm with Edgar. Write a few commands to store your persistent state in files. Low effort, high reliability, great extensibility. Files only take a split second to read the first time and even less time if the OS keeps the tiny file cached in RAM if you access it repeatedly with each voice command.

The only other way is to have your scripts talk to a persistent app (like a GUI app that stays open) or a server process in the background. But if you go that way, you've got all the headaches of sockets or pipes or windows messages or IPC or whatever. Ugh.

My guess is that if you get a persistent file mechanism going, you'll find other uses for the persistent state besides storing your current markup language.

-------------------------

Win10/11/x64, AMD Ryzen 7 3700X/3950X, 64/128GB RAM, Dragon 15.3, SP 7 Standard, SpeechStart, Office 365, KB 2017, Dragon Capture, Samson Meteor USB Desk Mic, Amazon YUWAKAYI headset, Klim and JUKSTG earbuds with microphones, 3 BenQ 2560x1440 monitors, Microsoft Sculpt Keyboard and fat mouse

 03/16/2021 10:51 PM
User is offline View Users Profile Print this message

Author Icon
Ag
Top-Tier Member

Posts: 775
Joined: 07/08/2019

You're right. My thinking was premature optimization, with a 1990s hat on. Heck, I'm using AHK scripts and loading a dozen files on many commands, and am reasonably happy with the performance. (Although I expect to get annoyed at some point, and have have my Dragon commands send messages to a persistent command server.)



-------------------------

DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design.



 03/16/2021 10:59 PM
User is offline View Users Profile Print this message

Author Icon
Ag
Top-Tier Member

Posts: 775
Joined: 07/08/2019

So that just brings me back to the markup language command set design. Looking for a reasonably universal command set, that maps to several markup languages, and works in a reasonable number of wiki and other text editors.

E.g.

Heading insert dictation as a heading of specified levdel

Heading -- make the current line or paragraph into a heading.
-- Possibly blind, assuming currently no heading. Bonus points if it can parse text already there, probably by copy/cut into clipboard for script to look at.

Promote/Demote Heading

Similarly for lists, bulleted or numbered.

The usual textual effects

(bold|italic|emphasis|underline|...)
--> *foo*, ~foo~, _foo_ etc


...

Basically, pretty much the same command syntax as for Word, and much in common with email, gmail, PowerPoint, ...




-------------------------

DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design.

 03/18/2021 12:37 PM
User is offline View Users Profile Print this message

Author Icon
wristofdoom
Top-Tier Member

Posts: 320
Joined: 09/03/2020

I don't know about design principles but I have created markdown commands that I use all day in Trello and Obsidian and any other markdown-friendly environment that comes up.

I've exported my commands as an XML file if you want to take a look:

https://www.dropbox.com/s/dleb3lm4minn5y9/2021.03.18%20markdown%20commands.xml?dl=0



-------------------------

Dragon Professional Individual v15.6. Windows 10. Knowbrainer 2017.



 03/18/2021 06:30 PM
User is offline View Users Profile Print this message

Author Icon
kkkwj
Top-Tier Member

Posts: 865
Joined: 11/05/2015

Wrist, thanks for posting your XML file. My guess is that the design that Ag is looking for is near the bottom of your file that defines a list of markdown commands for headings, lists, bold, italics, and so on. I would guess that not much design skill is required to identify a set of markdown commands because markdown languages (and the operations/keystrokes to create the markdown effects) are so simple. For example, you can only do a limited number of things in markdown. In contrast, there are many hundreds of things to do in Word or other advanced wordprocessing "languages."

It's always nice to see how someone else views the world and how they have solved the problem. Thank you for sharing your knowledge.

-------------------------

Win10/11/x64, AMD Ryzen 7 3700X/3950X, 64/128GB RAM, Dragon 15.3, SP 7 Standard, SpeechStart, Office 365, KB 2017, Dragon Capture, Samson Meteor USB Desk Mic, Amazon YUWAKAYI headset, Klim and JUKSTG earbuds with microphones, 3 BenQ 2560x1440 monitors, Microsoft Sculpt Keyboard and fat mouse

 12/22/2021 08:18 AM
User is offline View Users Profile Print this message

Author Icon
benTalks
Junior Member

Posts: 36
Joined: 04/27/2020

Hello,

I just saw this, and wanted to mention, if folks are not already aware, there is an add-on for Dragon called Talon, specifically designed for voice coding. It's definitely worth checking out.
https://talonvoice.com/
The website is nothing special, but they use a Slack channel for support and discussion.
 12/22/2021 11:53 AM
User is offline View Users Profile Print this message

Author Icon
wristofdoom
Top-Tier Member

Posts: 320
Joined: 09/03/2020

Originally posted by: benTalks Hello, I just saw this, and wanted to mention, if folks are not already aware, there is an add-on for Dragon called Talon, specifically designed for voice coding. It's definitely worth checking out. https://talonvoice.com/ The website is nothing special, but they use a Slack channel for support and discussion.

 

I have looked at Talon, but haven't taken the plunge because the setup process seems pretty involved and I'm not sure what the advantage would be.



Will you name a few things that you can do with Talon that you cannot do with Dragon on its own? 



-------------------------

Dragon Professional Individual v15.6. Windows 10. Knowbrainer 2017.

 12/22/2021 12:07 PM
User is offline View Users Profile Print this message

Author Icon
dilligence
Top-Tier Member

Posts: 1499
Joined: 08/16/2010

Ag,

 

The upcoming SP 7 PRO will have several interesting tools for voice coders like CodePrep©, Transit© and TopicNotes©.

 

Here's a preview of that last one which auto converts any input immediately to Markdown. TopicNotes© is the big brother of SP HyperNotes© but it adds a lot more (formatting) functionality and also Spreadsheet support:

 

 



-------------------------

https://speechproductivity.eu


Turbocharge your Dragon® productivity with 40 Power Addons



 12/22/2021 08:19 AM
User is offline View Users Profile Print this message

Author Icon
benTalks
Junior Member

Posts: 36
Joined: 04/27/2020

To clarify, Talon is not technically an add-on, but it acts as one.
 12/23/2021 10:30 PM
User is offline View Users Profile Print this message

Author Icon
kkkwj
Top-Tier Member

Posts: 865
Joined: 11/05/2015

Here's a link that shows what Talon can do in a coding environment. It's stunning. https://twitter.com/i/status/1378159234861264896. I installed the default Talon and tried out the commands that he used to dictate the code in the video. If you play the video two or three times, with pausing, you can start to anticipate what he's going to say. I tried out an exercise/practice site for special characters and the alphabet, and Talon was pretty much flawless at recognizing my single-syllable commands to produce individual letters or {}'@ characters. I was impressed. And I got *instant* help on the Slack channel. That was great!

 

Will Talon replace Dragon any time soon? Probably not. I found out pretty quickly that I used Dragon's autotranscribe agent, and some other Dragon addons. 

 

I chatted with the Talon developer for a while, and Talon can actually use Dragon (connect to Dragon and do stuff). I'm not sure exactly how that chain operates, but probably Talon is using Dragon as a recognizer. Talon has a complete grammar and recognizer and execution system too. 



-------------------------

Win10/11/x64, AMD Ryzen 7 3700X/3950X, 64/128GB RAM, Dragon 15.3, SP 7 Standard, SpeechStart, Office 365, KB 2017, Dragon Capture, Samson Meteor USB Desk Mic, Amazon YUWAKAYI headset, Klim and JUKSTG earbuds with microphones, 3 BenQ 2560x1440 monitors, Microsoft Sculpt Keyboard and fat mouse

Statistics
32286 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 2 users logged in.
The most users ever online was 12124 on 09/09/2020 at 04:59 AM.
There are currently 363 guests browsing this forum, which makes a total of 365 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2022 FuseTalk™ Inc. All rights reserved.