KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: FindText
Topic Summary: Really?
Created On: 08/29/2022 10:31 PM
Status: Post and Reply
Linear : Threading : Single : Branch
 FindText   - ax - 08/29/2022 10:31 PM  
 FindText   - ax - 08/29/2022 11:49 PM  
 FindText   - Ag - 08/30/2022 11:51 PM  
 FindText   - ax - 08/31/2022 02:45 AM  
 FindText   - ax - 08/31/2022 03:29 PM  
 08/29/2022 10:31 PM
User is offline View Users Profile Print this message

Author Icon
Top-Tier Member

Posts: 697
Joined: 03/22/2012

I mean really!  Have I not heard of Ctrl-F?  Well every day one learns something and forgets a little more.  


While I do try to keep Ctrl-F in mind, I am here to introduce an AHK ImageSearch refinement going by the intuitive name of "FindText":, whose primary utility, I think, is to search text on a screen as images.


The inspiration to try this came from none other than the Chief Ideas Officer, Ag.  See thread here please:


Most AHKaky type of things are community endeavours to various extent.  This one is no exception.  It's open source of course.  Due to author's ESL status, introductions were written by other users!


The package is fairly functional as I just started playing with it in the last hour.  90% of the menu I still don't quite understand.  I will in the next little while try to digest some of the tutorials.  They are hard to get a grasp on without going hands-on.


But it's packaged in such a way that not a lot of "hand-holding" is needed just to get started.  Let's say I am looking for the "word" "DMO:" as seen in the following screenshot of this forum:





NB: above is not the search field/haystack.  My search haystack is the entirety of my 3 monitors.  It's that kind of speed refinement that makes this utility useful.  I will set a box 30 pixel by 12 pixel and go on cropping my "text", or needle, rather, to allow for search later in the haystack.




And one would then get something like below, after mandatory "ASCII-artification", so-to-speak, which converts the cropped image to "black and white" (I just click the "GrayDiff2Two" button ... will read the "manual" later).  I haven't a clue the ins and outs here.




Now if I go click the "Test" button above, to search for this ASCII art "DMO:" across the whole of my 3 screens (WUXGA x2 + UXGA x1), it takes a mere average of 130ms to find it. 


13ms would be better.  But anything under 300ms I think is relatively acceptable from a "UX" point of view.


And that's with a feeble Quadro M620 in my 5-year-old Z2 Mini G3.  I kinda was hoping there might be justification to upgrade to something more modern, like an RTX A2000, before certain straight boils over and computer and parts go the way of natural gas ... I mean, a nerd may not have to shower, but no computers?  ¯\_(`;`)_/¯

 08/29/2022 11:49 PM
User is offline View Users Profile Print this message

Author Icon
Top-Tier Member

Posts: 697
Joined: 03/22/2012

Now I just quickly tested over RDP on my office rig, into which I shoved a T400 gpu a few months ago to better handle the triple monitors.  And it actually takes longer to find the "word" there, at around 230ms.

Plenty of testing is needed.  The entry card T400 and M620 are rather close in benchmark anyway.  

But the potential here seems enormous.

I already tested it on my Citrix "controls" and it took less than 80ms to find some of the menu items.  Very, very encouraging.

Limitations so far:

1. The window containing the "needle" has to be visible for search to be successful.  Can't be behind other windows.

2. While it can handle different monitors and different degrees of night light with aplomb, just by going with default settings, if I invert the colours with "dark mode", however, it just won't work (without "re-cropping").  Perhaps that could be addressed by swapping the "0"s and "underscores" on-the-fly.  Not sure how well it handles different scaling - that's potentially a biggie.  I use the same scaling on all my monitors, even the laptop.

Anyway, surely I could see myself delving into this utility more "fulsomely".  It is welcoming to have something with the potential to contend with that stupid Citrix a bit more meaningfully, while not relying on inflexible screen coordinates or the slower and ricketier ImageSearch.


@Ag, I could see the utility being at least partially useful to your desired workflow, which I must say seems quite "high-end" (your workflow, that is).

 08/30/2022 11:51 PM
User is offline View Users Profile Print this message

Author Icon
Top-Tier Member

Posts: 1036
Joined: 07/08/2019

Looks very cool! or perhaps I should say sounds very cool, because I haven't looked at the code yet.... OK, now it looks cool.

"The window containing the "needle" has to be visible for search to be successful. Can't be behind other windows."

Yeah, this limitation is why didn't use the macOS version for very long: it worked really well as an alternative to "number buttons" and "click button number", but it wasn't really all that good for automating. Especially with modern apps where the things that are on the screen, the menus, etc., depend very much on screen size.

but I suppose I'll need to play with It this weekend



DPG15.6 (also DPI 15.3) + KB, Sennheiser MB Pro 1 UC ML, BTD 800 dongle, Windows 10 Pro, MS Surface Book 3, Intel Core i7-1065G7 CPU @ 1.3/1.5GHz (4 cores, 8 logical, GPU=NVIDIA Quadro RTX 3000 with Max-Q Design.

 08/31/2022 02:45 AM
User is offline View Users Profile Print this message

Author Icon
Top-Tier Member

Posts: 697
Joined: 03/22/2012

Due to unforeseen circumstances, I actually got a bit of unscheduled "free time" on my hands this week.  So I have been going through the tutorials as we speak in the last 2 hours.

I was sorely stuck on Example #9, the one with "Notepad OCR".  I finally ascertained that even though the super helpful tutorial writer had stipulated "100%" DPI for the purposes of the examples, without a doubt the Notepad OCR script works only at a scaling factor of 150%.

Anyhow, once the correct scaling was understood, the speed and accuracy of that OCR search simply amazed me.  Again with just the M620 gpu.

The pseudo-OCR function should dovetail with voicification more natually.

Anyway, 3 quick points:

1. Not true that the T400 was slower than the M620.  I was inadvertenly searching with a different algorithm (Gray vs GrayDiff) and not paying attention (while didn't bother with the tutorials last night).

2. It can search overlapped (but not minimized) windows through "BindWindow" optons.  For my intended struggle against Citrix, this is slightly less consequential.  But I think it will still be quite limiting in dealing with any "menu-on-top-of-menu" type of interfaces.  But potentially looping and waiting for topmost controls to "peel away" might be workarounds?

3. It is not only scaling-limited, meaning one cannot change desktop scaling and still expect the same "needles" to work, but it is also colour theme-limited.  This is a bit disappointing but not surprsing.  If one changes Windows display theme or transparency, or webpage colour scheme - in any way, shape, or form - the "needle" also won't work anymore.  Again I would accept this as intrinsic limitations.

Otherwise I have high hopes for this package.

Do check out that "Notepad OCR Search" example I mentioned.  Nothing short of astounding to me.

 08/31/2022 03:29 PM
User is offline View Users Profile Print this message

Author Icon
Top-Tier Member

Posts: 697
Joined: 03/22/2012

Above is the location of the tutorial thread, just to be sure.

And I clarified with the author of the excellent tutorials that the examples should indeed be run at 150% scaling.


P.S., Very preliminary field testing showed that the search can in fact handle small amount of shading, which I suppose makes sense as long as there is enough of a "gradient" between characters and background (again colour inversion requires a whole new set of "needles"). 


Going with the "OCR-styled" search, I do find that the initial "definition" of the AlphaNumerics to be quite tedious and time-consuming.  But once set up, the speed of searching even long strings is simply mind-bogglingly fast, at least for the menu items I tend to deal through Citrix.


There have been some false +ves and false -ves, e.g. "ff" and "i" in the middle of words (in Verdana 14 font), hyphens, and underscores.  Generally I have been able to work around.

32532 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 2 users logged in.
The most users ever online was 12124 on 09/09/2020 at 04:59 AM.
There are currently 401 guests browsing this forum, which makes a total of 403 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2023 FuseTalk™ Inc. All rights reserved.