KnowBrainer Speech Recognition
Decrease font size
Increase font size
Topic Title: "add words from your doc..." finding unknonwn capitalization in clean txt file
Topic Summary: Carefully cleaned my sample but it still finds first word in sentences
Created On: 07/17/2008 12:56 PM
Status: Post and Reply
Linear : Threading : Single : Branch
Keyword
 07/17/2008 12:56 PM
User is offline View Users Profile Print this message


dontdont
Junior Member

Posts: 5
Joined: 04/06/2007

It has been many months and I'm coming up with words that aren't in my existing dictionary.  So I take the last 90 days, concatenate, convert to ascii txt, clean until absolutely squeaky, and ask it to find unknown words in document, find known with unknown capitalization (since there are cases where I do and do not want special capitalization), and preview the list.

It finds lots of words I do want it to find.  Then it finds and counts every instance of "Pupils", for example, as in "...vessel.  Pupils are equally reactive..."  So it appears that it is finding lots of words at the beginning of sentences that are capitalized, but preceeded by a period and two spaces, that it believes are not what it expects for capitalization.

Is there any chance I've left something in the txt file that I cannot see that is triggering this?  Or is part of the "unknown capitalization" search something that I don't understand?  I assumed the first word in a sentence would always be capitalized and ignored by any search for unknown capitalization, unless it was NOT capitalized perhaps.  If I don't check "unknown capitalization" it ignores these, but ignores the other special capitalization that I do want to manually decide whether to add to the dictionary or not.

I've searched KnowBrainer and the web trying to find an answer before asking here.

Thanks
Preferred, 2ghz, 1g memory, xp, nothing else running, stable and fairly accurate.
 07/17/2008 04:00 PM
User is offline View Users Profile Print this message

Author Icon
Lunis Orcutt
Top-Tier Member

Posts: 22636
Joined: 10/01/2006

We suspect you are getting nailed by something that we have never liked about NaturallySpeaking which is a setting that automatically adds unknown words and partially spelled words to your vocabulary against your wishes. Say show options and note the default checkmark in Automatically add words to vocabulary. You can prevent this occurrence by removing that checkmark but there is a downside. Normally when you make corrections via Spell That correction, NaturallySpeaking will automatically add unknown words to your vocabulary when you make a correction. This will no longer occur during correction and the only way you can add vocabulary is directly via the Vocabulary Editor or through the DNS Add Word feature. However, this setting caused us so much grief that we think it's well worth removing the checkmark from Automatically add words to vocabulary and recommend that everyone do so. Because we believe it is best to turn off this feature, KnowBrainer 2007 includes an Add to Vocabulary command that reduces the process of adding words or phrases to your vocabulary to a single efficient step.

-------------------------


Click KB 2012 REV D to Download a 30 Day Evaluation of KnowBrainer 2012 


 


 


 

 07/17/2008 07:43 PM
User is offline View Users Profile Print this message


dontdont
Junior Member

Posts: 5
Joined: 04/06/2007

I believe I've seen the sort of unintended additions you describe, (usually when later I say "how did that meaningless string of letters end up in the dictionary?!" but I thought this only happened during "normal dictation."

I didn't think this behavior happened with the steps I had taken:
1: find one or more finished documents
2: concatenate into a single document
3: strip out formatting, etc
4: dns->Tools->Accuracy Center...->Add words from your documents to the vocabulary
5: Check Find Unknown Words, Check Find Known Words with unknown capitalization, Check Preview the list
6: Next->Add Document->...->Next
and it appears to (roughly) complain about the first word of many of my sentences.  Then I have to go through the whole list trying to determine which actually should be added to the dictionary and which seem to be false positives.  Initially I had thought the problem was because of my inadequate cleaning of the sample text file, trailing white space after periods, lack of periods, etc., but I've looked really carefully this time and I don't think I see any such flaws in the sample document I gave it.  For false positives, it only appears to find capitalized first words of sentences when I look at the list of potential new words to add to the dictionary.

Thanks for any insight


 07/17/2008 08:10 PM
User is offline View Users Profile Print this message

Author Icon
Lunis Orcutt
Top-Tier Member

Posts: 22636
Joined: 10/01/2006

In theory you're right but if you even open a Select-&-Say enabled document that has a misspelled word in it, there is a fairly high probability that NaturallySpeaking will see that word and add it to your vocabulary. Even when you don't type or dictate a word, NaturallySpeaking may see it in a Select-&-Say document. The only way to prevent unwanted vocabulary is to remove the checkmark from Automatically add words to vocabulary.

-------------------------


Click KB 2012 REV D to Download a 30 Day Evaluation of KnowBrainer 2012 


 


 


 

Statistics
27372 users are registered to the KnowBrainer Speech Recognition forum.
There are currently 1 users logged in.
The most users ever online was 2028 on 04/05/2013 at 07:36 PM.
There are currently 117 guests browsing this forum, which makes a total of 118 users using this forum.

FuseTalk Standard Edition v4.0 - © 1999-2013 FuseTalk™ Inc. All rights reserved.