![]() |
KnowBrainer Speech Recognition | ![]() |


|
Topic Title: "add words from your doc..." finding unknonwn capitalization in clean txt file Topic Summary: Carefully cleaned my sample but it still finds first word in sentences Created On: 07/17/2008 12:56 PM Status: Post and Reply |
|
![]() |
- dontdont | - 07/17/2008 12:56 PM |
![]() |
- Lunis Orcutt | - 07/17/2008 04:00 PM |
![]() |
- dontdont | - 07/17/2008 07:43 PM |
![]() |
- Lunis Orcutt | - 07/17/2008 08:10 PM |
|
|
|
|
It has been many months and I'm coming up with words that aren't in my existing dictionary. So I take the last 90 days, concatenate, convert to ascii txt, clean until absolutely squeaky, and ask it to find unknown words in document, find known with unknown capitalization (since there are cases where I do and do not want special capitalization), and preview the list.
It finds lots of words I do want it to find. Then it finds and counts every instance of "Pupils", for example, as in "...vessel. Pupils are equally reactive..." So it appears that it is finding lots of words at the beginning of sentences that are capitalized, but preceeded by a period and two spaces, that it believes are not what it expects for capitalization. Is there any chance I've left something in the txt file that I cannot see that is triggering this? Or is part of the "unknown capitalization" search something that I don't understand? I assumed the first word in a sentence would always be capitalized and ignored by any search for unknown capitalization, unless it was NOT capitalized perhaps. If I don't check "unknown capitalization" it ignores these, but ignores the other special capitalization that I do want to manually decide whether to add to the dictionary or not. I've searched KnowBrainer and the web trying to find an answer before asking here. Thanks Preferred, 2ghz, 1g memory, xp, nothing else running, stable and fairly accurate. |
|
|
|
|
|
|
|
|
We suspect you are getting nailed by something that we have never liked about NaturallySpeaking which is a setting that automatically adds unknown words and partially spelled words to your vocabulary against your wishes. Say show options and note the default checkmark in Automatically add words to vocabulary. You can prevent this occurrence by removing that checkmark but there is a downside. Normally when you make corrections via Spell That correction, NaturallySpeaking will automatically add unknown words to your vocabulary when you make a correction. This will no longer occur during correction and the only way you can add vocabulary is directly via the Vocabulary Editor or through the DNS Add Word feature. However, this setting caused us so much grief that we think it's well worth removing the checkmark from Automatically add words to vocabulary and recommend that everyone do so. Because we believe it is best to turn off this feature, KnowBrainer 2007 includes an Add to Vocabulary command that reduces the process of adding words or phrases to your vocabulary to a single efficient step.
-------------------------
|
|
|
|
|
|
|
|
|
I believe I've seen the sort of unintended additions you describe, (usually when later I say "how did that meaningless string of letters end up in the dictionary?!"
I didn't think this behavior happened with the steps I had taken: 1: find one or more finished documents 2: concatenate into a single document 3: strip out formatting, etc 4: dns->Tools->Accuracy Center...->Add words from your documents to the vocabulary 5: Check Find Unknown Words, Check Find Known Words with unknown capitalization, Check Preview the list 6: Next->Add Document->...->Next and it appears to (roughly) complain about the first word of many of my sentences. Then I have to go through the whole list trying to determine which actually should be added to the dictionary and which seem to be false positives. Initially I had thought the problem was because of my inadequate cleaning of the sample text file, trailing white space after periods, lack of periods, etc., but I've looked really carefully this time and I don't think I see any such flaws in the sample document I gave it. For false positives, it only appears to find capitalized first words of sentences when I look at the list of potential new words to add to the dictionary. Thanks for any insight |
|
|
|
|
|
|
|
|
In theory you're right but if you even open a Select-&-Say enabled document that has a misspelled word in it, there is a fairly high probability that NaturallySpeaking will see that word and add it to your vocabulary. Even when you don't type or dictate a word, NaturallySpeaking may see it in a Select-&-Say document. The only way to prevent unwanted vocabulary is to remove the checkmark from Automatically add words to vocabulary.
-------------------------
|
|
|
|
|
FuseTalk Standard Edition v4.0 - © 1999-2013 FuseTalk™ Inc. All rights reserved.