![]() |
KnowBrainer Speech Recognition | ![]() |


|
Topic Title: iPhone dictation system versus Dragon Topic Summary: When I use Siri iPhone it appears that it is almost perfect in its dictation. It appears that it is better voice recognition than Dragon. Is this Dragon voice recognition or another system. It seems to me that the iPhone dictates very well in fact better Created On: 04/10/2012 02:03 PM Status: Post and Reply |
|
![]() |
- chas | - 04/10/2012 02:03 PM |
![]() |
- Rag | - 04/10/2012 03:51 PM |
![]() |
- Lunis Orcutt | - 04/10/2012 05:53 PM |
![]() |
- Chucker | - 04/11/2012 06:45 AM |
![]() |
- Matt Chambers | - 04/11/2012 08:37 AM |
![]() |
- Chucker | - 04/11/2012 09:16 AM |
![]() |
- Matt Chambers | - 04/11/2012 09:55 AM |
![]() |
- monkey8 | - 04/11/2012 02:20 PM |
![]() |
- GDS | - 04/11/2012 04:57 PM |
![]() |
- Chucker | - 04/11/2012 05:28 PM |
![]() |
- bmac | - 04/12/2012 11:03 AM |
![]() |
- Chucker | - 04/11/2012 10:31 AM |
![]() |
- wgoren | - 08/06/2012 04:48 PM |
![]() |
- Lunis Orcutt | - 08/06/2012 10:06 PM |
![]() |
- NeuroDoc | - 08/07/2012 12:27 AM |
|
|
|
|
When I use Siri iPhone it appears that it is almost perfect in its dictation. It appears that it is better voice recognition than Dragon. Is this Dragon voice recognition or another system.
It seems to me that the iPhone dictates very well in fact better recognition than the downloaded Dragon dictation system has anyone had any experience with this. |
|
|
|
|
|
|
|
|
Apparently its the DNS 11 engine with a massive vocab, voice independant model. Nuance it seems have their fingers in many pies. You have to be connected to the net to use it which could be one downfall. Im sure that will change in years to come. R |
|
|
|
|
|
|
|
|
Note that you can use any existing analog microphone on your iPad or iPhone, to increase your accuracy, when you add the new iPad/iPhone Adapter
-------------------------
|
|
|
|
|
|
|
|
|
It seems to me that the iPhone dictates very well in fact better recognition than the downloaded Dragon dictation system has anyone had any experience with this. chas, First, be careful mixing apples and oranges. Siri uses the Dragon recognizer via access to the Nuance Dragon NaturallySpeaking server, but it is a query application, which is much simpler in terms of accurate recognition of your queries then simply dictating text using a large vocabulary continuous speech recognition application. The latter is many times more complex because there are many more unknown outcomes the Dragon NaturallySpeaking has to interpret correctly. Querying for information is a much simpler process. Second, you do not have to be connected to the Internet to use Siri. Surrey uses either a wireless connection or 3G/4G and the Surrey app automatically sends the query to the Nuance server via those protocols. Also, what you say as limitations in terms of the length of your query. Dragon NaturallySpeaking has no limitations. That is, you can dictate for as long as you want. But with Siri if you ask a very long question, you may not get what you expect, which is not a question of accuracy as much as it is matter of understanding what it is that you're asking in terms of querying the Siri database. There will come a time in the future when smart phones will be obsolete because will carry our computers around in our pockets and such technology will be many times more sophisticated and powerful than even the current desktop/laptop technologies. Chuck Runquist -------------------------
|
|
|
|
|
|
|
|
|
Siri is not simply a "query application". On the iPhone 4S, you can use it to enter text into documents, e-mails, and text messages. There is a special key on the keyboard that takes dictation. There is an article today in the Wall Street Journal in the Personal Technology column about its efficacy. It is correct that the speech recognition engine is supplied by Nuance. I'm sure that it is very similar to the engine that we use in Dragon NaturallySpeaking, but the recognition is performed at central servers, as Chuck said. That should allow a lot more processing power to be applied.
------------------------- |
|
|
|
|
|
|
|
|
Matt, Thank you for bringing me up to date. I ditched my previous iPhone because it was only 3G. I'm waiting until next month when I get my new iPhone with 4G. So, I only know Siri through Nuance and their development team that worked on Siri SR. I knew that they were using a central server. However, I didn't know that Siri was more extensive in terms of being able to accept dictation. It appears that Siri is similar to FlexT9, which I have on my android smartphone and am using temporarily. Question for you. How long can you dictate with Siri? It takes a lot of bandwidth to transmit a significant amount of audio information to the Nuance backend server even if the audio stream is compressed. FlexT9 breaks it up into utterances. Does Siri do the same? Is it capable of doing the same? Thanks, Chuck Runquist A creative man is motivated by the desire to achieve, not by the desire to beat others. Ayn Rand -------------------------
|
|
|
|
|
|
|
|
|
Chuck,
I don't really know how long the utterances can be with Siri. I haven't used it a lot, for two reasons. First, as you say, it uses a ton of bandwidth and data, so I don't want to use it very often unless I am connected by Wi-Fi. Second, there is no easy way to add custom words and no way to customize your vocabulary, so I find it less accurate than dictating on my computer. I do find that it is pretty accurate for short text messages and e-mails, as long as you're using fairly standard language and not much jargon. Maybe I will have to experiment some more. Matt ------------------------- |
|
|
|
|
|
|
|
|
Chuck,
I don't know which Android phone you are using but if you can update to Android 4 you are no longer restricted to short length utterances and the Google servers can now deal with any length (more or less) of utterance. It's starting to get difficult to keep up. Lindsay ------------------------- |
|
|
|
|
|
|
|
|
Quote: Chuck, I don't know which Android phone you are using but if you can update to Android 4 you are no longer restricted to short length utterances and the Google servers can now deal with any length (more or less) of utterance. It's starting to get difficult to keep up. That, and Google's done something remarkable: the transcription of dictated text is accurate and instant. I know those of us with flawless dictation styles and supercomputers have been taking this for granted in Dragon NaturallySpeaking for awhile, but the bottom line is that instantaneous transcription of "from my mouth to the screen" is available to the average end user for the first time. I'm long on Nuance specifically, and in general I'm long on speech as the "future" and best method of human-computer interaction. I've got some quibbles with Nuance as a company -- mostly in how it strategically leverages its technology. But that technology is best in class. That said, Google will be a serious competitor. It's no coincidence that the core of Google's speech team is made up of former Nuance staffers (and patent holders). Quote: I'm waiting until next month when I get my new iPhone with 4G. You might have to wait longer than that. Apple will be studied for generations as a masterclass in marketing, business, vertical and horizontal integration, yada yada yada. But for all of Apple's deserved kudos in marketing, they're confusing as hell sometimes. The iPhone 4S does not have 4G cellular data speeds. The only Apple device that can access 4G data networks is the new iPad, (which is officially referred to as The New iPad). I don't watch Apple all that closely. I don't know when we'll see a 4G iPhone. But I suspect that it's a few months out, yet... mostly because refreshes of the MacBooks are coming before the end of this month, its new operating system is coming in June, and it likes to stagger its major consumer releases. ------------------------- Eric Wright At work: DNS 12 Pro. At home: DNS 11.5 Pro, KnowBrainer 2011, and Utter Command by RedStart Systems; Dragon Dictate 3 for Mac
Appetite for Dictation - My Blog |
|
|
|
|
|
|
|
|
Eric, I don't know what other vendors are doing relative to the iPhone 4, but Verizon has given me a guarantee in writing that I will have mine by the first week in May. The guarantee says that if they don't deliver, I get it free. I hope they don't make it. Chuck Runquist -------------------------
|
|
|
|
|
|
|
|
|
Quote: I don't know what other vendors are doing relative to the iPhone 4 Chuck - I believe you meant iPhone 5... ------------------------- Bill |
|
|
|
|
|
|
|
|
Matt et al., Here is an article from Speech Technology Magazine, today's issue, that should clarify many of the questions that anyone has about Siri. The link requires a subscription, which is free, but rather than burdening everyone with the process of obtaining a subscription before being able to read the article, I'm reproducing it here with full credit to the author. Most consumers have encountered speech recognition largely in call center automation, where the speech recognition can be annoyingly overstructured. Callers might feel they are being prevented from speaking to an agent by the automated system, a further source of annoyance. Part of the negative view of speech recognition has also been its limitations compared to human speech recognition. Today, the technology seems to be at a tipping point, with both the perception of it and capabilities rapidly moving speech recognition toward an everyday experience. Apple's Siri is a big part of the attitude change. The model of a friendly personal assistant, easily available, seemingly always with you in the form of a mobile phone, and apparently responding to unstructured speech (natural language), has changed perceptions both of how useful the technology can be and how far speech recognition has come. The friendly part of the perception is in part due to Apple's marketing genius. When the company emphasized the naturalness of the interaction rather than reminding users they were talking to a computer, I initially thought that the natural language model would encourage pushing the service beyond its capabilities. But Apple foresaw this issue, and made it an advantage. They put in canned clever answers to many of the testing questions that Siri might be asked (from "What is the meaning of life?" to "Will you marry me?" ). As a marketing tool and confidence builder, this insight is proving tremendously effective. The speech technology in Siri is remarkable. In speech recognition, the mobile phone environment is one of the most difficult, with background noise a typical issue. The iPhone includes noise cancellation, which helps. Beyond that, the speech recognition accuracy for unconstrained speech with very few context restrictions is remarkable. What accounts for this apparent quantum leap in the capabilities of speech recognition? Part of the accuracy can be attributed to the speech recognition itself, and part to the natural language processing of the transcribed speech. The transcription of the speech to text is displayed so one can see what Siri "heard," and it is remarkably accurate (based on personal experience and the reaction of the marketplace). What adds to the experience, however, is the post-processing, which can compensate for recognition errors. For example, in a personal experience, the iPhone responded to one spoken request with the text interpretation, "Fries electronics near here," but then, without further interaction, displayed the location of a Fry's Electronics store nearby, the correct interpretation of intent. The natural language processing either made its own match to similar-sounding words or is working from output of the recognizer that includes more than the highest-scoring option. Another aspect of this performance is the infrastructure used. The speech recognition and natural language processing are done in the network, so they can use the processing power and memory resources of a server, rather than the limitations of the small device. The processing also has access to constantly updated large databases, e.g., local businesses. The core speech recognition probably uses more than a pure statistical language model, with entries such as business names or street addresses represented by a list within the statistical language model, making it easy to update without rebuilding the entire model. Beyond the significant technology advances, this tipping point is supported by consumer enthusiasm for the personal assistant model of interaction. This attitude change is important because it leads to consumer tolerance of inaccurate responses when they do occur, and a willingness to repeat or restate a request. A subsidiary effect of the personal assistant model is that call centers will face even more resistance to automation if they don't adopt a less-structured natural language approach in their operations, since consumers know now that it is possible. Conversely, they will witness more acceptance of automation when they do adopt it. Companies have a chance to build on this major change in attitudes by recognizing a paradigm shift and adopting the assistant model. Article from Speech Technology by William Meisel, Ph.D., president of TMA Associates Finally, whether it be Apple or Nuance, or both, someone has finally grasped what is important and how to properly market speech recognition so as to move it in the direction of mainstream technology. This is how speech recognition should be marketed. It would appear that there is hope after all!!! Chuck Runquist -------------------------
|
|
|
|
|
|
|
|
|
Will be getting an iPhone 4 s this week. What will DNS do for me? Does it help with hands free?
Bill |
|
|
|
|
|
|
|
|
We haven’t heard any rumblings for a while but supposedly, the iPhone 5 is due to be released any day now. You might want to check on the ETA before purchasing the soon to be dated iPhone 4s technology. -------------------------
|
|
|
|
|
|
|
|
|
I am impressed with the quality of transcription using SIRI or the corresponding function on the Nexus7. The Nexus7 will work in slo-mo if there is no connection to the servers. What is most surprising to me is the fact that despite the time devoted to training, the masses of documents fed to the program for analysis and the body of corrections available, Dragon does not perform discernably better. I would wonder what advantage, if any the backend servers could have. I am running Dragon on machines with up to 12 Gigs of RAM, SSD's and fast 4 core processesors and one overclocked machine.
|
|
|
|
|
FuseTalk Standard Edition v4.0 - © 1999-2013 FuseTalk™ Inc. All rights reserved.