Now that VoiceXML has been accepted as a powerful way to develop and deploy dialup voice access applications, readers are starting to ask how easy VoiceXML really is to use, and where one goes to get the knowledge to "do it right." In pursuit of answers, I found a company, and someone, with the right voice portal application stuff.
Todd Elvins, Ph.D. (yes, he's a real doctor) co-founded Indicast (www.indicast.com) - a company in the forefront of VoiceXML application development. Indicast provides private-label, user-customizable voice portal services to telecom, web, and enterprise customers and has amassed a big (they say the biggest) database of professionally-produced audio content from sources such as ABCNEWS.com, The Wall Street Journal, etc. Todd's company was the first provider to sell voice portal services to a wireless carrier; including voice-activated dialing, unified messaging, business finder services, driving directions, and other voice-activated telephone services. In other words, they do lots of VoiceXML programming and are uniquely qualified to give app development advice.
Todd says that to build a commercial-grade VoiceXML application can take almost as much time and effort as if you were to program it directly to the automatic speech recognition (ASR) engine API (Nuance, SpeechWorks, etc). Since VoiceXML abstracts the ASR and text-to-speech (TTS) APIs away from the programmer, this may seem a little surprising. But remember, we're taking about what it takes to produce a commercially polished application, not a quickie demo. When you're done, of course, VoiceXML offers additional benefits, including maintainability and (in principle) the ability to port your app to different portal providers, ASR and TTS platforms. Such portability can only be ensured, however, if you resist using extensions to the core VoiceXML language, an easier rule to follow with the VoiceXML 2.0 standard that was recently released.
Here is a list of things Todd feels should be considered when it comes time to develop your own application. Follow these guidelines and you should end up with an application that callers will enjoy using:
Attend a course on ASR. Several ASR companies offer classes on developing speech-based applications and building and tuning grammars. This step will save you months of trial and error. A list of speech recognition companies can be found at the VoiceXML Forum's website (www.voicexml.org).
Design your application and voice interface. To define a usable voice interface design is challenging but well worth the effort. Consider hiring a linguist (any ASR company can help) with a solid background in voice interface design to help you define a "voice interaction philosophy" which should be dictated by the type of data being delivered to callers. Next, conduct paper and live caller studies to fine tune your "content-specific" voice interfaces.
Make use of available VoiceXML tools. These are usually available on voice portal websites and will at least get you started. These tools include VoiceXML debuggers, editors, and grammar builder/checker modules. Expect to hard code some portions of your application, though.
Tune end points. An utterance is a spoken command by a caller. End points are parameters that describe the ASR listening window. Examples include the expected length range of an utterance and expected amount of silence on either end of the utterance. Tuning end point parameters can have a dramatic effect on ASR accuracy. By recording a few dozen utterances from each of a few hundred callers (and in varying environments from quiet to noisy), and then physically listening to and/or transcribing the utterances, you can determine an optimal listening window. You may want to enlist professional help from your ASR provider the first time.
Next month-we continue Todd's VoiceXML checklist.
Chris Bajorek is co-founder of CT Labs, an independent full-service converged communications and IP Telephony product testing and certification lab. He can be reached at cbajorek@ct-labs.com.
CommWeb Special: Voice Portals
CommWeb has commissioned extensive lab tests on the industry's leading VoiceXML voice portals. You can read the complete results, plus our analysis, online at www.commweb.com/article/COM20010912S0007.