|
Speech Recognition Goes Mainstream By Masha Zager
In the past year, speech recognition technology seems to have made its long-awaited breakthrough to mainstream acceptance. “It’s not a science-fair project anymore,” says Brian Garr, program director for conversational solutions with IBM (Armonk, NY). Other vendors agree that speech technology has made major inroads into call centers and, more important, that companies are beginning to develop strategies for implementing speech technology rather than experimenting with a simple application or two. What accounts for this turnaround? Many competing theories have been proposed. Not surprisingly, vendors have reached differing conclusions about how to design, enhance and market their products. The speech recognition industry is a complex ecosystem consisting of many products and services. The basis for the industry is the speech recognition engine – software that converts digitized sound waves into written words. Other significant software products include text-to-speech engines; voice platforms, or development and runtime environments; toolkits that enable you to build, maintain and tune speech recognition applications; tools for monitoring and testing applications; and “packaged” speech applications that call centers can customize and implement within a few months. Speech recognition service providers include integrators, designers, developers and testers, among other specialists that help companies implement speech recognition in their call centers. The one factor that has changed very little is the speech recognition engine itself. While vendors have tweaked their speech engines to make them work more efficiently and reliably, they haven’t needed to make substantial improvements recently. Kevin Shaughnessy, product manager with Microsoft (Redmond, WA), whose Speech Server includes a speech recognition engine, says that the accuracy of these engines is now a nonissue. “The technology is a commodity,” explains Bruce Balentine, executive vice president of Enterprise Integration Group (EIG; San Ramon, CA). “The value is in the application, not in the technology.” This article outlines trends with the development of speech recognition software. In another feature article next year, we’ll focus on services that enable call centers to deploy speech recognition as a hosted or network-wide service. Increasing Call Automation Speech recognition vendors have traditionally marketed their software as a means of automating and reducing labor costs associated with certain types of customer service calls. Some of the packaged speech applications coming onto the market have very high call completion rates. For example, call completion rates for TuVox’s (Cupertino, CA) applications of speech recognition that allow callers to look up fares and schedules, make payments and return products range from 14% to 43% above the rates for the touchtone applications they replace. When either customers or TuVox’s system opts to direct calls to agents, agents view transcripts of the automated dialogues that precede the routing of the calls to them. Another reason more calls can be automated is that the range of speech recognition applications is steadily, if slowly, increasing. One well-received new application is the customer satisfaction survey. Nuance's (Burlington, MA) packaged survey application, for example, can survey customers following a live conversation or automated transaction over the phone. Vendors report that other new and successful speech applications include activations of wireless add-on services and simple technical troubleshooting. But interest in packaged applications may have peaked, says Peter Mahoney, vice president of worldwide marketing for Nuance, which sells packaged applications to the health, utilities and automotive industries. In the last six to twelve months, according to Mahoney, enterprises have shown more interest in raising call completion rates for their customized applications. Nuance has accommodated these customers by making it easier for them to create foolproof applications. After analyzing millions of calls, Nuance concluded that callers would trip up speech scripts, for example, when they would offer more information than the speech recognition system requested (System: “Where are you flying from?” Customer: “I’m leaving from Boston on Tuesday.”) or attempted to state corrections the system didn’t understand (“No, not Austin – Boston!”). The most recent version of Nuance’s OpenSpeech Dialog design tool helps designers create applications that deal with these and other common scenarios gracefully. (Nuance merged with another speech recognition software developer, ScanSoft, last month; the combined company is known as Nuance.) Other vendors have also tried to boost call completion rates. For instance, LumenVox (San Diego, CA) recently improved its Speech Platform’s end-of-speech detection and error-handling capabilities, while the Conversation Engine that Voxify (Alameda, CA) uses to create its packaged applications lets callers choose their own paths through transactions. Design tools alone can’t create foolproof applications; designer expertise is also required. After several years of experience, best practices have been developed and designers have become more skilled. Vendors are now able to offer expert design advice, such as that given by the “design collaborative” of human factors specialists with Edify (Santa Clara, CA). Finally, better – and earlier – testing has also increased completion rates. Calls can fail for many technical reasons, including incorrect configurations of equipment, poorly-designed databases or problems with telecom networks. Testing suites, like Empirix’s (Bedford, MA) Hammer systems, typically verify that call centers can route lots of calls and accommodate lots of IVR transactions accurately. Empirix now incorporates speech recognition among the types of applications that the Hammer systems test. Empirix also offers OneSight Voice Watch, a hosted service that tracks whether your call center’s speech recognition, routing and IVR systems are behaving as you expect. systems are behaving as you expect. Empirix can also provide a nonhosted variant of OneSight Voice Watch, OneSight Voice Engine, that you implement within your call center. Intervoice (Dallas, TX), a developer of speech recognition tools, offers a usability testing service. Through this service, panels of consumers, who are demographically similar to your customers, provide feedback about how easy it is to use your call center’s applications of speech recognition. Despite all the effort that call centers have invested in increasing call completion rates, some observers believe that the business case for speech recognition has little or nothing to do with reducing how much time customers are on the phone with live agents. IBM, for instance, estimates that call completion rates for speech rec average between 10% and 15% higher than that for touchtone; EIG doesn’t find that they are any higher. Dan Miller, co-founder and senior analyst at the San Francisco-based research and analysis firm Opus Research, argues that touchtone applications already handle most of the transactions that lend themselves to automation. He adds that longstanding customers are often expert, and satisfied, touchtone users. Miller even makes a case for building speech applications to mimic touchtone applications that are already familiar to customers so that “the shock of the new” won’t drive customers away. Delighting Customers If the prospect of automating more calls doesn’t represent the whole story, what else is driving the adoption of speech recognition? “Companies are getting more serious about the customer experience,” says Lynda Smith, chief marketing officer with Nuance. Several other vendors agree. Research to back up this theory certainly exists. According to a recent study Genesys (Daly City, CA) commissioned, 85% of respondents said they find speech recognition satisfactory or better. Sixty-five percent of respondents said they prefer speech recognition to touchtone. Among organizations the study surveyed, 62% that used speech recognition saw an increase in customer satisfaction. Smith admits that callers sometimes need education before they appreciate the benefits of speech recognition. “People still want to talk to people,” she says. To get customers over the hurdle, some companies mail them instructions for how to use speech recognition systems; other companies enlist call center agents to promote these systems and encourage callers to use them. Marie Jackson, Edify’s vice president of marketing, reports that many companies use speech applications to increase customer satisfaction by reducing wait times for high-volume inquiries and keeping service available all the time. The “persona,” or the voice character with whom callers speak, presents another “opportunity for customers to confirm that they’re doing business with the right people,” Jackson says. As is often the case with IVR systems, callers typically reach speech recognition systems before they reach agents. Unlike IVR systems, which present callers with choices of digits they should press to connect with specific departments of a company, speech recognition systems generally pose a more open-ended question: “How can I help you?” If the caller doesn’t know how to respond to this question, the speech recognition system offers examples of possible answers, such as indicating whether they seek technical support or want to find out their account balances. A new way of using speech recognition to boost customer satisfaction is to integrate information about callers from your customer relationship management (CRM) software. Through this approach, you could enable your speech recognition system to greet callers by name and speak in their preferred languages. In addition, your speech recognition system would be able to keep track of products and services callers use, as well as recall which types of automated transactions they most often conduct by phone or on-line, and when these transactions occurred. A recent survey of consumers from Intervoice found that 88% of respondents said they liked being greeted by name, 72% enjoyed having their language preference remembered and 85% were pleased that speech recognition systems knew about transactions they had completed on-line. Even if callers like speech recognition, it’s not clear that they are happy enough to justify investments in speech technology. The notion that the business case for speech recognition rests on customer loyalty has its naysayers. “The vendors want to sell user delight, but they can’t find callers who are delighted,” says EIG’s Balentine. In his view, there is a “conflict of interest between the company and the caller. The end user wants service, and the enterprise wants the advantage of delivering service without actually having to deliver it.” Easier Deployment Speech recognition applications are notoriously difficult to deploy. Up to half the work in any deployment involves “tuning,” or adding key words and phrases based on what callers actually say. Tuning continues indefinitely, and must be repeated every time a change is made to the application. This keeps many companies permanently tied to the consultants who designed and installed their applications. There are ways to reduce deployment costs and time, such as by implementing packaged applications. TuVox, for example, says its speech recognition applications can typically be deployed in 60 to 90 days, and that customers can manage the applications themselves once they are installed. TuVox has also added tools such as a dialog generator that generates code from companies’ knowledge bases and call-flow diagrams. “What takes weeks to be custom coded, we can automate in several hours,” says Azita Martin, TuVox’s vice president of marketing. Speech recognition vendors are also expanding their grammars, or lists of utterances that speech engines can recognize, to reduce the amount of tuning required. These vendors offer grammars that account for different possible ways callers answer basic questions. (Speech applications need to be told that “Definitely not!” means “No.”) At the same time, speech recognition vendors can also provide grammars that include terminology that’s specific to certain industries, like financial services. Yet another strategy for speeding up deployment and maintenance is to make design and tuning tools simpler to use. IBM’s third-generation tools, according to Garr, enable subject matter experts to participate in creating applications. Microsoft has adapted its Visual Studio Web development tool for use with Speech Server, hoping to take advantage of widespread Visual Studio expertise. As a result, some Microsoft customers have begun maintaining their own speech applications, using in-house Visual Studio developers. “Developers who do Web applications are 90% of the way there,” Shaughnessy says. With more user-friendly tools, touchtone application developers can transition more easily to working on speech applications. As Opus Research’s Miller points out, companies can now tap the pool of personnel who can apply their expertise in touchtone to developing applications of speech recognition. EIG’s Balentine, however, believes vendors are still far from creating the ideal platform. “There’s too much technology and not enough thinking about how conversations take place,” he says. EIG, which currently provides only professional services, is now developing its own voice platform; the company hopes this software will support speech application developers similarly to the way in which Microsoft Windows supports PC application developers. Convergence The biggest change in the speech recognition industry over the last few years has been the move toward open standards and convergence with other IT assets. Traditional touchtone IVR systems were proprietary systems unrelated to enterprise IT (and to each other) that existed largely in silos outside the consciousness of IT departments. Such isolation isn’t likely to continue. That tendency is changing as IT departments insist that call center technology follow their organizations’ IT standards. One trend in the evolution of IT is that as technologies continue to mature, they are able to use software to fulfill functions that used to require hardware. As more hardware, in turn, accommodates Internet protocol standards, a company that uses this hardware is better able to share IT and telephony resources on the same network. Deploying speech recognition software over a network reduces the amount of equipment needed and supports business continuity. Avaya’s (Basking Ridge, NJ) latest release of its IP-based Avaya Voice Portal, for example, can automatically distribute licenses and workload from a failed server to other servers. Just as IP telephony systems depends on standards to facilitate communication over IP networks, vendors that occupy different niches in the speech recognition ecosystem need standards to ensure their products work together. During the last year, twelve vendors’ voice platforms have been certified by the VoiceXML Forum, an industry organization that oversees the Voice Extensible Markup Language (VoiceXML) standard. Other vendors say they support VoiceXML. (Microsoft is committed to SALT, another XML variant.) Last year, the World Wide Web consortium also adopted Call Control XML, or CCXML, which provides telephony call control support for VoiceXML and other systems that automate transactions over the phone. Vendors are now beginning to use this standard as well. According to IBM’s Garr, VoiceXML and CCXML are important not simply because they allow software interoperability but because they move the business logic off proprietary IVR hardware and onto less expensive off-the-shelf software. “The IVR [system] used to do the heavy lifting,” Garr says. “Now it answers the phone and passes it off to the application server [where the voice platform now resides].” This, in turn, means that speech applications can be located together with other customer service applications. And because VoiceXML is a close relative of XML, the scripting language used to integrate Web-based applications, speech applications can join what IBM calls the “service-oriented architecture” – a common framework for delivering information over multiple channels. Through such an architecture, companies can use Web self-service scripts they developed in XML and incorporate these scripts within speech recognition applications. Ideally, a customer who looks up a bank account balance receives the same answer – whether from a Web site, from a Palm Pilot, at an automated teller machine or over the phone. In the end, the recent adoption of speech recognition may result from the confluence of two developments: the cyclical upgrading of call center technology and the convergence of speech recognition within the realm of IT. As call centers and IT operations establish themselves as crucial aspects of companies’ business strategies, the two entities are more interdependent than ever. That may explain why IT departments increasingly insist that deployments of technology in call centers follow their companies’ overall IT standards. “If it doesn’t fit into the IT strategy, you’re not going to get the money,” says Lawrence Byrd, director of communications applications with Avaya, in reference to call center technology. “It’s not someone else’s problem; you need to be familiar and comfortable with it.” Nuance’s Mahoney points out that companies that upgraded their call centers for Y2K are finding the technology they bought nearing the end of its useful life. Any new technology call centers buy now, especially if it’s technology that callers use, will have to meet not only the requirements of IT, but also the goals of the business and the needs of customers. In an environment where automation must not only help companies save money, but also enable them to retain customers, speech recognition is a worthwhile investment. Masha Zager is a freelance writer. Enhancing Revenue Can speech recognition technology help companies sell more products? Some companies attempt to up-sell entirely through speech applications. Others use these applications to promote additional products or services to existing customers. In this scenario, the speech recognition system gauges a caller’s interest before directing him or her to an agent. The agent, in turn, is responsible for closing the sale. Voxify (Alameda, CA), for instance, plans to add business analytics as a corollary offering in the near future, based on the notion that mining automated calls for business intelligence may also help call centers yield revenue. Adeeb Shana, Voxify’s CEO, cites an example of a speech recognition application that lets callers book hotel reservations. “We found that customers would look at a certain number of room rates before hanging up,” he says. “If you know they’re only going to look at three rates, you’d better make the third one your best offer!” More Speech Rec Resources Want to learn more about speech recognition software, services and trends? Check out our e-mail newsletter about the latest happenings and developments in speech recognition. Simply visit www.callcentermagazine.com, and you’ll find a list of free newsletters you can sign up for – including the speech recognition newsletter – directly to the right of the Call Center Magazine logo at the top of the Web page. Do You Hear What I Hear?
The following companies offer software, equipment and advice for implementing speech recognition in call centers.
Aspect Software
Avaya
Edify
Empirix
Enterprise Integration Group (EIG)
Genesys
IBM
Intervoice
LumenVox
Microsoft
Nuance (merged with ScanSoft)
Sterling Audits
Syntellect
TuVox
Unveil Technologies
Voxify |