Speech Analytics in the Call Center

Will speech analytics revolutionize the call center? Or is it overhyped? We dig into the details and cut through the hype for you.

Speech analytics has the potential to transform certain aspects of the call center business. The labor costs associated with call centers are the primary driver of a call center operation’s financial success. Speech analytics can be used to reduce labor costs and improve efficiencies.

Speech analytics and its application in the call center has also been overhyped for some time with promises of a magic voice in a call center agent’s ear whispering to tell the agent how to handle an irate customer.

Let’s dig in and separate the hype from the reality of speech analytics in the call center.

What is speech analytics?

First, what is speech analytics? Speech analytics is the use of artificial intelligence techniques to convert human speech into text so the text can be analyzed and some action can be taken (or recommended) based on that analysis.

There are a few major concepts wrapped up in the term “speech analytics”, and it is important to understand them to really understand how speech analytics can be used in a call center environment.

In general, speech analytics = ASR + NLP (or using plain language… speech analytics is speech to text analysis using machine learning or natural language processing).

Automatic speech recognition (or “ASR”)

In order for speech analytics to work in a call center, there needs to be some automatic speech recognition software that is used to convert speech into text (either in real time or in a batch).

Luckily, automatic speech recognition software has advanced in leaps and bounds the past 10 years. There are a large number of ASR software solutions that work well in a call center environment. Just a few years ago, ASR software models needed to be trained to recognize certain voices in order to work at a high degree of accuracy. In a call center environment, there’s no way to train a model. Each call may involve a different caller and a variety of different agents.

Currently available ASR software models are able to perform highly accurate speech recognition in call center environments.

Once speech has been converted to text, what happens next?

Converting speech to text is an important first step in speech analytics. At this point, we essentially have a transcript of a call. For some call center applications, this may be sufficient. For example, having a transcript of every call may allow a call center to perform text based searches to identify different calls.

But, it’s most likely that your call center application will need some further processing to make these call transcripts have some value.

Natural Language Processing (or “NLP”)

Most call center speech analytics applications use some form of natural language processing to not only improve the quality of the speech recognition, but to also allow the software to “understand” the meaning or intent of each speaker.

For example, in a phone call involving a call center agent and a caller, one or both parties may not be speaking clearly or using full sentences, making it difficult for the speech analytics software to properly convert the speech to text or understand the intent of either party.

NLP processing is used to assist in speech recognition as well as to tag different words to identify “part of speech” usage (or “grammatical tagging”). Part of speech tagging is a process of determining how a word or phrase is used (e.g., is it used as a noun? A verb?).

Natural language processing can also include performing sentiment analysis, which is an attempt to extract subjective information from a conversation. Many NLP models are able to predict a speaker’s attitude, emotion, or other feelings from a text.

NLP processing is generally performed after (or in conjunction with) the conversion of speech to text.

A subset of natural language processing is natural language understanding (or “NLU”). NLU performs analyses of text and speech to determine the intent or meaning of a sentence or paragraph.

Natural language processing and understanding also may be used to identify speakers in a conversation. For example, in a typical call center call, the speakers include an agent and a customer. It is important (from a speech analytics perspective) to identify the speech of the agent as well as the speech of the customer. An NLP concept called “speaker diarization” is used to perform this separate identification of each participant in the call.

Once a speech analytics application has performed automated speech recognition (to generate text) and operated on the text using NLP (and its child, NLU), a phone call between an agent and a customer now is actionable.

The ASR resulted in the generation of a transcript, which can be searched or analyzed.

The natural language processing provided context and meaning so that a computer can understand the discussion and can be programmed to perform different actions based on the context of the discussion.

Speech analytics can work in “real time” (to analyze a phone call in progress) as well as in a “batch” mode of operation (to analyze recordings of phone calls).

In summary, speech analytics in a call center generally is a transcript of a call, as well as some machine-tagged understanding or description of the call (that typically includes information about parts of speech, intent and speaker diarization).

For those who want to dig deeper into the tech behind NLP and ASR, this video is an excellent introduction.

How does speech analytics work?

We won’t go down the rabbit hole of details here. Instead, let’s take a practical view of things and explain how a typical call center may implement speech analytics.

Batch speech analysis

Batch speech analysis processing is the most straightforward way to implement speech analysis in a call center. Most call centers already record their calls. Batch speech analysis simply takes those call recordings and converts the recordings to text and associated natural language processing attributes.

Often, batch speech analysis is done using a third party system that is outside of a call centers existing software platform.

Real time speech analysis

Real-time speech analysis in a call center is more complex. The speech analysis software needs to have access to the audio stream as it occurs. This can be performed by bridging the speech analysis software into a call (e.g., via a conference) or by streaming the audio over a separate port to the speech analysis software.

We will review the technical details of batch and real-time speech analysis in a separate post.

For the purposes of this article, think of the two options this way… Are you looking to perform after-the fact analysis on calls? Then batch processing is probably the way to go. Are you looking to provide in-call assistance or information? Then you'll need to look into real time speech analysis.

Speech analytics uses in a call center

Here are a number of speech analytics use cases that actually work today that you can implement in your call center. (These are real – not hype).

Perform automated quality assurance (“AQA”). You can use speech analytics to analyze some or all of your calls and automatically apply quality assurance rules to identify calls that violate your rules. AQA is commonly done on stored call recordings to transcribe calls for text analysis.
Perform sentiment analysis. You can use speech analytics to predict the sentiment of a caller. Is he mad? Happy? Frustrated? Real time speech analytics can be used to signal the sentiment of a caller to the agent (or to a supervisor). Sentiment analysis can also be used to trigger special scripts or messaging an agent may use to respond to a caller's sentiment.
Perform content moderation. You can currently use speech analytics in your call center to bleep or cut out sensitive or offensive content. For example, you can establish rules to redact curse words, or to redact sensitive data (e.g., such as PCI data or health care related data).
Automatically identify trigger words. Real time speech analytic processing can be used to identify trigger words (or words that require special handling). For example, a sales call center may use speech analytics to monitor for words like “returns” or “attorney” or “attorney general”. When those words are identified, a supervisor may be alerted or a special script may be presented to the agent to handle the situation properly.
Translate calls. Speech processing can be used in a call center to perform real time translation of callers. Sometimes, call center agents are faced with callers who speak a different language. Real time speech processing can translate the caller (and the agent) language to allow the call to be handled or transferred to the appropriate agent or destination.
Enable self service. Speech enabled IVRs are a great use of speech recognition technology.
Assist agents. Speech analytic tools are often used to provide some agent assistance (e.g., such as by prompting agents with information during a call using call whispering).