What marketers must know about natural language processing

This is the first post of a series on language technologies. Why do marketers need to know about it? Because their business relies on how well they understand their customers. In today’s digital and mobile world this means listening to what customers say on social media, in free-form messages, even in search queries. Businesses are drowning in text information. Here’s where language processing solutions come in handy, extracting actionable signal from noise.

In the spirit of the upcoming Father’s Day, did you know that daughters send more Father’s Day tweets to their dads than sons do? And that apparel, together with special meals, alcohol and barbecue tools and meat are the top four gift ideas mentioned in the tweets? Uncovering insights like these, which are invaluable for any retailer counting on a big Father’s Day, would not be possible without Natural Language Processing (NLP).

Natural language processing, also called “text analytics”, is the technology allowing machines to understand what people write or say conversationally at a scale and speed that greatly exceeds the abilities of human experts. In their Father’s Day Twitter analysis, a text analytics company, Luminoso, looked at 92,450 tweets. It would take an average person, working eight hours a day, about 21 days, or an entire work month, to read through all those tweets. And we haven’t even gotten to the work of generating insights.

Big Data became a buzzword in the business world a long time ago. Since then, organizations have started learning how to draw actionable insights from data that’s quantitative and structured. Now the time has come for Big Text.

Big Text is this enormous universe of documents, e-mails, free text forms, social media posts, product reviews, call center logs, you name it, generated in natural language. The documents are inherently vague, ambiguous and context-dependent, because that’s what human communication is. This is unstructured data that is not presented in any kind of pre-defined data model readily understood by machines. According to a new IDC study, unstructured data composes up to 90 percent of all digital information. Can you say that you read a book if you only understood 10 percent of it? Can you say you know how your business operates, if you can only measure 10 percent of it? Organizations that are not leveraging unstructured data risk losing business to more data-focused competitors. And, again, most of unstructured data comes in the form of natural human communication.



How hard is it for a machine to make sense out of a phrase in natural language? Imagine you are selling dresses and your potential customer types “fabulous cocktail dress for a wedding” into Twitter, Facebook, her blog or your site search field. We, humans, understand what this person is looking for because we know that “dress” is a piece of clothing and “cocktail” is its attribute, and not a drink, and “wedding” is a formal occasion, which usually has guests, guests who are expected to dress up. How do we teach machines to understand this query so it can facilitate discovery?

When people think about machines understanding human communication the first thing that usually comes to mind are virtual personal assistants like Siri or smart question-answering systems like Jeopardy!-winning IBM Watson. However natural language processing is much more pervasive. Say, when a customer routinely searches for products online, language technologies are quietly working in the background to generate relevant search results. To understand a search query machines do “query parsing”, breaking up the query into words and understanding how they relate to each other using statistical and machine learning heuristics.

For instance, to understand whether a product called “cocktail dress” would be relevant for a customer shopping online for a bridesmaid dress, the machine may calculate the similarity of these two phrases by looking at all the contexts in which both phrases are used. This approach is called “distributional similarity”: the higher the number of similar contexts, the higher the probability that the two phrases will have the same meaning.

There are some great natural language processing service providers in the market who can deal with all sorts of tasks related to text analytics. But for businesses, having an available technology is just part of the solution. The largest challenge lies in the organization’s need to identify use cases and answer the question of how they could benefit from language technologies.

The market understands this and tries to sell business solutions instead of pure language processing technologies. Seth Grimes, a renowned expert in this field, observes: “Text-analytics technology is increasingly delivered embedded in applications and solutions, for customer experience, market research, investigative analysis, social listening, and many, many other business needs. These solutions do not bear the text-analytics label”.

Voice of the customer analytics/customer experience management remains the biggest driver for adoption of language processing solutions. Alongside with analyzing Father’s Day tweets, it may bring businesses much more tangible results. With the help of text analytics, a leading hardware technology company managed to reduce the number of social media messages a specialist at its social media command center has to go through daily from 5,000 to 450 (the rest being replied to automatically or filtered out as spam). That’s an over 90 percent workload reduction!

It’s no surprise that Seth Grimes predicts that the first text analytics unicorns (startups with $1 billion or higher valuation) will be in social media analytics or customer experience management space.

More organizations are expected to benefit from natural language processing solutions on a larger scale in the years to come. According to a recent market report, the natural language processing market is expected to grow at an annual rate of 18.4% and be worth $13.4 billion by 2020. Given the speed with which the volume of unstructured data increases every second, it’s in the best interest of organizations to be able to convert this data from waste into asset. Whenever businesses have to take action based on vague, ambiguous, ever-changing human communication, natural language processing technology will have to be part of the solution. And all the better, if in the process, Father’s Day becomes a little happier for dads and retailers alike.

Photo of Twitter by Jeff Turner published under Creative Commons license.