Via supervised classification techniques in Python.

source: alesnesetril via unsplash (CC0)

Dealing with high volume IT service requests? Interested in reducing operational costs? Looking to elevate the user experience? Look no further.

This article serves as a guide for data science aficionados to deploy production-level Machine Learning solutions in the IT Service Management (ITSM) environment. The prescribed ML solution will help us peer into the black-box of IT service requests and coalesce ITSM strategy across business management silos.

We’ll be using a supervised, classification algorithm to categorize new tickets based on input text. I employ Python, RESTful API framework, Scikit-Learn and SpaCy to accomplish this task; however, there are many solutions that could more efficiently fit your organization. I’ll do my best to address opportunities for divergence as well as provide dedicated reasoning for why I chose specific methodology.

Susan Li provides an excellent overview on Machine Learning for Text Classification using SpaCy. My process, code and demonstration (as outlined in this article) are influenced by her contributions. I strongly recommend subscribing to her channel if you find any of my content helpful/interesting.

The final model is more than 85% accurate in making predictions for all tickets flowing into the production environment. SLA response times have been cut in half and annual cost savings rack in nearly $600,000.

Background

IT Service Management (ITSM) is an important corporate function responsible for leveraging innovation to increase value, maximizing user productivity, providing end-to-end tech services, and so much more. Despite such resonant enterprise responsibility, front-end IT interactions are often defined by long, arduous conversations (via web or phone) with support specialists. Whether you’re requesting a new password, submitting configuration changes for an application, or simply asking for help, you’ll be enduring a frustrating road ahead. The stigma persists because IT leaders struggle to staff and support a comprehensive help-desk management solution that serves the entire enterprise.

Despite good intentions, support organizations often miss the mark for efficient ITSM. While on a project for a recent global medical device client of mine, I was tasked with remedying the frustrating outcomes that accompany enterprise incident ticket triage. Leadership made the decision to pour resources into a high-cost legacy solution using ServiceNow (a popular ITSM workflow platform) and an outside vendor to improve response times for incoming tickets, with little success. Generalized use and stringent SLA restrictions led to inaccuracy across the board, where business groups played hopscotch with tickets that landed in their queue. Users were exasperated, support professionals were callous, and leadership was stumped. Time for a fresh perspective!

In 2019, more than 60,000 tickets were submitted to my client’s ServiceNow platform with intent to reach various nearly 15 business groups. Every ticket cost the IT organization $13, despite an average accuracy score (chance of reaching the desired target) of only 40%. Incorrectly assigned tickets bounced between business groups for an average of 21 days before landing in the right place. Cost, latency and accuracy were a huge concern, and led to poor user experiences.

When a user submits a support ticket, it flows into the platform via email, phone or an embedded portal. Each ticket contains a bit of text about the problem or request, which is quickly reviewed by a support professional and sent on its way. Once the correct assignment group picks up the ticket, some amount of work gets completed and the incident state reverts to closed.

This opportunity seems ripe for Multinomial Classification via Supervised Machine Learning to categorize support tickets based on a fixed number of business groups. We can easily scrape text and category from each ticket and train a model to associate certain words and phrases with a particular category. My hypothesis is simple: machine learning can provide immediate cost savings, better SLA outcomes, and more accurate predictions than the human counterpart. Let’s get started!

Data Gathering & Exploration

Before selecting and training machine learning models, we need to take a look at the data to better understand trends within incident tickets. ServiceNow provides a robust Table API framework for us to grab the data directly from the platform.

# Import Statements
import requests
import json
import pandas as pd

# Initialize url
url = "https://{instance.service-now.com}/api/now/table/incident"

# Set simple authorization
user = "{username}"
pwd = "{password}"

# Set proper headers
headers = {"Content-Type" : "application/json",
"Accept" : "application/json",
"accept-encoding" : "gzip, deflate, br"}
# Initialize GET response
response = requests.get(url, auth=(user, pwd), headers=headers)
data = json.loads(response.text)
dataframe = pd.DataFrame(data['result'])

ServiceNow provides you with fantastic opportunity to explore the nuances of RESTful API tuning with their embedded API Explorer. This tool helps the user build custom API requests from scratch, whittling down query parameters, fields, etc. into easy-to-understand increments. Furthermore, you can hit popular tables (incident, task, etc.) or create complex queries. It’s an awesome tool for any data professional!

Let’s take a look at our dataframe:

Since we’re interested in associating text with a relevant classifier, we can use a categorical variable like “u_portfolio” to label each row in our dataframe. Despite a pretty serious class imbalance (“Global Support Services” with almost 65% of all records) and more than 2,000 missing values, we want to eliminate those specific categories with fewer than 100 tickets to reduce noise and ensure we’re only using relevant categories. Let’s create a pure text column called “text” by concatenating together “short_description” and “description”. We definitely want to visualize the updated dataframe!

import matplotlib.pyplot as plt
import seaborn as sns
# Eliminate categories with fewer than 100 tickets
classifier = "u_portfolio"
ticket_threshold = 100
df_classifiers = df[df.groupby(classifier[classifier].transform(len) > ticket_threshold]
# Print number of relevant categories & shape
print("Categories: " + str(df_classifiers[classifier].nunique()))
# Plot the classifiers
fig = plt.figure(figsize=(10,6))
sns.barplot(df_classifiers[classifier].value_counts().index, df_classifiers[classifier].value_counts())
plt.xticks(rotation=20)
plt.show()

It looks like we dropped more than 5 categories after setting the threshold to 100 tickets, returning only those categories with relevant business value. After digging into the data and asking around a bit, I confirmed the dropped categories hadn’t been used in more than a year and could be comfortably eliminated.

A note on class imbalance and the wonderful world of enterprise consulting:

Global Support Services (GSS) accounts for more than 60% of total support tickets. That means we could write a simple program to assign GSS to every incoming ticket, and we’d be right more half the time!

Without doing any advanced analysis, we’ve identified a major issue. The 3rd party vendor that charges $13 for every ticket interaction and averages 40% accuracy, is performing worse than if my client took no action at all…Imagine breaking that news to the CIO!

The remaining categories will be used as the labels to train/test the model. Let’s save them as a list:

category_labels = list(df_classifiers[classifier].value_counts().index)

Now that we have our category_labels, we need to better understand the text patterns for each type of ticket. By peeking into the ServiceNow platform, I can quickly gather a few themes by category: GSS handles a lot of password resets and hardware issues; Business Intelligence covers reporting functions and data questions; Customer deals with SalesForce and other customer apps; SAP S/4 Security manages all ERP related access/configuration. If you’ve worked in this corporate arena before, these motifs sound familiar. It’s easy for a human to identify a few keywords for each category by studying the data — let’s see if a computer can do it too!

Once we run the code, we can inspect the output:

Unfortunately, there’s not much here, as the most common words barely differ by category. I investigated and found that emails account for more than 75% of ticket submissions; all internal employees have some version of a confidentiality notice below their signature that distorts significant differences between the categories. We can try and change N to see if other patterns emerge “down the line”, or hard code the email signature into the STOPLIST variable to prevent it from showing up, but this wouldn’t fix the root cause. Instead, we want to find words/phrases that correlate with each label in our list of categories. This is called Term Selection, and can help us identify the most relevant terms by label for our dataset.

Let’s explore some ML solutions for measuring and evaluating correlation!

Building the Model

Natural Language Processing (NLP) sits at the nexus of computer science and linguistics, defining the solutions for how machine and human languages can interact with one another. Functionally, NLP consumes human language by analyzing and manipulating data (often in the form of text) to derive meaning. To do this, we need to convert data that’s passed between humans into a numeric format that is machine readable. This process of encoding text is called Vectorization, and catalyzes computational processes like applying mathematical rules and performing matrix operations that can produce valuable insights.

Although there are some super cool, burgeoning methods for vectorizing text data for NLP like Transfer Learning and advanced Neural Networks, we’re going to use a more simplistic technique called Term Frequency — Inverse Document Frequency. Tf-idf values increases proportionally to the number of times a word / phrase (n-gram) appears in a document, offset by the number of documents in total. Although it sounds complex, it basically reflects how important an n-gram is to the document without favoring words that appear more frequently. This is especially powerful for processing text documents where class imbalance exists, like ours! You might use a Count Vectorizer if your text is well-balanced.

Now that we understand how computers consume text data, we can experiment using different models! Here’s some starter code to test out a few options:

Let’s use Logistic Regression as our model of best fit. As a data scientist, you should be able to apply multiple techniques to a project and choose one favorably suited to the opportunity. In my experience, human psychology plays a major part in the success of a use-case; coaching an enterprise to accept emerging, disruptive technology takes time and energy! It’s just as important to market, brand and sell your solution as it is to build an elegant algorithm. Let’s build our model!

For this project, I had ample opportunity to socialize concepts methodically to address questions in real time. A major part of my success criterion for this use-case was the uptake by leadership to both understand and spread these data science concepts in context. Because there are many other algorithms/models that can optimize model performance based on the data, I encourage you to experiment.

Production Deployment

When a user submits a ticket, it’s easy to grab the text and pump it through the model. In doing so, we can determine…

a) if the model finds the text relevant

b) which category best fits the text

# Save the model to variable 'model'
model = pipe.fit(X_train, y_train)
# Save array of predictions for given TEXT
predict_category = model.predict(TEXT)
# Save array of prediction probabilities for given TEXT   
predict_probability = model.predict_proba(TEXT)

Both of our variables that predict will return an array of numbers proportional to the length of the categories list. If we print predict_category, we expect an array of 8 numbers that correspond to our 8 categories with either 0 or 1 to represent relevance. If the text string was “I need a new corporate laptop”, then we should expect an array of 0’s except for a 1 in the nth position that corresponds to “Global Support Services”. We can use ‘predict_probability’ to see how strong the prediction result for GSS is in context; at 98% for this particular text string, it’s safe to say we trust the model 😃.

We can use the same Table API that we employed to scrape the data, replacing our GET response with a PUT request, to update the ticket in ServiceNow. In real-time, a user submits a ticket and the model updates ServiceNow with the predicted category in less than a minute. Let’s pat ourselves on the back for implementing an effective machine learning solution!

Deploying a model in production depends on what technology stack your particular enterprise subscribes to. My client is an AWS shop and manages a great relationship with access to the full suite of AWS tools.

I played around with Lambda and SageMaker to automate support ticket assignment in a serverless AWS environment. However, it was considerably easier to spin up an EC2 instance to host the model and interact with ServiceNow directly. ServiceNow has built-in ‘Business Rules’ that can be configured to trigger API calls on the model and perform updates. The final deployment was slightly cheaper and much more easily updated in the EC2 server, and relies on AWS and ServiceNow communicating effectively. AWS documentation is legendary for its depth and breadth; I strongly recommend consulting appropriate resources before diving in.

If these terms mean nothing to you — don’t fret! Basically, the machine learning pipeline needs to be hosted in an environment agnostic of the people and technology involved. If new developers come aboard, ticket volume triples overnight, or leadership elects to use KNN in R instead of LogReg in Python, the environment needs to accommodate variable scale. The entire pipeline was developed on my local machine, but production deployment can’t rely on my computing resources and/or availability in the long term. Keeping it on a server (hosted in the cloud) ensures sustainability and efficacy. This is a critical shift between the build phase and actual deployment.

Evaluating the Model

After all this work, what did we accomplish? To start, we have an awesome ML solution that uses Natural Language Processing to categorize all incident tickets for a global company. We save the enterprise nearly $600,000 annually by automating ticket assignment and circumventing the 3rd party vendor. We improved average accuracy from 40% to 85% and cut SLA response times in half! Anecdotally, the user experience is significantly more positive and trust in the ITSM environment has skyrocketed.

Unexpectedly, my intense focus on data and process improvement in ServiceNow helped to coalesce department strategy and inform better decision making. IT Leadership, comprised of VPs, Directors and Senior Managers, were so excited about saving money, improving response times, and bolstering the user experience. Despite initial trepidation around sustaining new technology, operationalizing the model in production, refactoring platform workflows to the vendor, etc. leadership finally accepted the change. Deployment offered the opportunity to democratize decision making and evangelize complex data topics. I believe leadership is better equipped to strategize and staff data science projects in the future, displaying this use-case as a springboard to convey the success of predictive analytics and data science.

If you have any questions or comments about the methods outlined in the article, please drop a line below or message me directly! I’d love to hear from you 😄.


Predict IT Support Tickets with Machine Learning and NLP was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.