The purpose of this document is to present a set of specific use cases, nevertheless, not an exhaustive list, where the power of data science, artificial intelligence and machine learning can be applied at scale to the retail industry.

In a digital world, customer-centricity is the key and any retailer that focuses on the customer and the customer experience stands to gain over its competitors. There are several reasons for this. First, the retail industry is evolving into a “pull strategy” than a “push strategy”. The pull strategy effectively tries to pull their customers to them where as a push strategy pushes the retailer and its products to the customer. Now, the pull strategy is not a new concept. TV adverts which have long since been in existence are considered as a pull strategy since they tried to induce the customer to show interest in their products although those customers are unknown to them. An example of a push strategy would be sending a brochure of products directly to a customer.

Now, let us consider the challenge faced by a retailer versus a manufacturer or a vendor/supplier to the retailer. For instance, Gillette a manufacturer of shaving razors, would use TV adverts to advertise their products. What about a retailer like Walmart or Target that sells thousands and thousands of products to the end customers (including all Gillette products)? It is not clear how to effectively implement a “pull strategy” in this case. Of course, they would still use TV adverts to gain “mindshare” among their potential customers, but it is still far away from an effective pull strategy that attracts customers to specific products in the store in order to have a full customer experience.

Next, we look at how the life cycle of a customer purchase of a product has evolved. Previously, there were three stages as shown below:

Stimulus (Customer is interested in the product) — — — -🡪 First Moment of Truth (Customer buys the product) — — — 🡪 Second Moment of Truth (Customer experiences the product).

Now, with the widespread access to the internet, Google came up the “Zero Moment of Truth” where the customer researches online about this product. This would include information about pricing and previous customer sentiments. In this stage, there is also a possibility that the customer who initially was “stimulated” to buy a Honda car may now end up actually buying a Toyota car (a competitor). Thus, we can represent the new life cycle as shown below:

Stimulus (Customer is interested in the product) — — — -🡪 Zero Moment of Truth (Customer researches the product as well as competitor projects in the internet)………. ->First Moment of Truth (Customer buys the product) — — — 🡪 Second Moment of Truth (Customer experiences the product).

According to Google’s Zero Moment of Truth, the percentage of number of customers who research the product online is as high as 70% which means this aspect of the lifecycle of a product purchase can no longer be ignored. Also, during the First Moment of Truth stage, the customer could very well purchase the product online or at a brick-and-mortar store.

In short, implementing a customer-centric pull strategy in the retail industry is a complex problem. However, the rewards can be considerable not only on the customer front, but also, in inventory management and more efficient supply chain coordination with the myriad group of suppliers/vendors.

Now that we are focused on customer-centricity, it is clear that it would be very valuable to get a 360-degree view of the customer which is basically to get a complete view of the customer through their historical interactions and product purchases as well as the different touchpoints the customer uses to interact with the retailer (email, phone, social media, website, at the store etc.). This can greatly aid in implementing an effective pull strategy by the retailer tailored to the individual customer as well as helping the retailer in allocating inventory and shelf space to its different products.

Even though our focus is on customer-centricity, the complexity of retail operations entails upon us to understand the “multi-channel” concept that exists today in the digital era. Multi-channel retailing can be broadly classified into two — offline and online. Even these two can be sub-classified. For instance, offline channels include brick-and-mortar stores, mail order catalogues as well as direct-to-consumer telemarketing. On the other hand, online channels include ecommerce sites, social media, email and marketplaces.

We close out our introduction with the most desirable attribute in the value chain of delivering a true customer experience. This is to enable “omnichannel retailing”. This is different from multi-channel retailing which focuses on enabling the customer to interact with the retailer through multiple channels as mentioned in the prior section. Omnichannel retailing is focused on “connectivity” among the different channels to provide a seamless, continuous customer experience across the many channels the customer may wish to interact from creating a personalized brand experience. The three main attributes of omnichannel retailing are seamless, integrated and consistent.

Even before we get into the data science aspect of omnichannel retailing it should be noted that the IT systems and infrastructure needed to achieve this will be very sophisticated. Due to the dynamic nature of new channels that may crop up in the future (what if the customer wants to interact with the retailer through an IoT device like a smart watch). The general consensus is to adopt an API-led connectivity approach to achieve this from an IT implementation perspective. To understand the challenge, the customer may switch between multiple channels to view prices and other attributes before finally making a decision to purchase. This information should then go back to the ERP systems and update inventory in real-time based on the channel used.

While the focus of this document is on the impact of data science and analytics on the retail industry it should be noted that to truly achieve the ultimate benefits of analytics, one should have a robust and flexible IT infrastructure in place.

Having introduced our thoughts on how a typical retail client should position itself for competitive advantage in the current world, we now focus on what is required to deliver the full analytics potential. We believe there are three stages of achieving this goal as listed below:

  • Data Warehousing
  • Application of Artificial Intelligence/Machine Learning/Optimization Algorithms
  • Presentation of Actionable Business Insights

Data Warehousing

While our focus is not on the mechanism of how the data warehouse (or a data lake architecture) is created, a repository of data is the starting point of any analytics journey. Having said that a data scientist should be skilled enough to be able to bring in data from different data tables or sources through complex SQL queries as well as scraping websites and social media sites for relevant data. It should also be noted that some of the data may be unstructured and may be needed to be processed using natural language processing techniques. Nevertheless, our focus will be entirely on the next two stages.

Application of AI/ML/Optimization Algorithms

This is the area where we would like to highlight our expertise. This stage of the path is closely related to the last stage and these two stages need to iteratively interact with each other for successful execution of project in hand.

Typically, once we have access to the data we would do “descriptive analysis”. This would essentially try to understand the features in the data, their correlations as well as summary statistics. This can be presented visually to the domain experts either using a dashboard (Tableau, Qlik or PowerBI) or even a PowerPoint presentation. This step would then generate feedback to us in terms of how we should go about the modeling part.

Once we understand the data well enough with the aid of domain experts, we are now ready to embark on the road to performing “prescriptive analysis”. This is where we set the AI/ML framework to obtaining the best model that gives us the best prediction. Standard procedures such as “K-fold cross validation” to compare different algorithmic approaches, splitting data into a training piece and a test piece and appropriate metrics of model performance are all embedded in this framework. The choice of the right technology based on the algorithm to be implemented is also made here (R, Python, TensorFlow, H2O etc.)

In many use cases, the final result may just be the predictions (for instance, predicting footfall in a retail store). However, there may be many uses cases where we can further move up the value chain into the arena of “prescriptive analysis” (typically involves optimization) which aids the client in making the right decisions (for instance, a recommendation engine or shelf space allocation in a retail store).

Presentation of Actionable Business Insights

Inevitably, any data science project will fail if its output cannot be consumed by the end user and made actionable. Actionability by the end user is only one aspect of the success of the AI implementation. The other aspect is the ability to measure the impact of the decisions made and whether such decisions have made a positive impact on the top and bottom line of the retailer in consideration. For instance, once we provide a recommendation engine that is executed periodically or in real-time, one must measure what percentage of the recommendations resulted in a success.

The mode of presentation again depends on the context. Dashboards are great for real-time snapshots where even in some cases PowerPoint presentations may suffice. The context will also be different depending on the use case in consideration. For the same retailer, one audience might be the CMO office where as the other audience might be the vendors/suppliers who by integrating the output into their supply chain operations can bring in efficiencies that benefit the retailer itself.

Next, we present a few use cases specific to the retail industry and lay out the algorithmic approaches to solve them. As mentioned earlier in this document, these use cases do not form an exhaustive list, but serve the purpose of showcasing our capabilities in solving several challenging analytical problems that arise in the retail industry.

  1. Recommendation Engines

This is a very important use case in the digital world where the customer reigns supreme and needs to be engaged constantly. The importance of this has also been corroborated by the success of entities like Amazon who claim that 35% of their online sales come through their recommendation engine. Thus, it is critical for retailers to be able to invoke the power of machine learning and AI to provide personalized recommendations to their customers.

Essentially, there are two categories of approaches — content-based filtering and collaborative filtering. Content-based filtering is user-specific in the sense that it looks at the user preferences from historical data and makes recommendations based on similarities with past behavior. The disadvantage of this approach is that the retail entity needs to have some access to some historical behavior for every user and cannot be applied in general to new users.

The other category is known as collaborative filtering which is based on interactions between and among users. There are two kinds of collaborative filtering sub-techniques — user to user, one which finds similarities among users and based on the behavior of the users in the segment would make recommendations to another user in the same segment. As an example, consider that User A and B are in the same segment and let us say we are building a recommendation engine to recommend a product to buy. For discussion purpose, let us assume there are four items {Item 1, Item 2, Item 3, Item 4}. Now, let us consider the data we have. Let us say User A bought Item 1, Item 2 and Item 3 and User B bought Item 1, Item 2 and Item 4. A recommendation engine would recommend Item 4 for User A and Item 3 for User B. The second type of collaborative filtering is item to item similarity. Here we still utilize other users’ data but only to come up with similarities between the items that they bought. These similarities are then used to make recommendations for a particular customer. Actually, in practice, most entities try to implement a hybrid approach, often combining the nuances of content-based filtering and collaborative-based filtering.

The machine learning techniques that are applied for this particular use case are pretty standardized techniques based on distance metrics such as cosine similarity, Euclidean distance, Pearson correlation and even simple regression. The real challenge here is the scale of data that needs to be processed considering the number of potential customers and the large number of potential products to recommend.

  1. Market Basket Analysis

This use case in retail is closely connected to a recommender system, in that, we are interested in finding out which items usually appear together in a market basket. The reason why we kept this as a separate use case is because of the existence of a specialized approach known as association rules mining which is a rule-based machine learning approach that can be applied to this problem.

As an example, consider a POS sale at a supermarket. We could consider the following rule {onions, potatoes} => {burger}, to test whether a customer who bought onions and potatoes is likely to by burgers. The entire application of association rules mining is to test the “strength” of these rules from the underlying data to come up with actionable insights. Here, the action taken may not be a personalized recommendation to a customer, but decisions on marketing campaigns such as pricing, product placement and designing deals combining multiple items. These insights can also be utilized for inventory management.

Once we have fixed a rule to evaluate, we look at, in general, three attributes of the rule to guide us in understanding the importance of the relationship. They are support, confidence and lift. Support indicates how often the itemset appears in the data set. An itemset is the set of items that appear on the left-hand side of a rule. Confidence is an indication of how often the rule has been found to be true. Support and Confidence necessarily may not communicate the whole story. The other common metric is known as Lift which considers the support of the rule and the overall data set.

  1. Price Optimization (Markdown Optimization)

This is an area of great interest in retail due to the burgeoning application of AI/ML techniques. Before this price optimization was confined to industries that have limited inventory, such as airlines and hotels, but with availability of growing internal and external data and increase in computation speed, it is even possible to implement real-time price optimization strategies.

Essentially, there are three steps in executing an efficient price optimization strategy. The first step is to forecast the price. This is not exactly straightforward since the price can affect the demand and this elasticity should be predicted. This would mean looking at the sales history of similar products in the past and building a “regression tree” model that would help us infer the demand-price curve. A regression tree is nothing but an ML technique (based on if-then else rules) that can be used to predict price based on different levels of demand observed in the past.

The prediction phase is typically followed by a learning phase wherein we test our price against actual sales, redrawing our price curve if needed based on actual results, since we know exactly how well the product sold and we can refine our demand-price curve for it. The last step involves applying the new curve and optimizing pricing across hundreds of products and time periods. For instance, a simple rule of thumb in implementing an optimization technique would be to price each product at a level “p” such that the demand for that price which we denote by “D(p)” is such that “p*D(p)” is maximum on the demand-price curve (maximizing the top line). However, the fact that we now can optimize across multiple products across time, we can take into account other complex constraints such as limited shelf space as well as maximizing the bottom line.

The above three-step process is only a set of general guidelines. In a real-world setting, one may not have the freedom to implement all the three steps. The learning phase is a luxury since this involves observing the actual behavior following a price implementation and many retail clients may not be inclined towards this since they would like to lock the price after the forecasting phase for a fixed time period (say, the next 48 hours or two days). There may also be instances where the forecasting process may not be applicable and in this case the learning phase is critical since this provides feedback on price-demand elasticities.

One of the challenges for any retailer to implement price optimization for “first exposure items”, those that were never sold before. Let us say there are two items Item A and Item B and the price set for each is P(A) and P(B) respectively. In addition, let us assume that the offer was available for a time period T (say 48 hours or 2 days). Now, if we observe, after the fact that Item A had a stock out well before the end of the time period T and Item B had half its inventory left after T, then we could justify that a better pricing would have been to increase P(A) and decrease P(B) . This is exactly what a price optimization technique attempts to achieve. One approach to solve this is to create two groups from history where one group consists of “item-event” combinations that had a stock-out and the other group consists of combinations that had inventory left over at the end of the offer period. For each group we can infer item-specific price-demand curves, following which, we apply clustering to combine these price-demand curves into a fewer set of price-demand curves. Again, we apply the regression tree technique to predict the price of the item.

As a side note, it should be interesting to note why we go for a regression tree approach as opposed to a regression technique. The reason is that the regression tree approach is more general in being able to handle more real-world nuances. For instance, it can model “inverted price relationships” which sometimes arise in the market for luxury goods where the demand may actually go up if the price goes up. We now close out this section by mentioning that the same techniques that we discussed before can be applied in the event of a “markdown”, where one seeks to reduce the price from the existing one with the goal of eliminating inventory over a certain period of time.

  1. Inventory Management

Inventory management is not a new area as it forms a core part of efficient supply chain operations. One of the earliest inventory management systems known as the “just-in-time” (JIT) first championed by Toyota Motors became very popular that such techniques have been applied with some success to the retail industry. JIT systems aim to do exactly what we would like to achieve, the ability to accurately forecast demand such that the production schedules are fixed in advance and items from suppliers/vendors are ordered only as needed. In the ideal world there are no inventory costs as well as stock out of any supplier part. One major disadvantage of this system is that it is not robust in the sense that any little disruption in this symbiosis can cause a massive break-down in the supply chain operations.

However, due to the existence of multiple channels of engagement in the retail digital world, forecasting the demand for a product well in advance is always a challenge and thus new approaches based on AI/ML techniques have found their application in this area. Demand forecasting is still a key piece, but even more important is that this is an area where one needs to move up the value chain from predictive analytics to the realm of prescriptive analytics where the AI/ML engine should not only stop at the predictions but should make the optimal order/re-order decisions based on these predictions.

Some of the most sophisticated AI/ML algorithms have been applied in this arena. We refer to reinforcement learning as a particular technique that can be applied for smooth inventory management and control. Reinforcement learning is a special application of Markov decision process that originated in the arena of dynamic programming.

There are two types of inventory management processes — Retailer-Managed Inventory (RIM) systems and Vendor-Managed Inventory (VIM) systems. Walmart is a major entity that has successfully implemented a VIM system. While we could argue that a VIM system lets the retailer relinquish control over the ordering decision-making, the advantages of implementing a VIM system are many. Firstly, it requires full transparency of retailer inventory data to the vendor/supplier allowing for a seamless supply chain operation. Next, it allows the retailer to focus entirely on the customer as opposed to worrying about stock-outs of an item or alternatively, idle inventory occupying shelf space.

  1. Lifetime Value Prediction

It is important for any retailer to be able to come up with an estimate of the “customer lifetime value” (CLV) which can be defined as the “discounted value of future profits generated from a customer”. Now, profits indicate that both “costs” and “revenue” are involved, but we will focus more on revenue since that is harder to predict for an individual customer. There are also the “tangible” and “intangible” parts. The tangible part includes all the direct purchases by the customer. The intangible part, especially in the era of social media, refers to the “customer influence” (such as a “Like” for a product on the product’s Facebook page) that prompts other customers to make direct purchases with the customer. For simplicity, we will focus only on the tangible part.

Before we introduce the algorithms needed to achieve this goal, let us understand the different types of business contexts. The business context can be defined across two dimensions — contractual v/s non-contractual in one dimension and continuous v/s discrete (in terms of when the purchases are made) in the second dimension. We will focus on one particular combination that applies to most retailers and that is “non-contractual and continuous” (we note that special stores like Costco fall under the “contractual and continuous” category).

Before we introduce the ML models that can be applied to this use case, there are three latent (unobserved) parameters that need to be estimated (probabilistically) for each customer:

  • Lifetime — The period of interaction between the customer and the retail entity.
  • Purchase Rate — The number of purchases the customer will make during the Lifetime.
  • Monetary Value — Dollar amount assigned to each future transaction.

The common model applied here is known as the “Pareto/NBD” model. NBD stands for “negative binomial distribution”. The Pareto/NBD model is used to estimate the first two of the aforementioned three latent parameters. The Lifetime is expected to follow an exponential distribution with parameter “mu” and the Purchase Rate is expected to follow a Poisson distribution with parameter “lambda”. The latent parameters “mu” and “lambda” are further constrained by two prior gamma distributions that represent our belief regarding how these parameters are distributed among the population of customers. The two gamma distributions have their own set of parameters that need to be estimated. The estimation process is fairly standard and required the parameters to be estimated on a training data set and then validated on a test/holdout data set. As you can see, we do make distributional assumptions beforehand, thus, it is worthwhile to note that there are other frameworks that can be applied as well such as “BG/NBD”, where BG stands for “beta geometric” and “BG/BB” where BB stands for “beta binomial”. We could compare these frameworks and choose the one that best fits on the data provided.

We now need to augment the above model with another model that estimates the Monetary Value associated with each predicted purchase for a customer. One such model is the “Gamma-Gamma” model which has three underlying assumptions — the monetary value per a given transaction varies randomly around the average value of transactions, the average transaction value varies across customers but does not vary in time for a particular customer and the distribution of average transaction values across customers is independent of the transaction process.

Combining the two models mentioned above, we have estimates of the three latent parameters for each customer which directly gives us the estimate of the CLV for a particular customer.

  1. Market Mix Optimization

The marketing concept behind the “Market Mix” is very relevant in today’s world when marketing departments of retail entities need to optimize their marketing spend across multiple channels to ensure positive impact on their top and bottom lines. The legendary marketing guru Philip Kotler introduced the 4P’s that essential define the market mix. They are ‘product’, ‘price’, ‘place’ (that is, distribution) and ‘promotion’ (advertisement).

The goal is as follows. Given a fixed marketing budget, the retail entity needs to figure out the amount that needs to be allocated across the 4 P’s. This is a complex problem as the channels can be so varied — TV, mail order, search engine optimization (SEO), social media sites, email, displays, magazines etc. Moreover, some of the marketing initiatives need not involve all the 4 P’s. For instance, a retailer could advertise on TV without mentioning a product (and hence price and place as well) just to gain awareness. The same retailer could advertise on TV a special discount on a certain product (for instance, Best Buy advertises a sale on a particular range of its laptop products. Here both product and price are in context, but maybe, not place as this offer could be available both online as well as at their brick and mortar stores). The only P that is consistent is the ‘promotion’ or ‘channel’ that is used.

Thus, the first step in understanding the framework for market mix optimization is that “aggregation” is involved across different dimensions. The second step is to integrate this with a forecasting system that would predict demand as a function of the marketing spend (this could be at the product level which can be aggregated or directly at the aggregate level). There are no special algorithms needed here, the challenge is to be able to optimize over different marketing scenarios which can potentially explode due to the number of products and the number of promotion channels. The last step is the execution of the optimization engine. The objective is to “maximize Return on Investment (ROI)” subject to specific constraints such as total budget, budget ceiling by promotion channel etc. The ROI-based framework is the most popular framework used in practice for designing the optimal market mix spend.

  1. Facility Location

The decision to open a new store is not merely a challenge for a brick-and-mortar retail company but is also relevant for a pure e-commerce company who would be interested in where to locate their warehouses to minimize transportation costs as well as the initial high fixed cost of opening a store. The problem is not new but what has changed is the wealth of data available in the current digital era. One such data is mobile device data or mobile trace data based on a geofenced area (such as the vicinity of a mall). Another decision worth investigating would be “potential brick and mortar store closings” for a retail entity that does business both offline and online and sometimes the best decision would be to close a store at a particular location and focus on generating revenues for that population through online sales. Retailers can use a combination of demographic, psychographic, competitor activity and other key customer information to enable better decision-making regarding where to locate a new facility.

The modeling framework involves the predictive part which draws upon the application of AI/ML techniques to predict footfall, demand etc. that can be translated to potential upside in the event of building a facility at that particular location. The upside can be offset by the fixed cost of building the facility as well as the variable costs of maintaining the facility. The facility location analysis is facilitated by an optimization framework that is able to optimize across many potential locations in one single model (for instance, it is straightforward to formulate a model that can pick at a maximum 10 locations from a set of 200 potential location across the country that would maximize their bottom line).