Dawn of a New Era in Machine Learning

A food for your thoughts — Machine learning on dynamic and high frequency datasets

Photo by Andrea De Santis on Unsplash

Recently, I came across an inquiry by a telecom giant having several TV channels. The client wanted us to develop a machine learning model to display the context-sensitive ads for every TV show they broadcast. The requirement was like this – in a live TV show, if the world-renowned speaker or a celebrity is drinking Coke, the company would like to display the Coke Ad. Note that the speaker may decide to drink Pepsi in his next speech. This time, the company would like to display a Pepsi ad.

As you would agree, this is a non-trivial problem. We need to understand and resolve many issues. The first challenge would be to understand the dynamics. You need to identify whether it is coke or Pepsi in a split-second. We are not freezing on Coke and Pepsi. It could be a Maserati driven by a celebrity in a live event. The number of classes that they require you to detect is very large, as the client has an ample number of brands to advertise.

The second challenge for us was that the client claimed they collect several terabytes of data every day. How do you train a classical ML algorithm on such enormous data? Even if you decide to use ANN technology, it would be impossible for us to load such a large volume of data in memory for training, even when you consider batch processing.

These are just the two immediately noticeable challenges. There are many more. So how do we develop an AI solution to this problem? I will provide you with a few pointers for tackling such problems and introduce you to dynamic modeling.

Limitations of Current ML Models

In the last couple of years, we have seen many machine learning models deployed in production across several industries, no vertical has kept itself away. Consider yourself in the shoes of a businessperson. Would you depend on the predictions made on today’s data by a model developed a year ago? Let us say you are a mall operator. With the current ongoing epidemic, the analysis on customer visiting patterns and their buying trends, as decided by your age-old static model, will not give you any decent predictions. The machine learning algorithm needs to be trained on the new data. This could be an interim solution. What people have realized now is that we need a dynamic machine learning model – the model which will be continuously re-trained in time to consider the pattern variations continuously over a period. Are we talking about Time-Series analysis? The answer to this question is YES.

Challenges in Dynamic Modeling

Can we implement time series analysis on high dynamics and high-frequency data? The time series analysis works fine when we observe the periodic patterns in the data. For example, during Christmas, the sale of toys shoots up, or during winter there are no outdoor sports. Nature and festivals decide such periodicity and are reasonably predictable. With the current epidemics, nobody had ever thought of the economic changes that we are observing today. Even after several phases of Unlock, the footfall in the malls is low. The customer buying patterns have changed considerably, beyond our imaginations. You will now need to re-develop existing models and re-train them more frequently as we move forward towards a new Normal.

The second challenge for us is to address high-frequency data. All online businesses delivering either essential or non-essential goods now have many-fold transactions every day. To do a time series analysis, we need to fix the window size. For the telecom client that I mentioned above, or even for other companies like Amazon, Swiggy a one-second window would contain several thousand data points as they collect terabytes of data every 24 hours. Now, your model needs to be trained on these thousands of data points in less than a second. Also, you need to use the latest copy of the trained model for real-time inference. All said you need to provide a super-fast interface to these models. Challenge is to find a web framework that would meet these requirements.

Having said the challenges, let us look at the benefits that dynamic modeling would provide us.

Benefits of Dynamic Models

Switching to a time-series analysis provides you with certain benefits in the model development. Conventionally, we always preferred a very huge dataset for model training. Loading such datasets required large memories, and the training required enormous computing resources. The training times too ran into hours, if not in days. With dynamic data analysis, we would now work with much smaller data sets – a dataset that fits into a time window decided by us. Thus, many of the traditional problems mentioned above would automatically vanish. Second, as we re-train the model periodically, it will learn new patterns in the data and could thus make more accurate predictions that would meet the customer’s needs. Rather, the customer would be thrilled to get predictions from an ML model on the live data.

Final Thoughts

Do you see the challenges in developing the ML applications to meet the new requirements? Not that the problem is unsolvable; several companies have implemented solutions for high-frequency dynamic data. In some other post, I will provide you with a few solutions and guidelines for meeting these new data requirements.

Conclusion

No matter whether you are a business person seeking analysis on your live data or a data scientist developing solutions for such clients that have dynamically changing high-frequency datasets, I hope I have set your brains spinning.

Credits

Pooja Gramopadhye — Copy editing