Traditional vs Deep Learning Algorithms in the Telecom Industry — Cloud Architecture and Algorithm Categorization

Google Cloud Architecture for Machine Learning Algorithms in the Telecom Industry


The unprecedented growth of mobile devices, applications, and services have placed the utmost demand on mobile and wireless networking infrastructure. Rapid research and development of 5G systems have found ways to support mobile traffic volumes, real-time extraction of fine-grained analytics, and agile management of network resources, so as to maximize user experience.

Moreover, inference from heterogeneous mobile data from distributed devices experiences challenges due to computational and battery power limitations. As a result, models employed in the edge-based scenario are constrained to light-weight to achieve a trade-off between model complexity and accuracy. Also, model compression, pruning, and quantization are largely in place.

In this blog, we try to understand the different use-cases, problems, and solutions that can be leveraged with ML as follows:

  • Different telecom use-cases solved by traditional ML models for customer satisfaction/end-user experience catered to higher ROI.
  • Limitations of traditional models, the evolution of deep learning model, and its usage in the telecom industry.
  • Categorization of different ML models and how it fits in an end to end cloud architecture starting from app-level data ingestion to running predictive models in the pipeline.

Use cases of traditional Machine Learning algorithms

In this section, let’s look at the different use cases in the telecom industry where different ML and AI algorithms have played a significant role in network traffic prediction, customer retention, and fraud analysis.

Smart traffic prediction and path optimization

The network and service control layer contains multi-dimension convergent management and control functions to manage and control traditional and SDN/NFV cloud networks.

Adding AI reasoning capability would allow Intelligent Network Operations and Management. Network performance data can help identify sleeping cells and trigger an automatic restart, Network Optimization (coverage optimization, capacity optimization, Massive MIMO optimization) RCA (Root Cause Analysis), and Intelligent Transmission Route Optimization and Network Strategy Optimization, etc.


Features governing Network security include:

  • Fast tracing and filtering records with Naïve Bayesian Classification, Support Vector Machine, K Nearest Neighbor, Neural Network.
  • Rule extraction with Ensemble Methods like Aggregated Decision Trees.
  • Identification and interception of malicious behaviors, prevention of attacks, etc with Naive Bayes, Multilayer Perceptron Neural Networks (MLPNNs), Radial Base Function Neural Networks (RBFNN) and SVM algorithms.

Sentiment analysis with social media

As network operators have turned to Machine Learning to analyze brand coverage and customer sentiment, social posts help them to monitor language patterns and sentiment to identify trends like the factors driving new customers to subscribe or when do subscribers seek out a competitor.

System design and architecture for highly accurate customer churn modeling 

Customer Service Recommendation and Business Personalization

Service recommenders may also be used to boost existing services or to identify why users do not adopt some services and, in turn, suggest them value-added services based on their profile and choice. In addition, they also predict churn based on the usage patterns of past churners and changes in other usage profiles.

The below figure illustrates an SVM (Support Vector Machine)-based music recommendation system that extracts personal user-level information, timing, location, activity records, along with musical context to suggest suitable music services.

Music Recommendation System a VAS leveraged by telecom operators—

With customer-generated network data, it is easier to automate the process of grouping customers into segments, like profiling customers based on their calling and messaging behavior.

Personalized ads

Operators try to present product/service advertisements that are tailored to an individual, situation and device. This type of target-advertising, when directed at right intended customer bases, helps operators and advertisers to zero in on customers with ads that fit their needs and interests.

Customer segmentation on call-records

Different clustering techniques and classification techniques like K-means and others cluster mobile customers based on their call detail records and analyze their consumer behavior. PCA-based dimensionality reduction techniques can be used for the identification of relevant and recurrent patterns (e.g. location to identify common presence patterns) among the CDRs of a given user. Further, matrix factorization is employed to infer location preferences on sparse CDR data and generate location-based recommendations.

Clustering and Classification phase for predicting user-churn

Customer Churn Prediction

The above figure illustrates the application of SVM, Naive Bayes, Decision Tree, Boosting, Bagging, Random Forest in Customer Churn Prediction through supervised/unsupervised (clustering) techniques.

Traffic Flow Prediction

  • k-NN and Linear Discriminant Analysis (LDA), SVM, Decision Trees are used to map network traffic into different classes of interest-based on QoS requirements. The traffic classification framework uses statistics that are insensitive to application protocol, which includes both packet-level and flow-level features.
  • Flow clustering using Expectation Maximization: Based on flow features (packet length, inter-arrival time, byte count, etc.) EM algorithm groups the traffic into a small number of clusters.
  • AutoClass: Unsupervised Bayesian classifier using EM algorithm to select best clusters from a set of training data. To achieve global maxima, it repeats EM searches multiple times.
  • K-means: Unsupervised ML using the first few packets of traffic flow. It was assumed that the first few packets capture the application negotiation phase, which is distinct among applications.
  • Density-based spatial clustering(DBSCAN) has the ability to classify noisy data in contrast to k-Means and AutoClass.

Profiling by association: PBA takes as input an IP-to-IP connectivity graph and information about a small subset of IP hosts and produces a prediction about the class of all the flows (edges) in the graph.

Topic Models for Mobile Short Message Service Communication

Latent Dirichlet Allocation (LDA), a generative topic modeling technique, is used to extract latent features arising from mobile Short Messaging Service (SMS) communication for automatic discovery of user interest. The mobile SMS documents are partitioned into segments, wherein the discovered topics in each segment are propagated to influence the discovery of latent features. This technique filters malicious mobile SMS communication. Topic models can effectively detect distinctive latent features to support automatic content filtering and remove security threats to mobile subscribers and operators.

Customer Segmentation

Clustering to segment customer profiles requires complex multivariate time series analysis-based models, that have limitations around scalability and ability to accurately represent temporal behavior sequences (TBS) of users, illustrated in the figures below. TBS may be short, noisy, and non-stationary, where the LDA model serves as the best to represent the temporal behavior of mobile subscribers as compact and interpretable profiles, relaxing the strict temporal ordering of user preferences.

Categorization of Deep Learning algorithms and their use cases in the Telecom Industry, Source –

The model generating subscriber behavior documents. (left) , LDA model to generate interpretable subscriber profiles (right), MTS-Multivariate TimeSeries

Categorization of Deep Learning algorithms and their use cases in the Telecom Industry

Advantages of Deep Learning in Mobile and Wireless Networking

The Telecom industry acknowledges several benefits of employing Deep Learning to address network engineering problems:

  • Traditional ML algorithms require feature engineering, which is expensive. Deep learning can automatically extract high-level features from data that has a complex structure and inner correlations. Feature Engineering needs to be automated, particularly in the context of mobile networks, as mobile data is generated by heterogeneous sources, is often noisy, and exhibits non-trivial spatial/temporal patterns, whose labeling requires an outstanding human effort.
  • Deep Learning is capable of handling large amounts of data and control model over-fitting. Deep ML models are suited to high volumes of different types of data generated from mobile networks at a fast pace. Training traditional ML algorithmsg., Support Vector Machine (SVM) and Gaussian Process (GP) sometimes requires to store all the data in memory, which is computationally infeasible under big data scenarios. In contrast to traditional ML models that do not scale, Stochastic Gradient Descent (SGD) employed to train NNs only requires sub-sets of data at each training step.
  • Traditional supervised learning is only effective when sufficient labeled data is available. However, most current mobile systems generate unlabeled or semi-labeled data, where some of the Deep Learning algorithms like restricted Boltzmann Machine (RBM), Generative Adversarial Network (GAN), one/zero shot learning demand wider applicability to solve telecom domain problems.
  • Compressive representations learned by deep neural networks can be shared across different networks/telecom providers, while this is limited or difficult to achieve in other ML paradigms (e.g., linear regression, random forest, etc.). Therefore, a single model can be trained to fulfill multiple objectives, without requiring complete model retraining for different tasks, thereby saving CPU and memory of mobile networks.
  • Deep Learning is effective in handing multivariate geometric mobile data, user-location, represented bycoordinates, topology, metrics, and order through dedicated Deep Learning architectures such as PointNet++ and Graph CNN.

PointNet++ Architecture (left) and Graph CNN Architecture (right)

  • Hierarchical neural network similar to conventional CNNs
  • Applies PointNet recursively on a nested partitioning of the input point set
  • Better able to capture local structures and finer details

Despite the challenges posed by Deep Learning models, emerging tools and technology make them tangible in mobile networks, as illustrated in the figures below: (i) Advanced Parallel Computing, (ii) Distributed Machine Learning Systems, (iii) Dedicated Deep Learning libraries, (iv) Fast optimization algorithms, and (v) Fog Computing.

CPU/GPU/Processing capability to support Deep Learning Architectures

Deep Learning has a wide range of applications in mobile and wireless networks.

  • Mobile big data collected within the network helps in traffic classification, and Call Detail Record (CDR) mining.
  • Deep Learning-Driven App-Level Mobile Data Analysis shifts the attention towards mobile data analytics on edge devices.
  • Deep Learning-Driven User Mobility Analysis identifies movement patterns of mobile users, either at group or individual levels.
  • Deep Learning-Driven User Localization helps localize users in indoor or outdoor environments, based on different signals received from mobile devices or wireless channels.
  • Deep Learning-Driven Wireless Sensor Networks find application in centralized vs. de-centralized sensing, WSN data analysis, WSN localization and other applications.
  • Deep Learning-Driven Network Control finds the usage of deep reinforcement learning and deep imitation learning on network optimization, routing, scheduling, resource allocation, and radio control.
  • Deep Learning-Driven Network Security leverages Deep Learning to improve network security, which we cluster by focus as infrastructure, software, and privacy-related.
  • Deep Learning-Driven Signal Processing scrutinizes physical layer aspects that benefit from Deep Learning.
  • Deep Learning-based RCNN and Fast-RCNN algorithms are used in Telecom Inventory management via object recognition and localization on Google Street View Images 
  • Media recognition (applied on pictures, sound, video and traffic bursts)/Photo-tagging helps subscribers learn and classify known patterns in a collaborative image-classification system and then use this to identify the category to which previously unseen patterns belong. Transfer Deep Learning approach with ontology priors provides effective means of discovering intermediate image representations from deep networks and ensures good generalization abilities across two different domains (Web images as the source domain and personal photos as the target).

Now, let’s take a quick look at the different Deep Learning platforms available, mobile hardware supported along with its speed and mobile compatibility.

Comparison of Mobile Deep Learning Models

Cloud Architecture with mobile data ingestion and Model Training, Prediction

The figure below depicts the different components involved in building the ML platform — 

Network Monitoring/OptimizationMedia Settlement, Advertising, Audience Orientation, Pattern Recognition, Sensor Data Mining and Mobility Analytics

  • Incoming real-time data from mobile SDK
  • Real-time data collection and computing engine receiving data from SDK, with a messaging pipeline to cache frequently received records
  • Offline Computing and Analysis Engine
  • BI and Data Warehousing Engine

Cloud Architecture with GCP for telecom Machine Learning and AI algorithms

Cloud Architecture with GCP for telecom Machine Learning and AI algorithms

Network Monitoring and Optimization

Network State Prediction refers to inferring mobile network traffic or performance indicators, given historical cellular measurements of EnodeB, Sector and Carrier data. MLPs and Deep Learning LSTM-based techniques are used to predict users’ QoE, and evaluate the best-beam for transmission based on:

  • Average user throughput
  • Number of active users in a cell
  • Average data volume per user
  • Channel quality indicators (uplink and downlink)
  • Beam Index (BI)
  • Beam Reference Signal Received Power (BRSRP)
  • Distance (of UE to serving cell site)
  • Position (GPS location of UE)
  • Speed (UE mobility)
  • Channel Quality Indicator (CQI)
  • Historic values based on past events and measurements including previous serving beam information, time spent on each serving beam, and distance trends

By leveraging sparse coding and max-pooling, semi-supervised Deep Learning models have been developed to classify received frame/packet patterns and infer the original properties of flows in a WiFi network.

Mobility metrics based for Network Capacity Estimation

Further, AI-capable 5G networks aid in:

  • Building a panoramic data map of each network slice-based on user-subscription, network performance, QoS, event logs
  • Forecasting network resources
  • Anticipate network outages, equipment failures, and performance degradation
  • Predicting UE mobility in 5G networks, allowing Access and Mobility Management Function (AMF) to update mobility patterns based on user subscription, historical statistics, and instantaneous radio conditions.
  • Enhancing security in 5G networks, preventing attacks and frauds by recognizing user patterns, and tagging certain events to prevent similar attacks in the future.

Predicting Mobile traffic at city scale

  • Spatio-temporal correlations of geographic mobile traffic can be predicted with an AE-based architecture and LSTMs. Global and multiple local stacked AEs are used for spatial feature extraction, dimension reduction and training parallelism, while compressed representations extracted are subsequently processed by LSTMs, to perform final forecasting. The following figure illustrates a typical AE-LSTM architecture, where AutoEncoder model is used to extract features and LSTM model is used to predict the traffic flow:

AE-LSTM for traffic flow prediction

  • Hybrid Multimodal Deep Learning method can be used for short-term traffic flow forecasting. The model, as illustrated in the figure below, is composed of one-dimensional Convolutional Neural Networks (1D CNN) and Gated Recurrent Units (GRU) with the attention mechanism, and can jointly and adaptively learn the spatial-temporal correlation features and long temporal interdependence of multi-modality traffic data.

Hybrid Multimodal Deep Learning framework for traffic flow forecasting

  • Multiple 3D Convolutional Neural Networks use 3D-CNNs to learn the Spatio-temporal correlation features jointly from low-level to high-level layers for traffic data.

Multiple 3D CNN architecture

  • Other commonly used traditional ML models for modeling Spatio-temporal characteristics include SVM and the Autoregressive Integrated Moving Average (ARIMA).

ST-DenNetFus based Deep Learning framework is used to predict network demand (i.e. uplink and downlink throughput) in every region of a city as illustrated in the figure below. The ST-DenNetFus architecture captures unique properties (e.g., temporal closeness, period, and trend) from Spatio-temporal data, through various branches of dense neural networks (CNN). ST-DenNetFus also introduces extra branches for fusing external data sources (e.g., crowd mobility patterns, temporal functional regions, and the day of the week) that have not been considered before in the network demand prediction problem of various dimensionalities.

ST-DenNetFus Architecture

  • Mobile Traffic Super-Resolution (MTSR) technique is used to infer network-wide fine-grained mobile traffic consumption given coarse-grained counterparts obtained by probing. MTSR works on the principle of image super-resolution, designed with a dedicated CNN with multiple skip connections between layers, named deep zipper network, along with a Generative Adversarial Network (GAN). This helps perform precise MTSR, reduces traffic measurement overheads and improves the fidelity of inferred traffic snapshots.

GAN operating principle in the MTSR problem. The generator is employed in the prediction phase.

  • MLPs, CNNs, and LSTMs perform encrypted mobile traffic classification as deep NNs can automatically extract complex features (e.g., identify protocols in a TCP flow dataset). CNN’s have also been used to identify malware traffic, where images and unusual patterns that malware traffic exhibits are classified by representation learning.
  • CDR Mining involves extracting knowledge from specific instances of telecommunication transactions such as phone number, cell ID, session start/end time, traffic consumption, etc. Using Deep Learning to mine useful information from CDR data can serve a variety of functions, including:
  1. Estimating metro density from streaming CDR data, by using RNNs. The goal is to take the trajectory of a mobile phone user as a sequence of locations, which can then be fed to RNN-based models to handle the sequential data.
  2. CDR data can also be used to study demographics, where a CNN is used to predict the age and gender of mobile users.
  3. CDR data is also used to predict tourists’ next locations.
  4. Human activity chains generation by using an Input-Output based HMM-LSTM generative model.

CDR Analysis Pipeline

RNN-based predictors significantly outperform traditional ML methods, including Naive Bayes, SVM, RF and MLP.

Deep Learning-Driven App-level Mobile Data Analysis

Analysis of mobile data, therefore, becomes an important and popular research direction in the mobile networking domain, as rapid emergence of IoT sensors and its data collection strategies have been able to provide a powerful solution for app-level data mining.

App-level mobile data analysis include: (i) Cloud-based computing and (ii) Edge-based computing. In the former, mobile devices act as data collectors and messengers that constantly send data to cloud servers, via local points of access with limited data pre-processing capabilities. In Edge-based computing, pre-trained models are offloaded from the cloud to an individual. The primary applications include mobile healthcare, mobile pattern recognition and mobile Natural Language Processing (NLP), and Automatic Speech Recognition (ASR).

Mobile Health: Wearable health monitoring devices being introduced in the market, incorporates medical sensors that capture the physical conditions of their carriers and provide real-time feedback (e.g., heart rate, blood pressure, breath status, etc.), or trigger alarms to remind users of taking medical actions.

Deep Learning-driven MobiEar to aid deaf people’s awareness of emergencies operates efficiently on smart phones and only requires infrequent communication with servers for updates. UbiEar, a lightweight CNN architecture designed for acoustic event sensing and notification system, operates on the Android platform and is able to assist hard-to-hear sufferers in recognizing acoustic events, without requiring location information.

Deep Learning-based (DL) models (CNNs and RNNs) are able to classify lifestyle and environmental traits of volunteers, different types of Human Activity Recognition with heterogeneous and high-dimensional mobile sensor data, including accelerometer, magnetometer, and gyroscope measurements. ConvLSTMs are known for fusing data gathered from multiple sensors and perform activity recognition.

Mobile motion sensors collect data via video capture, accelerometer readings, motion — Passive Infra-Red (PIR) sensing, specific actions and activities that a human subject performs. Such models trained on server for domain-specific tasks through federated learning, finally serve a broad range of devices.

Mobile Pattern Recognition based on patterns observed in the output of the mobile camera or other sensors. All these DL models demonstrate superior prediction accuracy over RFs and logistic regression.

Object Classification finds huge applications in mobile devices as devices take photos and rely on photo-tagging process. One such DL-based framework is the DeepCham that generates high-quality domain-aware training instances for adaptation from in-situ mobile photos. It has a distributed algorithm which identifies qualifying images stored in each mobile device for training and a user labeling process for recognizable objects identified from qualifying images using suggestions automatically generated by a generic deep model.

Mobile classifiers can also assist Virtual Reality (VR) applications, where Deep Learning object detectors are incorporated into a mobile Augmented Reality (AR) system. Object detectors use CNN-based frameworks used for facial expression recognition when users wear head-mounted displays in the VR environment.

The figure below demonstrates a lightweight Deep Learning-based object detection framework that combines spatial relations for:

  • Training and detection with the lightweight Single Shot Detector (SSD)
  • Combination of vision-based detection results and spatial relationships
  • Registration, geo-visualization and interaction

Mobile Outdoor Augmented Reality method

The figure below demonstrates app-level data collection and transfer from edge devices to the cloud for algorithm training and prediction.

Deep Learning-Driven Mobility Analysis

Mobility data is usually subject to stochasticity, loss, and noise, which creates a problem in precise modeling. As Deep Learning is able to perform automatic feature extraction, it becomes a strong candidate for human mobility modeling. CNN’s and RNNs are the most successful architectures in such applications as they can effectively exploit spatial and temporal correlations.

  • The “DeepSpace” model, built with a hierarchal CNN structure, predicts individuals’ trajectories/moving paths with much higher accuracy as compared to naive CNNs, stacked RNN and LSTM, n-grams, and k nearest neighbor method. In addition to providing support to 2 parallel prediction models, the coarse prediction model and fine prediction models to deal with the continuous mobile data stream, the framework supports online training and learning to extract optimal feature set size for the online data.

Hierarchical framework with coarse model and fine models, suited for spatial mobile data in an online learning system

  • The “DeepMove” model predicts human mobility from lengthy and sparse trajectories using an attentional recurrent network. DeepMove is first designed as a multi-modal embedding recurrent neural network to capture the complicated sequential transitions by jointly embedding multiple factors that govern human mobility. Further, it’s also extended to include a historical attention model to capture the multi-level periodicity. As illustrated in the following figure, the historical attention module is equipped with an auto-selector, comprised of two components:

An attention candidate generator to generate the candidates, which are exactly the regularities of the mobility and an attention selector to match the candidate vectors with the query vector, i.e., the current mobility status.

Architecture of DeepMove

  • GPS records and traffic accident data are combined to understand the correlation between human mobility and traffic accidents. The design includes a stacked de-noising Auto Encoder to learn a compact representation of human mobility, and subsequently use that to predict traffic accident risk.
  • DBNs (Deep Belief Networks) are employed to predict and simulate human emergency behavior and mobility in a natural disaster, learning from GPS records of 1.6 million users.
  • A Deep Learning-based approach called ST-ResNet (illustrated in the figure below) is used to collectively forecast the inflow and outflow of crowds in each and every region of a city. The architecture of the ST-ResNet (residual neural network framework) is based on unique properties of Spatio-temporal data, to model the temporal closeness, period, and trend properties of crowd traffic. Each property is designed to have a branch of residual convolutional units, which models the spatial properties of crowd traffic. ST-ResNet learns to dynamically aggregate the output of the three residual neural networks based on data, assigning different weights to different branches and regions, along with external factors, such as weather and day of the week.

ST-ResNet architecture. Conv: Convolution; ResUnit: Residual Unit; FC: Fully-connected.

Deep Learning Driven User Localization

Location-based services and applications (e.g. mobile AR, GPS) demand precise individual positioning technology. Deep Learning can enable high localization accuracy with both device-free and device-based localization services.

Limitations of Deep Learning in Mobile and Wireless Networking

Although Deep Learning has unique advantages when addressing mobile network problems, it also has several shortcomings, which partially restrict its applicability in this domain. Specifically:

  • Deep Learning (including deep reinforcement learning) is vulnerable to adversarial/cyber attacks (especially CNN), where artifact inputs that are intentionally designed by an attacker to fool Machine Learning models into making mistakes. They can trigger mis-adjustments of a model with high likelihood.
  • Deep Learning algorithms are largely black boxes and have low interpretability. This limits the applicability of Deep Learning, e.g. in network economics. Still, businesses continue to employ statistical methods that have high interpretability, whilst sacrificing on accuracy that could be attainable from Deep Learning models.
  • Deep Learning is heavily reliant on data, and models further benefit from training data augmentation. This creates an opportunity for mobile networking, as networks generate tremendous amounts of data. However, data collection may be costly, and face privacy concern, therefore, it may be difficult to obtain sufficient information for model training.
  • Deep Learning can be computationally demanding and heavily relies on advanced parallel computing (e.g., GPUs, high-performance chips). Deploying Neural Networks on embedded and mobile devices has additional constraints on energy and capability.
  • Deep neural networks usually have many hyperparameters (e.g., for a CNN, it includes number, shape, stride, and dilation of filters, as well as for the residual connections) and finding their optimal configuration can be difficult. The AutoML platform2 provides the first solution to this problem, by employing progressive neural architecture search.


In this blog, we discussed different traditional vs Deep Learning algorithms, DL-based architectures, their pros and cons, and applications in the telecom industry. We also explored the data ingestion, categorization, and model deployment architecture in production. We looked at the recent advances in ML driver mobile-app development (in object detection, speaker identification, emotion recognition, stress detection, and ambient scene analysis), in-built technologies to sustain limited mobile battery by building memory-energy efficient apps and model compression techniques.


  1. Machine-learning technologies in telecommunications
  2. Deep Learning in Mobile and Wireless Networking: A Survey :