We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


For those who understand its real-world applications and its potential, artificial intelligence is among the most valuable tools we have today. From disease detection and drug discovery to climate change models, AI is continually offering the insights and solutions that are helping us address the most pressing challenges of our time. 

In financial services, one of the main problems we are faced with is inequality when it comes to financial inclusion. Though this inequality is driven by many factors, the common denominator in each case is likely to be data (or lack thereof). Data is the lifeblood of most organizations, but especially so for organizations seeking to implement advanced automation through AI and machine learning. It, therefore, falls to financial services organizations and the data science community, to understand how models can be used to create a more inclusive financial services landscape.

Lending a hand

Lending is an essential financial service today. It drives revenue for banks and loans providers, but also provides a core service for both individuals and businesses. Loans can offer a lifeline during difficult times, or be the boost needed for a fledgling start-up. But in each case, loan risk must be evaluated. 

The majority of loan default risk today is calculated via automated tools. Increasingly, this automation is provided by algorithms that greatly expedite the decision-making process. The data that informs these models is extensive, but as with any decision-making algorithm, there is a tendency to deliver accurate outcomes for a majority group, leaving certain individuals and minority groups disadvantaged, depending on the model used. 

This business model is, of course, unsustainable, which is why loan providers must consider the more nuanced factors behind making “the right decision”. With the demand for loans booming, particularly as point-of-sale loans such as buy-now-pay-later offer new and flexible ways to gain credit, there is now a wealth of competition in the industry, with traditional lenders, challengers and fintechs all vying for market share. As regulatory and social pressure continues to grow around fairness and equitable outcomes, organizations that prioritize and codify these principles within their business and data science models will become increasingly attractive to customers. 

Building for fairness

When a loan risk model rejects applications, it’s possible that many of the unsuccessful applicants will implicitly understand the logic behind the decision. They may have applied knowing that they would not likely meet the acceptance criteria, or simply miscalculated their eligibility. But what happens when a member of a minority group or individual is rejected, based on the fact that they fall outside the majority group on which a model was trained?

Customers do not have to be data scientists to understand when unfairness — algorithmic or otherwise — has occurred. If a small business owner has the means to pay back their loan, but is rejected for no discernible reason, they will quite rightly be upset at their mistreatment and may seek a competitor to provide the services they require. Furthermore, if customers from a similar background are also rejected unfairly, then there is potentially something wrong with the model. The most common explanation here is that bias has crept into the model in some way. 

Recent history has shown insurance companies using machine learning for insurance premiums that discriminated against the elderly, online pricing discrimination and even product personalization steering minorities into higher rates. The cost of these glaring mistakes has been severe reputational damage, with customer trust irretrievably lost.

This is where there must now be a refocusing of priorities within the data science and financial services communities, which elevates equitable outcomes for all above high-performing models that work for the majority. We must seek to prioritize people in addition to model performance. 

Eliminating bias in models

Despite regulations that rightly prevent the use of sensitive information for use in decision-making algorithms, unfairness can creep in through the use of biased data. To illustrate how this is possible, here are five examples of how data bias can occur: 

  • Missing data — This is where a data set is used that may be missing certain fields for particular groups in the population. 
  • Sample bias — The sample datasets chosen to train models do not accurately represent the population users wanted to model, meaning the models will be largely blind to certain minority groups and individuals.
  • Exclusion bias — This is when data is deleted or not included because it is deemed unimportant. This is why robust data validation and diverse data science teams are essential.
  • Measurement bias — This occurs when the data collected for training does not accurately represent the target population, or when faulty measurements result in data distortion. 
  • Label bias — A common pitfall at the data labeling stage of a project, label bias occurs when similar types of data are labeled inconsistently. Again, this is more a validation issue.  

While no point in this list could be described as malicious bias, it’s easy to see how bias can find its way into models if a robust framework that builds in fairness is not included from the start of a data science project. 

Data scientists and machine learning engineers are used to very specific pipelines that have traditionally favored high-performance. Data is at the heart of modeling, so we start each data science project by exploring our data sets and identifying relationships. We go through exploratory data analysis so that we can understand and explore our data. Then it’s time to enter the preprocessing stage where we wrangle and clean our data before we begin the intense process of feature generation, which helps us to create more useful descriptions of the data. We then experiment with different models, tune parameters and hyperparameters, validate our models and repeat this cycle until we’ve met our desired performance metrics. Once this is done, we can productize and deploy our solutions, which we will then maintain in production environments.

It’s a lot of work, but there is a significant problem that is not addressed under this traditional model. At no point in this cadence of activity is model fairness assessed, nor is data bias heavily explored. We need to work with domain experts, including legal and governance, to understand what fairness means for the problem at hand and seek to mitigate bias from the root of our modeling, i.e., the data. 

Simply understanding how bias can find its way into models is a good start when it comes to bringing about a more inclusive financial services environment. By checking ourselves against the above points and reassessing how we approach data science projects, we can seek to create models that work for everybody. 

Adam Lieberman is the head of artificial intelligence and machine learning at Finastra 

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers