Photo by Daria Nepriakhina on Unsplash

Design Thinking for Artificial Intelligence Projects

How IBM adapted design thinking principles to build a workflow for AI projects

Why should you read this article?

IBM uses an interesting workflow for AI projects that is based on design thinking principles, shown in detail in its specialisation on Coursera. It’s a long MOOC, that goes into thorough detail on implementing AI apps, and its worth taking, if you have the time and technical knowledge. If you are only interested in learning the workflow and how to implement it on AI projects, this article can help you.

What is design thinking?

Design thinking is a type of thinking used when developing design concepts (for example when designing a new building or tech product), which seeks to understand the product from the end user’s point of view. This is done not only by empathising with the user, but also by generating multiple ideas in brainstorming sessions, prototyping, and then testing those ideas. It is, thus, an iterative and hands-on approach by design. The main phases of the process are: empathise, define, ideate, prototype and test. I’ll not get into the details of design thinking for product creation, but will focus instead on how this process can be adapted to AI projects.

The workflow

The workflow presented by IBM for AI projects is the following, with the corresponding name in design thinking:

  • Data collection (empathise)
  • Exploratory analysis (define)
  • Transformation (ideate)
  • Modelling (prototype)
  • Testing (test)

Let’s now get into further detail of each of these steps.

Data collection

This is where the data scientists talk to the people closest to the data to articulate the business opportunity and translate it into testable hypothesis or hypotheses. It includes defining a timeline, cost, feasibility, etc. Finally you proceed to gather data for the project.

  1. Get as close to the source of data as possible usually by interviewing the people involved
  2. Identify the business problem
  3. To articulate the business question, enumerate possible questions, and prioritise according to domain knowledge, feasibility and impact (impact can be measured using a back-of-the-napkin ROI calculation).
  4. Obtain all of the relevant the data
  5. Translate the business problem into a testable hypothesis or hypotheses

Exploratory data analysis

At this point, you try to visualise your data, check for missing values (and decide how to deal with each of them) and potentially test hypotheses.This part is composed by data visualisation and hypothesis tests.

Start your analysis with simple csv files, to make sure your model will have value, before building a full data ingestion pipeline.

Data visualization

Check for missing data, understand what features and observations have missing values, and why (go back to “empathise” and talk to the client)

Look at how the missing observations behave, compared to the rest:

  • MCAR (Missing Completely At Random): missing cases are, on average, identical to non-missing cases, with respect to the feature matrix. Complete case analysis (removing those observations from the dataset) will reduce the power of the analysis, but will not affect bias
  • MAR (Missing At Random): missing data often have some dependence on measured values, and models can be used to help impute what the likely data would be. For example, in an MLB survey, there may be a gender bias when it comes to completing all of the questions
  • MNAR (Missing Not At Random): missing data depend on unmeasured or unknown variables. There is no information available to account for the missingness.

If needed, try a few different imputation methods, but make sure to come back and re-assess them during the modelling phase, knowing by then which ones yielded the best results. The process of trying different imputation methods is called multiple imputation.

  • Univariate imputation: mean or median of the missing feature
  • Multivariate imputation: use other variables to predict the missing feature

Create visual summaries detailing data, including missing values, outliers, classing imbalance issues, and try to identify factors that can be useful for your specific business problem and start formulating hypothesis.

Use plots and tables to create a first presentation or report telling a story related to your business problem (make sure all of them have at least one or two lines explaining the main point/conclusion). Finish with conclusions and suggestions on next steps.


where the goal is to transform your data so that it becomes consumable by models. This is where all the feature engineering magic happens.


At this point you define an evaluation metric and compare different models for your problem. Start with simple models, and build up from there.


Here the goal is not only to perform unit tests on your code, but also criticise the choices you have previously made and go back if needed (it usually is). It culminates with the deployment of the chosen solution, which is also subject to testing: even running models are subject to scrutiny and ongoing performance testing.