The Only Domain AI Can’t Crack
Source: https://medium.com/towards-artificial-intelligence/the-only-domain-ai-cant-crack-b750e54f74cd?source=rss------artificial_intelligence-5
The Only Domain AI Can’t Crack
Recipes can’t be made by an AI: the data is all wrong!
Every AI expert has, soon or late, thought of developing a perfect formula for making cookies, brownies, or any other kind of dessert.
Let me guide you to the technical problems that a developer will encounter in the task. First of all, the data would be hard to collect and preprocess.
Issues with Standardization
All the recipes should be written in grams, rather than using volume because it is a very unprecise measuring metric. For every ingredient, we would need to convert it to its equivalent measure in grams, but the conversion won’t likely be 100% accurate.
The volume does not only depend on the kind of ingredient but the indications for using ingredients, sometimes, live too much space for interpretation. Sometimes you find written in a recipe to use a stick of butter; this would be equivalent to 115g in the US, only 100g in Europe. You will likely encounter the same problem using tablespoons and cups.
Collecting data
Data collection would also require either APIs or web scraping algorithms. To run an AI algorithm you need to collect sufficient data. Unfortunately, I have not been able to find a single API with a FREE to download recipe data (and experimenting). Web scraping is still a possibility, but in terms of recipes look very messy at first sight.
What AI algorithm could we apply?
Clustering is definitely the first candidate. Let me take brownies as an example. In order to discover how many kinds of brownies (fudgy, chewy, cakey) exist and the extension of the variance of their ingredients, we would need a fair amount of statistical modeling.
We can encompass each recipe in a collection of Features (the ingredients), using the class of the brownies as Labels. We can then graph the data on a multidimensional cartesian field, and then estimate the size of the clusters. For each cluster, every recipe within a defined space can be classified as a defined class of brownies.
What you are seeing in the image above are three spaces that englobe all the possible collection of recipes. Each recipe in this math space is a single point. Thousands of them make a giant cluster that you can see as a sphere.
If you are wondering what happens if you choose a combination of ingredients that places your recipe out of the clusters, it is very simple, the recipe won’t work. Perhaps it will turn too liquid or too solid, or maybe too sugary. Only those defined domains are able to represent functioning recipes.
So far, it may seem like a very standard way to solve a clustering problem, but this is where things get messy.
The fundamental flaw in the data
You can trust the words of an expert: all the recipes published on the web by non-professional pastry chefs are WRONG!
The reason for this harsh statement is based on the existing gap between the systems used by the experienced pastry makers and regular people who both decide to publish recipes.
There is no data!
This is the main issue: if I discard all the data on the internet because it has not been invented by a professional, we end up with only an insignificant number of recipes (at least the ones we can collect) that cannot give us a narrative of the functions used by the pastry chefs, at least using an AI. Because every pastry chef uses a function to create a recipe, if I were to use an interpolation given different points on the same functions (different recipes of the same dessert), I could derive a unique function, the same that is used by pastry chefs. That, technically speaking, is not obtained by using AI.
How to create a recipe: the professional way
Recipes are not invented from scratch or by attempts. That is not debatable. At least in the field of bakery, there are several rules that have to be obeyed when creating one recipe.
One example: the Genoise Cake
One of the bases of pastries is called Genoise. This cake was discovered accidentally by a pastry chef from Genova during a visit to the court of a famous Spanish marquise. Unfortunately, in a hurry, he panicked and he overwhipped the eggs, discovering what is known as Genoise.
Pastry chefs, as much as they enjoy claiming to possess inhuman creativity, use those numbers as a base (with their own system of equations to determine the boundaries of each cluster). The numbers above represent specific clusters in an R⁴ Cartesian Field. Each cluster is representative of a class of cakes. However, if we set a range of degrees of freedom, we are can have a variance of cakes belonging to the same cluster.
Ligh_Cake_1: 100g Eggs, 45g Sugar, 55g Flour
Ligh_Cake_2: 90g Eggs, 45g Sugar, 65g Flour
Ligh_Cake_3: 85g Eggs, 55g Sugar, 60g Flour, 5g Butter
Ligh_Cake_4: ...
Recipes with no variance
There is also the case of recipes such as Croissants or Macarons where there is almost no variance among the ingredients. Every pastry chef uses the same identical recipe for Macaron since Pierre Hermè defined its classification.
The gap between clustering approximation and the real recipe would be maximized, with terrible results.
Conclusion
Using AI to make recipes based on data of recipes will be wrong because the data itself is approximate and biased. If we had to compare the data obtained with clustering methods and the recipes of grandmasters (the professionally defined clusters), there would be no match.
Related posts
Discover Past Posts