Machine Learning

We’ll explore a very simple machine learning project thoroughly so we completely understand the basis of machine learning and demystify it so we don’t confuse machine learning and artificial intelligence (THEY ARE NOT THE SAME).

Here, our first project will entail classifying RGB hexadecimal codes as colors that humans would perceive. We’ll thoroughly investigate each step we take and, in so doing, come to understand how one can introduce bias into the process in at least two primary ways. Machine learning rest squarely on the theories of linear algebra and multivariable calculus; that is, one performs a matrix operation followed by a nonlinear operation on the result. One may iterate this process many times. The result at the end of the process is generally some sort of classification into a predefined category. There is a sort of fundamental theorem of machine learning that states if the data contains decision boundaries then sufficient iteration with this process will arrive at separated data (for those more familiar with machine learning, I am aware this is a misstatement of the theorem; however, both the actual statement and the proof will be beyond the scope of this course).

If we examine the diagram given to the right, we see blue fruit and orange fruit graphed on a line by weight in grams. The grey point in between the two clumps of data represents a reasonable decision boundary. If we have access to only the mass of the fruit, we can say with good confidence fruit weighing less than 11.5 grams is blue and fruit weighing more than 11.5 grams is orange. We didn’t need any machine learning to come up with that decision boundary because people are intrinsically very good with linear relationships and linear responses- especially in low dimensionality.

Certainly, data can be much messier and harder for a human to parse than this. Look at similar spread for a pair of fruits, red and yellow, that grow according to conditions of rainfall and pH and yield a particular mass given that rainfall and that pH. There’s three variables to sort through here. The first figure seems hopeless, but a subtle rotation reveals a clear decision boundary where I might place a separating plane. There third figure demonstrates that this rotation was not unique, but also that not every bound is created equal. Machines learning goes a step forward and reorients the space the points sit in to make such boundaries clear.