Our First Impression of Random Probability Baseline is Wrong
Iwrote an article on probability last year, and I thought I'd like to expand on another perspective that we mistakes with probability. What one used to notice is how we thought of random probability baseline, our first impression is wrong. Only deeper thoughts reveal what the actual "random probability" is. One would use machine learning in my examples because that was when one first noticed the mistake.
Before I start, the values here are fake. I just make them up, and sarcastic-ify them to ensure the example is clearly presented.
Consider an untrained machine learning model that tries to differentiate a cat with a dog. Its predictive power is 5 percent correct. Training it for 10 cycles (you could treat "10 cycles" as some kids going to school to attend 10 lessons) its predictive power raise to 33 percent. Its certainly doing better right? And can we use it in real life? Probably not. Now, consider what a random probability would do: tell whether it's a cat or a dog by flipping a coin (assume a fair coin, 2 sided). We would say, its 50 percent probably it'll be head (which we assign it to a dog) and 50 percent tail (assigned to cat). Now, compare it to your machine learning model, do you still think it's useful? Probably not. Flipping a coin does better than predicting with the machine learning model. The baseline for random guesses is 50 percent. We had fall into a rule called "anchoring" (if you read "Thinking, Fast and Slow" by Daniel Kahneman before) where we thought the baseline is 5 percent, but actually it should be 50 percent, since when we first build a model, we want it to do better, perhaps far better, than random guesses.
Now consider similar example. A machine learning model tries to predict between 6 classes: Cat, Dog, Elephant, Eagle, Cow, Goat. Still, we have 5 percent predictive power before training, and 10 cycles of training raises it to 33 percent. Is it doing better? Yes. Can we use it in real life? Probably... yes...??? What is the baseline? It could not be 5 percent unless it coincides. Now, the clue had been given. Get an equally-probably 6-sided die. If you assign each of the class above to 1, 2, 3, 4, 5, and 6 respectively, roll the die to get what you have. That's a random probability. For a die, the probability of each face is 16.667% (unlimited 6 after the decimal, but we rounded up. We get the probability by dividing 100 percent by 6). Now, 33 percent fares better than 16.667 percent, so the model is usable in real life, it could predict better than random. Though, perhaps we could make a model that does better.
Anyways, the focus is, our first impression (anchoring) is wrong on our base probability. By being careful on not being anchor and calculate our base probablility ourselves, we could avoid the anchoring and know what a random outcome could be by dividing 100 percent by the number of classes that thingy would predict, assuming equal situations. (Unequal situations is possible, but that's not something that I wanna talk about here, nor is it something that I'm an experienced/expert enough to talk about). So, mind your probabilities!
Remember to Like and Subscribe.