Monday, January 9, 2017

SOMETIMES CHERRYPICKING DATA CAN BE USEFUL : THE CASE FOR PAVEL DATSYUK


"Cherrypicking data" is a colloquial way of saying "mining subsets of data for desired information". It is a procedure that is used in scientific studies or interpretations of said studies, in order to show or hide a tendency that is present in the data.

For example, I could do a study that purports to demonstrate the existence of ESP by having a subject guess the outcome of a coin flip. Let's say that after one thousand experiments, I have exactly the amount of positive results predicted by chance (my subjects got the right answer 500 times out of 1000). That's kind of a boring result, and pretty hard to get it published and draw attention to it (and to myself). But within that dataset of 1000, let's say my sybjects got a "lucky streak" and guessed right 12 times in a row. There's the data I want to cherrypick : when I present my study, I can title my article "Subjects guess coinflip 12 times in a row!" and go on to rant about there's only one chance in 2048 of that ever happening and so on and so forth.

I have just manufactured a positive result out of absolutely nothing by cherrypicking within my sample, and now every science reporter is after me for an interview. That's why cherrypicking is generally frowned upon in a scientific context.

But in other contexts, I want to argue that cherrypicking of a certain type is actually the more honest way of presenting statistics.

Let me tell you about Pavel Datsyuk, a former star player of the Detroit Red Wings. Datsyuk was an insanely talented player that entered the league as a rookie in 2002 and retired in 2016. Now, because the league media hates everything that is both elite and non-Crosby, the prevailing narrative was always that Datsyuk was a poor player performer because (and this is the key stat) he had a low points-per-game average. In 2008 for example, commentators were going on and on and on about how he "only had 11 goals in his last 59 games". It was virtually impossible to read or hear anything about Datsuyk without having that "fact" regurgitated in your face. But let's look at his actual stats. 
        GP    G    A    Pts   
2001-02     21     3     3     6    
2002-03     4     0     0     0    
2003-04     12     0     6     6    
2005-06     5     0     3     3    
2006-07     18     8     8     16    
2007-08     22     10     13     23    

From that data, you can see that he had 3 very slow years and then he took off. Now, it's important to understand that "games played" does not specify "minutes per game played". And that the Red Wings have a philosophy of bringing in new players very slowly into the game. Rookies only play a few shifts a game, if any, during the playoffs, because the pressure is so high. They're there mostly to learn the ropes and watch how the old guys do it. Later on, in their 4th or 5th seasons, rookies becomes the new major players and get lots more ice time.

So the "fact" that Datsyuk only scored 3 goals in his first 37 playoff games does not show that he is a choker or has limited talent: it's all due to the fact that while he was on the scoresheet for those games, he saw almost no time on ice. He had a 3-year learning period from the bench. When he got promoted to the first line in 2006, he immediatley started producing at an elite pace, around a point per game.

Put in front of all these facts, you can either choose to be a dishonest analyst (which all of the league's talking heads decided to do) and push the easy narrative of "Datsyuk sucks because he only has X points in Y games"...or you can choose to be more thorough and explain at length that "Datsyuk had a long mentoring period as an observator, after which he hopped on the ice and became a major contributor overnight". Cherrypicking your data to consider only his time as a first-liner actually gives you a better idea of his general ability.

Addendum : the opposite narrative was applied to Alexei Kovalev, another russian star. He had the reputation of a playoff beast, even though his numbers showed he had one crazy rookie year and was quite mediocre after that.

No comments:

Post a Comment