RangerDave wrote:
DFK! wrote:
Hmm, really, LA has none whatsoever? Weird. RD: based on methodology, this could be erroneous data given the complete dearth of data from LA, LV, ABQ, or a half dozen other western states.
They address that a bit in part of the post that I didn't quote before:
Article wrote:
For example, Orange County, California has the highest absolute number of tweets mentioning many of the slurs, but because of its significant overall Twitter activity, such hateful tweets are less prominent and therefore do not appear as prominently on our map.
See my second post about bias.
RD wrote:
Yeah, their FAQ acknowledges that potential source of bias:
Article wrote:
This map includes ALL geotagged tweets for each of these words that were determined as negative. This is not a sample of tweets containing these words, but rather the entire population that meets our criteria. That being said, only around 1.5 % of all tweets are geotagged, as it requires opting-in to Twitter's location services. Sure enough, that subset might be biased in a multitude of ways when compared with the the entire body of tweets or even with the general population. But that does not mean that the spatial patterns we discover based on geotagged tweets should automatically be discarded - see for example some of our earlier posts on earthquakes and flooding.
Actually, that is what it means: that they should automatically be discarded. Unless one is able to first demonstrate dispersal of geotagging opt-in, and then correct the data for that variable, it means the data is
fundamentally unusable. The fact that they knew this issue and failed to correct for it actually makes me more dubious, not less.
RD wrote:
Totally agreed. Just thought it was an interesting east/west division that makes me go, "Hm. I wonder what's going on here?"
Well, based on my sentence above, at this point it should make you instead state "that was an interesting opportunity to learn why statistics are easy to lie with," and then to discard the study entirely. That is, rather than wondering what's going on.
"Scientists" who accidentally overlook inherent biases are called: flawed, mistaken, or "lacking rigor."
"Scientists" who purposely disregard inherent biases while presenting the data as factual are called: liars, frauds, or quacks.
If a pharma company were to present the same type of erroneous findings, they would be (and have been in the past) liable for huge amounts of money.