The Glade 4.0 https://gladerebooted.net/ |
|
The Geography of Hate https://gladerebooted.net/viewtopic.php?f=8&t=9964 |
Page 1 of 3 |
Author: | RangerDave [ Tue May 14, 2013 10:16 am ] |
Post subject: | The Geography of Hate |
You westerners sure are a tolerant bunch. Relatedly, wtf is wrong with the eastern half of the country!? Interactive version is here. Racist tweets follow roughly the same east/west split. Article wrote: Using DOLLY to search for all geotagged tweets in North America between June 2012 and April 2013, we discovered 41,306 tweets containing the word ‘nigger’, 95,123 referenced ‘homo’, among other terms. In order to address one of the earlier criticisms of our map of racism directed at Obama, students at Humboldt State manually read and coded the sentiment of each tweet to determine if the given word was used in a positive, negative or neutral manner. This allowed us to avoid using any algorithmic sentiment analysis or natural language processing, as many algorithms would have simply classified a tweet as ‘negative’ when the word was used in a neutral or positive way....Only those tweets used in an explicitly negative way are included in the map. All together, the students determined over 150,000 geotagged tweets with a hateful slur to be negative. Hateful tweets were aggregated to the county level and then normalized by the total number of tweets in each county. This then shows a comparison of places with disproportionately high amounts of a particular hate word relative to all tweeting activity. *Edit: Added link to article/blog in the quote. |
Author: | Nitefox [ Tue May 14, 2013 10:22 am ] |
Post subject: | |
They sure are selective in what they search for to find "hate". Want hate? Read the tweets after Andrew Breitbart died. |
Author: | DFK! [ Tue May 14, 2013 10:26 am ] |
Post subject: | |
Hmm, really, LA has none whatsoever? Weird. RD: based on methodology, this could be erroneous data given the complete dearth of data from LA, LV, ABQ, or a half dozen other western states. It could also just indicate that western staters don't geotag, don't use twitter as much, or that the "negative/neutral/positive" algorithm they're running was flawed. Very interesting, but without a lot more insight into methodology, I'll be having a truckload of salt to go with this. |
Author: | DFK! [ Tue May 14, 2013 10:31 am ] |
Post subject: | |
Wait, I also just noticed in the bottom right it says the numbers were aggregated and then normalized. So that's another confounding variable: this is essentially saying these terms, run through the algorithm, show up a higher proportion of times than in areas that aren't bright red. So LA, LV, ABQ and others could have lots more "hateful" tweets than the rest of the country put together, but they have a lower percentage. This could then be disproportionately representative because the tweets were not looking in the southwest for Spanish-language hate speech, for higher populations of individuals identifying as homosexual, or maybe just because racists are more likely to be twitterers in the eastern US than the western (not that any of these are concrete, I'm just speculating). Any of these would confound the results enough to be useless. In other words: lies, damn lies, and statistics, once again. |
Author: | Khross [ Tue May 14, 2013 10:34 am ] |
Post subject: | Re: The Geography of Hate |
Hmmms, confirmation bias results in unusable data applications once again ... |
Author: | RangerDave [ Tue May 14, 2013 10:34 am ] |
Post subject: | Re: |
DFK! wrote: Hmm, really, LA has none whatsoever? Weird. RD: based on methodology, this could be erroneous data given the complete dearth of data from LA, LV, ABQ, or a half dozen other western states. They address that a bit in part of the post that I didn't quote before: Article wrote: For example, Orange County, California has the highest absolute number of tweets mentioning many of the slurs, but because of its significant overall Twitter activity, such hateful tweets are less prominent and therefore do not appear as prominently on our map. DFK! wrote: It could also just indicate that western staters don't geotag, don't use twitter as much, or that the "negative/neutral/positive" algorithm they're running was flawed. Yeah, their FAQ acknowledges that potential source of bias: Article wrote: This map includes ALL geotagged tweets for each of these words that were determined as negative. This is not a sample of tweets containing these words, but rather the entire population that meets our criteria. That being said, only around 1.5 % of all tweets are geotagged, as it requires opting-in to Twitter's location services. Sure enough, that subset might be biased in a multitude of ways when compared with the the entire body of tweets or even with the general population. But that does not mean that the spatial patterns we discover based on geotagged tweets should automatically be discarded - see for example some of our earlier posts on earthquakes and flooding. DFK! wrote: Very interesting, but without a lot more insight into methodology, I'll be having a truckload of salt to go with this. Totally agreed. Just thought it was an interesting east/west division that makes me go, "Hm. I wonder what's going on here?" |
Author: | Talya [ Tue May 14, 2013 10:37 am ] |
Post subject: | |
Note the lack of hate at Disney World. I'm sure that means something. Somehow. To someone. |
Author: | Diamondeye [ Tue May 14, 2013 10:40 am ] |
Post subject: | Re: The Geography of Hate |
Besides the fact that a lot of the west is near-uninhabited compared to the east? Also, how about people using those terms to refer to themselves, or to the derogatory use of the term. If some black guy tweets the n-word in quotes while talking about imaginary predjudice, does that count as hate speech? |
Author: | Khross [ Tue May 14, 2013 10:44 am ] |
Post subject: | Re: The Geography of Hate |
They don't have that important piece of data; Twitter isn't legally allowed to collect and store it. Consequently, the entire study is rather meaningless and guilty of the self-same reductivism it's decrying. |
Author: | Rorinthas [ Tue May 14, 2013 10:52 am ] |
Post subject: | |
Would homo include homosexual? Homogeneous? Also a lot of people of africian decent use th n word in a non hateful way. The red spots do corrospond with areas were people of such decent live |
Author: | Rorinthas [ Tue May 14, 2013 10:58 am ] |
Post subject: | |
Nevermind reread |
Author: | DFK! [ Tue May 14, 2013 11:34 am ] |
Post subject: | Re: Re: |
RangerDave wrote: DFK! wrote: Hmm, really, LA has none whatsoever? Weird. RD: based on methodology, this could be erroneous data given the complete dearth of data from LA, LV, ABQ, or a half dozen other western states. They address that a bit in part of the post that I didn't quote before: Article wrote: For example, Orange County, California has the highest absolute number of tweets mentioning many of the slurs, but because of its significant overall Twitter activity, such hateful tweets are less prominent and therefore do not appear as prominently on our map. See my second post about bias. RD wrote: Yeah, their FAQ acknowledges that potential source of bias: Article wrote: This map includes ALL geotagged tweets for each of these words that were determined as negative. This is not a sample of tweets containing these words, but rather the entire population that meets our criteria. That being said, only around 1.5 % of all tweets are geotagged, as it requires opting-in to Twitter's location services. Sure enough, that subset might be biased in a multitude of ways when compared with the the entire body of tweets or even with the general population. But that does not mean that the spatial patterns we discover based on geotagged tweets should automatically be discarded - see for example some of our earlier posts on earthquakes and flooding. Actually, that is what it means: that they should automatically be discarded. Unless one is able to first demonstrate dispersal of geotagging opt-in, and then correct the data for that variable, it means the data is fundamentally unusable. The fact that they knew this issue and failed to correct for it actually makes me more dubious, not less. RD wrote: Totally agreed. Just thought it was an interesting east/west division that makes me go, "Hm. I wonder what's going on here?" Well, based on my sentence above, at this point it should make you instead state "that was an interesting opportunity to learn why statistics are easy to lie with," and then to discard the study entirely. That is, rather than wondering what's going on. "Scientists" who accidentally overlook inherent biases are called: flawed, mistaken, or "lacking rigor." "Scientists" who purposely disregard inherent biases while presenting the data as factual are called: liars, frauds, or quacks. If a pharma company were to present the same type of erroneous findings, they would be (and have been in the past) liable for huge amounts of money. |
Author: | Amanar [ Tue May 14, 2013 11:48 am ] |
Post subject: | |
Almost all of the problems you guys are finding with this data are addressed if you simply read the two paragraphs that go along with it. There's no algorithm to mess things up, each tweet was looked at individually to determine if the word was used in a positive, negative, or neutral manner. It's normalized by overall the overall number of tweets in each county so that population density (or the popularity of twitter varying between regions) is not a factor. If you actually look at the links, this is all explained in detail (including all the words that are used, their variations, which ones had to be left out, etc.) The only thing I couldn't find is how they deal with non-english tweets. If they normalize by the overall volume of tweets regardless of language, and then look at only english hate speech, then areas with higher proportions of non-english twitter users would be skewed towards being "non-hateful." This would be pretty trivial to correct by using only the volume of english language tweets, which I'm sure there is data for. I suspect they've already thought of this, but it wouldn't hurt to suggest it to them. |
Author: | DFK! [ Tue May 14, 2013 12:03 pm ] |
Post subject: | Re: |
Amanar wrote: Almost all of the problems you guys are finding with this data are addressed if you simply read the two paragraphs that go along with it. There's no algorithm to mess things up, each tweet was looked at individually to determine if the word was used in a positive, negative, or neutral manner. It's normalized by overall the overall number of tweets in each county so that population density (or the popularity of twitter varying between regions) is not a factor. If you actually look at the links, this is all explained in detail (including all the words that are used, their variations, which ones had to be left out, etc.) Apologies. I thought I misread the OP. That said, what they've instead introduced (and the very reason you use algorithms in the first place) is confirmation bias, selection bias, and perception bias. Additionally, normalization by density doesn't really help if you don't correct for other factors, as I mentioned above. |
Author: | Amanar [ Tue May 14, 2013 12:28 pm ] |
Post subject: | |
What confirmation bias? I don't see how that could be a factor unless the students who were classifying the tweets were given the location of the tweets they were classifying, which would just be retarded. Selection bias? Of course there's a **** selection bias. They're looking at tweets. Most of the country doesn't even use twitter. But guess what, there's a selection bias in every study like this. That doesn't make the data unusable. Take election polling for example. There are selection biases against people who don't use landlines, don't answer calls from strangers, don't like answering questions from strangers, etc. But we still get very usable data to the point where we can predict the results of an election before it takes place with surprising accuracy. Not sure what you mean by perception bias in this case. DFK wrote: This could then be disproportionately representative because the tweets were not looking in the southwest for Spanish-language hate speech, for higher populations of individuals identifying as homosexual, or maybe just because racists are more likely to be twitterers in the eastern US than the western (not that any of these are concrete, I'm just speculating). Any of these would confound the results enough to be useless. I guess these are your problems with their normalization technique? Most of your speculations are retarded (except the spanish language one that I responded to above). How would the number of homosexuals in an area be a factor? How would you "correct" for the population of homosexuals? Why would racists be more likely to use twitter (compared to non-racists in their area) in one area of the US vs another? Of course it could be a factor, but you need to at least provide some evidence or an explanation of why you think it would be a significant one. You seem to be upset that their data and their methodology isn't perfect, but this kind of data never is. That doesn't mean it's not useful. |
Author: | Khross [ Tue May 14, 2013 12:28 pm ] |
Post subject: | Re: The Geography of Hate |
Again, because they can't access demographic data about the Twits (people who tweet), they can't provide useful information. |
Author: | Amanar [ Tue May 14, 2013 12:33 pm ] |
Post subject: | |
Khross, I'm genuinely confused as to what you mean. What demographic data do they need to make their information useful? |
Author: | Khross [ Tue May 14, 2013 12:35 pm ] |
Post subject: | Re: |
Amanar wrote: Khross, I'm genuinely confused as to what you mean. What demographic data do they need to make their information useful? All sorts of things: ethnicity, income, disabled status, gender, sex, etc.
|
Author: | DFK! [ Tue May 14, 2013 12:38 pm ] |
Post subject: | Re: |
Edit: Deleted due to coming to civil terms. |
Author: | Amanar [ Tue May 14, 2013 12:50 pm ] |
Post subject: | |
I'm sorry if my tone was a little harsh DFK, I'm just in a weird mood right now. I really didn't mean to be so grating, but rereading my post I can see how it turned out that way. I shouldn't have called your reasoning retarded. Khross, yeah, that would make it a lot more useful for making statements about the population as a whole, as opposed to just twitter users. I still think it's interesting given that limitation. |
Author: | Khross [ Tue May 14, 2013 12:53 pm ] |
Post subject: | Re: |
Amanar wrote: Khross, yeah, that would make it a lot more useful for making statements about the population as a whole, as opposed to just twitter users. I still think it's interesting given that limitation. I'd still say it is pretty useless, given we're trying to track intent without even getting ALL of the stereotypes we're using to track this on the page ...
|
Author: | Rorinthas [ Tue May 14, 2013 12:56 pm ] |
Post subject: | |
Read this a little better. My problem is that a lot of the country probably doesn't tweet about every little thing the way more media driven culture centers (New York, CA, LV) do. These means, as the pollster explained to his credit, that the ratio of "hate tweets":All tweets would be larger. I'm not saying it's utter garbage, but its something that needs to be considered. There's lots of hate tweets in CA, NY, etc, but they are being lost in the load of "garbage tweets" that certain demographic areas probably have a lot more of them. |
Author: | FarSky [ Tue May 14, 2013 12:56 pm ] |
Post subject: | |
Flyover tweets? |
Author: | Rorinthas [ Tue May 14, 2013 12:57 pm ] |
Post subject: | |
Perhaps. I'm just saying where do think people live who tweet about what their cat had for lunch and other pointless stuff like that. |
Author: | Lenas [ Tue May 14, 2013 12:58 pm ] |
Post subject: | Re: The Geography of Hate |
I'm sure you guys are right and this data doesn't actually correlate to anything. It's just a statistical illusion and the nation is actually more homogenous than it seems. |
Page 1 of 3 | All times are UTC - 6 hours [ DST ] |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |