In China, pretty much everyone knows that the Internet is heavily policed. The people know. The government knows the people know. The people know the government knows the people know.
In fact, the “open secret” of the Great Firewall is surely an important part of the way censorship works in China. Precisely because people know Internet censorship exists, the party-state benefits from the efficiency of self-policing as a means of control rather than relying exclusively on external enforcement in real time. When Internet users know they might be monitored and internalize the potentially observing gaze of the state, they are more likely to adjust their behavior accordingly. Even when individuals go around the censorship—or perhaps most especially when they do (and most regularly do)—the act of circumvention makes users even more acutely aware of risk.
Of course, to work well, those being surveilled should know they are being watched but be ignorant of the precise details—the keywords or behaviors—that might actually attract unwanted attention. While everyone knows they are potentially being watched on line, nobody knows exactly what might tip someone off. In fact, the very uncertainly surrounding keywords that might attract attention is indicated by their name in Chinese. They are not “banned words” or “censored words” or “illegal words,” they are minganci, or “sensitive words.” They are sensitive like landmines with hair-triggers that might go off if one is not careful. (A few years back some friends in China shared with me a little ditty, I Love Beijing’s Sensitive Words! a rewrite of a classic Communist song.)
Sure, we can guess some of the big, hot-button words that we know are sensitive—words referencing June 4, 1989 or Falungong or Dalai Lama are surely among them. There is also a Wikipedia page that has a list of Blacklisted Keywords in China. Most of those terms, however, are pretty standard fare. It would be much more interesting to have a larger list of sensitive words, a list that stretches from the just-a-bit-sensitive through the kind-of-sensitive, all the way to the much-more-sensitive and almost-super-sensitive? Such a list would be very interesting for understanding contemporary China. It would be like having a topographical map of the tender places, the weak spots and insecurities of the party-state.
Well, we now have just such a list of sensitive words.
Jeffrey Knockel, a computer science graduate student at the University of New Mexico at Albuquerque has been compiling lists of sensitive words as part of his work on the Chinese version of Skype. A few days ago Bloomberg Businessweek published an article about Mr. Knockel’s work. For years those who work between the US and China have known that the Chinese version of Skype deployed by Tom.com was different than the US version because it enabled Chinese surveillance. As part of a computer science project, Mr. Knockel cracked it open to uncovered the way that it monitors users’ activities. The Bloomberg article recounts the fascinating way that he figured out the monitoring and how he used the commonly-censored F-word as his Rosetta Stone to decode the program’s encrypted keyword list.
While Mr. Knockel’s project was interested in the way the censorship works, the lists of words that he has decoded and collected are fascinating. In the spirit of open scholarship he has posted everything on his personal website. He has lists, updated daily, of words in Chinese for various versions of Tom Skype and some with machine generated English translations. For folks who don’t speak Chinese, he has two files (here and here) of human translations from the Chinese, complete with short explanations. He also has copies of some related publications and talks.
This is some seriously interesting stuff. It is a treasure trove of information about what counts as sensitive in China these days. Of course we don’t know how sensitive words on Skype relate to efforts by the party-state to monitor its netizens, but given Tom.com’s high profile, the list must be a fairly authoritative one.
Each of the words on the lists has its own interesting historical and social context—a reason for being singled out for inclusion. I wasn’t at all surprised to see many familiar political words, sensitive historical moments or searches for foreign news media or pornographic content. The list is peppered with the personal names of dissidents, political figures and their family members and a wide variety of interesting and entertaining euphemisms.
One seemingly random entry that caught my eye was for neopets.com, a fantasy pet and gaming website that has been around since 1999. Why would that be on the banned list? Perhaps because people can play networked games? It seems almost farcical that the Chinese government would be uncomfortable with games like 200m Peanut Dash, Biscut Brigade, or Gummi Dice.
As I looked over the lists I thought how many interesting research papers it could spawn. Thanks to the fact that he has already offered English translations of some of the lists, Mr. Knockel’s data would be great for an undergraduate student to dig into for a paper, for learning about contemporary China, or even as a vocabulary list for learning all kinds of new words.
Of course many of those words might only be useful among salty-talking sailors!