header image

Search and Deletion of Wikipedia, part 3

Posted by: | March 13, 2009 | 21 Comments |

Besides Speedy Deletion and Proposed Deletion, there is a third, even more deliberative and time intensive process for removing Wikipedia articles. This third and most formal deletion process involves the article being Nominated for Deletion, at which point a section will be created for the article on the page listing articles being considered for deletion (AfD – Articles for Deletion). On the AfD page, editors will debate the relative merits of either keeping the article on wikipedia, or deleting it. This debate is often extensive and can be dense with Wikipedia shorthand, jargon, and abbreviated references to Wikipedia policy pages. This debate, Wikipedia policy tells us, is an attempt to reach “rough consensus” rather that a “majority vote.” (Wikipedia: Guide to deletion) In practice, this means that administrators have considerable latitude to consider the relative merits of arguments made for or against deletion by various editors, rather than being bound to simply follow majority opinion. When an administrator determines that consensus has been reached, he or she will “close” debate on the AfD page as either “Keep” (to retain the article) or “Delete” (to remove it), ending debate. This administrator will then go on to carry out deletion of the page if necessary.

Google and other search engines clearly play an explicit role in many debates on the AfD page. Of the about 30 articles that were Nominated for Deletion out of my sample from October 1 + 2, 2008, 8 have AfD entries in which one or more editors cite Google or other search engine results as evidence for either retaining or deleting an article. In several cases, editors say they make use of Google Scholar, Google Books, Google News and other specialized search products in an attempt to find sources and either establish or discount notability for the subject of the article in question. In two cases, the shorthand “ghits” for “google hits” is used by editors.

Since AfD discussions are retained on Wikipedia indefinately, the AfD process provides a rich supply of data with which I can easily expand my original sample to more extensively study the role of Google in AfD debates. The complete listing of all of the Articles listed for Deletion on October 1, 2008 gives 108 articles that were Nominated for Deletion on that date. (Wikipedia:Articles for deletion/Log/2008 October 1) Of the 108 AfD debates listed, 35 include comments by editors that explicitly reference the use of Google search products as a means of establishing whether or not a given page should be deleted. Interestingly, only 5 AfD debates appear to include discussion of search testing that do not mention Google products, either by name or using the “ghits” shorthand. Looking at the AfD listings for first days of the remaining two months of 2008, November and December, suggests that the October 1 listing is fairly normal for this time period on Wikipedia. The November 1, 2008 and December 1, 2008 AfD listings give 99 AfD debates with 19 explicitly citing Google products and 119 AfD debates with 29 explicitly citing Google products, respectively. (Wikipedia:Articles for deletion/Log/2008 November 1, Wikipedia:Articles for deletion/Log/2008 December 1) In both cases, only a handful of debates discuss search tests without invoking Google at some point.

These numbers would certainly seem to support the broad assertion that Google plays an important role in deciding if material on Wikipedia will be deleted or retained. About 1 in 4 of the 326 total AfD discussions listed on these 3 days had at least one editor invoke Google as evidence. To get a better idea of exactly what role Google was playing in these debates, I closely examined the arguments made by editors in the 35 AfD debates that explicitly invoked Google products in the October 1, 2008 AfD listing. Doing so demonstrates that, while Google clearly is an important element in the AfD process, editors are not simply counting Google hits to determine whether or not a given subject “exists,” rather editors engage in a relatively sophisticated process of reading and interpreting the output of Google searches in the process of making their decisions.

Wikipedia editors may be guided in this process, at least in part, by a section of a Wikipedia essay (NOTE TO TORI: Wikipedia “essays” are on site commentary pieces written by editors that attempt to provide advice on best practices for editing articles and maintaining the site. I should probably define all terms like this in my introduction, perhaps when I am discussing data sources and methodology.) entitled, “Arguments to avoid in deletion discussions.” (Wikipedia:Arguments to avoid in deletion discussions) One Wikipedia editor involved in the AfD discussions of October 1, 2008 explicitly references the subsection of this essay which provides guidance on using Google to provide evidence for AfD debates. This section, which is subtitled “Google test,” provides Wikipedia editors with the following advice:

Although using a search engine like Google can be useful in determining how common or well-known a particular topic is, a large number of hits on a search engine is no guarantee that the subject is suitable for inclusion in Wikipedia. Similarly, a lack of search engine hits may only indicate that the topic is highly specialized or not generally sourceable via the internet. One would not expect to find thousands of hits on an ancient Estonian god. The search-engine test may, however, be useful as a negative test of popular culture topics which one would expect to see sourced via the Internet. A search on an alleged “Internet meme” that returns only one or two distinct sources is a reasonable indication that the topic is not as notable as has been claimed.

Overall, the quality of the search engine results matters more than the raw number. A more detailed description of the problems that can be encountered using a search engine to determine suitability can be found here: Wikipedia:Search engine test.

Note further that searches using Google’s specialty tools, such as Google Book Search, Google Scholar, and Google News are more likely to return reliable sources that can be useful in improving articles than the default Google web search.

(Wikipedia: Arguments to avoid in deletion discussions)

Several things are significant about this language. First, the simple presence of a sub-section within this essay devoted solely to the “Google test,” speaks to how important and prominent a tool Google is for Wikipedia editors. Not only is Google the only search engine mentioned by name in “Arguments to avoid in deletion discussions,” in is the only source mentioned by name. The language provided here endorsing the use of “Google’s specialty tools,” can only serve to increase the influence of services like Google books and Google scholar within Wikipedia.

Second, “Arguments to avoid in deletion discussions,” includes specific language attempting to dissuade Wikipedia editors from using raw Google hits as a means of establishing whether or not a given article should be retained on Wikipedia. Of course, this does not mean that editors never invoke Google hits in deletion debates. For example, an editor calling for the deletion of an article on “Magic Bars,” writes, “No sources to indicate notability. All I could find on Google News were articles about bars where magicians work.” ( Wikipedia:Articles for deletion/Log/2008 October 1) In another example, an editor suggested that an article on “cat repellers” be renamed to “cat repellants” as this term, “gets more G[oogle] hits.” (Wikipedia:Articles for deletion/Log/2008 October 1) However, the discussions often indicate that these arguments are not solely reliant on Google to determine the worthiness of a given article, but rather are connected to doubts editors have based on the text of the article itself. In the case of “magic bars,” the discussion indicates that the article was a recipe for a sort of food, raising doubts among Wikipedia editors that see recipes as outside the scope of Wikipedia’s stated goal of encyclopedic knowledge production. In another case, an article on “Maxbashing” was nominated for deletion on the grounds of, “Fails Notability, Google yields few results. Written more as an advertisement rather than a substantial encyclopedic article.” (Wikipedia:Articles for deletion/Log/2008 October 1) Here the article’s advertisement-like tone (another editor calls the article “self-promotion”) is cited as an important consideration, along with the lack of Google results.

Furthermore, the guidelines provided in “Arguments to avoid in deletion discussions,” do a fairly good job of noting Google’s biases (especially its propensity to give greater weight to recent popular culture) and advising editors where hit counting may or may not be useful. The discussions on the AfD list for October 1, 2008 suggest that editors are taking this guidance under consideration. Many of the AfD entries in which the mere fact of how many Google results a given subject returns is advanced as an argument for either deleting or retaining an article involve subjects drawn from recent popular culture, especially living artists and recently released or upcoming works of popular culture. In one particularly interesting example, an editor arguing for the deletion of an article on the Harvard University “Bionumbers” project writes that the article should be removed, “or now. It appears to be a legit project run by a Harvard lab […] and it seems to be creating some sort of a buzz based on plain google search results […]. But as I understand it, the project is very new and was started in the Spring 2008. A more careful look at the google search result show that there is no sibstantial [sic] coverage yet by reliable sources.” (Wikipedia:Articles for deletion/Log/2008 October 1) Here, a Wikipedia editor is clearly arguing that raw Google hits may be unduly influenced by “buzz” about a very recent topic, and that this bias should be corrected for by a close reading of the search results.

There is one more interesting facet of the “Google test” language included in “Arguments to avoid in deletion discussions.” This language is included in a section on “Notability fallacies,” further demonstrating the link between search engines (especially Google) and Wikipedia editors’ perceived need to establish that subjects are notable enough to warrant inclusion in an encyclopedia. This link is also reinforced by many discussions on the October 1, 2008 AfD list, which draw upon Google while engaging with the issue of notability. In fact, of the 35 discussions on the October 1, 2008 AfD page that explicitly reference Google, only 4 do not center around a debate over the subject’s notability.

These deletion discussions also give important clues as to what might be driving Wikipedia editors’ perceived need to establish that article subjects are sufficiently notable to merit inclusion in Wikipedia. After all, as Wikipedia itself notes, “Wiki is not paper,” that is to say, the physical limits of what may be stored in Wikipedia effectively are much much less pressing than those of a traditional paper Encyclopedia. Unlike a paper Encyclopedia, Wikipedia need not worry about printing costs, how much space it takes up on a shelf, or the ability of readers to find information with only an index of subjects. Instead, Wikipedia is distributed through inexpensive digital means, and readers can use internal and external hyperlinks, Wikipedia’s internal search engine, and the services of external search engines (like Google) to lead them to the information they need. Wikipedia’s official policy, of course, limits this theoretically endless ability to collect and organize information, in the typical flippant prose of Wikipedia policy, “Wikipedia is not an indiscriminate collection of information” (Wikipedia:What Wikipedia is not). The general guideline is that information on Wikipedia should be “encyclopedic,” but since Wikipedia has already expanded to cover many topics ignored by traditional encyclopedias (such as the pages devoted to the major characters from the popular cartoon “Transformers”) clearly the meaning of what is and is not “encyclopedic” is constantly being re-negotiated by Wikipedia editors.

The contents of deletion discussions suggest that Wikipedia editors may be driven to police Wikipedia articles on the grounds of notability by their desire to establish and maintain Wikipedia’s status as a reliable and accurate source of information. While they often cite notability concerns in deleting articles on subjects without a significant presence in reliable, third party news publications, editors are clearly also worried that such articles may be outright hoaxes. In one example, an editor arguing for the deletion of an article on a movie entitled “Tattoos: A Scattered History,” writes that the entries on this movie on the IMDB (Internet Movie Database) should not be counted as establishing the movie’s notability since,

Anyone can add anything they want for IMDB. Someone once wrote that Saw IV would star Jessica Alba and feature Jigsaw’s baby. That stayed up there for at least a week. If anything, it’s worse than Wikipedia as it’s a lot easier to remove false information from Wikipedia than it is for IMDB. On another note, having an IMDB entry doesn’t equal notability…I can think of a lot of IMDB entries that if they were to become articles on Wikipedia they would fail an AFD.

(Wikipedia:Articles for deletion/Log/2008 October 1)

While the nomination for deletion for this article cited concerns about notability, the editor above shows how these concerns connect to editors’ attempts to guard against false information from remaining on Wikipedia. Notability becomes a means for editors to remove suspected falsehoods without violating Wikipedia’s central “Neutral Point of View” policy, which holds that Wikipedia does not, by definition, present “one point of view as ‘the truth'” (Wikipedia: Five pillars). Instead of asserting that a given article is “false,” editors may instead assert that it is simply not notable by virtue of its lacking a presence in large mainstream media sources.

In addition, editors seem to be particularly concerned with preventing Wikipedia from becoming a space in which small artists, businesspeople, and others promote their own projects using information that cannot be confirmed in reliable sources. Several debates on the October 1, 2008 AfD list use the term “Myspace musician” as a derogatory term to describe a musician who lacks recognition outside of self-promotional material posted on sites like the social-networking site Myspace. One such case of self-promotional media being discounted is that of the article on “Carlos Sepuluveda” which was deleted after being nominated on the grounds that, “Clearly non-notable, as a google search turns up nothing aside from Youtube/blogs/Myspace.” (Wikipedia:Articles for deletion/Log/2008 October 1) In another case, an editor writes that an article on the CRG West company should be retained despite the fact that “Much of the GoogleNews hits are press releases, which are not usually considered to count for notability,” since, “there are a few gems buried within.” (Wikipedia:Articles for deletion/Log/2008 October 1) In these examples, we see Wikipedia editors attempting to prevent companies from influencing their coverage on Wikipedia by issuing press releases, and artists from influencing their coverage by self-promotion.

The above cases demonstrate one technique used by Wikipedia editors to critically read Google results, they scan these results for patterns suggesting self-promotion, such as the disproportionate presence within the results of press-releases or information sources like Myspace or IMDB where subjects may be posting information about themselves. This ability of Wikipedia editors to read, rather than simply count, Google search results, tends to undermine some of Arno’s claims as to how Wikipedia editors use Google. In his article, he tells us that he intends to follow up on his original experiment, in which a hoax text was rejected from Wikipedia on the grounds that it’s subject lacked Google results. “Later on,” he writes, “I’ll try to create an other hoax. This time, I’ll make sure I use (fake) sources and there will be something about it to be found on Google. I have to use an other computer, Wikipedia files your IP-address.” (Arno, 2008) Apparently Arno believes such a hoax will have a better chance of being retained on Wikipedia. However, since Wikipedia editors tend to discount the sort of Google results Arno would be able to generate (message board pages, social networking pages, blogs) it seems likely that these Google results would be disregarded and the hoax article would again be deleted. It is perhaps telling that Arno has yet to publish the results of this follow-up.

The ability of Wikipedia editors to read Google results critically guards against those who would attempt to insert false information into Wikipedia, even if would-be hoaxers were equipped with the sort of motivated network that might succeed in generating Google results. It also guards against those who would attempt to crudely manipulate Wikipedia content for profit. Take, for example, Ron Gooden, who describes himself as a “Atlanta-based freelance copywriter and editor,” and advertises on his website that writing “custom Wikipedia articles” is one service he is able to provide (Goodden, 2008) He has also advertised this service on the classified-ads site Craigslist under the headline, “Let Me Put You In Wikipedia,” bragging, “as a long-time Wikipedia contributor I have been able to consistently help individuals, companies and organizations gain the recognition and advantage they deserve from this new-age encyclopedia – and usually within 48 hours of placing their order!” (Gooden, 2009) Goodden’s website provides a sample Wikipedia article he claims responsibility for. This article, which documents New York fashion designer Junko Yoshioka, is still available on Wikipedia, perhaps because it cites major publications such as People magazine and the Spanish newspaper El Mundo. (Junko Yoshioka) It is, however, as of the time of this writing less than 3 months old, and may yet be challenged on notability grounds. The user responsible for this article, presumably Mr. Goodden’s account, dates back to October 2005, but is responsible for creating only 4 articles, including the article on Ms. Yoshioka, and has edited only a handful of others, suggesting Mr. Goodden’s advertisement may overstate his abilities. (User Contributions for ChulaOne)

Mr. Goodden’s bravado aside, the presence of those like him, who might attempt to insert information into Wikipedia for profit, is of concern to Wikipedia editors. The ability of editors to critically read Google results guards against this sort of manipulation of Wikipedia, since editors are checking for patterns in Google results, such as the overwhelming presence of press releases, promotional material, or material from blogs or other easily manipulated sites, which might indicate that a subject or someone hired by a subject was attempting to use Wikipedia for promotional purposes. Unless such an editor is able to produce reliable sources testifying to their significance, it does not seem likely that simply having prominent Google results would provide Wikipedia editors with sufficient evidence to retain an article.

However, while this protects Wikipedia from manipulation by for-profit and self-promoting editors, it also ironically ties Wikipedia to many of the traditional media organizations that it is often seen as being in opposition to. It is the presence or absence of these traditional media organizations within search results that often decides whether or not an article is deleted. For example, in successfully arguing for the retention of an article on the “Milwaukee Ale House,” one editor writes, “there is detailed coverage of the pub in several books. […] There is also substantial coverage in local newspaper, Milwaukee Journal Sentinel: 246 hits in googlenews [sic]” (Wikipedia:Articles for deletion/Log/2008 October 1) This reliance on traditional print media as a means for establishing notability is also seen in the widespread reliance on Google Books and Google Scholar within deletion debates, and the explicit approval given to these tools in the guidelines provided by “Arguments to avoid in deletion discussions.”

While this reliance on print media may make Wikipedia more reliable, it may also prevent it from properly assessing the notability of artists and works of art hailing from some subcultures. The best example of this found in the AfD list for October 1, 2008, is that of the article on a punk band known as “Bankrupt,” which was deleted after a long and passionate discussion. The editor nominating the article for deletion writes, “I still believe the article fails WP:MUSIC because almost all of the links provided are for very niche type sites. Also, none of the albums the band has released have articles. They are on a minor label and haven’t charted as far as I can see. I did a Google search but many were for entirely different bands called Bankrupt.” An editor arguing for retaining the article attempts to refute these accusations writing,

Five new sources have been added, and quotes from reviews suggesting that the band also qualifies for notability criterion no.7. of WP:MUSIC

– 7. Has become the most prominent representative of a notable style or of the local scene of a city; besides – 1. It has been the subject of multiple non-trivial published works whose source is independent from the musician/ensemble itself and reliable

It is not stated here that the reference cannot be a “niche” publication. Several of these publications are considered as reliable sources in the punk community. Ox fanzine is the No.1 punk rock magazine of Germany.

I’ll create pages for the band’s albums. They may be on a minor label, but their recent releases are available worldwide on iTunes and Amazon.

Please note that a band is notable if it meets ANY of the notability criteria, therefore charting is not an obligatory criterion.

Regarding your argument of Google search: please do a search on last.fm. The only band called Bankrupt that comes up with over 15,000 listeners is this one. You can also search MySpace for Bankrupt for similar results.

(Wikipedia:Articles for deletion/Log/2008 October 1)

These arguments, however, fail to convince the other editors, who continue to contend that these sources are unreliable, even after the above editor attempts to explain that, “Ox Fanzine has published 80 issues since 1988, and is the largest punk rock fanzine in Germany. Moloko Plus is another major German punk rock fanzine with over 30 issues released. Distorted Magazine from the UK is a very unique flash-based online magazine with over 20 issues published. Est.hu is a major Hungarian entertainment portal. Southspace.de, thepunksite.com, and kvakpunkrock.cz are all punk music portals with hundreds of reviews published, and having a significant readership. Also, Left Of The Dial (USA) was originally a respected print zine, before the author decided to go on as a blog.” (Wikipedia:Articles for deletion/Log/2008 October 1) One editor arguing for deletion writes, “Running a quick google search turns up the classic ‘myspace own website irrelevant’ results,” suggesting that perhaps here the same critical Google reading skills used by Wikipedia editors to prevent for-profit and self-promotional articles might here be being used to discount sub-cultural sources in the absence of mainstream media acceptance. Ultimately, the article is deleted.

under: Diss Fragments

Leave a Reply

Your email address will not be published. Required fields are marked *