Wikipedia and Shapps: Sockpuppetry, Conflict of Interest, or None?

Taha Yasseri

Will the real Grant Shapps please stand up? ViciousCritic/Totally Socks, (CC BY-NC-SA)

You must have heard about the recent accusation of Grant Shapps by the Guardian. Basically, the Guardian claims that Shapps has been editing his own Wikipedia page and “Wikipedia has blocked a user account on suspicions that it is being used by the Conservative party chairman, Grant Shapps, or someone acting on his behalf”.

In a short piece that I wrote for The Conversation I try to explain how these things work in Wikipedia, what they mean,  and basically how unreliable these accusations are.

There are two issues here:

First, conflict of interest, for which Wikipedia guidelines suggest that “You should not create or edit articles about yourself, your family, or friends.” But basically it’s more a moral advice, because it’s technically impossible to know the real identity of editors. Unless the editors disclose their personal information deliberately.

The second point is that the account under discussion is banned by a Wikipedia admin not because of conflict of interest (which is anyway not a reason to ban a user), but Sockpuppetry: “The use of multiple Wikipedia user accounts for an improper purpose is called sock puppetry”. BUT, Sockpuppetry is not generally a good cause for banning a user either. It’s prohibited, only when used to mislead the editorial community or violate any other regulation.

Sock puppets are detected by certain type of editors who have very limited access to confidential data of users such as their IP-addresses, their computed and operating systems settings and their browser. This type of editor is called a CheckUser, and I used to serve as a CheckUser on Wikipedia for several years.

In this case the accounts that are “detected” as sock puppets have not been active simultaneously — there is a gap of about 3 years between their active periods. And this not only makes it very hard to claim that any rule or regulation is violated, but also, for this very long time gap, it is technically impossible for the CheckUser to observe any relation between the accounts under discussion.

Actually, the admin who has done the banning admits that his action has been mostly because of behavioural similarity (similarity between the edits performed by the two users and their shared political interests).

Altogether, I believe the banning has no reliable grounds and it’s based on pure speculation, and also the Guardian accusations are way beyond what you can logically infer from the facts and evidence.

Brief History of Political Wikipedia

ParliamentEdits Wikipedia places among the top Google results for almost all topics – including political parties and politicians. This is why this Social Election Prediction Project exists; when voters seek information before an election they may turn to Wikipedia. Yet the earliest days of Wikipedia featured little political content. While the site itself was founded on January 11th 2001, the first page for a political party appears to be that for the Green Party of the United States, created months later, on September 19th 2001. Early contributors were perhaps more interested in, or more interested in spreading information about, fringe parties as fellow minority party the Libertarian Party in the United States also had a Wikipedia page before one was created for the Republican or Democratic parties. Today contributors are quick to update Wikipedia political pages after elections and there is even often a Wikipedia page dedicated to the election itself. Several weeks ahead of the UK General Election for example, the Wikipedia page for it is already thousands of words long, full of descriptions of the leader debate and various seat predictions. However, Wikipedia did not cover the UK General Election back in 2001. The pages for the Conservative and Labour parties were not created until well after the June election of that year. (Interestingly the pages for those two parties and the page for the Liberal Democrats were all created on the same day, October 11th 2001.) While Wikipedia might currently be a quickly updated source for political information, the medium’s open-editing policy has created some controversies as political figures around the world have been accused of favorably editing their own pages. In 2014, Hindustan Times covered a number of  Indian politicians with suspiciously clean Wikipedia pages ahead of state elections, writing “The profile of former Mayor and Shiv Sena corporator Shraddha Jadhav, who has been eyeing the Sion Koliwada assembly seat, mentions that she is known for her ‘elegant dressing’, her ‘fashion sense’ and ‘her crisp cotton sarees’, along with describing her as an articulate corporator.” but that the Wikipedia page “has no mention of the controversies that plagued her term as well as that she lost a by-poll she had contested in 2006.” The descriptors still remain on Jadhav’s page, perhaps because they are cited to an article from the Hindustan Times itself. In 2006, the Massachusetts newspaper the Lowell Sun reported that a staff member in the office of the U.S. Representative Marty Meehan had tried to replace the congressman’s entire Wikipedia page with a staff-written bio. Wikipedia political pages can be edited to troll as well as just to mislead. In 2014, users from inside the US Congress were briefly banned from editing Wikipedia altogether after a contributor added content that “accuse[d] Donald Rumsfeld of being an alien lizard and Cuba of faking the moon landings.” Tools such as ParliamentEdits and CongressEdits, in the UK and US respectively, help monitor politically motivated edits. The tools are automated Twitter accounts that tweet out anytime a user associated with an IP address within the legislatures edits Wikipedia. Inspired by the first such tool, ParliamentEdits, people have created similar Twitter bots for the legislatures in Australia, Israel and Greece. The openness of Wikipedia is what allows for political tampering but the site’s transparency is also then what enables watchdogs to pinpoint actors behind fishy edits. There are many countries where government officials hold influence over the press; but no information source that allows to users to trace the trail of the article creation as easily as Wikipedia.

Ethics of Wikipedia Research

Ethics of Editing

The election results on this Wikipedia page are wrong, I can tell. As we collect data for the Social Election Prediction Project, I am reviewing many a Wikipedia political party page and every so often I see mistakes. For this project I am checking that the page exists, ensuring that the page existed before the date of the election so that a voter could have used it to find out political information beforehand. I am not, it should be noted, checking for accuracy of information. Yet sometimes there are errors that glare. As an occasional Wikipedia editor and a stickler for correcting errors, I feel a strong urge to correct the mistakes I come across. Yet, as an academic looking at this page in a research context I am hesitant to alter that which I am studying. What are the ethical boundaries for academics conducting research on Wikipedia?

In 2012, Okoli et al. wrote an overview of scholarship on Wikipedia, a huge and varied field, totaling almost 700 articles in peer-reviewed journals in disciplines ranging from Computer Science, to Economics to Philosophy (Okoli et al, 2012).  The Okoli article, titled, “The people’s encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia,” is comprehensive on the subject of all Wikipedia research up to that date, but does not deal extensively with ethics. The ethical issues that are addressed are those that are linked with privacy concerns of studying the Wikipedia community. In their article on using wikis for research, Gerald Kane and Robert Fishman note that while all Wikipedia data is available under General Public License, or GPL, and so can be used without copyright concerns, researchers should still be cognizant of the privacy of Wikipedia editors (Kane & Fishman, 2009). For example many of the editors Kane and Fishman interacted with were hesitant to connect their real world identity with that of their identity on Wikipedia, and so did not want to conduct conversations through email or any other platform.

Of course, acting as a part of a community is not always a research taboo. Participatory action research, a method that arose from psychologist’s Kurt Lewin’s action research, emphasizes collaboration between researchers and the communities at hand. However, while participatory action research could apply for someone editing a Wikipedia article, studying the behavior of other editors and working with other editors to define the study, Wikipedia editors are not the subjects of the Social Election Prediction Project. The Social Election Prediction Project is a study of Wikipedia as an informational object. The subjects are voters seeking information before an election, and Wikipedia is simply a tool to help us measure their information-seeking behavior.

The ethical ambiguities of researching Wikipedia are just a symptom of Web 2.0., where everyone is a potential contributor. The same question could be asked of researchers studying Twitter for example, should they tweet? It depends on the objective of the study. For the Social Election Prediction Project, I have not edited any Wikipedia page that I am looking at for research purposes. While I could not alter the outcome for this specific project as we are looking at past elections and so historic page views, in some small way, improving political Wikipedia pages could make more people turn to Wikipedia for political news. However, I will continue to do minor edits for the Wikipedia pages I read in my own time. While not acting as researcher, I can be collaborator and reader both.

Kane, G., & Fichman, R. (2009). The Shoemaker’s Children: Using Wikis for Information Systems Teaching, Research, and Publication. Management Information Systems Quarterly, 33(1).
Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2012). The People’s Encyclopedia Under the Gaze of the Sages: A Systematic Review of Scholarly Research on Wikipedia. Retrieved from

Subjectivity and Data Collection in a “Big Data” Project


There remains a mistaken belief that qualitative researchers are in the business of interpreting stories and quantitative researchers are in the business of producing facts.” (boyd & Crawford, 2012) The Social Election Prediction project is once again in the data collection phase and we’re here to discuss some of the data collection decision points we have encountered thus far or, in other words, the subjective aspect of big data research. This is not to denigrate this type of quantitative research. The benefits of big data for social science research are too numerous to list here and likely any reader of this blog is more than familiar. In the era of big data, human behaviour that was previously only theorized is now observable at scale and quantifiable. This is particularly true for the topic of this project, information seeking behaviour around elections. While social scientists have long studied voting behaviour, historically they have had to rely on self-reported surveys for signals as to how individuals sought information related to an election.

Now, certain tools such as Wikipedia and Google Trends provide an outside indication as to how and when people search for information on political parties and politicians. However, although Wikipedia page views are not self-reported, this does not mean that they are objective. Wikipedia data collection requires the interjection of personal interpretation; the typical measure of subjectivity. These decisions tend to fall into two general categories: the problem of individuation and the problem of delimitation.

When is something considered a separate entity and when should it be grouped? The first is a frequently occurring question in big data collection. For this project, this question has reoccurred with party alliances and two-round elections. If we are collecting Wikipedia pages to study information-seeking behaviour related to elections, should we consider views only of the page of a party alliance or of the individual party as well? This is a problem of individuation, deciding when to consider discrete entities as disparate and when to count them as a single unit. The import of party alliances varies by country but big data collection necessitates uniformity for the analysis stage. So, a decision must be made. The same issue arises with two-round elections. Should they be considered as one election instance or two? Again, a uniform decision is necessary for the next step of data analysis.

For decisions of delimitation one must set a logical boundary on something continuous. Think, time. For the Social Election Prediction project, we are collecting the dates of all of the elections under consideration, so that we can compare the Wikipedia page views for the various political parties involved prior to the election. For most electoral systems, the date of an election is simple, but for countries like Italy and the Czech Republic with two-day elections, the question of when to end the information-seeking period arises. The day before the election begins? After the first day? There is uniform data solution to this question, only yet another subjective decision by the data collector.

In the article quoted above, boyd and Crawford question the objectivity of data analysis but the subjective strains in big data research begin even earlier, with the collection stage. Data is defined in the collection stage, and these definitions, as with the analysis, can be context specific. Social media research faces the same definitional problems but many of the collection decisions have already been made by social media platform. Of course, same criticisms could be raised about traditional statistical analysis as well. While there may be unique benefits to big data research, it faces many of the same problems as previous research methods. Big data often seen as some sort of “black box” but the process of building that box can be just as subjective as qualitative research.

#indyref on Wikipedia

My colleague Taha Yasseri and I are currently working on a Fell Fund project on social media data and election prediction, looking especially at data from Google and Wikipedia (first paper out soon; will also be presenting on that at IPP 2014 which should be great). As part of that we thought we’d have a bit of fun looking at Scotland’s independence referendum on Wikipedia.

For election prediction the method is relatively straightforward: examine readership stats on the party Wikipedia pages of the country in question, and see which page is read the most (of course that doesn’t correspond straight away to election results – would that life were so simple – and the idea of the project is to see what corrections and biases need to be accounted for to make it work). It isn’t quite so clear how to do that for Scotland, but (just for fun really) we compared the following pages:


First we look at the UK and Scotland -> interesting how Scotland has leapfrogged the UK in the last days of the independence campaign. Points to a yes victory?


In terms of flags, though, the Union Jack is well ahead of the Saltire, peaking in the last few days. Is it a last minute outbreak of unionism?


In terms of national dishes, meanwhile, Haggis has been dominating Fish and Chips for the full period of the campaign, with interest in Haggis especially spiking in the last couple of days.

Well, one of these graphs will predict the winner of the referendum: we just don’t know which one ;-) More seriously, I think its interesting how most of these terms are spiking in the days before the vote, showing again how the social web really responds to political events.

UPDATE: Taha has passed me the comparison of the Yes and No campaign pages, as below. Yes for a narrow win following months of No dominance – you heard it here first.


Creating transnational political links with euandi & Facebook

Jonathan Bright

euandi front

Just wanted to put up a quick plug for the euandi voting advice application [VAA] which has recently been launched by the European University Institute. I was one of the 100 or so political scientists across Europe who got together to produce the application. Fill in a short questionnaire and it will tell you the extent to which other parties share your values (both in your country and across Europe), as well as telling you which other areas in Europe contain like minded individuals.

My political europe

There are lots of VAAs around at the moment; the novelty with this one is that there is the option to then go on to connect to people with similar political views through Facebook, with the aim of (for example) getting a European Citizens’ Initiative started. Overall aim is to promote more transnational politics in the European Parliament elections, which are at the moment almost overwhelmingly dominated by national political concerns. Pretty neat stuff and it will be interesting to see what comes of it over the next month or so.

Media effect or media replacement?

Taha Yasseri, Jonathan Bright

Online political information seeking, at least in the data we’ve gathered so far, happens in short, concentrated bursts. When we began the project, I (JB) was hoping that these bursts would tell us something about how people inform themselves about contemporary democratic politics. However we quickly saw in our first post that the peak of information seeking activity falls after the election itself takes place.

How can this be explained? So far we’ve been toying with two theories, developed out of the observations below. One: this behaviour is driven by news media coverage. People see the elections reported on TV or in the papers, then look them up online to find out more. If the peak in news coverage coincides with the day of the election, and its aftermath, then its logical that the peak of information seeking would occur shortly after that.

The second is that this behaviour is instead kind of replacing news coverage. People want to know the result of the election, especially if they participated, but for whatever reason the news media does an ineffective job of informing them of the result, so they look online instead.

One way of trying to distinguish between these two theories is by looking at information seeking activity during the European Parliament elections in countries with different election dates. The European elections in 2009 ran from the 4th to the 7th of June, but the final results were only announced on the 7th. Countries voting on the 4th, 5th and 6th would therefore have had a kind of information gap, whereby voters couldn’t find out the precise result of the elections. If information seeking is driven by a media effect, we might expect these countries to peak on the 8th (when the results are reported). If it is driven by a media replacement strategy, we would expect it to come the day after the relevant country’s election. Right?

Below are the info seeking graphs for the Netherlands and the Czech republic. As usual we are looking at page views of the Wikipedia page for the 2009 European Parliament elections in the language of the country of interest (so the Dutch and Czech versions). The Netherlands voted on the 4th of June, while the Czech Republic voted on the 5th.




Somehow, these graphs offer support for both theories, because they contain two peaks, one the day after the elections, and one the day after the results were reported. Well, none of this is perfect of course. Just because they can’t report the final result doesn’t mean the media can’t report (I believe) regional results within their country, and I think exit polls are also allowed. Finally they may just cover the elections on the day, even without reporting the results in detail. So media effects could still be the driving force. More importantly perhaps, the two theories aren’t really mutually exclusive.

In future work we are going to look at other Wikipedia pages which are more specific to the country in question. This will allow us to look at other early voting countries which don’t have a unique language (Austria, the UK, Ireland and perhaps Cyprus if the stats are high enough).