Tag Archives: election

Not Just #GE2015: Other 2015 Polling Failures

The failure of pollsters in #GE2015 was covered widely, including in a previous post here, but it was not the only opinion polling failure that year. Polls also failed to predict results by wide margins in the national elections in Israel and Poland.

The history of polling as we know it now is pretty short, dating back only to George Gallop in the US in the 1930s. Gallup successfully predicted Franklin D. Roosevelt’s win in the 1936 election, using a representative sample. The popular magazine The Literary Digest had predicted that Alfred Landon would win by a landslide. Unlike Gallup, The Literary Digest’s poll was based on an nonrepresentative sample – the magazine provided postcards for any reader to mail in with their preference.

The actual method used now by pollsters varies but most rely on some kind of automated selection of landline telephones. It is possible for polling firms to call cellphones as well but it is much more expensive, so few firms do. Of course, as people drop landlines in favor of cellphones, the pool of people responding to polls becomes less and less representative of the general population.

Israel and the Surprise Reelection of Binyamin Netanyahu

“It wasn’t a good night for Israel’s pollsters. The average of pre-election polls showed Binyamin Netanyahu’s Likud party on 21 seats, trailing the centre-left Zionist Union led by Isaac Herzog by four seats.,” Alberto Nardelli wrote in The Guardian of Israel’s March 2015 election. Instead Netanyahu was comfortably reelected. What happened?

Nardelli notes that it could be due to last-minute changes of heart in the electorate…or systematic error in the polls. In Israel polls cannot be published in the four days leading up to the election and it is possible that voters decided for Netanyahu and his Likud party in those final hours. Avi Degani a Professor at Tel Aviv University and a pollster himself blamed the poll errors on Internet-based methodologies in an interview with CNN, noting that not all voters in Israel are equally represented online.

Poland and the Unexpected Loss of Bronisław Komorowski

Just as the British Polling Council announced that it intended to carry out an investigation into the failures of opinion polls leading up to #GE2015, the Polish Association of Market and Opinion Research Organizations stated too that it planned to investigate why opinion polls failed to predict the results of the May presidential election. Contrary to predictions that the incumbent Komorowski would be comfortably reelected, the relatively unknown Andrzej Duda won instead in both the first and second rounds of voting. In an interview with the Associated Press, Miroslawa Grabowska, director of the CBOS polling agency in Warsaw, said that undecided voters would feel forced to state a preference when polled and so would point to the household name of the sitting president. Jan Kujawski, director of research with Millward Brown in Poland, pointed the blame at the fall in number of households with landlines.
Time will tell if this pattern holds true for upcoming national elections, or if more polling firms improve their results by contacting prospective voters through cellphones or other methods.

Image credit to Flickr user Mortimer62. 

What if mentions were votes?

The last post looked at mention activity for each British constituency. What would happen if we took these mentions to be votes? Does this reaction from social media offer any potential insight into what might happen in the election? In the image below (top),  using the same week of Twitter data from Datasift and YourNextMP, we identify which party “won” the Twitter mention battle in each constituency. The blank constituency on the map is Buckingham (the speaker’s constituency), and we have of course excluded Northern Ireland and Plaid Cymru entirely, which was done purely to limit the number of parties and hence make the job a bit more feasible in real time.

Of course, as we highlight in the previous post, there is a strong relationship between the amount of times a candidate tweeted and the amount of mentions they got: and we don’t want to just measure how much effort candidates have been putting in online, but the relative level of attention they generate. Hence in the map we divide the overall number of mentions of a candidate by the amount of tweets they published themselves, giving us a kind of relative measure of a candidate’s impact on Twitter.

The map below ours is a constituency level forecast based on polling data for the purposes of comparison, lifted straight from our colleagues at electionforecast.co.uk.


Constituency level Twitter winners


Constituency level prediction from http://www.electionforecast.co.uk/

As you can see the number of seats “won” in the Twitter vote diverges significantly from the electionforecast.co.uk model (which is, of course, much closer to what is actually going to happen), but is nevertheless not entirely unrealistic. Labour are understated to a large degree, whilst the reverse is true for UKIP and the Green Party. Labour, Liberal Democrats and SNP are somewhere within the ball park (+/- 30).

Of course, we didn’t really expect this type of method to offer a perfect “prediction” of the election: it would be a major surprise (and probably a coincidence) if it did. My guess is it indicates more something about the loyal / activist base present in a constituency than voter levels. Hence it will be interesting to see if the seats given to some of the more minor parties using this method are areas where these parties do surprisingly well or beat the national trend. For example, are the 35 Green Party constituencies we highlight places where the Greens manage to make a major improvement on their vote share?

Social Media + Elections: A Recap

OII - GE2015 - Candidate Activity on Twitter - Bright, Hale - web

From Jonathan Bright and Scott Hale’s blog post on Twitter Use.

In the run-up to the general election we conducted a number of investigations into relative candidate and party use of social media and other online platforms. The site elections.oii.ox.ac.uk has served as our hub for elections-related data analysis. There is much to look over, but this blog post can guide you through.


What if mentions were votes?” by Jonathan Bright and Scott Hale

Which parties are having the most impact on Twitter?” by Jonathan Bright and Scott Hale

The (Local) General Election on Twitter” by Jonathan Bright

Where do people mention candidates on Twitter?” by Jonathan Bright

Twitter + Wikipedia 

Online presence of the General Election Candidates: Labour Wins Twitter while Tories take Wikipedia” by Taha Yasseri


Which parties were most read on Wikipedia?” by Jonathan Bright

Does anyone read Wikipedia around election time?” by Taha Yasseri

Google Trends

What does it mean to win a debate anyway?: Media Coverage of the Leaders’ Debates vs. Google Search Trends” by Eve Ahearn

Social Media Overall

Could social media be used to forecast political movements?” by Jonathan Bright

Social Media are not just for elections” by Helen Margetts

Does anyone read Wikipedia around the election time?

I already have written about the Wikipedia-Shapps story. So, that is not the main topic of this post! But when that topic was still hot, some people asked me whether I think anyone ever actually reads the Wikipedia articles about politicians? Why should it be important at all what is written in those articles? This post tackles that question. How much do people refer to Wikipedia to read about politics, specially around the election time?

Let’s again consider the Shapps’ case. Below, you can see number of daily page views of  of the Wikipedia article about him.

Screenshot from 2015-05-04 22:57:35

As you see, there are two HUGE peaks of around 7,000 and 14,500 views per day on top of a rather steady daily page view of sub-1000. The first peak appeared when “he admitted that he had [a] second job as ‘millionaire web marketer’ while [he was] MP“, and  the second one when the Wikipedia incident happened. Interesting to me is that while the first peak is related a much more important event, the second peak related to what I tend to call a minor event, is more than twice as large as the first one. Ok, so this might be just the case of Shapps and mostly due to media effects surrounding the controversy. How about the other politicians, say the party leaders? See the diagrams below.

Screenshot from 2015-05-04 22:57:45

A very large peak is evident in all the curves for all the party leaders with a peak of 22,000 views per day for Natalie Bennett, the leader of the Green party. Yes, that’s due to the iTV leaders’ debate on the 2nd of April. If you saw our previous post on search behaviour, you shouldn’t be surprised; surprising is the absence of a second peak around the BBC leaders’ debate on 16th of April, especially when you see the diagrams from our other post on Google search volumes.

How about the parties? How many people read about them on Wikipedia? Check it out below.

Screenshot from 2015-05-04 22:57:52

Here, there seems to be a second increase in the page views after the BBC debate on 16th April. Moreover, there is an ever widening separation between the curves of Tory-Labour-UKIP and LibDem-Green-SNP curves. This is very interesting, as Tories and Labours are the most established English parties, whereas the UKIP is among the newest ones. That’s very much related to our project on understanding the patterns of online information seeking around election times.

Online presence of the General Election Candidates: Labour wins Twitter while Tories take Wikipedia

Some have called the forthcoming UK general election a Social Media Election. It might be a bit of exaggeration, but there is no doubt that both candidates and voters are very active on social media these days and take them seriously. The Wikipedia-Shapps story of last week is a good example showing how important online presence is for candidates, journalists, and of course voters. We don’t know how important this presence is in terms of shaping the votes, but at least we can look into the data and gauge the presence of the candidates and the activity of the supporters. In this post and some others we present statistics of online activity of parties, candidates, and of course voters. For an example, see the previous post on the searching behaviour of citizens around the debate times.

Who is on Twitter?

Candidates and parties are very much debated by supporters on social media, particularly Facebook and Twitter. But how active are candidates themselves on these platforms? In this post we show simply how many candidates from each party and in which constituencies have a Twitter account. Some of them might be more active than others and some might tweet very rarely, and we will analyse this activity in the next posts. Here we count only who has any kind of publicly known account.


Geographical distribution of candidates who have Twitter account.

The figure above shows the geographical distributions of candidates for each party and whether they have a Twitter account. There are some interesting results in there. For example, Labour has the largest number of Twitter-active candidates, whereas ALL the SNP candidates tweet. While LibDem and Green parties have the same number of accounts, normalised by the overall number of constituencies that they are standing in, Green seems to be more Twitter-enthusiastic. UKIP loses the Twitter game both in absolute number and proportion.

Who is on Wikipedia?

Having a Twitter account is something of a personal decision.  A candidate decides to have one and it’s totally up to them what to tweet. The difference in the case of Wikipedia, is that ideally candidates would not create or edit one about themselves. Also the type of information that you can learn about a candidate on their Wikipedia page is very different to what you can gain by reading their tweets.

Geographical distribution of the candidates, whom Wikipedia has an article about.

Geographical distribution of the candidates, whom Wikipedia has an article about.

The figure above shows the constituencies that the candidates standing in are featured in the largest online encyclopaedia, Wikipedia. Here, Tories are the absolute winners, in terms of the number of articles. Greens are the least “famous” candidates and LibDem are well behind the big two. In the next post we will explore often voters turn to Wikipedia to learn about the parties and candidates, and I’m sure by reading that you’ll be convinced that being featured on Wikipedia is important!


All right, so far, Labour won Twitter presence and Tories took Wikipedia (remember all the SNP’s also have a Twitter account). But how about the gender of the candidates? Is there any gender-related feature in social presence pattern of the candidates?

First let’s have a look at the gender distribution of the candidates.

Geographical distribution of the candidates colour-coded by gender.

Geographical distribution of the candidates colour-coded by gender.

As you see in the figure above, there are fewer female candidates than male ones across all the parties. Only 12% of the UKIP candidates are female while the Greens have the highest proportion at 38%. Tories sit right next to UKIP on the list of the most male oriented parties. There is also a clear pattern that most of the constituencies in the centre have male candidates.

How about social media?

Among all the candidates, 20% of male candidates are featured in Wikipedia, whereas this is about 17% for female candidates. Almost half of the Tories male candidates are in Wikipedia, whereas this goes down to 28% for their female counterparts. Only Labour female candidates have more coverage in Wikipedia compared to the males of the party, but the difference is marginal. ّIn all the other parties, males have a higher coverage rate. The tendency of Wikipedia to pay more attention to male figures is a very well known fact. 

Twitter is different. Slightly more female candidates (76%) have a Twitter account than male candidates (69%). Almost all (96%) of Labour females tweet, and Tory female candidates are more active than their male candidates. This pattern however is lost for the UKIP candidates, as 52% of their males are on Twitter compared to only 44% of their female candidates (who have the lowest rate among all the party-gender groups).


The data that we used to produced the maps and figures come mainly from a very interesting crowd-sourced project called yournextmp. However, we further validated the data using the Wikipedia and Twitter API’s. If you want to have a copy, just get in touch!

Wikipedia and Shapps: Sockpuppetry, Conflict of Interest, or None?

Taha Yasseri

Will the real Grant Shapps please stand up? ViciousCritic/Totally Socks, (CC BY-NC-SA)

You must have heard about the recent accusation of Grant Shapps by the Guardian. Basically, the Guardian claims that Shapps has been editing his own Wikipedia page and “Wikipedia has blocked a user account on suspicions that it is being used by the Conservative party chairman, Grant Shapps, or someone acting on his behalf”.

In a short piece that I wrote for The Conversation I try to explain how these things work in Wikipedia, what they mean,  and basically how unreliable these accusations are.

There are two issues here:

First, conflict of interest, for which Wikipedia guidelines suggest that “You should not create or edit articles about yourself, your family, or friends.” But basically it’s more a moral advice, because it’s technically impossible to know the real identity of editors. Unless the editors disclose their personal information deliberately.

The second point is that the account under discussion is banned by a Wikipedia admin not because of conflict of interest (which is anyway not a reason to ban a user), but Sockpuppetry: “The use of multiple Wikipedia user accounts for an improper purpose is called sock puppetry”. BUT, Sockpuppetry is not generally a good cause for banning a user either. It’s prohibited, only when used to mislead the editorial community or violate any other regulation.

Sock puppets are detected by certain type of editors who have very limited access to confidential data of users such as their IP-addresses, their computed and operating systems settings and their browser. This type of editor is called a CheckUser, and I used to serve as a CheckUser on Wikipedia for several years.

In this case the accounts that are “detected” as sock puppets have not been active simultaneously — there is a gap of about 3 years between their active periods. And this not only makes it very hard to claim that any rule or regulation is violated, but also, for this very long time gap, it is technically impossible for the CheckUser to observe any relation between the accounts under discussion.

Actually, the admin who has done the banning admits that his action has been mostly because of behavioural similarity (similarity between the edits performed by the two users and their shared political interests).

Altogether, I believe the banning has no reliable grounds and it’s based on pure speculation, and also the Guardian accusations are way beyond what you can logically infer from the facts and evidence.


This post has been cross-posted to the Oxford Internet Institute’s  Elections and the Internet blog.

Subjectivity and Data Collection in a “Big Data” Project


There remains a mistaken belief that qualitative researchers are in the business of interpreting stories and quantitative researchers are in the business of producing facts.” (boyd & Crawford, 2012) The Social Election Prediction project is once again in the data collection phase and we’re here to discuss some of the data collection decision points we have encountered thus far or, in other words, the subjective aspect of big data research. This is not to denigrate this type of quantitative research. The benefits of big data for social science research are too numerous to list here and likely any reader of this blog is more than familiar. In the era of big data, human behaviour that was previously only theorized is now observable at scale and quantifiable. This is particularly true for the topic of this project, information seeking behaviour around elections. While social scientists have long studied voting behaviour, historically they have had to rely on self-reported surveys for signals as to how individuals sought information related to an election.

Now, certain tools such as Wikipedia and Google Trends provide an outside indication as to how and when people search for information on political parties and politicians. However, although Wikipedia page views are not self-reported, this does not mean that they are objective. Wikipedia data collection requires the interjection of personal interpretation; the typical measure of subjectivity. These decisions tend to fall into two general categories: the problem of individuation and the problem of delimitation.

When is something considered a separate entity and when should it be grouped? The first is a frequently occurring question in big data collection. For this project, this question has reoccurred with party alliances and two-round elections. If we are collecting Wikipedia pages to study information-seeking behaviour related to elections, should we consider views only of the page of a party alliance or of the individual party as well? This is a problem of individuation, deciding when to consider discrete entities as disparate and when to count them as a single unit. The import of party alliances varies by country but big data collection necessitates uniformity for the analysis stage. So, a decision must be made. The same issue arises with two-round elections. Should they be considered as one election instance or two? Again, a uniform decision is necessary for the next step of data analysis.

For decisions of delimitation one must set a logical boundary on something continuous. Think, time. For the Social Election Prediction project, we are collecting the dates of all of the elections under consideration, so that we can compare the Wikipedia page views for the various political parties involved prior to the election. For most electoral systems, the date of an election is simple, but for countries like Italy and the Czech Republic with two-day elections, the question of when to end the information-seeking period arises. The day before the election begins? After the first day? There is uniform data solution to this question, only yet another subjective decision by the data collector.

In the article quoted above, boyd and Crawford question the objectivity of data analysis but the subjective strains in big data research begin even earlier, with the collection stage. Data is defined in the collection stage, and these definitions, as with the analysis, can be context specific. Social media research faces the same definitional problems but many of the collection decisions have already been made by social media platform. Of course, same criticisms could be raised about traditional statistical analysis as well. While there may be unique benefits to big data research, it faces many of the same problems as previous research methods. Big data often seen as some sort of “black box” but the process of building that box can be just as subjective as qualitative research.


This post has been cross-posted to the Oxford Internet Institute’s Elections and the Internet blog.