Author Archives: Eve Ahearn

Not Just #GE2015: Other 2015 Polling Failures

The failure of pollsters in #GE2015 was covered widely, including in a previous post here, but it was not the only opinion polling failure that year. Polls also failed to predict results by wide margins in the national elections in Israel and Poland.

The history of polling as we know it now is pretty short, dating back only to George Gallop in the US in the 1930s. Gallup successfully predicted Franklin D. Roosevelt’s win in the 1936 election, using a representative sample. The popular magazine The Literary Digest had predicted that Alfred Landon would win by a landslide. Unlike Gallup, The Literary Digest’s poll was based on an nonrepresentative sample – the magazine provided postcards for any reader to mail in with their preference.

The actual method used now by pollsters varies but most rely on some kind of automated selection of landline telephones. It is possible for polling firms to call cellphones as well but it is much more expensive, so few firms do. Of course, as people drop landlines in favor of cellphones, the pool of people responding to polls becomes less and less representative of the general population.

Israel and the Surprise Reelection of Binyamin Netanyahu

“It wasn’t a good night for Israel’s pollsters. The average of pre-election polls showed Binyamin Netanyahu’s Likud party on 21 seats, trailing the centre-left Zionist Union led by Isaac Herzog by four seats.,” Alberto Nardelli wrote in The Guardian of Israel’s March 2015 election. Instead Netanyahu was comfortably reelected. What happened?

Nardelli notes that it could be due to last-minute changes of heart in the electorate…or systematic error in the polls. In Israel polls cannot be published in the four days leading up to the election and it is possible that voters decided for Netanyahu and his Likud party in those final hours. Avi Degani a Professor at Tel Aviv University and a pollster himself blamed the poll errors on Internet-based methodologies in an interview with CNN, noting that not all voters in Israel are equally represented online.

Poland and the Unexpected Loss of Bronisław Komorowski

Just as the British Polling Council announced that it intended to carry out an investigation into the failures of opinion polls leading up to #GE2015, the Polish Association of Market and Opinion Research Organizations stated too that it planned to investigate why opinion polls failed to predict the results of the May presidential election. Contrary to predictions that the incumbent Komorowski would be comfortably reelected, the relatively unknown Andrzej Duda won instead in both the first and second rounds of voting. In an interview with the Associated Press, Miroslawa Grabowska, director of the CBOS polling agency in Warsaw, said that undecided voters would feel forced to state a preference when polled and so would point to the household name of the sitting president. Jan Kujawski, director of research with Millward Brown in Poland, pointed the blame at the fall in number of households with landlines.
Time will tell if this pattern holds true for upcoming national elections, or if more polling firms improve their results by contacting prospective voters through cellphones or other methods.

Image credit to Flickr user Mortimer62. 


GE2015: Polling Problems vs. Information Seeking Biases

By my bedtime on the night of the last UK General Election, May 7 2015, one thing, at least, was clear. No matter who won, the pollsters lost. This is what makes the possibility of the Social Election Project so exciting – the flaws of traditional polling. Leading up to the election, the polls showed the Conservative Party and the Labour Party neck in neck; yet, the Tories went on to win handily. What in the world happened with the polls? In contrast, the Social Election Prediction Project focuses on predicting elections from online information seeking behavior such as search trends, a source with its own set of biases. How do the the biases of this method compare to the issues with the polls leading up to the election?

Problems with Traditional Polling
First, it is important to note that the polls leading up to GE2015 were not just wrong on the whole; they were wrong in a consensus. All predicted a tie, or close to it, between the two majority parties when the outcome was anything but. Why were the polls are all so biased in similar ways? A few ideas…

Because the polls report percentages, not seats. Comres, for example, in their final predictions set the Conservatives at 35 percent, Labour at 34 percent and UKIP at 12 percent nationally. Of course, due to the first past the post system, voting percentages do not translate neatly to seats in the House of Commons.

…Because of how the polls ask about specific constituency preference. FiveThirtyEight noted that their model would have been far more accurate if they used a more “generic” question about party preference, as opposed to the question they used regarding preference for candidate in the respondent’s specific constituency.

…Because voters changed their minds. Peter Kellner, the President of the polling firm YouGov insinuated as much in an interview with The Telegraph.

…Because of the effect of earlier polls showing the strength of the SNP. Early polls showed the SNP gaining in strength and as the New York Times noted that later in the election the “Conservatives had adroitly exploited fears among voters that the Labour Party would be able to govern only in coalition with the Scottish National Party.”

…Because of the perennial issue of shy tories? Long a known factor in UK polling, more people will vote Conservative than will declare such to a pollster.

…Because of the way the poll participants were recruited? Poll respondents are representative by age, sex and social class, but as The Guardian notes, there still might be other divides between people who will respond to an online and telephone poll and those that will not.

…Because of the results of the other polls. Market research firm Survation noted that their final poll predicted a Conservative victory much in line with the final result but the poll seemed so out of sync with all the others that they declined to published it. This is not the first time in recent memory that UK opinion polling was notably inaccurate. In 1992, opinion polls leading up to the election predicted a Labour victory and yet the Conservatives won handily. A group of pollsters convened an inquiry following the embarrassment of the 1992 election and pinned the problem on: voters switching late in the election, unrepresentativeness in the people polled and shy tories.

Biases in Online Information Seeking
Using information from online information seeking, such as search trends, can remedy some of the above problems. One does not have to adjust for shy tories for example, or the wording of the question as search trend data constitutes demonstrated, not reported, behavior. Furthermore, search trend data would change as a voter considered new options and so could accommodate strategic voting. However, using online information seeking data presents its own issues. While the majority of the UK population uses the Internet, according to the 2013 Oxford Internet Survey, there is still 22 percent of population that does not. These people will not be represented at all in search trends.

Furthermore, of the UK voters that do use the Internet, they may not use our data sources such as Wikipedia, when they are looking for information. Yet the social election prediction project encompasses far more countries than just the UK; the next blog post will discuss how polling practices – and polling reliability – varies around the world.

Image credit to Flickr user ThePictureDrome

Social Media + Elections: A Recap

OII - GE2015 - Candidate Activity on Twitter - Bright, Hale - web

From Jonathan Bright and Scott Hale’s blog post on Twitter Use.

In the run-up to the general election we conducted a number of investigations into relative candidate and party use of social media and other online platforms. The site has served as our hub for elections-related data analysis. There is much to look over, but this blog post can guide you through.


What if mentions were votes?” by Jonathan Bright and Scott Hale

Which parties are having the most impact on Twitter?” by Jonathan Bright and Scott Hale

The (Local) General Election on Twitter” by Jonathan Bright

Where do people mention candidates on Twitter?” by Jonathan Bright

Twitter + Wikipedia 

Online presence of the General Election Candidates: Labour Wins Twitter while Tories take Wikipedia” by Taha Yasseri


Which parties were most read on Wikipedia?” by Jonathan Bright

Does anyone read Wikipedia around election time?” by Taha Yasseri

Google Trends

What does it mean to win a debate anyway?: Media Coverage of the Leaders’ Debates vs. Google Search Trends” by Eve Ahearn

Social Media Overall

Could social media be used to forecast political movements?” by Jonathan Bright

Social Media are not just for elections” by Helen Margetts

What does it mean to win a debate anyway?: Media Coverage of the Leaders’ Debates vs. Google Search Trends

LeadersDebatesGoogleFollowing the April 2nd Leaders’ Debate the media portrayed Nicola Sturgeon, the leader of the Scottish National Party, as the victor of the night, or at least a victor. “Cameron was robotic but Sturgeon impressed” ran a headline for one of The Guardian’s post-debate pieces. The headline for The Independent went even further, stating “’Can I vote for the SNP?’ voters ask after Nicola Sturgeon’s winning performance.”

Based off of Google trends data though, one can see that Sturgeon did not dominate in this arena. She only briefly topped the Google trends for a few days of the week ending on Saturday the 4th; Sturgeon did not top the trends on the day off the debate itself. In fact, it was Natalie Bennett who was the most searched for party leader, not only in absolute value but also in the relative increase from the weeks before.

Google trends is not, of course, equivalent to voting likelihood. A high rate of searches could simply mean that the politician had little name recognition prior to the debate. However it does indicate information seeking behavior. Political predictions have historically been based on polls of user behavior.  Information seeking, as in people searching out more info on a candidate or politician, would have been overly difficult to trace. Of course, the Internet changes that.

Now we can see that while Nicola Sturgeon did have an overall increase in Google searches after the April 2nd debate, along with every other party leader, her bump was far greater in Scotland, as compared to searches in England.



While the candidates themselves inspired interest (or some of them at least), almost no one it seems was prompted to search for information about the parties. The patterns relating to the debates are barely discernable in the search trends. The Tories and the Lib Dems had particularly low rates of searches, and both were the only parties without any visible peak in interest whatsoever during the debates. This is in line with the findings so far of the Social Election Prediction Project: voters are more likely to seek information online about minority parties or new parties than they are to look up information on the political parties in power. This would explain the peak in searches for the minority Green Party on April 2nd.


*Note that the y-axis for Google search trends is not an absolute number of searches, but rather the relative number of searches for that term as compared to other terms searched for simultaneously. The relativity and timeframe is important in discussing these results. The graphs above all have the relative rate of search results averaged over each day. Following the April 2nd debate, Google provided search trend information to The Guardian that does not exactly match up with the above. According to that data, Leanne Wood was the most searched for politician “through the debate.” It’s possible that the difference with the above is that Google was measuring only searches that took place during the exact time of the debate, and was not including anyone prompted to seek new information afterwards. Apparently the most-Googled question during the debate though is a key one: “Who is winning the debate?”


This post has been cross-posted to the Oxford Internet Institute’s  Elections and the Internet blog.

Brief History of Political Wikipedia

ParliamentEdits Wikipedia places among the top Google results for almost all topics – including political parties and politicians. This is why this Social Election Prediction Project exists; when voters seek information before an election they may turn to Wikipedia. Yet the earliest days of Wikipedia featured little political content. While the site itself was founded on January 11th 2001, the first page for a political party appears to be that for the Green Party of the United States, created months later, on September 19th 2001. Early contributors were perhaps more interested in, or more interested in spreading information about, fringe parties as fellow minority party the Libertarian Party in the United States also had a Wikipedia page before one was created for the Republican or Democratic parties. Today contributors are quick to update Wikipedia political pages after elections and there is even often a Wikipedia page dedicated to the election itself. Several weeks ahead of the UK General Election for example, the Wikipedia page for it is already thousands of words long, full of descriptions of the leader debate and various seat predictions. However, Wikipedia did not cover the UK General Election back in 2001. The pages for the Conservative and Labour parties were not created until well after the June election of that year. (Interestingly the pages for those two parties and the page for the Liberal Democrats were all created on the same day, October 11th 2001.) While Wikipedia might currently be a quickly updated source for political information, the medium’s open-editing policy has created some controversies as political figures around the world have been accused of favorably editing their own pages. In 2014, Hindustan Times covered a number of  Indian politicians with suspiciously clean Wikipedia pages ahead of state elections, writing “The profile of former Mayor and Shiv Sena corporator Shraddha Jadhav, who has been eyeing the Sion Koliwada assembly seat, mentions that she is known for her ‘elegant dressing’, her ‘fashion sense’ and ‘her crisp cotton sarees’, along with describing her as an articulate corporator.” but that the Wikipedia page “has no mention of the controversies that plagued her term as well as that she lost a by-poll she had contested in 2006.” The descriptors still remain on Jadhav’s page, perhaps because they are cited to an article from the Hindustan Times itself. In 2006, the Massachusetts newspaper the Lowell Sun reported that a staff member in the office of the U.S. Representative Marty Meehan had tried to replace the congressman’s entire Wikipedia page with a staff-written bio. Wikipedia political pages can be edited to troll as well as just to mislead. In 2014, users from inside the US Congress were briefly banned from editing Wikipedia altogether after a contributor added content that “accuse[d] Donald Rumsfeld of being an alien lizard and Cuba of faking the moon landings.” Tools such as ParliamentEdits and CongressEdits, in the UK and US respectively, help monitor politically motivated edits. The tools are automated Twitter accounts that tweet out anytime a user associated with an IP address within the legislatures edits Wikipedia. Inspired by the first such tool, ParliamentEdits, people have created similar Twitter bots for the legislatures in Australia, Israel and Greece. The openness of Wikipedia is what allows for political tampering but the site’s transparency is also then what enables watchdogs to pinpoint actors behind fishy edits. There are many countries where government officials hold influence over the press; but no information source that allows to users to trace the trail of the article creation as easily as Wikipedia.


This post has been cross-posted to the Oxford Internet Institute’s Elections and the Internet blog.


Ethics of Wikipedia Research

Ethics of Editing

The election results on this Wikipedia page are wrong, I can tell. As we collect data for the Social Election Prediction Project, I am reviewing many a Wikipedia political party page and every so often I see mistakes. For this project I am checking that the page exists, ensuring that the page existed before the date of the election so that a voter could have used it to find out political information beforehand. I am not, it should be noted, checking for accuracy of information. Yet sometimes there are errors that glare. As an occasional Wikipedia editor and a stickler for correcting errors, I feel a strong urge to correct the mistakes I come across. Yet, as an academic looking at this page in a research context I am hesitant to alter that which I am studying. What are the ethical boundaries for academics conducting research on Wikipedia?

In 2012, Okoli et al. wrote an overview of scholarship on Wikipedia, a huge and varied field, totaling almost 700 articles in peer-reviewed journals in disciplines ranging from Computer Science, to Economics to Philosophy (Okoli et al, 2012).  The Okoli article, titled, “The people’s encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia,” is comprehensive on the subject of all Wikipedia research up to that date, but does not deal extensively with ethics. The ethical issues that are addressed are those that are linked with privacy concerns of studying the Wikipedia community. In their article on using wikis for research, Gerald Kane and Robert Fishman note that while all Wikipedia data is available under General Public License, or GPL, and so can be used without copyright concerns, researchers should still be cognizant of the privacy of Wikipedia editors (Kane & Fishman, 2009). For example many of the editors Kane and Fishman interacted with were hesitant to connect their real world identity with that of their identity on Wikipedia, and so did not want to conduct conversations through email or any other platform.

Of course, acting as a part of a community is not always a research taboo. Participatory action research, a method that arose from psychologist’s Kurt Lewin’s action research, emphasizes collaboration between researchers and the communities at hand. However, while participatory action research could apply for someone editing a Wikipedia article, studying the behavior of other editors and working with other editors to define the study, Wikipedia editors are not the subjects of the Social Election Prediction Project. The Social Election Prediction Project is a study of Wikipedia as an informational object. The subjects are voters seeking information before an election, and Wikipedia is simply a tool to help us measure their information-seeking behavior.

The ethical ambiguities of researching Wikipedia are just a symptom of Web 2.0., where everyone is a potential contributor. The same question could be asked of researchers studying Twitter for example, should they tweet? It depends on the objective of the study. For the Social Election Prediction Project, I have not edited any Wikipedia page that I am looking at for research purposes. While I could not alter the outcome for this specific project as we are looking at past elections and so historic page views, in some small way, improving political Wikipedia pages could make more people turn to Wikipedia for political news. However, I will continue to do minor edits for the Wikipedia pages I read in my own time. While not acting as researcher, I can be collaborator and reader both.

Kane, G., & Fichman, R. (2009). The Shoemaker’s Children: Using Wikis for Information Systems Teaching, Research, and Publication. Management Information Systems Quarterly, 33(1).
Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2012). The People’s Encyclopedia Under the Gaze of the Sages: A Systematic Review of Scholarly Research on Wikipedia. Retrieved from
This post has been cross-posted to the Oxford Internet Institute’s Elections and the Internet blog.

Subjectivity and Data Collection in a “Big Data” Project


There remains a mistaken belief that qualitative researchers are in the business of interpreting stories and quantitative researchers are in the business of producing facts.” (boyd & Crawford, 2012) The Social Election Prediction project is once again in the data collection phase and we’re here to discuss some of the data collection decision points we have encountered thus far or, in other words, the subjective aspect of big data research. This is not to denigrate this type of quantitative research. The benefits of big data for social science research are too numerous to list here and likely any reader of this blog is more than familiar. In the era of big data, human behaviour that was previously only theorized is now observable at scale and quantifiable. This is particularly true for the topic of this project, information seeking behaviour around elections. While social scientists have long studied voting behaviour, historically they have had to rely on self-reported surveys for signals as to how individuals sought information related to an election.

Now, certain tools such as Wikipedia and Google Trends provide an outside indication as to how and when people search for information on political parties and politicians. However, although Wikipedia page views are not self-reported, this does not mean that they are objective. Wikipedia data collection requires the interjection of personal interpretation; the typical measure of subjectivity. These decisions tend to fall into two general categories: the problem of individuation and the problem of delimitation.

When is something considered a separate entity and when should it be grouped? The first is a frequently occurring question in big data collection. For this project, this question has reoccurred with party alliances and two-round elections. If we are collecting Wikipedia pages to study information-seeking behaviour related to elections, should we consider views only of the page of a party alliance or of the individual party as well? This is a problem of individuation, deciding when to consider discrete entities as disparate and when to count them as a single unit. The import of party alliances varies by country but big data collection necessitates uniformity for the analysis stage. So, a decision must be made. The same issue arises with two-round elections. Should they be considered as one election instance or two? Again, a uniform decision is necessary for the next step of data analysis.

For decisions of delimitation one must set a logical boundary on something continuous. Think, time. For the Social Election Prediction project, we are collecting the dates of all of the elections under consideration, so that we can compare the Wikipedia page views for the various political parties involved prior to the election. For most electoral systems, the date of an election is simple, but for countries like Italy and the Czech Republic with two-day elections, the question of when to end the information-seeking period arises. The day before the election begins? After the first day? There is uniform data solution to this question, only yet another subjective decision by the data collector.

In the article quoted above, boyd and Crawford question the objectivity of data analysis but the subjective strains in big data research begin even earlier, with the collection stage. Data is defined in the collection stage, and these definitions, as with the analysis, can be context specific. Social media research faces the same definitional problems but many of the collection decisions have already been made by social media platform. Of course, same criticisms could be raised about traditional statistical analysis as well. While there may be unique benefits to big data research, it faces many of the same problems as previous research methods. Big data often seen as some sort of “black box” but the process of building that box can be just as subjective as qualitative research.


This post has been cross-posted to the Oxford Internet Institute’s Elections and the Internet blog.