Could social media forecast political movements?

GE2015 turned out to be a bad night for some. Beyond the obvious political parties, the reputation of polling firms took a big hit: while the exit poll got more or less in the ball park, none of the pre-election polls were anywhere near. This, combined with the advance of the SNP, UKIP and Greens, lent the whole election a real “earthquake” feel, with people like David Dimbleby questioning whether politicians would ever take polling seriously again.

Considering the weaknesses of conventional polling, could social media have filled a gap in terms of forecasting the earthquake that was to come? Were people on Twitter in advance of the opinion polls?

The data we produced last night produces a mixed picture. We were able to show that the Liberal Democrats were much weaker than the Tories and Labour on Twitter, whilst the SNP were much stronger; we also showed more Wikipedia interest for the Tories than Labour, both things which chime with the overall results. But a simple summing of mention counts per constituency produces a highly inaccurate picture, to say the least (reproduced below): generally understating large parties and overstating small ones. And it’s certainly striking that the clearly greater levels of effort Labour were putting into Twitter did not translate into electoral success: a warning for campaigns which focus solely on the “online” element.


In terms of prediction the problem here, of course, is that there are many potential statistics which could be produced by social media, and many potential metrics to predict (from vote shares, to swings, to turnouts etc.). Some of them are bound to be “right” after the fact. In response to this, Taha Yasseri and I have recently written a draft paper trying to produce social election predictions more systematically using Wikipedia data. The main premise is that we need a theory informed model to drive social media predictions, which is based on an understanding of how the data is generated and hence enables us to correct for certain biases.

How could we apply this reasoning to our Twitter data? Well one of the suggestions we made last night was that, even though we were sure the Green Party wasn’t going to win the 46 constituencies shown on our Twitter map, perhaps these areas were nevertheless places where the Green vote was going to spike upwards disproportionately (they might, for instance, indicate a highly organised local party machine which would be capable of delivering extra votes). In order to check this, I took results data for the Green Party and UKIP from 50 constituencies in England and Wales (good data tables for the election results still haven’t been released – so I’m limited to the amount I could quickly collect by hand). The graph below plots the amount of percentage points each party’s results increased by against the amount of Twitter mentions candidates received in the run up to the election in each constituency.

Percentage point vote increase vs Twitter Mentions

Overall on the graph there is little apparent correlation for UKIP candidates; Green Party candidates show by contrast a rough though by no means perfect positive correlation. In other words, for the Green Party the Twitter mentions have a little predictive power, whereas for UKIP they have none at all. What is more striking is that the points on the graph group clearly into two sections: UKIP increasing more than their mentions would suggest, whilst the reverse is true for the Greens. This highlights one of the major difficulties in making predictions from social media: that voters of different parties make different uses of social media, and a predictive model would need to take these differences into account.

Once the results are announced in full, over the next few weeks we will be looking into this in more detail, for all parties, and across a wider range of metrics.


Social Media + Elections: A Recap

OII - GE2015 - Candidate Activity on Twitter - Bright, Hale - web

From Jonathan Bright and Scott Hale’s blog post on Twitter Use.

In the run-up to the general election we conducted a number of investigations into relative candidate and party use of social media and other online platforms. The site has served as our hub for elections-related data analysis. There is much to look over, but this blog post can guide you through.


What if mentions were votes?” by Jonathan Bright and Scott Hale

Which parties are having the most impact on Twitter?” by Jonathan Bright and Scott Hale

The (Local) General Election on Twitter” by Jonathan Bright

Where do people mention candidates on Twitter?” by Jonathan Bright

Twitter + Wikipedia 

Online presence of the General Election Candidates: Labour Wins Twitter while Tories take Wikipedia” by Taha Yasseri


Which parties were most read on Wikipedia?” by Jonathan Bright

Does anyone read Wikipedia around election time?” by Taha Yasseri

Google Trends

What does it mean to win a debate anyway?: Media Coverage of the Leaders’ Debates vs. Google Search Trends” by Eve Ahearn

Social Media Overall

Could social media be used to forecast political movements?” by Jonathan Bright

Social Media are not just for elections” by Helen Margetts

Does anyone read Wikipedia around the election time?

I already have written about the Wikipedia-Shapps story. So, that is not the main topic of this post! But when that topic was still hot, some people asked me whether I think anyone ever actually reads the Wikipedia articles about politicians? Why should it be important at all what is written in those articles? This post tackles that question. How much do people refer to Wikipedia to read about politics, specially around the election time?

Let’s again consider the Shapps’ case. Below, you can see number of daily page views of  of the Wikipedia article about him.

Screenshot from 2015-05-04 22:57:35

As you see, there are two HUGE peaks of around 7,000 and 14,500 views per day on top of a rather steady daily page view of sub-1000. The first peak appeared when “he admitted that he had [a] second job as ‘millionaire web marketer’ while [he was] MP“, and  the second one when the Wikipedia incident happened. Interesting to me is that while the first peak is related a much more important event, the second peak related to what I tend to call a minor event, is more than twice as large as the first one. Ok, so this might be just the case of Shapps and mostly due to media effects surrounding the controversy. How about the other politicians, say the party leaders? See the diagrams below.

Screenshot from 2015-05-04 22:57:45

A very large peak is evident in all the curves for all the party leaders with a peak of 22,000 views per day for Natalie Bennett, the leader of the Green party. Yes, that’s due to the iTV leaders’ debate on the 2nd of April. If you saw our previous post on search behaviour, you shouldn’t be surprised; surprising is the absence of a second peak around the BBC leaders’ debate on 16th of April, especially when you see the diagrams from our other post on Google search volumes.

How about the parties? How many people read about them on Wikipedia? Check it out below.

Screenshot from 2015-05-04 22:57:52

Here, there seems to be a second increase in the page views after the BBC debate on 16th April. Moreover, there is an ever widening separation between the curves of Tory-Labour-UKIP and LibDem-Green-SNP curves. This is very interesting, as Tories and Labours are the most established English parties, whereas the UKIP is among the newest ones. That’s very much related to our project on understanding the patterns of online information seeking around election times.

Online presence of the General Election Candidates: Labour wins Twitter while Tories take Wikipedia

Some have called the forthcoming UK general election a Social Media Election. It might be a bit of exaggeration, but there is no doubt that both candidates and voters are very active on social media these days and take them seriously. The Wikipedia-Shapps story of last week is a good example showing how important online presence is for candidates, journalists, and of course voters. We don’t know how important this presence is in terms of shaping the votes, but at least we can look into the data and gauge the presence of the candidates and the activity of the supporters. In this post and some others we present statistics of online activity of parties, candidates, and of course voters. For an example, see the previous post on the searching behaviour of citizens around the debate times.

Who is on Twitter?

Candidates and parties are very much debated by supporters on social media, particularly Facebook and Twitter. But how active are candidates themselves on these platforms? In this post we show simply how many candidates from each party and in which constituencies have a Twitter account. Some of them might be more active than others and some might tweet very rarely, and we will analyse this activity in the next posts. Here we count only who has any kind of publicly known account.


Geographical distribution of candidates who have Twitter account.

The figure above shows the geographical distributions of candidates for each party and whether they have a Twitter account. There are some interesting results in there. For example, Labour has the largest number of Twitter-active candidates, whereas ALL the SNP candidates tweet. While LibDem and Green parties have the same number of accounts, normalised by the overall number of constituencies that they are standing in, Green seems to be more Twitter-enthusiastic. UKIP loses the Twitter game both in absolute number and proportion.

Who is on Wikipedia?

Having a Twitter account is something of a personal decision.  A candidate decides to have one and it’s totally up to them what to tweet. The difference in the case of Wikipedia, is that ideally candidates would not create or edit one about themselves. Also the type of information that you can learn about a candidate on their Wikipedia page is very different to what you can gain by reading their tweets.

Geographical distribution of the candidates, whom Wikipedia has an article about.

Geographical distribution of the candidates, whom Wikipedia has an article about.

The figure above shows the constituencies that the candidates standing in are featured in the largest online encyclopaedia, Wikipedia. Here, Tories are the absolute winners, in terms of the number of articles. Greens are the least “famous” candidates and LibDem are well behind the big two. In the next post we will explore often voters turn to Wikipedia to learn about the parties and candidates, and I’m sure by reading that you’ll be convinced that being featured on Wikipedia is important!


All right, so far, Labour won Twitter presence and Tories took Wikipedia (remember all the SNP’s also have a Twitter account). But how about the gender of the candidates? Is there any gender-related feature in social presence pattern of the candidates?

First let’s have a look at the gender distribution of the candidates.

Geographical distribution of the candidates colour-coded by gender.

Geographical distribution of the candidates colour-coded by gender.

As you see in the figure above, there are fewer female candidates than male ones across all the parties. Only 12% of the UKIP candidates are female while the Greens have the highest proportion at 38%. Tories sit right next to UKIP on the list of the most male oriented parties. There is also a clear pattern that most of the constituencies in the centre have male candidates.

How about social media?

Among all the candidates, 20% of male candidates are featured in Wikipedia, whereas this is about 17% for female candidates. Almost half of the Tories male candidates are in Wikipedia, whereas this goes down to 28% for their female counterparts. Only Labour female candidates have more coverage in Wikipedia compared to the males of the party, but the difference is marginal. ّIn all the other parties, males have a higher coverage rate. The tendency of Wikipedia to pay more attention to male figures is a very well known fact. 

Twitter is different. Slightly more female candidates (76%) have a Twitter account than male candidates (69%). Almost all (96%) of Labour females tweet, and Tory female candidates are more active than their male candidates. This pattern however is lost for the UKIP candidates, as 52% of their males are on Twitter compared to only 44% of their female candidates (who have the lowest rate among all the party-gender groups).


The data that we used to produced the maps and figures come mainly from a very interesting crowd-sourced project called yournextmp. However, we further validated the data using the Wikipedia and Twitter API’s. If you want to have a copy, just get in touch!

What does it mean to win a debate anyway?: Media Coverage of the Leaders’ Debates vs. Google Search Trends

LeadersDebatesGoogleFollowing the April 2nd Leaders’ Debate the media portrayed Nicola Sturgeon, the leader of the Scottish National Party, as the victor of the night, or at least a victor. “Cameron was robotic but Sturgeon impressed” ran a headline for one of The Guardian’s post-debate pieces. The headline for The Independent went even further, stating “’Can I vote for the SNP?’ voters ask after Nicola Sturgeon’s winning performance.”

Based off of Google trends data though, one can see that Sturgeon did not dominate in this arena. She only briefly topped the Google trends for a few days of the week ending on Saturday the 4th; Sturgeon did not top the trends on the day off the debate itself. In fact, it was Natalie Bennett who was the most searched for party leader, not only in absolute value but also in the relative increase from the weeks before.

Google trends is not, of course, equivalent to voting likelihood. A high rate of searches could simply mean that the politician had little name recognition prior to the debate. However it does indicate information seeking behavior. Political predictions have historically been based on polls of user behavior.  Information seeking, as in people searching out more info on a candidate or politician, would have been overly difficult to trace. Of course, the Internet changes that.

Now we can see that while Nicola Sturgeon did have an overall increase in Google searches after the April 2nd debate, along with every other party leader, her bump was far greater in Scotland, as compared to searches in England.



While the candidates themselves inspired interest (or some of them at least), almost no one it seems was prompted to search for information about the parties. The patterns relating to the debates are barely discernable in the search trends. The Tories and the Lib Dems had particularly low rates of searches, and both were the only parties without any visible peak in interest whatsoever during the debates. This is in line with the findings so far of the Social Election Prediction Project: voters are more likely to seek information online about minority parties or new parties than they are to look up information on the political parties in power. This would explain the peak in searches for the minority Green Party on April 2nd.


*Note that the y-axis for Google search trends is not an absolute number of searches, but rather the relative number of searches for that term as compared to other terms searched for simultaneously. The relativity and timeframe is important in discussing these results. The graphs above all have the relative rate of search results averaged over each day. Following the April 2nd debate, Google provided search trend information to The Guardian that does not exactly match up with the above. According to that data, Leanne Wood was the most searched for politician “through the debate.” It’s possible that the difference with the above is that Google was measuring only searches that took place during the exact time of the debate, and was not including anyone prompted to seek new information afterwards. Apparently the most-Googled question during the debate though is a key one: “Who is winning the debate?”


This post has been cross-posted to the Oxford Internet Institute’s  Elections and the Internet blog.

Coverage of European parties in European language Wikipedia editions

By  and .

Reading niche political party Wikipedia pages, as one does when working on the Social Election Prediction project, one might wonder if there are any trends in which languages have articles about political parties of different countries. I did. Most major political parties in Europe have Wikipedia pages in dozens of languages, this makes sense, they are important, globally. But the same is not true for minority parties or party leaders. What does it mean that there are articles about this center-left Hungarian political alliance only in Czech, German, French, Flemish and Polish, in addition to Hungarian and English? Does the page of this ChristianUnion Dutch politician have coverage in Indonesian (in addition to German and English) because of the Netherlands’ long history with Indonesia?

We downloaded the data to find out.

We downloaded the data for European countries with a singular national language (or overly dominant singular language), so there would be something of a one-to-one relationship between language and country. We then grouped the countries in communities based on the number of links between the political party Wikipedia pages to minimize the inter-category and maximize the intra-category links.

Would countries cluster by historic ties? By geographical proximity? By political sympathies? Or would they just cluster completely randomly?

The first two observations that came from the graph were:

             1. Political Wikipedia is influenced by geography

Just look at the clusters grouped together by color – these are “communities” of languages, or countries that are closely interlinked.

Clusters of European country-languages based on the coverage of their political parties in Wikipedia editions of other languages.

Clusters of European country-languages based on the coverage of their political parties in Wikipedia editions of other languages.

            2. Everyone is reading about Greece

All of the news about Syriza, the 2015 Greek election and the possibility of a Grexit has apparently made fellow Europeans very interested in reading about Greek political parties. Greek political party pages have one of the highest rates of coverage among European parties.

The position of Greek parties is very special with a high rate of coverage in most of the other European languages.

The position of Greek parties is very special with a high rate of coverage in most of the other European languages.

            Stay tuned. More observations from this dataset to come.


This post has been cross-posted to the Oxford Internet Institute’s  Elections and the Internet blog.

Wikipedia and Shapps: Sockpuppetry, Conflict of Interest, or None?

Taha Yasseri

Will the real Grant Shapps please stand up? ViciousCritic/Totally Socks, (CC BY-NC-SA)

You must have heard about the recent accusation of Grant Shapps by the Guardian. Basically, the Guardian claims that Shapps has been editing his own Wikipedia page and “Wikipedia has blocked a user account on suspicions that it is being used by the Conservative party chairman, Grant Shapps, or someone acting on his behalf”.

In a short piece that I wrote for The Conversation I try to explain how these things work in Wikipedia, what they mean,  and basically how unreliable these accusations are.

There are two issues here:

First, conflict of interest, for which Wikipedia guidelines suggest that “You should not create or edit articles about yourself, your family, or friends.” But basically it’s more a moral advice, because it’s technically impossible to know the real identity of editors. Unless the editors disclose their personal information deliberately.

The second point is that the account under discussion is banned by a Wikipedia admin not because of conflict of interest (which is anyway not a reason to ban a user), but Sockpuppetry: “The use of multiple Wikipedia user accounts for an improper purpose is called sock puppetry”. BUT, Sockpuppetry is not generally a good cause for banning a user either. It’s prohibited, only when used to mislead the editorial community or violate any other regulation.

Sock puppets are detected by certain type of editors who have very limited access to confidential data of users such as their IP-addresses, their computed and operating systems settings and their browser. This type of editor is called a CheckUser, and I used to serve as a CheckUser on Wikipedia for several years.

In this case the accounts that are “detected” as sock puppets have not been active simultaneously — there is a gap of about 3 years between their active periods. And this not only makes it very hard to claim that any rule or regulation is violated, but also, for this very long time gap, it is technically impossible for the CheckUser to observe any relation between the accounts under discussion.

Actually, the admin who has done the banning admits that his action has been mostly because of behavioural similarity (similarity between the edits performed by the two users and their shared political interests).

Altogether, I believe the banning has no reliable grounds and it’s based on pure speculation, and also the Guardian accusations are way beyond what you can logically infer from the facts and evidence.


This post has been cross-posted to the Oxford Internet Institute’s  Elections and the Internet blog.