Recent electoral events (the Brexit referendum; Trump’s victory) have testified a phenomenon that is increasingly relevant in recent elections: the presence of an information gap in the first hours after the end of the scrutiny. As first results start to flow in from specific local areas, uncertainty emerges. This is especially true when results come from areas with a particularly specific political tradition, so that the partial scrutiny is not representative of what might be the final result. This often produces a deep uncertainty, with relevant effects not only on public opinion and politicians’ reactions, but also (an perhaps more relevantly) in terms of turbulence on financial markets. Only as results of the official scrutiny start to stabilize (usually, many hours later) uncertainty decreases.
With the Italian constitutional referendum approaching (and with some turbulence on financial markets already emerging in reaction to uncertainty over the referendum results) we wondered, at CISE, about making an effort at reducing this information gap in the few hours between the inflow of the first results and the stabilization of the final count.
As a result, we decided to set up a “nowcasting” experiment, presenting forecasts about the final results – based on the inflow of local results – whose results will be immediately available (and updated) in real time on the CISE website during the referendum night (likely 30-45 minutes after polls close, although the ease of the counting process for a referendum might allow an earlier inflow of first results). It is an experimental procedure that we did not test on actual election results, and whose main aim is to collect data for future applications, and to show problems and challenges.
Our experiment is based on simple assumptions, and the structure of the algorithm is relatively simple (although its development and implementation presented several challenges).
The fundamental intuition behind our approach is that poll data – collected before the election – allow to formulate predictions even at the local level. This is not done using the actual poll results (which are only relatively reliable – relatively, given all the biases that affect polls – at the national level), but rather by resorting to a more complex piece of information: the flow matrix relating vote choice in the last elections (both the 2013 general election and the 2014 European election) to vote intention in the referendum. In other words, we use the last CISE poll (conducted few weeks before the election) to estimate the referendum choices of voters that had voted for different parties in past elections. By applying this flow matrix to previous electoral results at the local level, we are able to formulate a referendum vote expectation at the local level. This of course rests on two assumptions: 1) that past vote choice is a good predictor of referendum choice; 2) that the estimation of the flow matrix is not affected by dramatic biases. The first assumption appears reasonable for this particular referendum, which was heavily politicized on the figure of Matteo Renzi; the second appears applicable, as bivariate relationships (such as a flow matrix) are less affected by sample bias than the sheer referendum results.
Once all these local level pre-referendum predictions are made (which obviously predict at the national level a 54% victory of the NO, mirroring the results of the poll conducted few weeks before the election), the procedure is ready for accepting the first results that flow in from local areas. As first local areas report results, each result is compared with the pre-electoral prediction, thus producing a correction vector that identifies how the poll-based local prediction was off, compared to the actual results. Correction vectors from local areas are then averaged together, and the final averaged correction is applied to the pre-electoral predictions for all local areas where actual results are not yet available. As a result, a national estimate is obtained.
In a way, the above procedure is simply a systematized application of the elementary reasoning that both politicians and observers perform when looking at first results, which might be exemplified by a typical sentence: “if they lost in one of their strongest constituencies, this doesn’t sound good for the national result”.
As obvious, such procedure is inevitably vulnerable to the specificity of local phenomena. Especially when the number of local areas is relatively small, there is the strong risk of generalizing to the whole country an error of prediction which might be the result of specific local dynamics. If, for example, the result in one of the first cities was produced by a very aggressive and successful referendum campaign, this might be incorrectly projected at the national level. This is precisely one of the goals of this experiment: to assess the extent to which such local biases can be dealt with, or rather whether such biases are so strong that the prediction model is even worse than the actual referendum scrutiny. Also, this experiment will allow the collection of a large amount of data about the geographical pattern in the inflow of actual results. All this information will allow us to improve the model towards future elections.
The live real-time estimations produced by the forecasting model will be available on the CISE website on Sunday, Nov 4, shortly after polls close (11 PM Italian time).