What if the Forecaster Knew: Assessing Forecast Reliability Via Simulation

M. Inacio, R. Izbicki, D. Lopes, M. A. Diniz, L. E. Salasar, J. C. P. Ferreira


The FIFA Men’s World Cup (FWC) is the most important football (soccer) competition, attracting worldwide attention. A popular practice among football fans in Brazil is to organize contests in which each participant informs guesses on the final score of each match. The participants are then ranked according to some scoring rule. Inspired by these contests, we created a website to hold an online contest, in which participants were asked for their probabilities on the outcomes of upcoming matches of the FWC. After each round of the tournament, the ranking of all participants based on a proper scoring rule was published. In this article we estimate, by means of simulations, the ability of the best forecasters of our contest, considering that their good performances could be due to randomness. We also study the performance of some methods to aggregate individual forecasts, in order to study if some sort of wisdom of crowds (WOC) phenomenon was verified in the contest.


Forecast; Risk; Soccer Contest

Full Text:



Eli Ben-Naim, Federico Vazquez, and Sidney Redner. Parity and redictability of competitions. Journal of Quantitative Analysis in Sports, 2(4):Article 1, 2006.

Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1):1–3, 1950.

David V Budescu and Eva Chen. Identifying expertise to extract the wisdom of crowds. Management Science, 61(2):267–280, 2014.

Paulo Cezar Pinto Carvalho, Moacyr Alvim Silva, and Arthur da Silva

Pereira Carneiro. Previsões para os jogos da copa do mundo de 2018.


Copa_2018.pdf, 2018. Accessed: 2019-11-10.

Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

Clintin P Davis-Stober, David V Budescu, Jason Dana, and Stephen B Broomell. When is a crowd wise? Decision, 1(2):79, 2014.

Alexander Philip Dawid and Monica Musio. Theory and applications of proper scoring rules. Metron, 72(2):169–183, 2014.

Alexander Philip Dawid and Allan M Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1):20–28, 1979.

Marcio Alves Diniz, Rafael Izbicki, Danilo Lopes, and Luis Ernesto Salasar. Comparing probabilistic predictive models applied to football. Journal of the Operational Research Society, 70(5):770–782, 2019.

Luı́s Gustavo Esteves, Rafael Izbicki, and Rafael Bassi Stern. Teaching decision theory proof strategies using a crowdsourcing problem. The American Statistician, 71(4):336–343, 2017.

PE Freeman, R Izbicki, AB Lee, JA Newman, CJ Conselice, AM Koekemoer, JM Lotz, and M Mozena. New image statistics for detecting disturbed galaxy morphologies at high redshift. Monthly Notices of the Royal Astronomical Society, 434(1):282–295, 2013.

Benoı̂t Frénay and Michel Verleysen. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems, 25(5): 845–869, 2013.

Christian Genest and James V Zidek. Combining probability distributions: A

critique and an annotated bibliography. Statistical Science, 1(1):114–135, 1986.

Andreas Groll, Christophe Ley, Gunther Schauberger, and Hans Van Eetvelde. Prediction of the fifa world cup 2018-a random forest approach with an emphasis on estimated team ability parameters. arXiv preprint arXiv:1806.03208, 2018.

Rafael Izbicki and Rafael Bassi Stern. Learning with many experts: model selection and sparsity. Statistical Analysis and Data Mining: The ASA Data Science Journal, 6(6):565–577, 2013.

Alan J Lee. Modeling scores in the premier league: is manchester united really the best? Chance, 10(1):15–19, 1997.

Chris J Lintott, Kevin Schawinski, Anže Slosar, Kate Land, Steven Bamford, Daniel Thomas, M Jordan Raddick, Robert C Nichol, Alex Szalay, Dan Andreescu, et al. Galaxy zoo: morphologies derived from visual inspection of galaxies from the sloan digital sky survey. Monthly Notices of the Royal Astronomical Society, 389(3):1179–1189, 2008.

Reason L Machete. Contrasting probabilistic scoring rules. Journal of Statistical Planning and Inference, 143(10):1781–1790, 2013.

Michael J Maher. Modelling association football scores. Statistica Neerlandica, 36 (3):109–118, 1982.

Spyros Makridakis and Robert L Winkler. Averages of forecasts: Some empirical results. Management Science, 29(9):987–996, 1983.

Erik Merkle, Mark Steyvers, Barbara Mellers, and Philip Tetlock. Item response models of probability judgments: Application to a geopolitical forecasting tournament. Decision, 3(1):1–19, 2016.

Henrik Olsson and Jane Loveday. A comparison of small crowd selection methods. In CogSci, 2015.

Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. Learning from crowds. Journal of Machine Learning Research, 11(Apr):1297–1322, 2010.

Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y Ng. Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing, pages 254–263. Association for Computational Linguistics, 2008.

Erik Štrumbelj. On determining probability forecasts from betting odds. International journal of forecasting, 30(4):934–943, 2014.

James Surowiecki. The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations. Doubleday & Co, New York, NY, US, 2004.

Vladimir Vovk and Fedor Zhdanov. Prediction with expert advice for the Brier game. Journal of Machine Learning Research, 10(Nov):2445–2471, 2009.

Peter Welinder and Pietro Perona. Online crowdsourcing: rating annotators and obtaining cost-effective labels. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pages 25–32. IEEE, 2010.

Yan Yan, Rómer Rosales, Glenn Fung, and Jennifer Dy. Modeling multiple

annotator expertise in the semi-supervised learning scenario. arXiv preprint

arXiv:1203.3529, 2012.

DOI: https://doi.org/10.5540/tcam.2022.023.01.00175

Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.

Trends in Computational and Applied Mathematics

A publication of the Brazilian Society of Applied and Computational Mathematics (SBMAC)


Indexed in:



Desenvolvido por:

Logomarca da Lepidus Tecnologia