A System for High Performance Mining on GDELT Data

Published in IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems, 2020

We design a system for efficient in-memory analysis of data from the GDELT database of news events. The specialization of the system allows us to avoid the inefficiencies of existing alternatives, and make full use of modern parallel high-performance computing hardware. We then present a series of experiments showcasing the system’s ability to analyze correlations in the entire GDELT 2.0 database containing more than a billion news items. The results reveal large scale trends in the world of today’s online news.

Download paper here


  author       = {Konstantin Pogorelov and
                  Daniel Thilo Schroeder and
                  Petra Filkukova and
                  Johannes Langguth},
  title        = {A System for High Performance Mining on GDELT Data},
  booktitle    = {2020 IEEE International Parallel and Distributed Processing Symposium
                  Workshops, IPDPSW 2020, New Orleans, LA, USA, May 18-22, 2020},
  pages        = {1101--1111},
  publisher    = {IEEE},
  year         = {2020},
  url          = {https://doi.org/10.1109/IPDPSW50202.2020.00182},
  doi          = {10.1109/IPDPSW50202.2020.00182},
  timestamp    = {Thu, 14 Oct 2021 10:37:33 +0200},
  biburl       = {https://dblp.org/rec/conf/ipps/PogorelovSFL20.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}