A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.





Local authentication and authorization system for immediate setup of cloud environments

Published in International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2012

Discusses the difficulties in setting up complex cloud solutions, proposing a concept to simplify the process through the mapping of cloud resources to the permission management of a Unix-like operating system and a separation of cloud middleware and operating system interactions.

Visually programming dataflows for distributed data analytics

Published in IEEE International Conference on Big Data (Big Data), 2016

The paper discusses the use of visual programming in the development of parallel dataflow programs for distributed dataflow systems such as Flink. A prototypical visual programming environment called Flision was built and evaluated through qualitative user testing, indicating that visual programming can be a valuable tool for users of scalable data analysis tools.

Fact: a framework for analysis and capture of twitter graphs

Published in 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019

In recent years, online social networks have become crucial for news and political debates, but the spread of fake news and harmful misinformation is a growing concern. Understanding social networks is essential to comprehend the phenomenon of fake news, particularly the connectivity between participants, which reveals communication patterns that impact the spread of ideas. Twitter, due to its public nature, offers a research opportunity without privacy concerns, but gathering sufficient data poses a challenge. This paper presents a scalable framework for collecting follower networks, posts, and profiles to enable high-performance social network analysis.

Graph-Based Feature Selection Filter Utilizing Maximal Cliques

Published in Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019

The paper proposed a novel graph-based feature selection filter to address the challenge of reducing feature vectors for binary decision problems. The approach considers both feature importance and correlation, and evaluation shows that it outperforms existing baseline feature selection approaches in approximately 69% of cases, delivering the highest accuracy while reducing the number of features.

A Scalable System for Bundling Online Social Network Mining Research

Published in 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), 2019

Online social networks like Facebook and Twitter have become integral to people’s daily lives, serving as platforms for interaction, information acquisition, and knowledge gain; however, accessing and extracting data from these networks is often constrained by complex APIs and data scraping limitations, leading to restricted research access and shared quotas, necessitating the development of a proxy server presented in this paper that enables cooperative management of researchers’ data contingents, facilitating seamless integration of multiple API accessing programs within the same research group without incurring performance penalties or implementation overhead.

A System for High Performance Mining on GDELT Data

Published in IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems, 2020

The paper an introduces an efficient system for in-memory analysis of data from the GDELT database, utilizing modern parallel high-performance computing hardware. Our experiments with the GDELT 2.0 database, containing over a billion news items, revealed significant trends in online news.

Resource efficient algorithms for message sampling in online social networks

Published in 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), 2020

This paper proposes an optimized data structure and user selection strategy for efficiently sampling contentful messages on social networks, using features of the Twitter API. The combination of the data structure and the algorithm results in a 92% sampling efficiency over long timeframes.

FakeNews Corona Virus and 5G Conspiracy Task at MediaEval 2020

Published in MediaEval, 2020

This paper introduces the FakeNews Corona Virus and 5G Conspiracy task at MediaEval 2020, which aims to classify tweet texts and retweet cascades for detecting fast-spreading misinformation using natural language processing and graph analysis.

A Framework for Interaction-based Propagation Analysis in Online Social Networks

Published in Complex Networks and their applications, 2021

The paper presents a method to calculate an acquaintance score between Twitter users for analyzing information propagation. The proposed method considers response time and enables time-based spreading comparisons. This approach addresses the need for obtaining weighted edges from communication on social networks and enables the detection of unusual communication patterns.

Don’t Trust Your Eyes: Image Manipulation in the Age of DeepFakes

Published in Frontiers in Communication, 2021

We review the phenomenon of deepfakes, a novel technology enabling inexpensive manipulation of video material through the use of artificial intelligence, in the context of today’s wider discussion on fake news. We discuss the foundation as well as recent developments of the technology, as well as the differences from earlier manipulation techniques and investigate technical countermeasures. While the threat of deepfake videos with substantial political impact has been widely discussed in recent years, so far, the political impact of the technology has been limited. We investigate reasons for this and extrapolate the types of deepfake videos we are likely to see in the future.

iPUG: Accelerating Breadth-First Graph Traversals Using Manycore Graphcore IPUs

Published in ISC High Performance 2021, 2021

The Graphcore Intelligence Processing Unit (IPU) is a newly developed processor type whose architecture does not rely on the traditional caching hierarchies. Developed to meet the need for more and more data-centric applications, such as machine learning, IPUs combine a dedicated portion of SRAM with each of its numerous cores, resulting in high memory bandwidth at the price of capacity. The proximity of processor cores and memory makes the IPU a promising field of experimentation for graph algorithms since it is the unpredictable, irregular memory accesses that lead to performance losses in traditional processors with pre-caching.

WICO Graph: A Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets.

Published in International Conference on Agents and Artificial Intelligence (ICAART) 2021, 2021

In the wake of the COVID-19 pandemic, a surge of misinformation has flooded social media and other internet channels, and some of it has the potential to cause real-world harm. To counteract this misinformation, reliably identifying it is a principal problem to be solved. However, the identification of misinformation poses a formidable challenge for language processing systems since the texts containing misinformation are short, work with insinuation rather than explicitly stating a false claim, or resemble other postings that deal with the same topic ironically. Accordingly, for the development of better detection systems, it is not only essential to use hand-labeled ground truth data and extend the analysis with methods beyond Natural Language Processing to consider the characteristics of the participant’s relationships and the diffusion of misinformation. This paper presents a novel dataset that deals with a specific piece of misinformation: the idea that the 5G wireless network is causally connected to the COVID-19 pandemic. We have extracted the subgraphs of 3,000 manually classified Tweets from Twitter’s follower network and distinguished them into three categories. First, subgraphs of Tweets that propagate the specific 5G misinformation, those that spread other conspiracy theories, and Tweets that do neither. We created the WICO (Wireless Networks and Coronavirus Conspiracy) dataset to support experts in machine learning experts, graph processing, and related fields in studying the spread of misinformation. Furthermore, we provide a series of baseline experiments using both Graph Neural Networks and other established classifiers that use simple graph metrics as features.

WICO Text: A Labeled Dataset of Conspiracy Theory and 5G-Corona Misinformation Tweets

Published in Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks, 2021

The COVID-19 pandemic has been accompanied by a flood of mis-information on social media, which has been labeled an “infodemic”.While a large part of such fake news is ultimately inconsequential,some of it has the potential to real-world harm, but due to themassive amount of social media contents, it is impossible to findthis misinformation manually. Thus, conventional fact-checkingcan typically only counteract misinformation narratives after theyhave gained significant traction. Only automated systems can pro-vide warnings in advance. However, the automatic detection ofmisinformation narratives is very challenging since the texts thatspread misinformation may be short messages on Twitter. Theymay also transmit misinformation by implication rather than bystating counterfactual information outright, and satirical messagescomplicate the issue further. Thus, there is a need for highly sophis-ticated detection systems. In order to support their development,we created substantial ground truth data by human annotation. Inthis paper, we present a dataset that deals with a specific piece ofmisinformation: the idea that the COVID-19 pandemic is causallyconnected to the 5G wireless network. We selected more than 10,000tweets that deal with COVID-19 and 5G and labeled them manually,distinguishing between tweets that propagate the specific 5G misin-formation, those that spread other conspiracy theories, and tweetsthat do neither. We provide the human-annotated dataset alongwith an additional large-scale automatically (by using the human-annotated dataset as the training set) labelled dataset consist ofmore than 100,000 tweets

FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task at MediaEval 2021

Published in MediaEval 2021 Workshop, 2021

The FakeNews: Corona Virus and Conspiracies Multimedia Analysis task, running for the second time as part of MediaEval 2021, focuses on the classification of tweet texts aiming detection of fastspreading misinformation. Task of this year extends the number of target conspiracy theories and introduces new challenges in terms of analysis complexity of the imbalanced dataset. This paper describes the task, including use case and motivation, challenges, the dataset with ground truth, the required participant runs, and the evaluation metrics.

Explaining news spreading phenomena in social networks

Published in Technische Universitaet Berlin (Germany), 2022

When a high-ranking British politician was falsely accused of child abuse by the BBC in November 2012, a wave of short messages followed on the online social network Twitter leading to considerable damage to his reputation. However, not only did the politician’s image suffer considerable damage, moreover, he was also able to sue the BBC for £185,000 in damages. On the relatively new media of the internet and specifically in online social networks, digital wildfires, i.e., fast spreading, counterfactual or even intentionally misleading information occur on a regular basis and lead to severe repercussions. Although the example of the British politician is a simple digital wildfire that only damaged the reputation of a single person, there are more complex digital wildfires whose consequences are more far-reaching. This thesis deals with the capture, automatic processing, and investigation of a complex digital wildfire, namely, the Corona and 5G misinformtionsevent - the idea that the COVID-19 outbreak is somehow connected to the introduction of the 5G wireless technology. In this context, we present a system whose application allows us to acquire large amounts of data from the online social network Twitter and thus create the database from which we extract the digital wildfire in its entirety. Furthermore, we present a framework that provides the playing field for investigating the spread of digital wildfires. The main findings that emerge from the study of the 5G and corona misinformation event can be summarised as follows. Although published work suggests that a purely structure-based analysis of the information spread allows for early detection, there is no way of predictively labelling spreading information as probably leading to a digital wildfire. Digital wildfires do not emerge out of nowhere but find their origin in a multitude of already existing ideas and narratives that are reinterpreted and recomposed in the light of a new situation. It does not matter if ideas and explanations contradict each other. On the contrary, it seems that it is the existence of contradictory explanations that unites supporters from different camps to support a new idea. Finally, it has been shown that the spread of a digital wildfire is not the result of an information cascade in the sense of single, particularly influential short messages within a single medium. Rather, a multitude of small cascades with partly contradictory statements are responsible for the rapid spread. The dissemination media are diverse, and even more so, it is precisely the mix of different media that makes a digital wildfire possible.

The connectivity network underlying the German’s Twittersphere: a testbed for investigating information spreading phenomena

Published in Multimedia Benchmark Workshop 2022, 2022

Online social networks are ubiquitous, have billions of users, and produce large amounts of data. While platforms like Reddit are based on a forum-like organization where users gather around topics, Facebook and Twitter implement a concept in which individuals represent the primary entity of interest. This makes them natural testbeds for exploring individual behavior in large social networks. Underlying these individual-based platforms is a network whose “friend” or “follower” edges are of binary nature only and therefore do not necessarily reflect the level of acquaintance between pairs of users. In this paper,we present the network of acquaintance “strengths” underlying the German Twittersphere. To that end, we make use of the full non-verbal information contained in tweet–retweet actions to uncover the graph of social acquaintances among users, beyond pure binary edges. The social connectivity between pairs of users is weighted by keeping track of the frequency of shared content and the time elapsed between publication and sharing. Moreover, we also present a preliminary topological analysis of the German Twitter network. Finally, making the data describing the weighted German Twitter network of acquaintances, we discuss how to apply this framework as a ground basis for investigating spreading phenomena of particular contents.

Implementing Spatio-Temporal Graph Convolutional Networks on Graphcore IPUs

Published in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022

Artificial neural networks have been used for a multitude of regression tasks, and their descendants have expanded the domain to many applications such as image and speech recognition, filtering of social networks, and machine translation. While conventional and recurrent neural networks work well on data represented in Euclidean space, they struggle with data in non-Euclidean space. Graph Neural Networks (GNN) expand recurrent neural networks to directly process sparse representations of graphs, but they are computationally expensive, which invites the use of powerful hardware accelerators. In this paper, we investigate the viability of the Graphcore Intelligence Processing Unit (IPU) for efficient implementation of Spatio-Temporal Graph Convolutional Networks. The results show that IPUs are well suited for this task.

A Streaming System for Large-scale Temporal Graph Mining of Reddit Data

Published in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2022

The study of online social networks has becomea major topic of research within the last decade, and manyaspects of the networks and the behavior of their users have beeninvestigated. The majority of research efforts has been directedat Twitter, which grants limited data accesses to researchers andprovides detailed information.However, recently, important social and economic phenomenasuch as WallStreetBets or Antiwork have originated on Reddit,which has thus become an important field of investigation in itsown right, and, due to its open nature, all Reddit data is availableto study. As a consequence, in contrast to Twitter, where it isdifficult to obtain large amounts of data, the main challenge ofresearching Reddit is to handle the vast amounts of data thatare freely available.Here, we present the Reddit Dataset Stream Pipeline (RDSP), asimple and efficient parallel system based on Akka Streams that iscapable of processing the entire Reddit dataset. We demonstratehow to build massive temporal graphs between subreddits froma parallel streamed dataset. We investigate the generated graphsand present experimental results. Moreover, we publish both thedatasets as well as the codebase in order to invite researchersfrom different fields to contribute and profit from this work.

Efficient Minimum Weight Vertex Cover Heuristics Using Graph Neural Networks

Published in 20th International Symposium on Experimental Algorithms (SEA 2022), 2022

Minimum weighted vertex cover is the NP-hard graph problem of choosing a subset of vertices incident to all edges such that the sum of the weights of the chosen vertices is minimum. Previous efforts for solving this in practice have typically been based on search-based iterative heuristics or exact algorithms that rely on reduction rules and branching techniques. Although exact methods have shown success in solving instances with up to millions of vertices efficiently, they are limited in practice due to the NP-hardness of the problem. We present a new hybrid method that combines elements from exact methods, iterative search, and graph neural networks (GNNs). More specifically, we first compute a greedy solution using reduction rules whenever possible. If no such rule applies, we consult a GNN model that selects a vertex that is likely to be in or out of the solution, potentially opening up for further reductions. Finally, we use an improved local search strategy to enhance the solution further. Extensive experiments on graphs of up to a billion edges show that the proposed GNN-based approach finds better solutions than existing heuristics. Compared to exact solvers, the method produced solutions that are, on average, 0.04% away from the optimum while taking less time than all state-of-the-art alternatives.

Combining Tweets and Connections Graph for FakeNews Detection at MediaEval 2022

Published in Multimedia Benchmark Workshop 2022, 2022

The FakeNews Detection task at MediaEval 2022, running for the third time as part of the challenge, focuses on the detection of misinformation tweets and their spreaders. Like in the 2021 task, conspiracy theories related to COVID-19 in nine different categories have to be detected, along with the authors stance towards them. For the 2022 challenge, the size of the dataset has approximately doubled. Furthermore, we also provide a large interaction graph along with vertex features derived from the same Twitter dataset in which misinformation spreaders should be classified. As a final subtask, participants are asked to combine text and graph information to refine their classifications. This paper describes the tasks, including use case and motivation, challenges, the dataset with ground truth, the required participant runs, and the evaluation metrics.

Graph Neural Network for Fake News Detection and Classification of Unlabelled Nodes at MediaEval 2022

Published in Multimedia Benchmark Workshop 2022, 2022

In this paper we describe our approach to fake news detection for the MediaEval 2022 challenge that has run for the third time. As in the previous editions, the goal of the challenge is the detection of misinformation tweets, but in this edition, both text and graph data are provided. We focus on the classification of unlabelled nodes/users in the graph by utilizing graph neural networks to classify them as either fake news spreader or just an ordinary node i.e. non fake news spreader. Apart from those labels, the classification apply for unlabelled nodes in conspiracy theories related to COVID-19 in nine different categories. Furthermore, graph based node classification detection for whole categories will be done since this will lead to more comprehensive classification analysis rather than just to label them either as a spreader or non spreader of fake news.

Understanding the Evolution of Reddit in Temporal Networks induced by User Activity

Published in Complex Networks and their Applications 2022, 2022

Online social networks are ubiquitous and have become an essential part of our daily lives. They not only mirror society but act as petri dishes for discourse, sometimes with fatal consequences. Understanding the dynamics underlying such platforms promises new models for predicting, e.g., the consequences of misinformation. Compared to individual-based platforms like Twitter or Facebook, Reddit is inherently topic-based. The Reddit universe consists of subreddits that represent sets of individuals gathering around certain themes. This paper aims to understand the dynamics of evolving fields of common interest by analyzing temporal networks induced by Reddit user activity using community detection.

COVID-19 and 5G conspiracy theories: long term observation of a digital wildfire

Published in International Journal of Data Science and Analytics, 2023

The COVID-19 pandemic has severely affected the lives of people worldwide, and consequently, it has dominated world news since March 2020. Thus, it is no surprise that it has also been the topic of a massive amount of misinformation, which was most likely amplified by the fact that many details about the virus were not known at the start of the pandemic. While a large amount of this misinformation was harmless, some narratives spread quickly and had a dramatic real-world effect. Such events are called digital wildfires. In this paper we study a specific digital wildfire: the idea that the COVID-19 outbreak is somehow connected to the introduction of 5G wireless technology, which caused real-world harm in April 2020 and beyond. By analyzing early social media contents we investigate the origin of this digital wildfire and the developments that lead to its wide spread. We show how the initial idea was derived from existing opposition to wireless networks, how videos rather than tweets played a crucial role in its propagation, and how commercial interests can partially explain the wide distribution of this particular piece of misinformation. We then illustrate how the initial events in the UK were echoed several months later in different countries around the world.

COCO: an annotated Twitter dataset of COVID‑19 conspiracy theories

Published in 2023 Journal of Computational Social Science, 2023

The COVID-19 pandemic has been accompanied by a surge of misinformation on social media which covered a wide range of different topics and contained many competing narratives, including conspiracy theories. To study such conspiracy theories, we created a dataset of 3495 tweets with manual labeling of the stance of each tweet w.r.t. 12 different conspiracy topics. The dataset thus contains almost 42,000 labels, each of which determined by majority among three expert annotators. The dataset was selected from COVID-19 related Twitter data spanning from January 2020 to June 2021 using a list of 54 keywords. The dataset can be used to train machine learning based classifiers for both stance and topic detection, either individually or simultaneously. BERT was used successfully for the combined task. The dataset can also be used to further study the prevalence of different conspiracy narratives. To this end we qualitatively analyze the tweets, discussing the structure of conspiracy narratives that are frequently found in the dataset. Furthermore, we illustrate the interconnection between the conspiracy categories as well as the keywords.