Visually programming dataflows for distributed data analytics
Published in IEEE International Conference on Big Data (Big Data), 2016
Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters of computers. These frameworks provide automatic program parallelization and manage distributed workers, including worker failures. Moreover, they provide high-level programming abstractions and execute programs efficiently. Yet, the programming abstractions remain textual while the dataflow model is essentially a graph of transformations. Thus, there is a mismatch between the presented abstraction and the underlying model here. One can also argue that developing dataflow programs with these textual abstractions requires needless amounts of coding and coding skills. A dedicated programming environment could instead allow constructing dataflow programs more interactively and visually. In this paper, we therefore investigate how visual programming can make the development of parallel dataflow programs more accessible. In particular, we built a prototypical visual programming environment for Flink, which we call Flision. Flision provides a graphical user interface for creating dataflow programs, a code generation engine that generates code for Flink, and seamless deployment to a connected cluster. Users of this environment can effectively create jobs by dragging, dropping, and visually connecting operator components. To evaluate the applicability of this approach, we interviewed ten potential users. Our impressions from this qualitative user testing strengthened our believe that visual programming can be a valuable tool for users of scalable data analysis tools.
Bibtex
@inproceedings{ThamsenRBPSB16,
author = {Lauritz Thamsen and
Thomas Renner and
Marvin Byfeld and
Markus Paeschke and
Daniel Schroder and
Felix Bohm},
editor = {James Joshi and
George Karypis and
Ling Liu and
Xiaohua Hu and
Ronay Ak and
Yinglong Xia and
Weijia Xu and
Aki{-}Hiro Sato and
Sudarsan Rachuri and
Lyle H. Ungar and
Philip S. Yu and
Rama Govindaraju and
Toyotaro Suzumura},
title = {Visually programming dataflows for distributed data analytics},
booktitle = {2016 IEEE International Conference on Big Data IEEE BigData 2016,
Washington DC, USA, December 5-8, 2016},
pages = {2276--2285},
publisher = {IEEE Computer Society},
year = {2016},
url = {https://doi.org/10.1109/BigData.2016.7840860},
doi = {10.1109/BigData.2016.7840860},
timestamp = {Fri, 19 Nov 2021 16:08:20 +0100},
biburl = {https://dblp.org/rec/conf/bigdataconf/ThamsenRBPSB16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}