Publications

2024

Thumbnail of Athena: Smart Order Routing on Centralized Crypto Exchanges using a Unified Order Book
Full Paper

Athena: Smart Order Routing on Centralized Crypto Exchanges using a Unified Order Book

Robert Henker, Daniel Atzberger, Jan Ole Vollmer, Willy Scheibel, Jürgen Döllner, and Markus Bick
Wiley International Journal of Network Management
BibTeX , Abstract , doi:10.1002/nem.2266

Most cryptocurrency spot trading occurs on centralized crypto exchanges, where offers for buying and selling are organized via an order book. In liquid markets, the price achieved for buying and selling deviates only slightly from the assumed reference price, i.e., trading is associated with low implicit costs. However, compared to traditional finance crypto markets are still illiquid and consequently the reduction of implicit costs is crucial for any trading strategy and of high interest, especially for institutional investors. This paper describes the design and implementation of Athena, a system that automatically splits orders across multiple exchanges to minimize implicit costs. For this purpose, order books are collected from several centralized crypto exchanges and merged into an internal unified order book. In addition to price and quantity, the entries in the unified order book are enriched with information about the exchange. This enables a smart order routing algorithm to split an order into several slices and execute these on several exchanges to reduce implicit costs and achieve a better price.

Publisher Record, Author Version, Data

Thumbnail of Quantifying Topic Model Influence on Text Layouts based on Dimensionality Reductions
Full Paper

Quantifying Topic Model Influence on Text Layouts based on Dimensionality Reductions

Daniel Atzberger, Tim Chech, Willy Scheibel, Jürgen Döllner, and Tobias Schreck
Best Paper Award
15th International Conference on Information Visualization Theory and Applications (IVAPP '24)
BibTeX , Abstract , doi:10.5220/0012391100003660

Text spatializations for text corpora often rely on two-dimensional scatter plots generated from topic models and dimensionality reductions. Topic models are unsupervised learning algorithms that identify clusters, so-called topics, within a corpus, representing the underlying concepts. Furthermore, topic models transform documents into vectors, capturing their association with topics. A subsequent dimensionality reduction creates a two-dimensional scatter plot, illustrating semantic similarity between the documents. A recent study by Atzberger et al. has shown that topic models are beneficial for generating two-dimensional layouts. However, in their study, the hyperparameters of the topic models are fixed, and thus the study does not analyze the impact of the topic models’ quality on the resulting layout. Following the methodology of Atzberger et al., we present a comprehensive benchmark comprising (1) text corpora, (2) layout algorithms based on topic models and dimensionality reductions, (3) quality metrics for assessing topic models, and (4) metrics for evaluating two-dimensional layouts’ accuracy and cluster separation. Our study involves an exhaustive evaluation of numerous parameter configurations, yielding a dataset that quantifies the quality of each dataset-layout algorithm combination. Through a rigorous analysis of this dataset, we derive practical guidelines for effectively employing topic models in text spatializations. As a main result, we conclude that the quality of a topic model measured by coherence is positively correlated to the layout quality in the case of Latent Semantic Indexing and Non-Negative Matrix Factorization.

Publisher Record, Author Version, Slides, Experiments and Results

Thumbnail of Bringing Objects to Life: Supporting Program Comprehension through Animated 2.5D Object Maps from Program Traces
Short Paper

Bringing Objects to Life: Supporting Program Comprehension through Animated 2.5D Object Maps from Program Traces

Christoph Thiede, Willy Scheibel, and Jürgen Döllner
15th International Conference on Information Visualization Theory and Applications (IVAPP '24)
BibTeX , Abstract , doi:10.5220/0012393900003660

Program comprehension is a key activity in software development. Several visualization approaches such as software maps have been proposed to support programmers in exploring the architecture of software systems. However, for the exploration of program behavior, programmers still rely on traditional code browsing and debugging tools to build a mental model of a system’s behavior. We propose a novel approach to visualizing program behavior through animated 2.5D object maps that depict particular objects and their interactions from a program trace. We describe our implementation and evaluate it for different program traces through an experience report and performance measurements. Our results indicate that our approach can benefit program comprehension tasks, but further research is needed to improve scalability and usability.

Publisher Record, Author Version, Slides, Poster, Prototype, Github Project

Thumbnail of Integrated Visual Software Analytics on the GitHub Platform
Full Paper

Integrated Visual Software Analytics on the GitHub Platform

Willy Scheibel, Jasper Blum, Franziska Lauterbach, Daniel Atzberger, and Jürgen Döllner
Issue Cover
MDPI Computers
BibTeX , Abstract , doi:10.3390/computers13020033

Readily available software analysis and analytics tools are often operated within external services, where the measured software analysis data is kept internally and no external access to the data is available. We propose an approach to integrate visual software analysis on the GitHub platform by leveraging GitHub Actions and the GitHub API, covering both analysis and visualization. The process is to perform software analysis for each commit, e.g., static source code complexity metrics, and augment the commit by the resulting data, stored as git objects within the same repository. We show that this approach is feasible by integrating it into 64 open source TypeScript projects. Further, we analyze the impact on Continuous Integration (CI) run time and repository storage. The stored software analysis data is externally accessible to allow for visualization tools, such as software maps. The effort to integrate our approach is limited to enabling the analysis component within the a project's CI on GitHub and embed an HTML snippet into the project's website for visualization. This enables a large amount of projects to have access to software analysis as well as provide means to communicate the current status of a project.

Publisher Record, Author Version, Prototype, Github Project

2023

Thumbnail of Visual Counterfactual Explanations Using Semantic Part Locations
Full Paper

Visual Counterfactual Explanations Using Semantic Part Locations

Florence Böttger, Tim Cech, Willy Scheibel, and Jürgen Döllner
Best Student Paper Award
Proceedings of the 15th International Conference on Knowledge Discovery and Information Retrieval
BibTeX , Abstract , doi:10.5220/0012179000003598

As machine learning models are becoming more widespread and see use in high-stake decisions, the explainability of these decisions is getting more relevant. One approach for explainability are counterfactual explanations. They are defined as changes to a data point such that it appears as a different class. Their close connection to the original dataset aids their explainability. However, existing methods of creating counterfacual explanations often rely on other machine learning models, which adds an additional layer of opacity to the explanations. We propose additions to an established pipeline for creating visual counterfacual explanations by using an inherently explainable algorithm that does not rely on external models. Using annotated semantic part locations, we replace parts of the counterfactual creation process. We evaluate the approach on the CUB-200-2011 dataset. Our approach outperforms the previous results: we improve (1) the average number of edits by 0.1 edits, (2) the key point accuracy of editing within any semantic parts of the image by an average of at least 7 percentage points, and (3) the key point accuracy of editing the same semantic parts by at least 17 percentage points.

Publisher Record, Author Version

Thumbnail of unCover: Identifying AI Generated News Articles by Linguistic Analysis and Visualization
Full Paper

unCover: Identifying AI Generated News Articles by Linguistic Analysis and Visualization

Lucas Liebe, Jannis Baum, Tilman Schütze, Tim Cech, Willy Scheibel, and Jürgen Döllner
Candidate for Best Paper
Proceedings of the 15th International Conference on Knowledge Discovery and Information Retrieval
BibTeX , Abstract , doi:10.5220/0012163300003598

Text synthesis tools are becoming increasingly popular and better at mimicking human language. In trustsensitive decisions, such as plagiarism and fraud detection, identifying AI-generated texts poses larger difficulties: decisions need to be made explainable to ensure trust and accountability. To support users in identifying AI-generated texts, we propose the tool UNCOVER. The tool analyses texts through three explainable linguistic approaches: Stylometric writing style analysis, topic modeling, and entity recognition. The result of the tool is a decision and visualizations on the analysis results. We evaluate the tool on news articles by means of accuracy of the decision and an expert study with 13 participants. The final prediction is based on classification of stylometric and evolving topic analysis. It achieved an accuracy of 70.4 % and a weighted F1-score of 85.6 %. The participants preferred to base their assessment on the prediction and the topic graph. However, they found the entity recognition to be an ineffective indicator. Moreover, five participants highlighted the explainable aspects of UNCOVER and overall the participants achieved 69 % true classifications. Eight participants expressed interest to continue using unCover for identifying AI-generated texts.

Publisher Record, Author Version, Slides, Demo, Github Project

Thumbnail of Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization
Full Paper

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

Daniel Atzberger, Tim Cech, Matthias Trapp, Rico Richter, Willy Scheibel, Jürgen Döllner, and Tobias Schreck
Transactions on Visualization and Computer Graphics, 28th Conference on Visualization (VIS '23)
BibTeX , Abstract , doi:10.1109/TVCG.2023.3326569

Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and supporting corpus analysis. Although the choice of the topic model, the dimensionality reduction, and their underlying hyperparameters significantly impact the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics. To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based computational evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of layout algorithms that are combinations of topic models and dimensionality reductions, and (3) quality metrics for quantifying the resulting layout. The corpora are given as document-term matrices, and each document is assigned to a thematic class. The chosen metrics quantify the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots. By evaluating the benchmark on a computing cluster, we derived a multivariate dataset with over 45 000 individual layouts and corresponding quality metrics. Based on the results, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions. As a main result, we show that interpretable topic models are beneficial for capturing the structure of text corpora. We furthermore recommend the use of t-SNE as a subsequent dimensionality reduction.

Publisher Record, Preprint, Slides, Github Project, Pre-trained Topic Models (I), Pre-trained Topic Models (II)

Thumbnail of Constructing Hierarchical Continuity in Hilbert & Moore Treemaps
Extended Abstract

Constructing Hierarchical Continuity in Hilbert & Moore Treemaps

Willy Scheibel and Jürgen Döllner
25th EG Conference on Visualization (EuroVis '23)
BibTeX , Abstract , doi:10.2312/evp20231060

The Hilbert and Moore treemap layout algorithms are based on the space-filling Hilbert and Moore curves, respectively, to map tree-structured datasets to a 2D treemap layout. Considering multiple snapshots of a time-variant dataset, one of the design goals for Hilbert and Moore treemaps is layout stability, i.e., low changes in the layout for low changes in the underlying tree-structured data. For this, their underlying space-filling curve is expected to be continuous across all nodes and hierarchy levels, which has to be considered throughout the layouting process. We propose optimizations to subdivision templates, their orientation, and discuss the continuity of the underlying space-filling curve. We show real-world examples of Hilbert and Moore treemaps for small and large datasets with continuous space-filling curves, allowing for improved layout stability.

Publisher Record, Author Version, Supplemental Material, Poster, Source Code, Github Project

Thumbnail of A Dashboard for Interactive Convolutional Neural Network Training And Validation Through Saliency Maps
Extended Abstract

A Dashboard for Interactive Convolutional Neural Network Training And Validation Through Saliency Maps

Tim Cech, Furkan Simsek, Willy Scheibel, and Jürgen Döllner
25th EG Conference on Visualization (EuroVis '23)
BibTeX , Abstract , doi:10.2312/evp20231054

Quali-quantitative methods provide ways for interrogating Convolutional Neural Networks (CNN). For it, we propose a dashboard using a quali-quantitative method based on quantitative metrics and saliency maps. By those means, a user can discover patterns during the training of a CNN. With this, they can adapt the training hyperparameters of the model, obtaining a CNN that learned patterns desired by the user. Furthermore, they neglect CNNs which learned undesirable patterns. This improves users' agency over the model training process.

Publisher Record, Author Version, Supplemental Material, Poster

Thumbnail of Outlier Mining Techniques for Software Defect Prediction
Full Paper

Outlier Mining Techniques for Software Defect Prediction

Tim Cech, Daniel Atzberger, Willy Scheibel, Sanjay Misra, and Jürgen Döllner
15th International Conference on Software Quality (SWQD '23)
BibTeX , Abstract , doi:10.1007/978-3-031-31488-9_3

Software metrics measure aspects related to the quality of software. Using software metrics as a method of quantification of software, various approaches were proposed for locating defect-prone source code units within software projects. Most of these approaches rely on supervised learning algorithms, which require labeled data for adjusting their parameters during the learning phase. Usually, such labeled training data is not available. Unsupervised algorithms do not require training data and can therefore help to overcome this limitation. In this work, we evaluate the effect of unsupervised learning - especially outlier mining algorithms - for the task of defect prediction, i.e., locating defect-prone source code units. We investigate the effect of various class balancing and feature compressing techniques as preprocessing steps and show how sliding windows can be used to capture time series of source code metrics. We evaluate the Isolation Forest and Local Outlier Factor, as representants of outlier mining techniques. Our experiments on three publicly available datasets, containing a total of 11 software projects, indicate that the consideration of time series can improve static examinations by up to 3%. The results further show that supervised algorithms can outperform unsupervised approaches on all projects. Among all unsupervised approaches, the Isolation Forest achieves the best accuracy on 10 out of 11 projects.

Publisher Record, Author Version

Thumbnail of Examining Liquidity of Exchanges and Assets and the Impact of External Events in Centralized Crypto Markets: A 2022 Study
Full Paper

Examining Liquidity of Exchanges and Assets and the Impact of External Events in Centralized Crypto Markets: A 2022 Study

Adrian Jobst, Daniel Atzberger, Robert Henker, Jan Ole Vollmer, Willy Scheibel, and Jürgen Döllner
1st International Workshop on Cryptocurrency Exchanges (CryptoEx '23)
BibTeX , Abstract , doi:10.1109/ICBC56567.2023.10174905

Most cryptocurrencies are bought and sold on centralized exchanges that manage supply and demand via an order book. Besides trading fees, the high liquidity of a market is the most relevant reason for choosing one exchange over the other. However, as the different liquidity measures rely on the order book, external events that cause people to sell or buy a cryptocurrency can significantly impact a market's liquidity. To investigate the effect of external events on liquidity, we measure various liquidity measures for nine different order books comprising three currency pairs across three exchanges covering the entire year 2022. The resulting multivariate time series is then analyzed using different correlations. From the results, we can infer that as a cryptocurrency's market capitalization and the exchange's trading volume increases, so does its liquidity. At the same time, only a moderate correlation of liquidity between exchanges can be observed. Furthermore, our statistical observations show that external events, particularly the events around FTX and the Terra Luna crash, caused significant changes in liquidity. However, depending on the exchange's size and the cryptocurrency's market cap, the liquidity took a shorter or longer time to recover.

Publisher Record, Author Version

Thumbnail of OrderBookVis: A Visualization Approach for Comparing Order Books from Centralized Crypto Exchanges
Short Paper

OrderBookVis: A Visualization Approach for Comparing Order Books from Centralized Crypto Exchanges

Adrian Jobst, Daniel Atzberger, Robert Henker, Willy Scheibel, and Jürgen Döllner
1st International Workshop on Cryptocurrency Exchanges (CryptoEx '23)
BibTeX , Abstract , doi:10.1109/ICBC56567.2023.10174944

Trading for a currency pair on centralized crypto exchanges is organized via an order book, which collects all open buy and sell orders at any given time and thus forms the basis for price formation. Usually, the exchanges provide basic visualizations, which show the accumulated buy and sell volume in an animated 2D representation. However, this visualization does not allow the user to compare different order books, e.g., several order book snapshots. In this work, we present OrderBookVis, a 2.5D representation that shows a discrete set of order books comparatively. For this purpose, the individual snapshots are displayed as a 2D representation as usual and placed one after the other on a 2D reference plane. As possible use cases, we discuss the analysis of the temporal evolution of the order book for a fixed market and the comparison of different order books across multiple markets.

Publisher Record, Author Version

Thumbnail of Hephaistos: A Management System for Massive Order Book Data from Multiple Centralized Crypto Exchanges with an Internal Unified Order Book
Full Paper

Hephaistos: A Management System for Massive Order Book Data from Multiple Centralized Crypto Exchanges with an Internal Unified Order Book

Robert Henker, Daniel Atzberger, Jan Ole Vollmer, Willy Scheibel, Jürgen Döllner, and Markus Bick
1st International Workshop on Cryptocurrency Exchanges (CryptoEx '23)
BibTeX , Abstract , doi:10.1109/ICBC56567.2023.10174923

Offers to buy and sell cryptocurrencies on exchanges are collected in an order book as pairs of amount and price provided with a timestamp. Contrary to tick data, which only reflects the last transaction price on an exchange, the order book reflects the market’s actual price information and the available volume. Until now, no system has been presented that can capture many different order books across several markets. This paper presents Hephaistos, a system for processing, harmonizing, and storing massive spot order book data from 22 centralized crypto exchanges and 55 currency pairs. After collecting the data, Hephaistos aggregates several order books in a so-called Unified Order Book, which is the foundation for a Smart Order Routing algorithm. As a result an order is splitted across several exchanges, which results in a better execution price. As component of a high-frequency trading system, Hephaistos captures 32 % of the total daily spot trading volume. We provide examples with data from two exchanges that show that the Smart Order Routing algorithm significantly reduces the slippage.

Publisher Record, Author Version

Thumbnail of Real Estate Tokenization in Germany: Market Analysis and Concept of a Regulatory and Technical Solution
Short Paper

Real Estate Tokenization in Germany: Market Analysis and Concept of a Regulatory and Technical Solution

Robert Henker, Daniel Atzberger, Willy Scheibel, and Jürgen Döllner
5th International Conference on Blockchain and Cryptocurrency (ICBC '23)
BibTeX , Abstract , doi:10.1109/ICBC56567.2023.10174954

Real estate is the largest asset class and is equally popular with professional and retail investors. However, this asset class has the disadvantage that it is very illiquid, and investments have a high entry barrier in terms of equity. The adoption of the Electronic Securities Act in 2021 by the German Bundestag has created the legal framework for tokenizing real estate assets and their management using digital ledger technology in Germany. In this paper we describe a business concept for managing ownership and business transactions for real estate in Germany using blockchain technology. Besides its possibilities, we present a market analysis that comprises existing approaches and discusses legal limitations specific to Germany.

Publisher Record, Author Version

Thumbnail of Detecting Outliers in CI/CD Pipeline Logs using Latent Dirichlet Allocation
Short Paper

Detecting Outliers in CI/CD Pipeline Logs using Latent Dirichlet Allocation

Daniel Atzberger, Tim Cech, Willy Scheibel, Rico Richter, and Jürgen Döllner
18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE '23)
BibTeX , Abstract , doi:10.5220/0011858500003464

Continuous Integration and Continuous Delivery are best practices used during the DevOps phase. By using automated pipelines for building and testing small software changes, possible risks are intended to be detected early. Those pipelines continuously generate log events that are collected in semi-structured log files. In an industry context, these log files can amass 100,000 events and more. However, the relevant sections in these log files must be manually tagged by the user. This paper presents an online-learning approach for detecting relevant log events using Latent Dirichlet Allocation. After grouping a fixed number of log events in a document, our approach prunes the vocabulary to eliminate words without semantic meaning. A sequence of documents is then described as a discrete sequence by applying Latent Dirichlet Allocation, which allows the detection of outliers within the sequence. Our approach provides an explanation of the results by integrating the latent variables of the model. The approach is tested on log files that originate from a CI/CD pipeline of a large German company. Our results indicate that whether or not a log event is marked as an outlier heavily depends on the chosen hyperparameters of our model.

Publisher Record, Author Version

Thumbnail of Evaluating Probabilistic Topic Models for Bug Triaging Tasks
Chapter

Evaluating Probabilistic Topic Models for Bug Triaging Tasks

Daniel Atzberger, Jonathan Schneider, Willy Scheibel, Matthias Trapp, and Jürgen Döllner
ENASE 2022: Evaluation of Novel Approaches to Software Engineering
BibTeX , Abstract , doi:10.1007/978-3-031-36597-3_3

During the software development process, occurring problems are collected and managed as bug reports using bug tracking systems. Usually, a bug report is specified by a title, a more detailed description, and additional categorical information, e.g., the affected component or the reporter. It is the task of the triage owner to assign open bug reports to developers with the required skills to fix them. However, the bug assignment task is time-consuming, especially in large software projects with many involved developers. This observation motivates using (semi-)automatic algorithms for assigning bugs to developers. Various approaches have been developed that rely on a machine learning model trained on historical bug reports. Thereby, the modeling of the textual components is mainly done using topic models, mainly Latent Dirichlet Allocation (LDA). Although different variants, inference techniques, and libraries for LDA exist and various hyperparameters can be specified, most works treat topic models as a black box without exploring them in detail. In this work, we extend a study of Atzberger and Schneider et al. on the use of the Author-Topic Model (ATM) for bug triaging tasks. We demonstrate the influence of the underlying topic model, the used library and inference techniques, and the hyperparameters on the bug triaging results. The results of our conducted experiments on a dataset from the Mozilla Firefox project provide guidelines for applying LDA for bug triaging tasks effectively.

Publisher Record, Author Version

Thumbnail of Evaluating Architectures and Hyperparameters of Self-supervised Network Projections
Short Paper

Evaluating Architectures and Hyperparameters of Self-supervised Network Projections

Tim Cech, Daniel Atzberger, Willy Scheibel, Rico Richter, and Jürgen Döllner
14th International Conference on Information Visualization Theory and Applications (IVAPP '23)
BibTeX , Abstract , doi:10.5220/0011699700003417

Self-Supervised Network Projections (SSNP) are dimensionality reduction algorithms that produce low-dimensional layouts from high-dimensional data. By combining an autoencoder architecture with neighborhood information from a clustering algorithm, SSNP intend to learn an embedding that generates visually separated clusters. In this work, we extend an approach that uses cluster information as pseudo-labels for SSNP by taking outlier information into account. Furthermore, we investigate the influence of different autoencoders on the quality of the generated two-dimensional layouts. We report on two experiments on the autoencoder's architecture and hyperparameters, respectively, measuring nine metrics on eight labeled datasets from different domains, e.g., Natural Language Processing. The results indicate that the model's architecture and the choice of hyperparameter values can influence the layout with statistical significance, but none achieves the best result over all metrics. In addition, we found out that using outlier information for the pseudo-labeling approach can maintain global properties of the two-dimensional layout while trading-off local properties.

Publisher Record, Author Version, Slides, Github Project

Thumbnail of Visualization of Source Code Similarity using 2.5D Semantic Software Maps
Chapter

Visualization of Source Code Similarity using 2.5D Semantic Software Maps

Daniel Atzberger, Tim Cech, Willy Scheibel, Daniel Limberger, and Jürgen Döllner
VISIGRAPP 2021: Computer Vision, Imaging and Computer Graphics Theory and Applications
BibTeX , Abstract , doi:10.1007/978-3-031-25477-2_8

For various program comprehension tasks, software visualization techniques can be beneficial by displaying aspects related to the behavior, structure, or evolution of software. In many cases, the question is related to the semantics of the source code files, e.g., the localization of files that implement specific features or the detection of files with similar semantics. This work presents a general software visualization technique for source code documents, which uses 3D glyphs placed on a two-dimensional reference plane. The relative positions of the glyphs captures their semantic relatedness. Our layout originates from applying Latent Dirichlet Allocation and Multidimensional Scaling on the comments and identifier names found in the source code files. Though different variants for 3D glyphs can be applied, we focus on cylinders, trees, and avatars. We discuss various mappings of data associated with source code documents to the visual variables of 3D glyphs for selected use cases and provide details on our visualization system.

Publisher Record, Author Version

2022

Thumbnail of Hardware-accelerated Rendering of Web-based 3D Scatter Plots with Projected Density Fields and Embedded Controls
Full Paper

Hardware-accelerated Rendering of Web-based 3D Scatter Plots with Projected Density Fields and Embedded Controls

Lukas Wagner, Daniel Limberger, Willy Scheibel, and Jürgen Döllner
Best Paper Award
27th International Conference on 3D Web Technology (Web3D '22)
BibTeX , Abstract , doi:10.1145/3564533.3564566

3D scatter plots depicting massive data suffer from occlusion, which makes it difficult to get an overview and perceive structure. This paper presents a technique that facilitates the comprehension of heavily occluded 3D scatter plots. Data points are projected to axial planes, creating x-ray-like 2D views that support the user in analyzing the data's density and layout. We showcase our open-source web application with a hardware-accelerated rendering component written in WebGL. It allows for interactive interaction, filtering, and navigation with datasets up to hundreds of thousands of nodes. The implementation is detailed and discussed with respect to challenges posed by API and performance limitations.

Publisher Record, Author Version, Slides, Demo, Github Project

Thumbnail of Procedural Texture Patterns for Encoding Changes in Color in 2.5D Treemap Visualizations
Full Paper

Procedural Texture Patterns for Encoding Changes in Color in 2.5D Treemap Visualizations

Daniel Limberger, Willy Scheibel, Jan van Diecken, and Jürgen Döllner
Journal of Visualization
BibTeX , Abstract , doi:10.1007/s12650-022-00874-3

Treemaps depict tree-structured data while maintaining flexibility in mapping data to different visual variables. This work explores how changes in data mapped to color can be represented with rectangular 2.5D treemaps using procedural texture patterns. The patterns are designed to function for both static images and interactive visualizations with animated transitions. During rendering, the procedural texture patterns are superimposed onto the existing color mapping. We present a pattern catalog with seven exemplary patterns having different characteristics in representing the mapped data. This pattern catalog is implemented in a WebGL-based treemap rendering prototype and is evaluated using performance measurements and case studies on two software projects. As a result, this work extends the toolset of visual encodings for 2.5D treemaps by procedural texture patterns to represent changes in color. It serves as a starting point for user-centered evaluation.

Publisher Record, Author Version

Thumbnail of CodeCV: Mining Expertise of GitHub Users from Coding Activities
Short Paper

CodeCV: Mining Expertise of GitHub Users from Coding Activities

Daniel Atzberger, Nico Scordialo, Tim Cech, Willy Scheibel, Matthias Trapp, and Jürgen Döllner
22nd International Working Conference on Source Code Analysis and Manipulation (SCAM '22)
BibTeX , Abstract , doi:10.1109/SCAM55253.2022.00021

The number of software projects developed collaboratively on social coding platforms is steadily increasing. One of the motivations for developers to participate in open-source software development is to make their development activities easier accessible to potential employers, e.g., in the form of a resume for their interests and skills. However, manual review of source code activities is time-consuming and requires detailed knowledge of the technologies used. Existing approaches are limited to a small subset of actual source code activity and metadata and do not provide explanations for their results. In this work, we present CodeCV, an approach to analyzing the commit activities of a GitHub user concerning the use of programming languages, software libraries, and higher-level concepts, e.g., Machine Learning or Cryptocurrency. Skills in using software libraries and programming languages are analyzed based on syntactic structures in the source code. Based on Labeled Latent Dirichlet Allocation, an automatically generated corpus of GitHub projects is used to learn the concept-specific vocabulary in identifier names and comments. This enables the capture of expertise on abstract concepts from a user's commit history. CodeCV further explains the results through links to the relevant commits in an interactive web dashboard. We tested our system on selected GitHub users who mainly contribute to popular projects to demonstrate that our approach is able to capture developers' expertise effectively.

Publisher Record, Author Version, Slides

Thumbnail of Visual Variables and Configuration of Software Maps
Full Paper

Visual Variables and Configuration of Software Maps

Daniel Limberger, Willy Scheibel, Jürgen Döllner, and Matthias Trapp
Journal of Visualization
BibTeX , Abstract , doi:10.1007/s12650-022-00868-1

Software maps provide a general-purpose interactive user interface and information display in software analytics. This paper classifies software maps as a containment-based treemap embedded into a 3D attribute space and introduces respective terminology. It provides a comprehensive overview of advanced visual metaphors and techniques, each suitable for interactive visual analytics tasks. The metaphors and techniques are briefly described, located within a visualization pipeline model, and considered within a software map design space. The general expressiveness and applicability of visual variables are detailed and discussed. Consequent applications and use cases w.r.t. different types of software system data and software engineering data are discussed, arguing for versatile use of software maps in visual software analytics.

Publisher Record, Author Version

Thumbnail of A Benchmark for the Use of Topic Models for Text Visualization Tasks
Extended Abstract

A Benchmark for the Use of Topic Models for Text Visualization Tasks

Daniel Atzberger, Tim Cech, Willy Scheibel, Daniel Limberger, Matthias Trapp, and Jürgen Döllner
15th International Symposium on Visual Information Communication and Interaction (VINCI '22)
BibTeX , Abstract , doi:10.1145/3554944.3554961

Publisher Record, Author Version, Slides

Thumbnail of Efficient GitHub Crawling using the GraphQL API
Full Paper

Efficient GitHub Crawling using the GraphQL API

Adrian Jobst, Daniel Atzberger, Tim Cech, Willy Scheibel, Matthias Trapp, and Jürgen Döllner
22th International Conference on Computational Science and Its Applications (ICCSA '22)
BibTeX , Abstract , doi:10.1007/978-3-031-10548-7_48

The number of publicly accessible software repositories on online platforms is growing rapidly. With more than 128 million public repositories (as of March 2020), GitHub is the world’s largest platform for hosting and managing software projects. Where it used to be necessary to merge various data sources, it is now possible to access a wealth of data using the GitHub API alone. However, collecting and analyzing this data is not an easy endeavor. In this paper, we present Prometheus, a system for crawling and storing software repositories from GitHub. Compared to existing frameworks, Prometheus follows an event-driven microservice architecture. By separating functionality on the service level, there is no need to understand implementation details or use existing frameworks to extend or customize the system, only data. Prometheus consists of two components, one for fetching GitHub data and one for data storage which serves as a basis for future functionality. Unlike most existing crawling approaches, the Prometheus fetching service uses the GitHub GraphQL API. As a result, Prometheus can significantly outperform alternatives in terms of throughput in some scenarios.

Publisher Record, Author Version, Slides

Thumbnail of Tooling for Time- and Space-efficient git Repository Mining
Short Paper

Tooling for Time- and Space-efficient git Repository Mining

Fabian Heseding, Willy Scheibel, and Jürgen Döllner
19th International Conference on Mining Software Repositories – Data and Tool Showcase Track (MSR '22)
BibTeX , Abstract , doi:10.1145/3524842.3528503

Software projects under version control grow with each commit, accumulating up to hundreds of thousands of commits per repository. Especially for such large projects, the traversal of a repository and data extraction for static source code analysis poses a trade-off between granularity and speed. We showcase the command-line tool pyrepositoryminer that combines a set of optimization approaches for efficient traversal and data extraction from git repositories while being adaptable to third-party and custom software metrics and data extractions. The tool is written in Python and combines bare repository access, in-memory storage, parallelization, caching, change-based analysis, and optimized communication between the traversal and custom data extraction components. The tool allows for both metrics written in Python and external programs for data extraction. A single-thread performance evaluation based on a basic mining use case shows a mean speedup of 15.6×to other freely available tools across four mid-sized open source projects. A multi-threaded execution allows for load distribution among cores and, thus, a mean speedup up to 86.9×using 12 threads.

Publisher Record, Author Version, Github Project, Source Code Archive, Preprint

Thumbnail of Augmenting Library Development by Mining Usage Data from Downstream Dependencies
Full Paper

Augmenting Library Development by Mining Usage Data from Downstream Dependencies

Christoph Thiede, Willy Scheibel, Daniel Limberger, and Jürgen Döllner
Candidate for Best Student Paper
17th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE '22)
BibTeX , Abstract , doi:10.5220/0011093700003176

In the dependency graph of a software ecosystem, downstream dependencies are the nodes that depend on a package. Apart from end-user APIs, these dependencies make up the bulk of a library’s usage for most packages. Other than for upstream dependencies, tools that provide individual package developers with this kind of information rarely exist to date. This paper makes two contributions: (i) We propose an approach for gathering downstream dependencies of a single package efficiently and extracting usage samples from them using a static type analyzer. (ii) We present a tool that allows npm package developers to survey the aggregated usage data directly in their IDE in an interactive and context-sensitive way and that further supports them in understanding which packages use specific package members and why and how they use these members. This can help prioritize and steer development and uncover unexpected usage patterns, inappropriate member signatures, or misleading interface design. Our methods return over 8 000 dependencies for popular packages and process about 12 dependencies per minute while requiring about 500 MB memory in total and less than 30 MB storage per package, but tend to exclude unpopular dependencies. Usage sample extraction is very precise but not easily available for repositories with complex build configurations or metaprogramming patterns. We show that usage data from downstream dependency repositories is a promising and diverse source of information for mining software repositories and that our approach supports package developers in maintaining their APIs.

Publisher Record, Author Version, Github Project, Source Code Archive

Thumbnail of Mining Developer Expertise from Bug Tracking Systems using the Author-Topic Model
Full Paper

Mining Developer Expertise from Bug Tracking Systems using the Author-Topic Model

Daniel Atzberger, Jonathan Schneider, Willy Scheibel, Daniel Limberger, Matthias Trapp, and Jürgen Döllner
Best Student Paper Award
17th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE '22)
BibTeX , Abstract , doi:10.5220/0011045100003176

During software development processes, software defects, so-called bugs, are captured in a semi-structured manner in bug tracking systems using textual components and categorical features. It is the task of the triage owner to assign open bugs to developers with the required skills and expertise. This task, known as bug triaging, requires an in-depth knowledge about a developer’s skills. Various machine learning techniques have been proposed to automate this task, most of these approaches apply topic models, especially Latent Dirichlet Allocation (LDA), for mining the textual components of bug reports. However none of the proposed approaches explicitly models a developers expertise. In most cases these algorithms are treated as black box, as they allow no explanation about their recommendation. In this work, we show how the Author-Topic Model (ATM), a variant of LDA, can be used to capture a developer’s expertise in the latent topics of a corpus of bug reports from the model itself. Furthermore, we present three novel bug triaging techniques based on the ATM. We compare our approach against a baesline model, that is based on LDA, on a dataset of 18269 bug reports from the Mozilla Firefox project collected between July 1999 to June 2016. The results show that the ATM can outperform the LDA-based approach in terms of the Mean Reciprocal Rank (MRR).

Publisher Record, Author Version, Slides

Thumbnail of Visualization of Knowledge Distribution across Development Teams using 2.5D Semantic Software Maps
Short Paper

Visualization of Knowledge Distribution across Development Teams using 2.5D Semantic Software Maps

Daniel Atzberger, Tim Cech, Adrian Jobst, Willy Scheibel, Daniel Limberger, Matthias Trapp, and Jürgen Döllner
13th International Conference on Information Visualization Theory and Applications (IVAPP '22)
BibTeX , Abstract , doi:10.5220/0010991100003124

In order to detect software risks at an early stage, various software visualization techniques have been developed for monitoring the structure, behaviour, or the underlying development process of software. One of greatest risks for any IT organization consists in an inappropriate distribution of knowledge among its developers, as a projects’ success mainly depends on assigning tasks to developers with the required skills and expertise. In this work, we address this problem by proposing a novel Visual Analytics framework for mining and visualizing the expertise of developers based on their source code activities. Under the assumption that a developer’s knowledge about code is represented directly through comments and the choice of identifier names, we generate a 2D layout using Latent Dirichlet Allocation together with Multidimensional Scaling on the commit history, thus displaying the semantic relatedness between developers. In order to capture a developer’s expertise in a concept, we utilize Labeled LDA trained on a corpus of Open Source projects. By mapping aspects related to skills onto the visual variables of 3D glyphs, we generate a 2.5D Visualization, we call KnowhowMap. We exemplify this approach with an interactive prototype that enables users to analyze the distribution of skills and expertise in an explorative way.

Publisher Record, Author Version, Slides

2021

Thumbnail of Software Galaxies: Displaying Coding Activities using a Galaxy Metaphor
Extended Abstract

Software Galaxies: Displaying Coding Activities using a Galaxy Metaphor

Daniel Atzberger, Willy Scheibel, Daniel Limberger, and Jürgen Döllner
14th International Symposium on Visual Information Communication and Interaction (VINCI '21)
BibTeX , Abstract , doi:10.1145/3481549.3481573

Software visualization uses metaphors to depict software and software development data that usually has no gestalt. The choice of a metaphor and visual depiction is researched broadly, but deriving a layout based on similarity is still challenging. We present a novel approach to 3D software visualization called Software Galaxy. Our layout is based on applying Latent Dirichlet Allocation on source code. We utilize a metaphor inspired from astronomy for depicting software metrics for single files and clusters. Our first experiments indicate that a 3D visualization capturing semantic relatedness can be beneficial for standard program comprehension tasks.

Publisher Record, Author Version, Slides

Thumbnail of Interactive Simulation and Visualization of Long-Term, ETF-based Investment Strategies
Short Paper

Interactive Simulation and Visualization of Long-Term, ETF-based Investment Strategies

Martin Büßemeyer, Daniel Limberger, Willy Scheibel, and Jürgen Döllner
14th International Symposium on Visual Information Communication and Interaction (VINCI '21)
BibTeX , Abstract , doi:10.1145/3481549.3481568

Personal, long-term investment products, especially ones for retirement savings, require thorough understanding to use them profitably. Even simple savings plans based on exchange-traded funds(ETFs) are subject to many variables and uncertainties to be considered for expected and planned-upon returns. We present aninteractive simulation of an ETF-based savings plan that combinesforecasts, risk awareness, taxes and costs, inflation, and dynamicinflows and outflows into a single visualization. The visualizationconsists of four parts: a form-fill interface for configuration, a savings and payout simulation, a cash flow chart, and a savings chart. Based on a specific use case, we discuss how private investors canbenefit from using our visualization after a short training period.

Publisher Record, Author Version, Demo

Thumbnail of Visualization of Data Changes in 2.5D Treemaps using Procedural Textures and Animated Transitions
Short Paper

Visualization of Data Changes in 2.5D Treemaps using Procedural Textures and Animated Transitions

Daniel Limberger, Willy Scheibel, Jan van Dieken, and Jürgen Döllner
14th International Symposium on Visual Information Communication and Interaction (VINCI '21)
BibTeX , Abstract , doi:10.1145/3481549.3481570

This work investigates the extent to which animated procedural texture patterns can be used to support the representation of changes in 2.5D treemaps. Changes in height, color, and area of individual nodes can easily be visualized using animated transitions. Especially for changes in the color attribute, plain animated transitions are not able to directly communicate the direction of change itself. We show how procedural texture patterns can be superimposed to the color mapping and support transitions. To this end, we discuss qualitative properties of each pattern, demonstrate their ability to communicate change direction both with and without animation, and conclude which of the patterns are more likely to increase effectiveness and correctness of the change mapping in 2.5D treemaps.

Publisher Record, Author Version, Slides

Thumbnail of Algorithmic Improvements on Hilbert and Moore Treemaps for Visualization of Large Tree-structured Datasets
Short Paper

Algorithmic Improvements on Hilbert and Moore Treemaps for Visualization of Large Tree-structured Datasets

Willy Scheibel, Christopher Weyand, Joseph Bethge, and Jürgen Döllner
23rd EG Conference on Visualization (EuroVis '21)
BibTeX , Abstract , doi:10.2312/evs.20211065

Hilbert and Moore treemaps are based on the same named space-filling curves to lay out tree-structured data for visualization. One main component of them is a partitioning subroutine, whose algorithmic complexity poses problems when scaling to industry-sized datasets. Further, the subroutine allows for different optimization criteria that result in different layout decisions. This paper proposes conceptual and algorithmic improvements to this partitioning subroutine. Two measures for the quality of partitioning are proposed, resulting in the min-max and min-variance optimization tasks. For both tasks, linear-time algorithms are presented that find an optimal solution. The implementation variants are evaluated with respect to layout metrics and run-time performance against a previously available greedy approach. The results show significantly improved run time and no deterioration in layout metrics, suggesting effective use of Hilbert and Moore treemaps for datasets with millions of nodes.

Publisher Record, Author Version, Slides, Github Project

Thumbnail of Software Forest: A Visualization of Semantic Similarities in Source Code using a Tree Metaphor
Full Paper

Software Forest: A Visualization of Semantic Similarities in Source Code using a Tree Metaphor

Daniel Atzberger, Tim Cech, Merlin de la Haye, Maximilian Söchting, Willy Scheibel, Daniel Limberger, and Jürgen Döllner
Candidate for Best Student Paper
12th International Conference on Information Visualization Theory and Applications (IVAPP '21)
BibTeX , Abstract , doi:10.5220/0010267601120122

Software visualization techniques provide effective means for program comprehension tasks as they allow developers to interactively explore large code bases. A frequently encountered task during software development is the detection of source code files of similar semantic. To assist this task we present Software Forest, a novel 2.5D software visualization that enables interactive exploration of semantic similarities within a software system, illustrated as a forest. The underlying layout results from the analysis of the vocabulary of the software documents using Latent Dirichlet Allocation and Multidimensional Scaling and therefore reflects the semantic similarity between source code files. By mapping properties of a software entity, e.g., size metrics or trend data, to visual variables encoded by various, figurative tree meshes, aspects of a software system can be displayed. This concept is complemented with implementation details as well as a discussion on applications.

Publisher Record, Author Version, Slides, Demo

2020

Thumbnail of Survey of Treemap Layout Algorithms
Full Paper

Survey of Treemap Layout Algorithms

Willy Scheibel, Daniel Limberger, and Jürgen Döllner
13th International Symposium on Visual Information Communication and Interaction (VINCI '20)
BibTeX , Abstract , doi:10.1145/3430036.3430041

This paper provides an overview of published treemap layout algorithms from 1991 to 2019 that were used for information visualization and computational geometry. First, a terminology is outlined for the precise communication of tree-structured data and layouting processes. Second, an overview and classification of layout algorithms is presented and application areas are discussed. Third, the use-case-specific adaption process is outlined and discussed. This overview targets practitioners and researchers by providing a starting point for own research, visualization design, and applications.

Publisher Record, Author Version, Slides

Thumbnail of Survey on User Studies on the Effectiveness of Treemaps
Full Paper

Survey on User Studies on the Effectiveness of Treemaps

Carolin Fiedler, Willy Scheibel, Daniel Limberger, Matthias Trapp, and Jürgen Döllner
13th International Symposium on Visual Information Communication and Interaction (VINCI '20)
BibTeX , Abstract , doi:10.1145/3430036.3430054

Treemaps are a commonly used tool for the visual display and communication of tree-structured, multi-variate data. In order to confidently know when and how treemaps can best be applied, the research community uses usability studies and controlled experiments to "understand the potential and limitations of our tools" (Plaisant, 2004). To support the communities' understanding and usage of treemaps, this survey provides a comprehensive review and detailed overview of 69 user studies related to treemaps. However, due to pitfalls and shortcomings in design, conduct, and reporting of the user studies, there is little that can be reliably derived or accepted as a generalized statement. Fundamental open questions include configuration, compatible tasks, use cases, and perceptional characteristics of treemaps. The reliability of findings and statements is discussed and common pitfalls of treemap user studies are identified.

Publisher Record, Author Version, Companion Website

Thumbnail of A Framework for Interactive Exploration of Clusters in Massive Data using 3D Scatter Plots and WebGL
Extended Abstract

A Framework for Interactive Exploration of Clusters in Massive Data using 3D Scatter Plots and WebGL

Lukas Wagner, Willy Scheibel, Daniel Limberger, Matthias Trapp, and Jürgen Döllner
25th International Conference on 3D Web Technology (Web3D '20)
BibTeX , Abstract , doi:10.1145/3424616.3424730

This paper presents a rendering framework for the visualization of massive point datasets in the web. It includes highly interactive point rendering, cluster visualization, basic interaction methods, and importance-based labeling, while being available for both mobile and desktop browsers. The rendering style is customizable, as shown in figure 1. Our evaluation indicates that the framework facilitates interactive visualization of tens of millions of raw data points even without dynamic filtering or aggregation.

Publisher Record, Author Version, Slides, Demo

Thumbnail of A Taxonomy of Treemap Visualization Techniques
Position Paper

A Taxonomy of Treemap Visualization Techniques

Willy Scheibel, Matthias Trapp, Daniel Limberger, and Jürgen Döllner
11th International Conference on Information Visualization Theory and Applications (IVAPP '20)
BibTeX , Abstract , doi:10.5220/0009153902730280

A treemap is a visualization that has been specifically designed to facilitate the exploration of tree-structured data and, more general, hierarchically structured data. The family of visualization techniques that use a visual metaphor for parent-child relationships based “on the property of containment” (Johnson, 1993) is commonly referred to as treemaps. However, as the number of variations of treemaps grows, it becomes increasingly important to distinguish clearly between techniques and their specific characteristics. This paper proposes to discern between Space-filling Treemap, Containment Treemap, Implicit Edge Representation Tree, and Mapped Tree for classification of hierarchy visualization techniques and highlights their respective properties. This taxonomy is created as a hyponymy, i.e., its classes have an is-a relationship to one another. With this proposal, we intend to stimulate a discussion on a more unambiguous classification of treemaps and, furthermore, broaden what is understood by the concept of treemap itself.

Publisher Record, Author Version, Slides

Thumbnail of Visualization of Tree-structured Data using Web Service Composition
Chapter

Visualization of Tree-structured Data using Web Service Composition

Willy Scheibel, Judith Hartmann, Daniel Limberger, and Jürgen Döllner
VISIGRAPP 2019: Computer Vision, Imaging and Computer Graphics Theory and Applications
BibTeX , Abstract , doi:10.1007/978-3-030-41590-7_10

This article reiterates on the recently presented hierarchy visualization service HiViSer and its API. It illustrates its decomposition into modular services for data processing and visualization of tree-structured data. The decomposition is aligned to the common structure of visualization pipelines and, in this way, facilitates attribution of the services' capabilities. Suitable base resource types are proposed and their structure and relations as well as a subtyping concept for specifics in hierarchy visualization implementations are detailed. Moreover, state-of-the-art quality standards and techniques for self-documentation and discovery of components are incorporated. As a result, a blueprint for Web service design, architecture, modularization, and composition is presented, targeting fundamental visualization tasks of tree-structured data, i.e., gathering, processing, rendering, and provisioning. Finally, the applicability of the service components and the API is evaluated in the context of exemplary applications.

Publisher Record, Author Version

2019

Thumbnail of Advanced Visual Metaphors and Techniques for Software Maps
Full Paper

Advanced Visual Metaphors and Techniques for Software Maps

Daniel Limberger, Willy Scheibel, Matthias Trapp, and Jürgen Döllner
12th International Symposium on Visual Information Communication and Interaction (VINCI '19)
BibTeX , Abstract , doi:10.1145/3356422.3356444

Software maps provide a general-purpose interactive user interface and information display for software analytics tools. This paper systematically introduces and classifies software maps as a treemap-based technique for software cartography. It provides an overview of advanced visual metaphors and techniques, each suitable for interactive visual analytics tasks, that can be used to enhance the expressiveness of software maps. Thereto, the metaphors and techniques are briefly described, located within a visualization pipeline model, and considered within the software map design space. Consequent applications and use cases w.r.t. different types of software system data and software engineering data are discussed, arguing for a versatile use of software maps in visual software analytics.

Publisher Record, Author Version

Thumbnail of Design and Implementation of Web-Based Hierarchy Visualization Services
Full Paper

Design and Implementation of Web-Based Hierarchy Visualization Services

Willy Scheibel, Judith Hartmann, and Jürgen Döllner
Candidate for Best Paper
10th International Conference on Information Visualization Theory and Applications (IVAPP '18)
BibTeX , Abstract , doi:10.5220/0007693201410152

There is a rapidly growing, cross-domain demand for interactive, high-quality visualization techniques as components of web-based applications and systems. In this context, a key question is how visualization services can be designed, implemented, and operated based on Software-as-a-Service as software delivery model. In this paper, we present concepts and design of a SaaS framework and API of visualization techniques for tree-structured data, called HiViSer. Using representational state transfer (REST), the API supports different data formats, data manipulations, visualization techniques, and output formats. In particular, the API defines base resource types for all components required to create an image or a virtual scene of a hierarchy visualization. We provide a treemap visualization service as prototypical implementation for which subtypes of the proposed API resources have been created. The approach generally serves as a blue-print for fully web-based, high-end visualization services running on thin clients in a standard browser environment.

Publisher Record, Author Version, Slides, Homepage

2018

Thumbnail of EvoCells – A Treemap Layout Algorithm for Evolving Tree Data
Full Paper

EvoCells – A Treemap Layout Algorithm for Evolving Tree Data

Willy Scheibel, Christopher Weyand, and Jürgen Döllner
9th International Conference on Information Visualization Theory and Applications (IVAPP '17)
BibTeX , Abstract , doi:10.5220/0006617102730280

We propose the rectangular treemap layout algorithm EvoCells that maps changes in tree-structured data onto an initial treemap layout. Changes in topology and node weights are mapped to insertion, removal, growth, and shrinkage of the layout rectangles. Thereby, rectangles displace their neighbors and stretche their enclosing rectangles with a run-time complexity of O(n log n). An evaluation using layout stability metrics on the open source ElasticSearch software system suggests EvoCells as a valid alternative for stable treemap layouting.

Publisher Record, Author Version, Slides

2017

Thumbnail of Mixed-Projection Treemaps: A Novel Approach Mixing 2D and 2.5D Treemaps
Full Paper

Mixed-Projection Treemaps: A Novel Approach Mixing 2D and 2.5D Treemaps

Daniel Limberger, Willy Scheibel, Matthias Trapp, and Jürgen Döllner
21st International Conference on Information Visualisation (iV '17)
BibTeX , Abstract , doi:10.1109/iV.2017.67

2D treemaps are a space-filling visualization technique that facilitate exploration of non-spatial, attributed, tree-structured data using the visual variables size and color. In extension thereto, 2.5D treemaps introduce height for additional information display. This extension entails challenges such as increased rendering effort, occlusion, or the need for navigation techniques that counterbalance the advantages of 2D treemaps to a certain degree. This paper presents a novel technique for combining 2D and 2.5D treemaps using multi-perspective views to leverage the advantages of both treemap types. It enables a new form of overview+detail visualization for complex treemaps and contributes new concepts for real-time rendering of and interaction with mixed-projection treemaps. The technique operates by tilting up inner nodes using affine transformations and animated state transitions. The mixed use of orthogonal and perspective projections is discussed and application examples that facilitate exploration of multi-variate data and benefit from the reduced interaction overhead are demonstrated.

Publisher Record, Author Version, Slides

Thumbnail of Attributed Vertex Clouds
Chapter

Attributed Vertex Clouds

Willy Scheibel, Stefan Buschmann, Matthias Trapp, and Jürgen Döllner
GPU Zen
BibTeX , Abstract

In todays computer graphics applications, large 3D scenes are rendered which consist of polygonal geometries such as triangle meshes. Using state-of-the-art techniques, this geometry is often represented on the GPU using vertex and index buffers, as well as additional auxiliary data such as textures or uniform buffers. For polygonal meshes of arbitrary complexity, the described approach is indispensable. However, there are several types of simpler geometries (e.g., cuboids, spheres, tubes, or splats) that can be generated procedurally. We present an efficient data representation and rendering concept for such geometries, denoted as attributed vertex clouds (AVCs). Using this approach, geometry is generated on the GPU during execution of the programmable rendering pipeline. Each vertex is used as the argument for a function that procedurally generates the target geometry. This function is called a transfer function, and it is implemented using shader programs and therefore executed as part of the rendering process. This approach allows for compact geometry representation and results in reduced memory footprints in comparison to traditional representations. By shifting geometry generation to the GPU, the resulting volatile geometry can be controlled flexibly, i.e., its position, parameterization, and even the type of geometry can be modified without requiring state changes or uploading new data to the GPU. Performance measurements suggests improved rendering times and reduced memory transmission through the rendering pipeline.

Author Version, Github Project

Thumbnail of Reducing Visual Complexity in Software Maps using Importance-based Aggregation of Nodes
Full Paper

Reducing Visual Complexity in Software Maps using Importance-based Aggregation of Nodes

Daniel Limberger, Willy Scheibel, Sebastian Hahn, and Jürgen Döllner
8th International Conference on Information Visualization Theory and Applications (IVAPP)
BibTeX , Abstract , doi:10.5220/0006267501760185

Depicting massive software system data using software maps can result in visual clutter and increased cognitive load. This paper introduces an adaptive level-of-detail (LoD) technique that uses scoring for interactive aggregation on a per-node basis. The scoring approximates importance by degree-of-interest measures as well as screen and user-interaction scores. The technique adheres to established aggregation guidelines and was evaluated by means of two user studies. The first user study investigates task completion time in visual search. The second evaluates the readability of the presented nesting level contouring for aggregates. With the adap- tive LoD technique software maps allow for multi-resolution depictions of software system information. It facilitates efficient identification of important nodes and allows for additional annotation.

Publisher Record, Author Version, Slides

2016

Thumbnail of Dynamic 2.5D Treemaps using Declarative 3D on the Web
Short Paper

Dynamic 2.5D Treemaps using Declarative 3D on the Web

Daniel Limberger, Willy Scheibel, Stefan Lemme, and Jürgen Döllner
21st International Conference on Web3D Technology (Web3D '16)
BibTeX , Abstract , doi:10.1145/2945292.2945313

The 2.5D treemap represents a general purpose visualization technique to map multi-variate hierarchical data in a scalable, interactive, and consistent way used in a number of application fields. In this paper, we explore the capabilities of Declarative 3D for the web-based implementation of 2.5D treemap clients. Particularly, we investigate how X3DOM and XML3D can be used to implement clients with equivalent features that interactively display 2.5D treemaps with dynamic mapping of attributes. We also show a first step towards a glTF-based implementation. These approaches are benchmarked focusing on their interaction capabilities with respect to rendering and speed of dynamic data mapping. We discuss the results for our representative example of a complex 3D interactive visualization technique and summerize recommendations for improvements towards operational web clients.

Publisher Record, Author Version, Slides, Github Project, Demo

Thumbnail of Interactive Revision Exploration using Small Multiples of Software Maps
Short Paper

Interactive Revision Exploration using Small Multiples of Software Maps

Willy Scheibel, Matthias Trapp, and Jürgen Döllner
7th International Conference on Information Visualization Theory and Applications (IVAPP '16)
BibTeX , Abstract , doi:10.5220/0005694401310138

To explore and to compare different revisions of complex software systems is a challenging task as it requires to constantly switch between different revisions and the corresponding information visualization. This paper proposes to combine the concept of small multiples and focus+context techniques for software maps to facilitate the comparison of multiple software map themes and revisions simultaneously on a single screen. This approach reduces the amount of switches and helps to preserve the mental map of the user. Given a software project the small multiples are based on a common dataset but are specialized by specific revisions and themes. The small multiples are arranged in a matrix where rows and columns represents different themes and revisions, respectively. To ensure scalability of the visualization technique we also discuss two rendering pipelines to ensure interactive frame-rates. The capabilities of the proposed visualization technique are demonstrated in a collaborative exploration setting using a high-resolution, multi-touch display.

Publisher Record, Author Version, Poster (Portrait), Poster (Landscape)

2014

Thumbnail of Interaktive Visualisierung von hierarchischen, multivariaten und zeitlich variierenden Daten
Thesis

Interaktive Visualisierung von hierarchischen, multivariaten und zeitlich variierenden Daten

Willy Scheibel
Master's Thesis at the Hasso Plattner Institute, University of Potsdam
BibTeX , Abstract , doi:10.13140/RG.2.2.33837.33763

A treemap recursively subdivides a two-dimensional area in order to encode a hierarchy and enables the visualization of multiple attributes e.g. with the size, the extruded height or the color of a node. Traditional treemap layout algorithms and rendering techniques can only be used for the comparison of two data sets at different points in time to some extent, as (1) no comparison between nodes in a treemap and between different states is possible and (2) there are no rendering techniques for the size differences of a node over time. This thesis introduces the techniques EvoCell-Layouting, Change Map, and Change Hints. EvoCell-Layouting is a novel treemap layout algorithm that iteratively changes a given treemap layout. Change Maps are density maps to locate changes in attribute values disregarding the difference and the size of the node. Change Hints visualize spatial changes between two states of a treemap. These three techniques enhance the comprehension of the evolution of temporal hierarchical data. A prototypical implementation, a discussion about alternatives, and performance and memory analyses demonstrate real data applicability. An additional case study reveals distinctive changes in the software system of a monitored open-source project that are hard to detect with traditional hierarchy visualizations.

Document

Academic Service

Logo of IEEE Computer Society / VGTC
since 2021

Reviewer

IEEE Computer Society / VGTC

  • Transactions for Visualization and Computer Graphics (TVCG)
  • Visualization Conference (VIS)
  • Pacific Visualization Symposium (PacificVis)
Logo of IEEE Computer Society / VGTC
Logo of IEEE International Conference on Mining Software Repositories (MSR)
2023–2024

Program Committee, Technical Program

IEEE International Conference on Mining Software Repositories (MSR)

Logo of IEEE International Conference on Mining Software Repositories (MSR)
Logo of EG
since 2023

Reviewer

EG

  • European Conference on Visualization (EuroVis)
Logo of EG
Logo of International Symposium on Visual Information Communication and Interaction
since 2022

Reviewer

International Symposium on Visual Information Communication and Interaction

  • VINCI
Logo of International Symposium on Visual Information Communication and Interaction
Logo of IEEE Working Conference on Software Visualization
in 2022

Reviewer

IEEE Working Conference on Software Visualization

  • VISSOFT
Logo of IEEE Working Conference on Software Visualization
Logo of Elsevier
in 2021

Reviewer

Elsevier

  • Information Sciences (INS)
Logo of Elsevier
Logo of International Symposium on Visual Information Communication and Interaction: VINCI 2021
in 2021

Organization Committee, Publication Chair

International Symposium on Visual Information Communication and Interaction: VINCI 2021

Logo of International Symposium on Visual Information Communication and Interaction: VINCI 2021

Committee Work

Logo of Digital Engineering Faculty (Hasso Plattner Institute and University of Potsdam)
since 2017

Faculty Counsil, Voting Member

Digital Engineering Faculty (Hasso Plattner Institute and University of Potsdam)

Logo of Digital Engineering Faculty (Hasso Plattner Institute and University of Potsdam)
Logo of Digital Engineering Faculty (Hasso Plattner Institute and University of Potsdam)
since 2017

Study Commission, Voting Member

Digital Engineering Faculty (Hasso Plattner Institute and University of Potsdam)

Logo of Digital Engineering Faculty (Hasso Plattner Institute and University of Potsdam)
Logo of University of Potsdam
2018–2019

Working Group for the Principles in Teaching, Voting Member

University of Potsdam

Logo of University of Potsdam
Logo of Digital Engineering Faculty, designated (Hasso Plattner Institute and University of Potsdam)
in 2017

Founding Committee, Voting Member

Digital Engineering Faculty, designated (Hasso Plattner Institute and University of Potsdam)

Logo of Digital Engineering Faculty, designated (Hasso Plattner Institute and University of Potsdam)
Logo of Zweite Neue Grundschule Ludwigsfelde
since 2021

School Conference, Chairman

Zweite Neue Grundschule Ludwigsfelde

Logo of Zweite Neue Grundschule Ludwigsfelde

Scholarships & Grants

Logo
in 2023

DDSA Scholarship for the VCG '23 Workshop

2nd International Workshop on Visualization and Visual Computing, Aarhus, DK

Workshop Website
Logo
Logo
in 2014

HPI Scholarship for Doctoral Studies

Logo
Logo
in 2014

HPI Award Best Master Graduate 2014

Awarded for graduation with top grades and subsequent doctoral studies at HPI.

Press Release
Logo

Projects

Thumbnail of cmake-init

cmake-init

Role: Maintainer
Type: Project Template
Homepage , Source Code

Template for reliable, cross-platform C++ project setup using cmake.

Thumbnail of globjects

globjects

Role: Maintainer
Type: OpenGL Library
Homepage , Source Code

A cross-platform C++ wrapper for OpenGL API objects.

Thumbnail of glbinding

glbinding

Role: Maintainer
Type: OpenGL Library
Homepage , Source Code

A C++ binding for the OpenGL API, generated using the gl.xml specification.

Thumbnail of glkernel

glkernel

Role: Maintainer
Type: Math Library
Source Code

C++ library for pre-computing noise, and random sample kernels.

Thumbnail of Attributed Vertex Clouds Demo

Attributed Vertex Clouds Demo

Role: Maintainer
Type: Demo
Source Code

Demo to the Article "Attributed Vertex Clouds" from GPU Zen: Advanced Rendering Techniques

Thumbnail of Unified Memory Demo

Unified Memory Demo

Role: Maintainer
Type: Demo
Source Code

Unified Memory Heterogenous Computing Showcase.

HiViSer

Role: Maintainer
Type: Research Project
Homepage

A Web API specification for the management and provisioning of tree-structured data and their visualization using information visualization techniques.

openll

Role: Maintainer
Type: Research Project
Homepage , Source Code

Open Label Library – API specification and reference implementations for glyph rendering in 2D and 3D graphics environments.

Thumbnail of cppexpose

cppexpose

Role: Maintainer
Type: Utility Library
Homepage , Source Code

C++ library for type introspection, reflection, and scripting interface.

Thumbnail of cppassist

cppassist

Role: Maintainer
Type: Utility Library
Homepage , Source Code

C++ sanctuary for small but powerful and frequently required, stand alone features.

Thumbnail of cpplocate

cpplocate

Role: Maintainer
Type: Utility Library
Homepage , Source Code

Cross-platform C++ library providing tools for applications to locate themselves, their data assets as well as dependent modules.

Thumbnail of HPI Schul-Cloud

HPI Schul-Cloud

Role: Contributor
Type: Service
Source Code

HPI Schul-Cloud Core Server.

CG Internals PPA

Role: Maintainer
Type: Binary Package Archive
Homepage

The Ubuntu Package Archive for the Open Source software of CG Internals.

khrbinding-generator

Role: Maintainer
Type: Utility Tools
Source Code

A Python generator for the Khronos APIs OpenGL, OpenGL ES, OpenGL SC, and EGL.

eglbinding

Role: Maintainer
Type: EGL Library
Source Code

A C++ binding for the EGL API, generated using the egl.xml specification.

glscbinding

Role: Maintainer
Type: OpenGL SC Library
Source Code

A C++ binding for the OpenGL SC API, generated using the gl.xml specification.

glesbinding

Role: Maintainer
Type: OpenGL ES Library
Homepage , Source Code

A C++ binding for the OpenGL ES API, generated using the gl.xml specification.

Thumbnail of gloperate

gloperate

Role: Maintainer
Type: OpenGL Framework
Source Code

C++ library for defining and controlling modern GPU rendering/processing operations.

Thumbnail of cppfs

cppfs

Role: Contributor
Type: Utility Library
Homepage , Source Code

Cross-platform C++ file system library supporting multiple backends.

qmltoolbox

Role: Contributor
Type: Utility Library
Source Code

QML item library for cross-platform graphics applications.

Thumbnail of webgl-operate

webgl-operate

Role: Contributor
Type: WebGL Framework
Homepage , Source Code

A TypeScript based WebGL rendering framework.

Widelands

Role: One-time Contributor
Type: Video Game
Homepage

A free, Settlers II inspired, open source real-time strategy game with singleplayer campaigns and a multiplayer mode.

Thumbnail of glm

glm

Role: One-time Contributor
Type: Math Library
Homepage , Source Code

C++ library for OpenGL Mathematics.

Tasteful Framework

Role: Maintainer
Type: Web Framework
Source Code

A framework to create web application servers written in C++ based on the Tasteful Server.

Tasteful Server

Role: Maintainer
Type: Web Framework
Source Code

A multithreaded web server written in C++ (using Qt)

Presentations

06/12/23

Constructing Hierarchical Continuity in Hilbert & Moore Treemaps
EuroVis 2023, Leipzig, Germany

Poster (Landscape, PDF)

06/18/21

Algorithmic Improvements on Hilbert and Moore Treemaps for Visualization of Large Tree-structured Datasets
EuroVis 2021, Virtual Attendance at Zürich, Switzerland

Slides (PDF)

12/08/20

A Survey of Treemap Visualization Techniques
VINCI 2020, Virtual Attendance at Eindhoven, The Netherlands

Slides (PDF)

02/29/20

A Taxonomy on Treemap Visualization Techniques
IVAPP 2020, Valletta, Malta

Slides (PDF)

02/26/19

Rendering Procedural Textures for Visualization of Thematic Data in 3D Geovirtual Environments
IVAPP 2019, Prague, Czech Republic

02/26/19

Design and Implementation of Web-Based Hierarchy Visualization Services
IVAPP 2019, Prague, Czech Republic

Slides (PDF)

03/07/18

Einführung in die Shared-Memory-Programmierung heterogener Systeme
parallel 2018, Heidelberg, Deutschland

Handout (PDF, 5.1 MiB), Slides (PDF)

01/28/18

EvoCells – A Treemap Layout Algorithm for Evolving Tree Data
IVAPP 2018, Funchal, Madeira, Portugal

Slides (PDF)

02/28/16

Interactive Revision Exploration using Small Multiples of Software Maps
IVAPP 2016, Rome, Italy

Poster (Landscape, PDF), Poster (Portrait, PDF)

Teaching

2023/24 : Winter Term

Computer Graphics I
Lecture (B.Sc.), Co-Lecturer

3D Computer Graphics: Extending the Three.js Framework
Project Seminar (B.Sc.), Seminar Lead and Tutor

Explaining and Visualizing AI
Project Seminar (M.Sc.), Tutor

2023 : Summer Term

Computer Graphics II
Lecture (B.Sc.), Co-Lecturer

Methods & Techniques for Visual Analytics
Project Seminar (M.Sc.), Seminar Lead and Tutor

2022/23 : Winter Term

Computer Graphics I
Lecture (B.Sc.), Co-Lecturer

Explainable AI by Visual Analytics
Project Seminar (M.Sc.), Tutor

2022 : Summer Term

Computer Graphics II
Lecture (B.Sc.), Co-Lecturer

Advanced Techniques for Analysis and Visualization of Software Data
Project Seminar (M.Sc.), Seminar Lead and Tutor

Programming Techniques II
Lecture (B.Sc.), Co-Lecturer

2021/22 : Winter Term

Computer Graphics I
Lecture (B.Sc.), Co-Lecturer

Introduction to Data Visualization
Lecture (M.Sc.), Tutor

Design and Construction of AI-based Interactive Systems with a 'Dark Side of AI'
Project Seminar (B.Sc.), Seminar Lead and Tutor

Advanced Techniques for Visual Analytics of Highdimensional Data
Project Seminar (M.Sc.), Tutor

2021 : Summer Term

Systems Engineering and Data Processing with C++
Lecture (M.Sc.), Lecturer and Tutor

Software Mining and Applications
Seminar (M.Sc.), Seminar Lead and Tutor

Visualization and Analysis of Mulitidimensional Data
Seminar (M.Sc.), Tutor

2020/21 : Winter Term

Computer Graphics I
Lecture (B.Sc.), Co-Lecturer

Programing User Interfaces
Lecture (B.Sc.), Co-Lecturer and Tutor

Analysis and Visualization of Similarities of Software Systems
Project (M.Sc.), Supervisor

2020 : Summer Term

Computer Graphics II
Lecture (B.Sc.), Co-Lecturer

Analysis and Visualization of Software Data
Seminar (M.Sc.), Seminar Lead and Tutor

Visual Analytics on Multi-dimensional Data using Topic Maps
Project (M.Sc.), Supervisor

2019/20 : Winter Term

Game Programming
Seminar (B.Sc.), Seminar Lead and Tutor

Selected Topics in Visual Analytics
Seminar (M.Sc.), Tutor

Selected Topics in Data Analytics
Seminar (M.Sc.), Tutor

2019 : Summer Term

Fundamentals of Software Analytics
Lecture (M.Sc.), Co-Lecturer and Tutor

2018/19 : Winter Term

Introduction to Software Analytics
Lecture (BA/MA), Co-Lecturer and Tutor

Visualization Algorithms & Techniques
Seminar (M.Sc.), Tutor

2018 : Summer Term

Gameprogramming
Project Seminar (B.Sc.), Seminar Lead and Tutor

2017/18 : Winter Term

Advanced Development in C++
Project Seminar (M.Sc.), Seminar Lead and Tutor

2017 : Summer Term

Advanced Information Visualization
Seminar (M.Sc.), Tutor

2016/17 : Winter Term

Advanced Programming in C++
Lecture (M.Sc.), Co-Lecturer and Tutor

Methods and Techniques of Information Visualization
Seminar (M.Sc.), Tutor

Real-time Monitoring of Massive Filesystems
Project (M.Sc.), Supervisor

2016 : Summer Term

Computergraphics II
Lecture (B.Sc.), Tutor

Methods and Techniques of Software Visualization
Seminar (M.Sc.), Tutor

Massive Information Mining for Software Analytics
Project (B.Sc.), Supervisor

2015/16 : Winter Term

Computergraphics I
Lecture (B.Sc.), Tutor

Systems Engineering for Software Analytics
Seminar (M.Sc.), Tutor

2015 : Summer Term

Computergraphics I
Lecture (B.Sc.), Tutor

Visual Software Analytics
Seminar (M.Sc.), Tutor

Programming of Computer Graphics Techniques using C++ and OpenGL
Seminar (B.Sc.), Tutor and Co-Lecturer

Software Analytics
Project (B.Sc.), Supervisor

2014/15 : Winter Term

Computergraphics II
Lecture (B.Sc.), Tutor

Visualization for Interactive Software Analytics
Seminar (M.Sc.), Tutor

Automated Visual Software Analytics
OpenHPI Course, Tutor

Software Analytics
Project (B.Sc.), Supervisor

2014 : Summer Term

Information Visualization
Seminar (M.Sc.), Co-Tutor

Graphics Programming using OpenGL and C++
Seminar (B.Sc.), Co-Tutor and Co-Lecturer

2013/14 : Winter Term

Software Analysis and Visualization
Seminar (M.Sc.), Co-Tutor

Contact

E-Mail:

willyscheibel@gmx.de

Other: willy.scheibel@hpi.de, willy.scheibel@cginternals.com

vCard: