As an ensemble model that consists of many independent decision trees, random forests generate predictions by feeding the input to internal trees and summarizing their outputs. The ensemble nature of the model helps random forests outperform any individual decision tree. However, it also leads to a poor model interpretability, which significantly hinders the model from being used in fields that have little or zero tolerance of errors, such as medical diagnosis and financial fraud detection. The interpretation challenges stem from the variety and complexity of the contained decision trees. Each decision tree has its unique structure and properties, such as the features used in the tree and the feature threshold in each tree node. Thus, a data input may lead to a variety of decision paths. To understand how a final prediction is achieved, it is desired to understand and compare all decision paths in the context of all tree structures, which is a huge challenge for any users. In this paper, we propose a visual analytic system aiming at interpreting random forest models and predictions. In addition to providing users with all the tree information, we summarize the decision paths in random forests, which eventually reflect the working mechanism of the model and reduce users’ mental burden of interpretation. To demonstrate the usefulness and effectiveness of our system, two usage scenarios and a qualitative user study are conducted.
Data analysis novices often encounter barriers in executing low-level operations for pairwise comparisons. They may also run into barriers in interpreting the artefacts (e.g., visualizations) created as a result of the operations. We developed Duet, a visual analysis system designed to help data analysis novices conduct pairwise comparisons by addressing execution and interpretation barriers. To reduce the barriers in executing low-level operations during pairwise comparison, Duet employs minimal specification: when one object group (i.e. a group of records in a data table) is specified, Duet recommends object groups that are similar to or different from the specified one; when two object groups are specified, Duet recommends the similar and different attributes between them. To lower the barriers in interpreting its recommendations, Duet explains the recommended groups and attributes using both visualizations and textual descriptions. We conducted a qualitative evaluation with 8 participants to understand the effectiveness of Duet. The results suggest that minimal specification is easy to use and Duet's explanations are helpful for interpreting the recommendations.
Getting the overall picture of how a large number of ego-networks evolve is a common yet challenging task. Existing techniques often require analysts to inspect the evolution patterns of individual ego-network one after another. In this study, we explore an approach that allows analysts to interactively create spatial layouts in which each dot is a dynamic ego-network. By providing an overview of the data, this technique enables analysts to see the evolution patterns of a large collection of ego-networks and how the evolution patterns relate at once. To let analysts construct various spatial layouts, we developed a data transformation pipeline, with which analysts can adjust the distance between dynamic ego-networks and hence the spatial layouts to reveal different global patterns such as trends, clusters and outliers. Based on this transformation pipeline, we develop Segue, a visual analysis system that supports thorough exploration of the evolution patterns of ego-networks. Through two usage scenarios, we demonstrate how analysts can gain insights into the overall evolution patterns of a large group of ego-networks by interactively creating different spatial layouts.
With the rapid development of e-commerce, there is an increasing number of online review websites, such as Yelp, to help customers make better purchase decisions. Viewing online reviews, including the rating score and text comments by other customers, and conducting a comparison between different businesses are the key to making an optimal decision. However, due to the massive amount of online reviews, the potential difference of user rating standards, and the significant variance of review time, length, details and quality, it is difficult for customers to achieve a quick and comprehensive comparison. In this paper, we present E-Comp, a carefully-designed visual analytics system based on online reviews, to help customers compare local businesses at different levels of details. More specifically, intuitive glyphs overlaid on maps are designed for quick candidate selection. Grouped Sankey diagram visualizing the rating difference by common customers is chosen for more reliable comparison of two businesses. Augmented word cloud showing adjective-noun word pairs, combined with a temporal view, is proposed to facilitate in-depth comparison of businesses in terms of different time periods, rating scores and features. The effectiveness and usability of E-Comp are demonstrated through a case study and in-depth user interviews.
Skyline queries have wide-ranging applications in fields that involve multi-criteria decision making, including tourism, retail industry, and human resources. By automatically removing incompetent candidates, skyline queries allow users to focus on a subset of superior data items (i.e., the skyline), thus reducing the decision-making overhead. However, users are still required to interpret and compare these superior items manually before making a successful choice. This task is challenging because of two issues. First, people usually have fuzzy, unstable, and inconsistent preferences when presented with multiple candidates. Second, skyline queries do not reveal the reasons for the superiority of certain skyline points in a multi-dimensional space. To address these issues, we propose SkyLens, a visual analytic system aiming at revealing the superiority of skyline points from different perspectives and at different scales to aid users in their decision making. Two scenarios demonstrate the usefulness of SkyLens on two datasets with a dozen of attributes. A qualitative study is also conducted to show that users can efficiently accomplish skyline understanding and comparison tasks with SkyLens.
Graph sampling is frequently used to address scalability issues when analyzing large graphs. Many algorithms have been proposed to sample graphs and the performances of these algorithms have been quantified through metrics based on graph structural properties preserved by the sampling: degree distribution, clustering coefficient, and others. However, a perspective that is missing is the impact of these sampling strategies on the resultant visualizations. In this paper, we present the results of three user studies that investigate how sampling strategies influence node-link visualizations of graphs. In particular, five sampling strategies widely used in the graph mining literature are tested to determine how well they preserve visual features in node-link diagrams. Our results show that depending on the sampling strategy used different visual features are preserved. These results provide a complimentary view to metric evaluations conducted in the graph mining literature and provide an impetus to conduct future visualization studies.
In this paper, we present a novel visual analytics system called NameClarifier to interactively disambiguate author names in publications by keeping humans in the loop. Specifically, NameClarifier quantifies and visualizes the similarities between ambiguous names and those that have been confirmed in digital libraries. The similarities are calculated using three key factors, namely, co-authorships, publication venues, and temporal information. Our system estimates all possible allocations, and then provides visual cues to users to help them validate every ambiguous case. By looping users in the disambiguation process, our system can achieve more reliable results than general data mining models for highly ambiguous cases. In addition, once an ambiguous case is resolved, the result is instantly added back to our system and serves as additional cues for all the remaining unidentified names. In this way, we open up the black box in traditional disambiguation processes, and help intuitively and comprehensively explain why the corresponding classifications should hold. We conducted two use cases and an expert review to demonstrate the effectiveness of NameClarifier.
The egocentric analysis of dynamic networks focuses on discovering the temporal patterns of a subnetwork around a specific central actor (i.e., an ego-network). These types of analyses are useful in many application domains, such as social science and business intelligence, providing insights about how the central actor interacts with the outside world. We present EgoLines, an interactive visualization to support the egocentric analysis of dynamic networks. Using a “subway map” metaphor, a user can trace an individual actor over the evolution of the ego-network. The design of EgoLines is grounded in a set of key analytical questions pertinent to egocentric analysis, derived from our interviews with three domain experts and general network analysis tasks. We demonstrate the effectiveness of EgoLines in egocentric analysis tasks through a controlled experiment and a case study with a domain expert.
Ego-network, which represents relationships between a specific individual, i.e., the ego, and people connected to it, i.e., alters, is a critical target to study in social network analysis. Evolutionary patterns of ego-networks along time provide huge insights to many domains such as sociology, anthropology, and psychology. However, the analysis of dynamic ego-networks remains challenging due to its complicated time-varying graph structures, for example: alters come and leave, ties grow stronger and fade away, and alter communities merge and split. Most of the existing dynamic graph visualization techniques mainly focus on topological changes of the entire network, which is not adequate for egocentric analytical tasks. In this paper, we present egoSlider, a visual analysis system for exploring and comparing dynamic ego-networks. egoSlider provides a holistic picture of the data through multiple interactively coordinated views, revealing ego-network evolutionary patterns at three different layers: a macroscopic level for summarizing the entire ego-network data, a mesoscopic level for overviewing specific individuals’ ego-network evolutions, and a microscopic level for displaying detailed temporal information of egos and their alters. We demonstrate the effectiveness of egoSlider with a usage scenario with the DBLP publication records. Also, a controlled user study indicates that in general egoSlider outperforms a baseline visualization of dynamic networks for completing egocentric analytical tasks.
In this paper, we introduce a novel visualization method which allows people to explore, compare and refine the major communities in a large network. We first detect major communities in a network using data mining and community analysis methods. Then, the statistics attributes of each community, the relational strength between communities, and the boundary nodes connecting those communities are computed and stored. We propose a novel method based on Voronoi treemap to encode each community with a polygon and the relative positions of polygons encode their relational strengths. Different community attributes can be encoded by polygon shapes, sizes and colors. A corner-cutting method is further introduced to adjust the smoothness of polygons based on certain community attribute. To accommodate the boundary nodes, the gaps between the polygons are widened by a polygon-shrinking algorithm such that the boundary nodes can be conveniently embedded into the newly created spaces. The method is very efficient, enabling users to test different community detection algorithms, fine tune the results, and explore the fuzzy relations between communities interactively. The case studies with two real data sets demonstrate that our approach can provide a visual summary of major communities in a large network, and help people better understand the characteristics of each community and inspect various relational patterns between communities.