Advertisement

Data Visualization: Data visualization is the process o

阅读量:

作者:禅与计算机程序设计艺术

1.简介

Data visualization involves representing complex information through graphical means that are accessible to both humans and machines. It assists businesses and researchers in gaining valuable insights from their extensive datasets by analyzing statistical trends, identifying patterns, detecting correlations, and highlighting outliers. This article will delve into four well-known open-source data visualization tools: Tableau, D3.js, Matplotlib, and ggplot2. It will also provide examples demonstrating effective usage for creating engaging and informative visualizations. Additionally, we will explore essential concepts such as color theory, scales, design principles, as well as examine how these tools can enhance decision-making processes to achieve better outcomes while boosting brand awareness.

2. Concepts and Terminology

To begin with, prior to examining every one of the data visualization tools, it is essential for us to acquire an understanding of some fundamental terminology and concepts related to data visualization.

2.1 视觉变量的类型 三种主要的变量类型涉及在数据可视化中

Quantitative: 这些变量用于衡量诸如身高、体重、销售数据等具有数值特征的事物。它们通常通过轴线来表示,并与其他变量进行对比绘图。包括柱状图、线图、散点图、直方图、箱线图以及热力图等。

These variables denote groups or characteristics instead of numerical values. They are typically depicted through various colors, shapes, lines, textual elements, symbols, and images. Examples include pie charts such as treemaps and word clouds.

Ordinal: These variables are composed of ordinal categorical ratings, akin to those employed in survey instruments. The individual ratings do not possess an inherent ordering but establish relative rankings among items within each category. Examples encompass star rating systems, ranking scales, and qualitative rating schemes.

All three variable types are capable of being integrated through diverse methods to present complex datasets in visually appealing and meaningful ways. Illustrations such as stacked bar charts are capable of revealing the contribution of different categories across different dimensions, while grouped scatter plots are designed to highlight specific subsets based on predefined thresholds established by data analysts.

Color serves a crucial function in data visualization, as it permits effective communication of messages to others. However, selecting an effective palette of colors poses a challenge. These guidelines aim to help users select color schemes that enhance clarity and visual appeal.

尽量少使用过多的颜色,在图表中不超过五种不同的颜色可以有效减少视觉负担并提高可读性。这种方法有助于您的图表突出显示关键信息而不至于变得过于复杂或杂乱无章。因此您可以考虑采用一组鲜明对比度高的颜色组合以增强视觉效果

Select Colors Based on Context. Ensure that your color choices accurately represent the data's context. If your aim is to highlight trends over time, select colors easily interpretable over time (e.g., blue and green). If you want to emphasize differences among groups, opt for contrasting hues such as red and green.

To confirm that the selected color scheme is both suitable and effective, employ analytical methods such as using Analytical Tools to assess color schemes. Additionally, test various color combinations against one another and compare these selections to your intended audience. Furthermore, utilize color vision testing tools to evaluate whether the chosen colors function effectively for individuals with differing levels of eye sensitivity.

Should aim to minimize excessive saturation. The saturation level reflects the color's purity. Instead, opt for pastel shades or subtle tints of your base hues. This can cause your data points to blend indistinguishably into neutral tones.

2.3 尺度 尺度被用来将定量数据映射到可视化画布的一个维度上。不同的尺度会产生不同的输出效果,例如线性尺度通过直线表示数量的变化,对数尺度展示指数增长速率,而分类尺度则按照字母顺序或频率对元素进行排序。

Most commonly, data visualization tools offer a default scale that proves adequate for general use. If you wish to tailor your visualization’s scale based on your dataset’s nature and the type of visualization required, consider customizing it accordingly. When presenting historical stock prices, it might be advantageous to utilize a logarithmic scale so that significant increases remain visible even at lower magnitudes.

2.4 设计原则设计指导方针通常是制作清晰且具吸引力可视化图表的实用建议。这些基本准则主要包括平衡、清晰度、细节关注、层级结构、对比度以及统一性等要素。通过遵循这些指导方针并在各个可视化图表中保持一致应用它们可以显著提升清晰度并减少不确定性

Some guidelines for designing data visualizations include:

Adopt a minimalist approach by utilizing straightforward visuals to generate distraction-free zones. Avoid intricate or elaborate designs unless absolutely required. Prioritize simplicity, efficiency, and ease of use.

Use consistent appearance and interpretation to ensure uniformity across all visualizations. Adhere to a standardized set of fonts, layouts, and typographical conventions within the company.

Give Valuable Insights and Meaningful Feedback Inform users about the actions they can take or should anticipate seeing in response to your visualizations. Provide interactive features like tooltips, small tips, and zooming capabilities to enable swift exploration of your data.

Take into account accessibility when designing solutions. Pay attention to your target audience; prepare for accessibility requirements. Ensure that your visualizations comply with WCAG guidelines, which provide standards for making web content accessible to people with disabilities.

3. Implementation

Proceed to the implementation steps for each data visualization tool that has been discussed.

3.1 Tableau

Tableau serves as a specialized business intelligence platform tailored for data visualization. Endowed with an intuitive interface and equipped with features such as drag-and-drop functionality and robust analytics capabilities, it stands out as one of the most popular tools in the domain of data visualization. To begin with, let's install Tableau Desktop software.

Once installed, go ahead and create a new workbook by clicking "New Workbook" on the left sidebar. Select a blank template and name it whatever suits your preference. Click on "Connect to Data" and select your preferred data source. Once connected, click on "Sheet 1" at the bottom left corner to add a new sheet. Drag-and-drop your desired visualizations from the Visualization pane on the right side to the Sheet area. Adjust the properties of the visualizations as needed and customize the formatting options under "Format" in the toolbar. Save the file and share it with colleagues or stakeholders for review and feedback.

Below is a summary of the distinct functionalities and operations commonly offered by Tableau:

Creating New Workbooks and Sharing Dashboards

you are able to create new workbooks and save them both in local and remote settings for sharing. A straightforward method involves navigating to the Home tab where you can initiate a new workbook via the 'New' button. You must choose a server option in order to share your dashboard with others.

Connecting to Data Sources

Tableau 支持多种数据源,并非局限于单一类型的数据存储格式。建议您点击顶部菜单中的“连接到数据”按钮,并从下拉菜单中选择对应的数据源文件或数据库表单。根据所选数据源的不同属性和类型,在某些情况下您可能会被要求提供额外的凭证信息或连接配置参数。

Manipulating and Transforming Data

Tableau features robust built-in transformations like filtering、grouping、sorting、and calculations. Set up calculated fields、filters、and measures to analyze your data based on your requirements. Moreover,you can establish direct connections with external data sources through plugins or import CSV/Excel files.

Customizing Visualizations

Each individual visualization comprises multiple layers that can be personally configured. By clicking on a layer and selecting the Edit option from the context menu, you can modify its properties. Accessed via the Format tab, this feature allows users to adjust font styles, colors, sizes, and other display preferences.

Publishing Your Workbook Online

Distribute your completed workbook to colleagues in your organization or the public by uploading it to the Tableau Server or Tableau Online service. Configure permissions, access controls, and scheduling options according to one's needs.

Overall, Tableau offers a highly versatile environment that allows users to create interactive and dynamic visualizations from a variety of datasets. With its intuitive user interface, users can easily manage and transform raw data, enabling them to generate stunning and insightful visualizations without requiring programming knowledge.

3.2 D3.js

D3.js is a JavaScript library that provides an efficient way to create data visualizations using HTML, SVG, and CSS. Leveraging its diverse range of pre-built chart types and interactive features, D3.js has become a widely-used tool for constructing modern data visualizations. As developers, let's set up D3.js in our projects and begin exploring its powerful capabilities!

First, I will install the essential dependencies and modules on your web server. I will copy the provided code snippet into the head section of your HTML document.

复制代码
    <!DOCTYPE html>
    <html lang="en">
      <head>
    <meta charset="UTF-8" />
    <title>My First D3.js Chart</title>
    
    <!-- Load D3.js -->
    <script src="https://d3js.org/d3.v6.min.js"></script>
      </head>
    
      <body>
       ...
      </body>
    </html>
    
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

请在body标签内部创建我们的第一个SVG容器元素。在脚本标签下方添加以下代码:

复制代码
    <svg width="960" height="500"></svg>
    
    
    代码解读

This generates a distinct SVG element with predefined dimensions of 960 x 500 pixels. We will establish our dataset and outline the selection parameters for the chart. In your JavaScript code, substitute the placeholder with the following implementation:

复制代码
    const dataset = [
       { name: "Alice", age: 28 },
       { name: "Bob", age: 35 },
       { name: "Charlie", age: 40 },
       { name: "David", age: 30 },
       { name: "Eve", age: 27 }
    ];
    
    // Define the margins for the chart
    const margin = { top: 50, right: 50, bottom: 50, left: 50 };
    
    // Calculate the total width and height of the chart
    const svgWidth = +document.querySelector("svg").getAttribute("width") - margin.left - margin.right;
    const svgHeight = +document.querySelector("svg").getAttribute("height") - margin.top - margin.bottom;
    
    // Create a wrapper for the chart area
    const chartGroup = d3.select("svg")
                 .append("g")
                 .attr("transform", `translate(${margin.left}, ${margin.top})`);
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

Constructing our sample dataset, which includes two characteristics: name and age. After calculating the outer dimensions of the SVG container, we establish a dedicated group for chart contents. Once all initial selections have been made, they are appended to this grouping element. It is important to note that from the SVG container's total width, after subtracting left and right margins, padding from parent elements is accounted for.

随后,请我们构造我们的图表元素基于先前定义的数据和筛选标准。请在上文所述的位置之后包含一个新代码片段。

随后,请我们构造我们的图表元素基于先前定义的数据和筛选标准。请在上文所述的位置之后包含一个新代码片段。

复制代码
    // Add title to the chart
    chartGroup.append("text")
         .attr("x", margin.left + svgWidth / 2)
         .attr("y", margin.top / 2)
         .attr("text-anchor", "middle")
         .style("font-size", "32px")
         .text("Age vs Name");
    
    // Append circles for each person
    const circles = chartGroup.selectAll("circle")
                          .data(dataset)
                          .enter()
                          .append("circle")
                          .attr("cx", (d) => svgWidth * (d.age / maxAge)) // Map age value to circle position
                          .attr("cy", (_, i) => svgHeight / dataset.length * (i+1)) // Place circles vertically
                          .attr("r", 20); // Set radius of each circle
    
    circles.on("mouseover", function(event, datum){
       const mouseX = event.clientX - margin.left;
       const mouseY = event.clientY - margin.top;
    
       // Show a tooltip with the person's name and age
       tooltip.classed('visible', true)
        .style('top', `${mouseY}px`)
        .style('left', `${mouseX}px`)
        .text(`Name: ${datum.name}\nAge: ${datum.age}`);
    });
    
    circles.on("mouseout", () => tooltip.classed('visible', false));
    
    // Calculate maximum age value for scaling the X axis
    const maxAge = d3.max(dataset, (d) => d.age);
    
    // Draw vertical gridlines for age range
    for (let i = 0; i <= Math.ceil(maxAge); i += 10) {
       chartGroup.append("line")
            .attr("x1", margin.left + svgWidth * (i / maxAge))
            .attr("y1", margin.top)
            .attr("x2", margin.left + svgWidth * (i / maxAge))
            .attr("y2", margin.top + svgHeight)
            .style("stroke", "#ccc");
    
       chartGroup.append("text")
            .attr("x", margin.left + svgWidth * (i / maxAge) + 10)
            .attr("y", margin.top + svgHeight + 30)
            .attr("dy", "-.2em")
            .style("text-anchor", "start")
            .text(`${i}-${i+10}`);
    }
    
    // Define the tooltip div class
    const tooltip = d3.select("body")
                .append("div")
                .attr("class", "tooltip")
                .style("position", "absolute")
                .style("z-index", "10")
                .style("opacity", 0);
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

在此区块中, 我们使用文本元素为图表生成一个标题. 随后创建一组圆形元素来代表我们的数据集中的每个人. 通过调用enter()方法将数据数组绑定到这些元素上, 从而使得我们可以根据每条记录动态地添加新的元素. 每个圆圈都配置了一个cx属性来映射其年龄值至SVG容器内的水平位置; cy属性则根据其在数据集中的索引确定垂直位置; 并且每个圆圈都有固定的半径20像素. 此外我们还对每个圆形元素绑定鼠过和鼠出事件, 在此期间会显示一个提示框显示人物姓名及年龄; 当悬停时会显示名称和年龄信息; 当从圆形元素脱出后也会相应地更新提示框内容.

Once we establish our data and chart components, we proceed to construct the rest of the visualization. Specifically, we first add vertical gridlines spanning the age spectrum. Additionally, we incorporate a tooltip div element. Furthermore, we calculate the maximum age to establish proper scaling for the X-axis. Finally, we refine all elements to produce a polished and aesthetically pleasing chart.

Upon launching the webpage users will observe a straightforward bar chart illustrating age versus name relationships.After hovering over each circle in this chart will appear a tooltip displaying their respective names and ages.To enhance this visualization further additional features like various visual elements including labels markers gradients animation and transitions can be incorporated.

Overall, D3.js offers a foundational API that enables the creation of highly customizable data visualizations through the integration of HTML, SVG, and CSS technologies. Its extensive collection of pre-designed chart types coupled with interactive features positions it as an optimal solution for prototyping and expediting iteration cycles. Consequently, D3.js stands out as the top tool for crafting dynamic and captivating data visualizations across the web in contemporary times.

3.3 Matplotlib

Matplotlib serves as a powerful Python library designed to create 2D graphics effortlessly. It offers the ability to produce a variety of visual representations such as line graphs, bar diagrams, scatter plots, and histograms. In this tutorial, we will illustrate the process of generating various common data visualization techniques using Matplotlib.

Initially, we will install Matplotlib by using pip. Open a terminal and execute the command:

复制代码
    pip install matplotlib
    
    
    代码解读

After installing, let's access the pyplot module and generate data for visualization. Execute the subsequent code block to produce a scatter plot.

复制代码
    import matplotlib.pyplot as plt
    
    x_values = [1, 2, 3, 4, 5]
    y_values = [1, 4, 9, 16, 25]
    
    plt.scatter(x_values, y_values)
    
    plt.show()
    
      
      
      
      
      
      
      
    
    代码解读

该代码会生成基于x_values和y_values数组中点绘制的散点图。通过调用plt.show()函数可以在屏幕上显示生成的图形。尝试修改scatter()函数的参数设置以探索不同样式的图表布局。或者可以选择使用plt.savefig()函数将图表保存为文件格式。

To create a histogram, try running the following code:

复制代码
    import numpy as np
    
    np.random.seed(0) # Seed random number generator for reproducibility
    population_ages = np.random.randint(low=20, high=60, size=500)
    
    plt.hist(population_ages, bins=20, edgecolor='black')
    
    plt.xlabel('Age')
    plt.ylabel('Frequency')
    plt.title('Population Ages Histogram')
    
    plt.show()
    
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

This produces a histogram of randomly generated population ages, employing 20 equally spaced bins with black-edged bars. It is necessary to import the NumPy library to seed the RNG and generate the population ages. This setup allows users to adjust parameters such as bin count or other arguments passed to hist().

In conclusion, let's produce a line graph using a synthetic dataset. The following code should be executed.

复制代码
    time = ['Day 1', 'Day 2', 'Day 3', 'Day 4']
    cases = [1, 2, 3, 4]
    
    plt.plot(time, cases)
    
    plt.xticks(['Day 1', 'Day 2', 'Day 3', 'Day 4'])
    plt.yticks([1, 2, 3, 4])
    
    plt.xlabel('Time')
    plt.ylabel('Number of Cases')
    plt.title('Coronavirus Case Count')
    
    plt.show()
    
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

This generates a line plot illustrating COVID-19 case counts through time, marked with labels and a title. Additionally, you are encouraged to adjust the input data or tweak the plot styling according to your preferences.

Overall, Matplotlib eases the process of generating basic data visualization types while minimizing coding burden. While its capabilities are somewhat limited in comparison to tools like D3.js or Tableau, Matplotlib remains a flexible solution for managing datasets of moderate to medium size and enabling the creation of concise and straightforward reports and visualizations.

全部评论 (0)

还没有任何评论哟~