Get inspired - Move beyond lines and bar graphs with this guide.
The best chart to be used for your data is dependent on the data type of the variables involved in your visualisation. There are 3 main data types most people use:
You may read this article here for a summary on these data types.
For the purpose of this guide, it might be useful to think of ordinal data as categorical data. Time series will also not be discussed in this guide.
A combination of these two data types in a visualisation will determine the best type of charts. In this guide, I will focus on the best charts if you had to create a visualisation for large datasets with:
A histogram shows the distribution of a numeric variable. It only requires 1 numeric variable as the input only. The x-axis is split into several bins of equal parts (e.g. 1 - 10, 11 - 20, 21 - 30 ..) and the y-axis shows the number of observations for each bin.
A histogram is used to show the distribution of the dataset. Here are some of the most commonly seen distributions. Each distribution produces an insight for that variable. For example, an edge peak distribution could either mean an outlier you should take note of or that your dataset was incorrectly processed.
If you would like to have more granular insights on the distribution, another variation of the histogram is to layer a categorical variable to the histogram with another color. This allows you to understand how the distribution of values is like with another variable. Alternatively, you may also use the box plot as shown in the section below.
A scatterplot is made to study the relationship between 2 numeric variables. It is often used to analyse linear relationships, and hence accompanied by a correlation coefficient.
If you are using discrete numerical data, you might want to consider using a ridge plot instead.
Adding marginal distributions to your scatterplot allows you to also understand the distribution of your variables in the x and y axis.
Scatterplots are most useful to your end-users if they are interactive. Ideally, end-users should be able to hover their mouse to a single data point and find out more about it.
A violin plot allows to analyse the distribution of a numeric variable for several categories or ordinal varibles. The shape represents the density estimate of the variable. The higher the count for a specific data point, the larger the violin.
It is really close to a, but allows a deeper understanding of the distribution.
While the violin plot gives you a very granular analysis of the distribution, including a boxplot within a violin plot gives you the added benefit of viewing median and quartiles easily.
A heatmap shows magnitude of a phenomenon as color in two dimensions. The variation in color might be in hue or intensity. This gives your end-user a clear visualisation on how the value varies over different category
Similar to the scatterplot above, a grouped scatterplot uses the color of each dot for the categorical variable, allowing your end-user to see how that category is distributed over the 2 numerical variables.