Calculating Percentages with dplyr and geom_text in R
=====================================================================
This article will explore how to calculate percentages using the popular data manipulation library dplyr and visualization library ggplot2. We’ll use a sample dataset to demonstrate the process of grouping, calculating proportions, and displaying results as percentages.
Introduction
The following example uses the popular R libraries dplyr and ggplot2. The data is represented in a simple table format with two variables: Language and Agegrp. We’ll walk through a step-by-step process of how to calculate percentages within each group using dplyr, followed by visualization using ggplot2.
Prerequisites
Before proceeding, ensure you have the necessary R packages installed:
install.packages("dplyr")
install.packages("ggplot2")
This code will install and load the required packages in your R environment.
Sample Data
We’ll use a sample dataset for this example. The data frame df contains two variables: Language and Agegrp. The Language variable has two levels, “GER” and “ENG”, while the Agegrp variable represents different age groups.
library(dplyr)
library(ggplot2)
df <- data.frame(Language = factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), levels = 1:2, labels = c("GER", "ENG")),
Agegrp = factor(c(1, 2, 3, 1, 2, 4, 1, 2, 3, 2, 3, 3, 3, 3, 1, 1, 2, 1, 1, 4), levels = c( 1, 2, 3, 4), labels = c("10-19", "20-29", "30-39", "40+"))
)
Calculating Percentages
To calculate percentages within each group, we can use the dplyr library. Here’s a step-by-step example of how to do it:
# Group by Language and calculate proportions (percentages)
df %>%
count(Agegrp, Language) %>% # Count occurrences for each Agegrp and Language combination
group_by(Language) %>% # Group the results by Language
mutate(n = prop.table(n)) %>% # Calculate proportion (percentage) of each Agegrp within Language
ungroup %>% # Remove grouping
Visualizing Results
We can use ggplot2 to visualize our results as a bar chart. The following code snippet shows how to do it:
# Create a new ggplot object with the calculated proportions
ggplot(df, aes(x = Agegrp, y = n, fill = Language)) +
geom_col(position = 'dodge') +
scale_y_continuous(labels = scales::percent) +
labs(title = "Age-structure between German and English",
y = "Percentage of persons")
This will display a bar chart showing the percentage distribution for each Agegrp within each Language. The x-axis represents different age groups, while the y-axis displays percentages.
Conclusion
In this article, we’ve demonstrated how to calculate percentages using dplyr and visualize results with ggplot2. By grouping data by a variable and calculating proportions, you can easily extract meaningful insights from your dataset. With practice, you’ll be able to apply these techniques to various data manipulation and visualization tasks.
Additional Tips
- Always make sure to check the accuracy of your calculated percentages.
- Adjust the code as needed for different datasets or requirements.
- Practice using
dplyrandggplot2with other sample datasets to improve your skills.
Last modified on 2024-01-09