Labeling all or some of your facts with text can assistance notify a tale — even when your graph is applying other cues like colour and sizing. ggplot has a few of developed-in methods of accomplishing this, and the ggrepel package provides some extra operation to these solutions.
For this demo, I’ll begin with a scatter plot hunting at percentage of grownups with at the very least a four-calendar year university degree vs. identified Covid-19 circumstances per capita in Massachusetts counties. (The principle: A university education and learning could possibly mean you are extra likely to have a task that allows you get the job done securely from property. Of study course there are lots of exceptions, and several other components have an affect on an infection premiums.)
If you want to comply with together, you can get the code to re-develop my sample facts on website page two of this post.
Creating a scatter plot with ggplot
To begin, the code beneath loads various libraries and sets scipen = 999
so I do not get scientific notation in my graphs:
library(ggplot2)
library(ggrepel)
library(dplyr)
solutions(scipen = 999)
Here is the facts composition for the ma_facts
facts frame:
head(ma_facts) Place AdultPop Bachelors PctBachelors CovidPer100K Positivity Location 1 Barnstable 165336 70795 .4281887 7. .0188 Southeast two Berkshire 92946 31034 .3338928 9. .0095 West three Bristol 390230 109080 .2795275 30.eight .0457 Southeast 4 Dukes and Nantucket 20756 9769 .4706591 25.three .0294 Southeast 5 Essex 538981 212106 .3935315 29.5 .0406 Northeast six Franklin 53210 19786 .3718474 4.7 .0052 West
The upcoming team of code makes a ggplot scatter plot with that facts, which includes sizing points by whole county inhabitants and coloring them by region. geom_clean()
provides a linear regression line, and I also tweak a few of ggplot layout defaults. The graph is saved in a variable named ma_graph
.
ma_graph <- ggplot(ma_data, aes(x = PctBachelors, y = CovidPer100K,
sizing = AdultPop, colour = Location)) +
geom_point() +
scale_x_constant(labels = scales::per cent) +
geom_clean(approach='lm', se = Phony, colour = "#0072B2", linetype = "dotted") +
concept_small() +
guides(sizing = Phony)
That makes a simple scatter plot:
Having said that, it is presently impossible to know which points characterize what counties. ggplot’s geom_text()
functionality provides labels to all the points:
ma_graph +
geom_text(aes(label = Place))
geom_text()
works by using the exact colour and sizing aesthetics as the graph by default. But sizing the text dependent on point sizing helps make the small points’ labels difficult to read. I can prevent that behavior by location sizing = NULL
.
It can also be a little bit challenging to read labels when they are appropriate on major of the points. geom_text()
allows you “nudge” them a little bit bigger with the nudge_y
argument.
There’s yet another developed-in ggplot labeling functionality named geom_label()
, which is very similar to geom_text()
but provides a box close to the text. The pursuing code applying geom_label()
makes the graph demonstrated beneath.
ma_graph +
geom_label(aes(label = Place, sizing = NULL), nudge_y = .7)
These capabilities get the job done effectively when points are spaced out. But if facts points are closer alongside one another, labels can end up on major of each and every other — primarily in a scaled-down graph. I added a faux facts point close to Middlesex County in the Massachusetts facts. If I re-run the code with the new facts, Fake blocks component of the Middlesex label.
ma_graph2 <- ggplot(ma_data_fake, aes(x = PctBachelors, y = CovidPer100K, size = AdultPop, color = Region)) +
geom_point() +
scale_x_constant(labels = scales::per cent) +
geom_clean(approach='lm', se = Phony, colour = "#0072B2", linetype = "dotted") +
concept_small() +
guides(sizing = Phony)
ma_graph2
ma_graph2 +
geom_label(aes(label = Place, sizing = NULL, colour = NULL), nudge_y = .seventy five)
Enter ggrepel.
Creating non-overlapping labels with ggrepel
The ggrepel package has its have versions of ggplot’s text and label geom capabilities: geom_text_repel()
and geom_label_repel()
. Making use of these functions’ defaults will instantly move 1 of the labels beneath its point so it doesn’t overlap with the other 1.
As with ggplot’s geom_text()
and geom_label()
, the ggrepel capabilities allow for you to set colour to NULL
and sizing to NULL
. You can also use the same nudge_y
arguments to develop extra room concerning the labels and the points.
ma_graph2 +
geom_label_repel(facts = subset(ma_facts_faux, Location == "MetroBoston"),
aes(label = Place, sizing = NULL, colour = NULL), nudge_y = .seventy five)
The graph over has the Middlesex label over the point and the Fake label beneath, so there is no chance of overlap.
Focusing consideration on subsets of facts with ggrepel
At times you may perhaps want to label only a couple points of distinctive interest and not all of your facts. You can do so by specifying a subset of facts in the facts
argument of geom_label_repel()
:
ma_graph2 + geom_label_repel(facts = subset(ma_facts_faux, Location == "MetroBoston"),
aes(label = Place, sizing = NULL, colour = NULL),
nudge_y = two,
segment.sizing = .two,
segment.colour = "grey50",
direction = "x"
)
Customizing labels and traces with ggrepel
There is extra customization you can do with ggrepel. For illustration, you can set the width and colour of labels’ pointer traces with segment.sizing
and segment.colour
.
You can even flip label traces into arrows with the arrow argument:
ma_graph2 + geom_label_repel(aes(label = Place, sizing = NULL),
arrow = arrow(size = unit(.03, "npc"),
form = "closed", ends = "previous"),
nudge_y = three,
segment.sizing = .three
)
And you can use ggrepel to label traces in a multi-series line graph as effectively as points in a scatter plot.
For this demo, I’ll use another facts frame, mydf
, which has some quarterly unemployment facts for four US states. The code for that facts frame is also on website page two. mydf
has 3 columns: Rate, Condition, and Quarter.
In the graph beneath, I uncover it a tiny difficult to see which line goes with what state, because I have to appear back again and forth concerning the traces and the legend.
graph2 <- ggplot(mydf, aes(x = Quarter, y = Rate, color = State, group = State)) +
geom_line() +
concept_small() +
scale_y_constant(develop = c(, ), restrictions = c(, NA))
graph2
In the upcoming code block, I’ll incorporate a label for each and every line in the series, and I’ll have geom_label_repel()
point to the 2nd-to-previous quarter and not the previous quarter. The code calculates what the 2nd-to-previous quarter is and then tells geom_label_repel()
to use filtered facts for only that quarter. The code uses the Condition column as the label, “nudges” the facts .seventy five horizontally, gets rid of all the other facts points, and will get rid of the graph’s default legend.
2nd_to_previous_quarter <- max(mydf$Quarter[mydf$Quarter != max(mydf$Quarter)])
graph2 +
geom_label_repel(facts = filter(mydf, Quarter == 2nd_to_previous_quarter),
aes(label = Condition),
nudge_x = .seventy five,
na.rm = Correct) +
concept(legend.posture = "none")
Why not label the previous quarter alternatively of the 2nd-to-previous 1? I attempted that initial, and the pointer traces finished up hunting like a continuation of the graph’s facts:
The major two traces really should not be starting up to development downward at the end!
If you want to uncover out extra about ggrepel, check out out the ggrepel vignette with
vignette("ggrepel", "ggrepel")