How to create ggplot labels in R

Maria J. Danford

Labeling all or some of your facts with text can assistance notify a tale — even when your graph is applying other cues like colour and sizing. ggplot has a few of developed-in methods of accomplishing this, and the ggrepel package provides some extra operation to these solutions. 

For this demo, I’ll begin with a scatter plot hunting at percentage of grownups with at the very least a four-calendar year university degree vs. identified Covid-19 circumstances per capita in Massachusetts counties. (The principle: A university education and learning could possibly mean you are extra likely to have a task that allows you get the job done securely from property. Of study course there are lots of exceptions, and several other components have an affect on an infection premiums.)

If you want to comply with together, you can get the code to re-develop my sample facts on website page two of this post.

Creating a scatter plot with ggplot

To begin, the code beneath loads various libraries and sets scipen = 999 so I do not get scientific notation in my graphs:

solutions(scipen = 999)

Here is the facts composition for the ma_facts facts frame:

                Place AdultPop Bachelors PctBachelors CovidPer100K Positivity    Location
1          Barnstable   165336     70795    .4281887          7.     .0188 Southeast
two           Berkshire    92946     31034    .3338928          9.     .0095      West
three             Bristol   390230    109080    .2795275         30.eight     .0457 Southeast
4 Dukes and Nantucket    20756      9769    .4706591         25.three     .0294 Southeast
5               Essex   538981    212106    .3935315         29.5     .0406 Northeast
six            Franklin    53210     19786    .3718474          4.7     .0052      West

The upcoming team of code makes a ggplot scatter plot with that facts, which includes sizing points by whole county inhabitants and coloring them by region. geom_clean() provides a linear regression line, and I also tweak a few of ggplot layout defaults. The graph is saved in a variable named ma_graph.

ma_graph <- ggplot(ma_data, aes(x = PctBachelors, y = CovidPer100K, 
sizing = AdultPop, colour = Location)) +
geom_point() +
scale_x_constant(labels = scales::per cent) +
geom_clean(approach='lm', se = Phony, colour = "#0072B2", linetype = "dotted") +
concept_small() +
guides(sizing = Phony)

That makes a simple scatter plot:

ggplot2 scatter plot with percent college education on x axis and Covid-19 infection rates on y axis Sharon Machlis, IDG

Primary scatter plot with ggplot2.

Having said that, it is presently impossible to know which points characterize what counties. ggplot’s geom_text() functionality provides labels to all the points:

ma_graph +
geom_text(aes(label = Place))
ggplot scatter polot with default text labels Sharon Machlis

ggplot scatter plot with default text labels.

geom_text() works by using the exact colour and sizing aesthetics as the graph by default. But sizing the text dependent on point sizing helps make the small points’ labels difficult to read. I can prevent that behavior by location sizing = NULL.

It can also be a little bit challenging to read labels when they are appropriate on major of the points. geom_text() allows you “nudge” them a little bit bigger with the nudge_y argument.

There’s yet another developed-in ggplot labeling functionality named geom_label(), which is very similar to geom_text() but provides a box close to the text. The pursuing code applying geom_label() makes the graph demonstrated beneath.

ma_graph +
geom_label(aes(label = Place, sizing = NULL), nudge_y = .7)
ggplot scatter plot with geom_label() Sharon Machlis, IDG

ggplot scatter plot with geom_label().

These capabilities get the job done effectively when points are spaced out. But if facts points are closer alongside one another, labels can end up on major of each and every other — primarily in a scaled-down graph. I added a faux facts point close to Middlesex County in the Massachusetts facts. If I re-run the code with the new facts, Fake blocks component of the Middlesex label.

ma_graph2 <- ggplot(ma_data_fake, aes(x = PctBachelors, y = CovidPer100K, size = AdultPop, color = Region)) +
geom_point() +
scale_x_constant(labels = scales::per cent) +
geom_clean(approach='lm', se = Phony, colour = "#0072B2", linetype = "dotted") +
concept_small() +
guides(sizing = Phony)
ma_graph2 +
geom_label(aes(label = Place, sizing = NULL, colour = NULL), nudge_y = .seventy five)
ggplot2 scatter plot with labels on top of each other Sharon Machlis, IDG

ggplot2 scatter plot with default geom_label() labels on major of each and every other

Enter ggrepel.

Creating non-overlapping labels with ggrepel

The ggrepel package has its have versions of ggplot’s text and label geom capabilities: geom_text_repel() and geom_label_repel(). Making use of these functions’ defaults will instantly move 1 of the labels beneath its point so it doesn’t overlap with the other 1.

As with ggplot’s geom_text() and geom_label(), the ggrepel capabilities allow for you to set colour to NULL and sizing to NULL. You can also use the same  nudge_y arguments to develop extra room concerning the labels and the points.

ma_graph2 + 
geom_label_repel(facts = subset(ma_facts_faux, Location == "MetroBoston"),
aes(label = Place, sizing = NULL, colour = NULL), nudge_y = .seventy five)
Scatter plot with labels not overlapping for close points Sharon Machlis, IDG

Scatter plot with geom_label_repel().

The graph over has the Middlesex label over the point and the Fake label beneath, so there is no chance of overlap.

Focusing consideration on subsets of facts with ggrepel

At times you may perhaps want to label only a couple points of distinctive interest and not all of your facts. You can do so by specifying a subset of facts in the facts argument of geom_label_repel():

ma_graph2 + geom_label_repel(facts = subset(ma_facts_faux, Location == "MetroBoston"), 
aes(label = Place, sizing = NULL, colour = NULL),
nudge_y = two,
segment.sizing = .two,
segment.colour = "grey50",
direction = "x"
Scatter plot with only some points labelled Sharon Machlis, IDG

Scatter plot with only some points labeled. 

Customizing labels and traces with ggrepel

There is extra customization you can do with ggrepel. For illustration, you can set the width and colour of labels’ pointer traces with segment.sizing and segment.colour

You can even flip label traces into arrows with the arrow argument:

ma_graph2 + geom_label_repel(aes(label = Place, sizing = NULL),
arrow = arrow(size = unit(.03, "npc"),
form = "closed", ends = "previous"),
nudge_y = three,
segment.sizing = .three
Scatter plot with ggrepel labels and arrows. Sharon Machlis, IDG

Scatter plot with ggrepel labels and arrows.

And you can use ggrepel to label traces in a multi-series line graph as effectively as points in a scatter plot.

For this demo, I’ll use another facts frame, mydf, which has some quarterly unemployment facts for four US states. The code for that facts frame is also on website page two. mydf has 3 columns: Rate, Condition, and Quarter.

In the graph beneath, I uncover it a tiny difficult to see which line goes with what state, because I have to appear back again and forth concerning the traces and the legend.

graph2 <- ggplot(mydf, aes(x = Quarter, y = Rate, color = State, group = State)) +
geom_line() +
concept_small() +
scale_y_constant(develop = c(, ), restrictions = c(, NA))
line graph with 4 lines and a legend to the right Sharon Machlis, IDG

ggplot line graph.

In the upcoming code block, I’ll incorporate a label for each and every line in the series, and I’ll have geom_label_repel() point to the 2nd-to-previous quarter and not the previous quarter. The code calculates what the 2nd-to-previous quarter is and then tells geom_label_repel() to use filtered facts for only that quarter. The code uses the Condition column as the label, “nudges” the facts .seventy five horizontally, gets rid of all the other facts points, and will get rid of the graph’s default legend.

2nd_to_previous_quarter <- max(mydf$Quarter[mydf$Quarter != max(mydf$Quarter)])
graph2 +
geom_label_repel(facts = filter(mydf, Quarter == 2nd_to_previous_quarter),
aes(label = Condition),
nudge_x = .seventy five,
na.rm = Correct) +
concept(legend.posture = "none")
Line graph with label for each line Sharon Machlis, IDG

Line graph with ggrepel labels.

Why not label the previous quarter alternatively of the 2nd-to-previous 1? I attempted that initial, and the pointer traces finished up hunting like a continuation of the graph’s facts:

Line graph with confusing label pointing lines at the end of each line Sharon Machlis, IDG

Line graph with puzzling label pointing traces.

The major two traces really should not be starting up to development downward at the end!

If you want to uncover out extra about ggrepel, check out out the ggrepel vignette with

vignette("ggrepel", "ggrepel")
Next Post

Steeltoe: Simplify building .NET cloud microservices

The .Net Foundation is the house for a lot more than .Net. It is the open supply hub for languages and frameworks to assistance you make on leading of the a variety of .Net runtimes and compilers, with contributions from providers and men and women all-around the planet. A person […]

Subscribe US Now