library(igraph)
library(ggraph)
library(graphlayouts)
library(networkdata)
9 Advanced Layouts
library(igraph)
library(ggraph)
library(graphlayouts)
library(ggforce)
library(networkdata)
data("got")
<- got[[1]]
gotS1
<- c(
got_palette "#1A5878", "#C44237", "#AD8941", "#E99093",
"#50594B", "#8968CD", "#9ACD32"
)
## compute a clustering for node colors
V(gotS1)$clu <- as.character(membership(cluster_louvain(gotS1)))
## compute degree as node size
V(gotS1)$size <- degree(gotS1)
While “stress” is the key layout algorithm in graphlayouts
, there are other, more specialized layouts that can be used for different purposes. In this part, we work through some examples with concentric layouts and learn how to disentangle extreme “hairball” networks.
9.1 Large Networks
The stress layout also works well with medium to large graphs.
The network shows the biggest componentn of the co-authorship network of R package developers on CRAN (~12k nodes)
If you want to go beyond ~20k nodes, then you may want to switch to layout_with_pmds()
or layout_with_sparse_stress()
which are optimized to work with large graphs.
These are capable to deal with networks with several 100,000 nodes.
9.2 Concentric Layouts
Circular layouts are generally not advisable. Concentric circles, on the other hand, help to emphasize the position of certain nodes in the network. The graphlayouts
package has two function to create concentric layouts, layout_with_focus()
and layout_with_centrality()
.
The first one allows to focus the network on a specific node and arrange all other nodes in concentric circles (depending on the geodesic distance) around it. Below we focus on the character Ned Stark.
ggraph(gotS1, layout = "focus", focus = 1) +
geom_edge_link0(aes(edge_linewidth = weight), edge_colour = "grey66") +
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(filter = (name == "Ned"), size = size, label = name),
family = "serif"
+
) scale_edge_width_continuous(range = c(0.2, 1.2)) +
scale_size_continuous(range = c(1, 5)) +
scale_fill_manual(values = got_palette) +
coord_fixed() +
theme_graph() +
theme(legend.position = "none")
The parameter focus
in the first line is used to choose the node id of the focal node. The function coord_fixed()
is used to always keep the aspect ratio at one (i.e. the circles are always displayed as a circle and not an ellipse).
The function draw_circle()
can be used to add the circles explicitly.
ggraph(gotS1, layout = "focus", focus = 1) +
draw_circle(col = "#00BFFF", use = "focus", max.circle = 3) +
geom_edge_link0(aes(width = weight), edge_colour = "grey66") +
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(filter = (name == "Ned"), size = size, label = name),
family = "serif"
+
) scale_edge_width_continuous(range = c(0.2, 1.2)) +
scale_size_continuous(range = c(1, 5)) +
scale_fill_manual(values = got_palette) +
coord_fixed() +
theme_graph() +
theme(legend.position = "none")
layout_with_centrality()
works in a similar way. You can specify any centrality index (or any numeric vector for that matter), and create a concentric layout where the most central nodes are put in the center and the most peripheral nodes in the biggest circle. The numeric attribute used for the layout is specified with the cent
parameter. Here, we use the weighted degree of the characters.
ggraph(gotS1, layout = "centrality", cent = graph.strength(gotS1)) +
geom_edge_link0(aes(edge_linewidth = weight), edge_colour = "grey66") +
geom_node_point(aes(fill = clu, size = size), shape = 21) +
geom_node_text(aes(size = size, label = name), family = "serif") +
scale_edge_width_continuous(range = c(0.2, 0.9)) +
scale_size_continuous(range = c(1, 8)) +
scale_fill_manual(values = got_palette) +
coord_fixed() +
theme_graph() +
theme(legend.position = "none")
(Concentric layouts are not only helpful to focus on specific nodes, but also make for a good tool to visualize ego networks.)
9.3 Backbone Layout
layout_as_backbone()
is a layout algorithm that can help emphasize hidden group structures. To illustrate the performance of the algorithm, we create an artificial network with a subtle group structure using sample_islands()
from igraph
.
<- sample_islands(9, 40, 0.4, 15)
g <- simplify(g)
g V(g)$grp <- as.character(rep(1:9, each = 40))
The network consists of 9 groups with 40 vertices each. The density within each group is 0.4 and there are 15 edges running between each pair of groups. Let us try to visualize the network with what we have learned so far.
ggraph(g, layout = "stress") +
geom_edge_link0(edge_colour = "black", edge_linewidth = 0.1, edge_alpha = 0.5) +
geom_node_point(aes(fill = grp), shape = 21) +
scale_fill_brewer(palette = "Set1") +
theme_graph() +
theme(legend.position = "none")
As you can see, the graph seems to be a proper “hairball” without any special structural features standing out. In this case, though, we know that there should be 9 groups of vertices that are internally more densely connected than externally. To uncover this group structure, we turn to the “backbone layout”.
<- layout_as_backbone(g, keep = 0.4)
bb E(g)$col <- FALSE
E(g)$col[bb$backbone] <- TRUE
The idea of the algorithm is as follows. For each edge, an embeddedness score is calculated which serves as an edge weight attribute. These weights are then ordered and only the edges with the highest score are kept. The number of edges to keep is controlled with the keep
parameter. In our example, we keep the top 40%. The parameter usually requires some experimenting to find out what works best. Since this may result in an unconnected network, we add all edges of the union of all maximum spanning trees. The resulting network is the “backbone” of the original network and the “stress” layout algorithm is applied to this network. Once the layout is calculated, all edges are added back to the network.
The output of the function are the x and y coordinates for nodes and a vector that gives the ids of the edges in the backbone network. In the code above, we use this vector to create a binary edge attribute that indicates if an edge is part of the backbone or not.
ggraph(g, layout = "backbone", keep = 0.4) +
geom_edge_link0(aes(edge_colour = backbone), edge_linewidth = 0.1) +
geom_node_point(aes(fill = grp), shape = 21) +
scale_fill_brewer(palette = "Set1") +
scale_edge_color_manual(values = c(rgb(0, 0, 0, 0.3), rgb(0, 0, 0, 1))) +
theme_graph() +
theme(legend.position = "none")
The groups are now clearly visible! Of course the network used in the example is specifically tailored to illustrate the power of the algorithm. Using the backbone layout in real world networks may not always result in such a clear division of groups. It should thus not be seen as a universal remedy for drawing hairball networks. Keep in mind: It can only emphasize a hidden group structure if it exists.
The plot below shows an empirical example where the algorithm was able to uncover a hidden group structure. The network shows facebook friendships of a university in the US. Node colour corresponds to dormitory of students. Left is the ordinary stress layout and right the backbone layout.
9.4 Longitudinal Networks
Longitudinal network data usually comes in the form of panel data, gathered at different points in time. We thus have a series of snapshots that need to be visualized in a way that individual nodes are easy to trace without the layout becoming to awkward.
For this part of the tutorial, you will need two additional packages.
library(gganimate)
library(ggplot2)
library(patchwork)
We will be using the 50 actor excerpt from the Teenage Friends and Lifestyle Study from the RSiena data repository as an example. The data is part of the networkdata
package.
data("s50")
The dataset consists of three networks with 50 actors each and a vertex attribute for the smoking behavior of students. The function layout_as_dynamic()
from graphlayouts
can be used to visualize the three networks. The implemented algorithm calculates a reference layout which is a layout of the union of all networks and individual layouts based on stress minimization and combines those in a linear combination which is controlled by the alpha
parameter. For alpha=1
, only the reference layout is used and all graphs have the same layout. For alpha=0
, the stress layout of each individual graph is used. Values in-between interpolate between the two layouts.
<- layout_as_dynamic(s50, alpha = 0.2) xy
Now you could use ggraph
in conjunction with patchwork
to produce a static plot with all networks side-by-side.
<- vector("list", length(s50))
pList
for (i in 1:length(s50)) {
<- ggraph(s50[[i]], layout = "manual", x = xy[[i]][, 1], y = xy[[i]][, 2]) +
pList[[i]] geom_edge_link0(edge_linewidth = 0.6, edge_colour = "grey66") +
geom_node_point(shape = 21, aes(fill = as.factor(smoke)), size = 6) +
geom_node_text(label = 1:50, repel = FALSE, color = "white", size = 4) +
scale_fill_manual(
values = c("forestgreen", "grey25", "firebrick"),
guide = ifelse(i != 2, "none", "legend"),
name = "smoking",
labels = c("never", "occasionally", "regularly")
+
) theme_graph() +
theme(legend.position = "bottom") +
labs(title = paste0("Wave ", i))
}
wrap_plots(pList)
This is nice but of course we want to animate the changes. This is where we have to get inventive, because ggraph does not (yet) work with gganimate
out of the box. For the time being, the function below provides a hacky workaround to produce a data structure that can be passed to gganimate.
<- function(gList, alpha = 0.2, nodes = NULL) {
aninet_data # check for absent nodes and add them
if (is.null(nodes)) {
<- unique(unlist(sapply(gList, function(x) vertex_attr(x, "name"))))
all_nodes for (i in 1:length(gList)) {
<- which(!all_nodes %in% V(gList[[i]])$name)
idx if (length(idx) > 0) {
<- add_vertices(gList[[i]], length(idx), attr = list(name = all_nodes[idx]))
gList[[i]]
}
}else if (nodes >= 0) {
} <- unlist(sapply(gList, function(x) vertex_attr(x, "name")))
all_nodes <- names(which(table(all_nodes) >= nodes * length(gList)))
all_nodes
for (i in 1:length(gList)) {
<- which(!all_nodes %in% V(gList[[i]])$name)
idx if (length(idx) > 0) {
<- add_vertices(gList[[i]], length(idx), attr = list(name = all_nodes[idx]))
gList[[i]]
}
}
for (i in 1:length(gList)) {
<- which(!V(gList[[i]])$name %in% all_nodes)
idx if (length(idx) > 0) {
<- delete_vertices(gList[[i]], idx)
gList[[i]]
}
}
}
<- graphlayouts::layout_as_dynamic(gList, alpha = alpha)
xy <- lapply(1:length(gList), function(i) {
nodes_lst cbind(igraph::as_data_frame(gList[[i]], "vertices"),
x = xy[[i]][, 1], y = xy[[i]][, 2], frame = i
)
})
<- lapply(1:length(gList), function(i) cbind(igraph::as_data_frame(gList[[i]], "edges"), frame = i))
edges_lst
<- lapply(1:length(gList), function(i) {
edges_lst $x <- nodes_lst[[i]]$x[match(edges_lst[[i]]$from, nodes_lst[[i]]$name)]
edges_lst[[i]]$y <- nodes_lst[[i]]$y[match(edges_lst[[i]]$from, nodes_lst[[i]]$name)]
edges_lst[[i]]$xend <- nodes_lst[[i]]$x[match(edges_lst[[i]]$to, nodes_lst[[i]]$name)]
edges_lst[[i]]$yend <- nodes_lst[[i]]$y[match(edges_lst[[i]]$to, nodes_lst[[i]]$name)]
edges_lst[[i]]$id <- paste0(edges_lst[[i]]$from, "-", edges_lst[[i]]$to)
edges_lst[[i]]$status <- TRUE
edges_lst[[i]]
edges_lst[[i]]
})
<- do.call("rbind", lapply(gList, get.edgelist))
all_edges <- all_edges[!duplicated(all_edges), ]
all_edges <- cbind(all_edges, paste0(all_edges[, 1], "-", all_edges[, 2]))
all_edges
<- lapply(1:length(gList), function(i) {
edges_lst <- which(!all_edges[, 3] %in% edges_lst[[i]]$id)
idx if (length(idx != 0)) {
<- data.frame(from = all_edges[idx, 1], to = all_edges[idx, 2], id = all_edges[idx, 3])
tmp $x <- nodes_lst[[i]]$x[match(tmp$from, nodes_lst[[i]]$name)]
tmp$y <- nodes_lst[[i]]$y[match(tmp$from, nodes_lst[[i]]$name)]
tmp$xend <- nodes_lst[[i]]$x[match(tmp$to, nodes_lst[[i]]$name)]
tmp$yend <- nodes_lst[[i]]$y[match(tmp$to, nodes_lst[[i]]$name)]
tmp$frame <- i
tmp$status <- FALSE
tmp<- which(!names(edges_lst[[i]]) %in% names(tmp))
idy if (length(idy) > 0) {
names(edges_lst[[i]])[idy]] <- NA
tmp[
}<- rbind(edges_lst[[i]], tmp)
edges_lst[[i]]
}
edges_lst[[i]]
})
<- do.call("rbind", edges_lst)
edges_df <- do.call("rbind", nodes_lst)
nodes_df list(nodes = nodes_df, edges = edges_df)
}
#| label: build
<- do.call("rbind", edges_lst)
edges_df <- do.call("rbind", nodes_lst)
nodes_df
<- aninet(s50) dat
And that’s it in terms of data wrangling. All that is left is to plot/animate the data.
ggplot() +
geom_segment(
data = dat$edges_df,
aes(x = x, xend = xend, y = y, yend = yend, group = id, alpha = status),
show.legend = FALSE
+
) geom_point(
data = dat$nodes_df, aes(x, y, group = name, fill = as.factor(smoke)),
shape = 21, size = 4, show.legend = FALSE
+
) scale_fill_manual(values = c("forestgreen", "grey25", "firebrick")) +
scale_alpha_manual(values = c(0, 1)) +
ease_aes("quadratic-in-out") +
transition_states(frame, state_length = 0.5, wrap = FALSE) +
labs(title = "Wave {closest_state}") +
theme_void()
9.5 Multilevel networks
In this section, you will get to know layout_as_multilevel()
, a layout algorithm in the raphlayouts
package which can be use to visualize multilevel networks.
A multilevel network consists of two (or more) levels with different node sets and intra-level ties. For instance, one level could be scientists and their collaborative ties and the second level are labs and ties among them, and inter-level edges are the affiliations of scientists and labs.
The graphlayouts
package contains an artificial multilevel network which will be used to illustrate the algorithm.
data("multilvl_ex")
The package assumes that a multilevel network has a vertex attribute called lvl
which holds the level information (1 or 2).
The underlying algorithm of layout_as_multilevel()
has three different versions, which can be used to emphasize different structural features of a multilevel network.
Independent of which option is chosen, the algorithm internally produces a 3D layout, where each level is positioned on a different y-plane. The 3D layout is then mapped to 2D with an isometric projection. The parameters alpha
and beta
control the perspective of the projection. The default values seem to work for many instances, but may not always be optimal. As a rough guideline: beta
rotates the plot around the y axis (in 3D) and alpha
moves the POV up or down.
9.5.1 Complete layout
A layout for the complete network can be computed via layout_as_multilevel()
setting type = "all"
. Internally, the algorithm produces a constrained 3D stress layout (each level on a different y plane) which is then projected to 2D. This layout ignores potential differences in each level and optimizes only the overall layout.
<- layout_as_multilevel(multilvl_ex, type = "all", alpha = 25, beta = 45) xy
To visualize the network with ggraph
, you may want to draw the edges for each level (and inter level edges) with a different edge geom. This gives you more flexibility to control aesthetics and can easily be achieved with a filter.
ggraph(multilvl_ex, "manual", x = xy[, 1], y = xy[, 2]) +
geom_edge_link0(
aes(filter = (node1.lvl == 1 & node2.lvl == 1)),
edge_colour = "firebrick3",
alpha = 0.5,
edge_linewidth = 0.3
+
) geom_edge_link0(
aes(filter = (node1.lvl != node2.lvl)),
alpha = 0.3,
edge_linewidth = 0.1,
edge_colour = "black"
+
) geom_edge_link0(
aes(filter = (node1.lvl == 2 &
== 2)),
node2.lvl edge_colour = "goldenrod3",
edge_linewidth = 0.3,
alpha = 0.5
+
) geom_node_point(aes(shape = as.factor(lvl)), fill = "grey25", size = 3) +
scale_shape_manual(values = c(21, 22)) +
theme_graph() +
coord_cartesian(clip = "off", expand = TRUE) +
theme(legend.position = "none")
9.5.2 Separate layouts for both levels
In many instances, there may be different structural properties inherent to the levels of the network. In that case, two layout functions can be passed to layout_as_multilevel()
to deal with these differences. In our artificial network, level 1 has a hidden group structure and level 2 has a core-periphery structure.
To use this layout option, set type = "separate"
and specify two layout functions with FUN1
and FUN2
. You can change internal parameters of these layout functions with named lists in the params1
and params2
argument. Note that this version optimizes inter-level edges only minimally. The emphasis is on the intra-level structures.
<- layout_as_multilevel(multilvl_ex,
xy type = "separate",
FUN1 = layout_as_backbone,
FUN2 = layout_with_stress,
alpha = 25, beta = 45
)
Again, try to include an edge geom for each level.
<- c(
cols2 "#3A5FCD", "#CD00CD", "#EE30A7", "#EE6363",
"#CD2626", "#458B00", "#EEB422", "#EE7600"
)
ggraph(multilvl_ex, "manual", x = xy[, 1], y = xy[, 2]) +
geom_edge_link0(
aes(
filter = (node1.lvl == 1 & node2.lvl == 1),
edge_colour = col
),alpha = 0.5, edge_linewidth = 0.3
+
) geom_edge_link0(
aes(filter = (node1.lvl != node2.lvl)),
alpha = 0.3,
edge_linewidth = 0.1,
edge_colour = "black"
+
) geom_edge_link0(
aes(
filter = (node1.lvl == 2 & node2.lvl == 2),
edge_colour = col
),edge_linewidth = 0.3, alpha = 0.5
+
) geom_node_point(aes(
fill = as.factor(grp),
shape = as.factor(lvl),
size = nsize
+
)) scale_shape_manual(values = c(21, 22)) +
scale_size_continuous(range = c(1.5, 4.5)) +
scale_fill_manual(values = cols2) +
scale_edge_color_manual(values = cols2, na.value = "grey12") +
scale_edge_alpha_manual(values = c(0.1, 0.7)) +
theme_graph() +
coord_cartesian(clip = "off", expand = TRUE) +
theme(legend.position = "none")
9.5.3 Fix only one level
This layout can be used to emphasize one intra-level structure. The layout of the second level is calculated in a way that optimizes inter-level edge placement. Set type = "fix1"
and specify FUN1
and possibly params1
to fix level 1 or set type = "fix2"
and specify FUN2
and possibly params2
to fix level 2.
<- layout_as_multilevel(multilvl_ex,
xy type = "fix2",
FUN2 = layout_with_stress,
alpha = 25, beta = 45
)
ggraph(multilvl_ex, "manual", x = xy[, 1], y = xy[, 2]) +
geom_edge_link0(
aes(
filter = (node1.lvl == 1 & node2.lvl == 1),
edge_colour = col
),alpha = 0.5, edge_linewidth = 0.3
+
) geom_edge_link0(
aes(filter = (node1.lvl != node2.lvl)),
alpha = 0.3,
edge_linewidth = 0.1,
edge_colour = "black"
+
) geom_edge_link0(
aes(
filter = (node1.lvl == 2 & node2.lvl == 2),
edge_colour = col
),edge_linewidth = 0.3, alpha = 0.5
+
) geom_node_point(aes(
fill = as.factor(grp),
shape = as.factor(lvl),
size = nsize
+
)) scale_shape_manual(values = c(21, 22)) +
scale_size_continuous(range = c(1.5, 4.5)) +
scale_fill_manual(values = cols2) +
scale_edge_color_manual(values = cols2, na.value = "grey12") +
scale_edge_alpha_manual(values = c(0.1, 0.7)) +
theme_graph() +
coord_cartesian(clip = "off", expand = TRUE) +
theme(legend.position = "none")
9.5.4 3D with threejs
Instead of the default 2D projection, layout_as_multilevel()
can also return the 3D layout by setting project2d = FALSE
. The 3D layout can then be used with e.g. threejs
to produce an interactive 3D visualization.
library(threejs)
<- layout_as_multilevel(multilvl_ex,
xyz type = "separate",
FUN1 = layout_as_backbone,
FUN2 = layout_with_stress,
project2D = FALSE
)$layout <- xyz
multilvl_exV(multilvl_ex)$color <- c("#00BFFF", "#FF69B4")[V(multilvl_ex)$lvl]
V(multilvl_ex)$vertex.label <- V(multilvl_ex)$name
graphjs(multilvl_ex, bg = "black", vertex.shape = "sphere")