12  Basics of tidygraph

12.1 Graph structures

We’ll use the famous Florentine Family marriage dataset as a running example. The dataset is in igraph format but can be converted to a tbl_graph object with as_tbl_graph().

data("flo_marriage")
flo_tidy <- as_tbl_graph(flo_marriage)
flo_tidy
This graph was created by an old(er) igraph version.
  Call upgrade_graph() on it to use with the current igraph version
  For now we convert it on the fly...
# A tbl_graph: 16 nodes and 20 edges
#
# An undirected simple graph with 2 components
#
# Node Data: 16 × 4 (active)
   name         wealth `#priors` `#ties`
   <chr>         <dbl>     <dbl>   <dbl>
 1 Acciaiuoli       10        53       2
 2 Albizzi          36        65       3
 3 Barbadori        55         0      14
 4 Bischeri         44        12       9
 5 Castellani       20        22      18
 6 Ginori           32         0       9
 7 Guadagni          8        21      14
 8 Lamberteschi     42         0      14
 9 Medici          103        53      54
10 Pazzi            48         0       7
11 Peruzzi          49        42      32
12 Pucci             3         0       1
13 Ridolfi          27        38       4
14 Salviati         10        35       5
15 Strozzi         146        74      29
16 Tornabuoni       48         0       7
#
# Edge Data: 20 × 2
   from    to
  <int> <int>
1     1     9
2     2     6
3     2     7
# ℹ 17 more rows

This new graph class just subclasses igraph and simply represents the network in a tidy fashion, printing two data frames, one for nodes and one for edges.

class(flo_tidy)
[1] "tbl_graph" "igraph"   

Any function in R that expects an igraph object as input will also accept a tbl_graph.

The function tbl_graph() can be used to create a network from scratch with two data frames. It is basically equivalent to graph_from_data_frame().

To create random graphs with the usual generators, check out the create_*() and play_*() families of functions.

12.2 Standard verbs

The tidy framework, specifically thinking about dplyr, is about providing verbs which help to solve common data manipulation tasks, such as mutate(), select(), filter(), and summarise(). The challange for the tbl_graph objects is that these verbs somehow need to work with two different data frames. The way tidygraph solves this is via a pointer to the data frame which is supposed to be manipulated. This pointer can be changed with the verb activate(). By default the nodes are activated, which can also be seen with the print function (see line 5 in the output of flo_tidy). To activate the edge data frame, simply use activate("edges").

flo_tidy %>% activate("edges")
# A tbl_graph: 16 nodes and 20 edges
#
# An undirected simple graph with 2 components
#
# Edge Data: 20 × 2 (active)
    from    to
   <int> <int>
 1     1     9
 2     2     6
 3     2     7
 4     2     9
 5     3     5
 6     3     9
 7     4     7
 8     4    11
 9     4    15
10     5    11
11     5    15
12     7     8
13     7    16
14     9    13
15     9    14
16     9    16
17    10    14
18    11    15
19    13    15
20    13    16
#
# Node Data: 16 × 4
  name       wealth `#priors` `#ties`
  <chr>       <dbl>     <dbl>   <dbl>
1 Acciaiuoli     10        53       2
2 Albizzi        36        65       3
3 Barbadori      55         0      14
# ℹ 13 more rows

Any data manipulation would now be done on the edge data frame.

Having “activated” a data frame, many of the known dplyr verbs can be used to manipulate the data frame. The activation process might indicate that edges and nodes can only be manipulated separately, which is certainly not desirable. It is, however, possible to gain access to the edge data frame when nodes are activated via the .E(). Similarly, nodes can be accessed via .N() when edges are activated. In the below example, we activate the edges and create a new edge attribute which indicates if a family is connected to the Medici or not.

flo_tidy <- flo_tidy %>% 
  activate("edges") %>% 
  mutate(to_medici=(.N()$name[from]=="Medici" | .N()$name[to]=="Medici"))

This particular use case is helpful for visualizations.

ggraph(flo_tidy, "stress") +
    geom_edge_link0(aes(edge_color = to_medici)) +
    geom_node_point(shape = 21, size = 10, fill = "grey66") +
    geom_node_text(aes(label = name)) +
    theme_graph()
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

The dplyr verb filter() can be used to obtain a subgraph that satisfies given conditions on the nodes. Note that in the case that you filter on nodes, also edges will be effected. If a node does not satisfy the condition, then all edges connected to that node disappear. This is not the case for edges though.

flo_tidy %>%
    activate("edges") %>%
    filter(to_medici) %>%
    ggraph("stress", bbox = 10) +
    geom_edge_link0(edge_color = "black") +
    geom_node_point(shape = 21, size = 10, fill = "grey66") +
    geom_node_text(aes(label = name)) +
    theme_graph()

12.3 Joins

12.4 New Verbs