Network Analysis in R

David Schoch

GESIS - Leibniz Institute for the Social Sciences

My R universe

My R universe

A Short Introduction to Network Analysis

Why study networks?

conventional research methods are often individual based and our models tend to model relations between variables

but nature and culture is structured as networks

  • society
  • brain (neural networks)
  • organizations (who reports to whom)
  • economies (who sells to whom)
  • ecologies (who eats whom)

Position within a network is important for predicting outcomes

From “ordinary” to network data


atomic data
individuals or entities

dyadic data
dependent pairs of individuals (e.g. couples)
but treated as independent entities

networks
interdependent and overlapping dyads
usual (statistical) independence assumptions do not hold

Levels of Analysis


dyad level
Fundamental unit of network data collection
(“Does sharing offices lead to friendship?”)

node level
Aggregation of dyad level measurement
(“Do actors with more friends have a stronger immune system?”)

network level
Assessing overal structure of a network
(“Do well connected networks diffuse ideas faster?”)

more levels are possible (triads, groups, …)

Types of relations I


Relational states

  • Similarities: location, participation, attribute
  • Relational roles: kinship, other roles
  • Relational cognition: affective, perceptual

Relational events

  • Interactions: sold to, talked to, helped, …
  • Flows: information, belief, money

Types of relations II


undirected
symmetric relation

directed
asymmetric relation, but can be bi-directional

valued
strength of relation, frequency of contact, etc.

signed
positive and negative relations

or a mixture thereof

Goals of analysis

Network variables as independent/explanatory

Using network theory to explain the consequences of network properties

social capital, brokerage, adoption of innovation

Network variables as dependent/outcomes

Using ______ theory to explain the antecendents of a network

homophily, balance theory

R Ecosystem for Network Analysis

What is “base R” for networks?

CRAN dependencies on igraph, graph, network

Which package to choose?


use igraph if

  • you need speed (large networks)
  • you need to use other SNA packages

use network/sna if

  • you need to do modeling (e.g. ERGMs and RSIENA)

does not make a difference in most cases, never load them both!

Creating simple networks

g1 <- make_graph(c(1,2, 1,3, 2,3, 2,4, 3,5, 4,5), n = 5, dir = FALSE)
g2 <- graph_from_literal(Joey-Chandler:Monica-Ross, Joey-Ross-Rachel)

Special graphs

g3 <- make_full_graph(n = 10)
g4 <- make_ring(n = 10)
g5 <- make_empty_graph(n = 10)

ls("package:igraph",pattern = "make_*")

Random graphs

g6 <- sample_gnp(n = 100,p = 0.1)
g7 <- sample_pa(n = 100, power = 1.5, m = 1, directed = FALSE)

ls("package:igraph",pattern = "sample_*")

igraph objects

g2
IGRAPH 4972f0f UN-- 5 6 -- 
+ attr: name (v/c)
+ edges from 4972f0f (vertex names):
[1] Joey    --Chandler Joey    --Monica   Joey    --Ross     Chandler--Ross    
[5] Monica  --Ross     Ross    --Rachel  


library(netUtils)
str(g2)
-----------------------------------------------------------
UNNAMED NETWORK (undirected, unweighted, one-mode network)
-----------------------------------------------------------
Nodes: 5, Edges: 6, Density: 0.6, Components: 1, Isolates: 0
-Vertex Attributes:
 name(c): Joey, Chandler, Monica, Ross, Rachel ...
---
-Edges: 
 Joey--Chandler Joey--Monica Joey--Ross Chandler--Ross Monica--Ross
Ross--Rachel

Attributes

node attributes

V(g2)$name
[1] "Joey"     "Chandler" "Monica"   "Ross"     "Rachel"  
V(g2)$gender <- c("M","M","F","M","F") 
# g2 <- set_vertex_attr("gender", c("M","M","F","M","F"))

edge attributes

E(g2)
+ 6/6 edges from 4972f0f (vertex names):
[1] Joey    --Chandler Joey    --Monica   Joey    --Ross     Chandler--Ross    
[5] Monica  --Ross     Ross    --Rachel  
E(g2)$weight <- sample(1:5,size = 6, replace = TRUE)
# g2 <- set_edge_attr("weight", sample(1:5,size = 6, replace = TRUE))

Attributes

g2
IGRAPH 4972f0f UNW- 5 6 -- 
+ attr: name (v/c), gender (v/c), weight (e/n)
+ edges from 4972f0f (vertex names):
[1] Joey    --Chandler Joey    --Monica   Joey    --Ross     Chandler--Ross    
[5] Monica  --Ross     Ross    --Rachel  


str(g2)
---------------------------------------------------------
UNNAMED NETWORK (undirected, weighted, one-mode network)
---------------------------------------------------------
Nodes: 5, Edges: 6, Density: 0.6, Components: 1, Isolates: 0
-Vertex Attributes:
 name(c): Joey, Chandler, Monica, Ross, Rachel ...
 gender(c): M, M, F, M, F ...
---
-Edge Attributes:
 weight(n): 2, 2, 1, 3, 1, 1 ...
---
-Edges: 
 Joey--Chandler Joey--Monica Joey--Ross Chandler--Ross Monica--Ross
Ross--Rachel

Network representations: adjacency matrix

A <- matrix(
  c(0, 1, 1,
    1, 0, 1,
    1, 1, 0),
  nrow = 3, ncol = 3, byrow = TRUE)
rownames(A) <- c("Bob","Ann","Steve")
colnames(A) <- c("Bob","Ann","Steve")
A
      Bob Ann Steve
Bob     0   1     1
Ann     1   0     1
Steve   1   1     0
  • \(A_{ij}=1\) if there is an edge between \(i\) and \(j\)
  • \(A\) is symmetric for undirected networks
  • If \(A_{ij}>1\) then the values are interpreted as weights

Network representation: edgelist


el <- matrix(
  c("Bob","Ann",
    "Bob","Steve",
    "Ann","Steve"),
  nrow = 3,ncol = 2, byrow = TRUE)
el
     [,1]  [,2]   
[1,] "Bob" "Ann"  
[2,] "Bob" "Steve"
[3,] "Ann" "Steve"

more efficient for sparse data (null edges aren’t stored)

Networks from matrices and lists

adjacency matrix

graph_from_adjacency_matrix(
  A,
  mode = "undirected",
  weighted = NULL,
  diag = FALSE)
IGRAPH b56c006 UN-- 3 3 -- 
+ attr: name (v/c)
+ edges from b56c006 (vertex names):
[1] Bob--Ann   Bob--Steve Ann--Steve

edgelist

graph_from_edgelist(el, directed = FALSE)
IGRAPH 0637008 UN-- 3 3 -- 
+ attr: name (v/c)
+ edges from 0637008 (vertex names):
[1] Bob--Ann   Bob--Steve Ann--Steve
ls("package:igraph",pattern = "graph_from_*")

Reading network data


Data is already in R (e.g. networkdata)
No extra work

Data was processed in another SNA tool

read_graph(file, format = c("edgelist", "pajek", "ncol", "lgl",
  "graphml", "dimacs", "graphdb", "gml", "dl"), ...)

Some extra work (with some issues)

Data is in a csv/spreadsheet/..
read.table(), read.csv(), readxl, readr,…

Preparing network data with attributes

Organize network data in two separate files

from to
Arizona Robbins Leah Murphy
Alex Karev Leah Murphy
Arizona Robbins Lauren Boswell
Arizona Robbins Callie Torres
Erica Hahn Callie Torres
Alex Karev Callie Torres
name sex birthyear
Addison Montgomery F 1967
Adele Webber F 1949
Teddy Altman F 1969
Amelia Shepherd F 1981
Arizona Robbins F 1976
Rebecca Pope F 1975


graph_from_data_frame(el, directed = c(TRUE, FALSE), vertices)

Descriptive Statistics and some graph theory

Toy example

data("greys")

Density


The density of a network is defined as the fraction of the potential edges in a network that are actually present.

c(graph.density(make_empty_graph(10)), 
  graph.density(greys), 
  graph.density(make_full_graph(10)))
[1] 0.0000000 0.0398323 1.0000000

Shortest Paths

A shortest path is a path that connects two nodes in a network with a minimal number of edges. The length of a shortest path is called the distance between two nodes.

shortest_paths(greys,from = "Alex Karev",to = "Owen Hunt",output = "vpath")
$vpath
$vpath[[1]]
+ 5/54 vertices, named, from f7716f1:
[1] Alex Karev         Addison Montgomery Mark Sloan         Teddy Altman      
[5] Owen Hunt         


$epath
NULL

$predecessors
NULL

$inbound_edges
NULL

Shortest Paths

Distances

distances(greys)[1:5,1:5]
                   Addison Montgomery Adele Webber Teddy Altman Amelia Shepherd
Addison Montgomery                  0          Inf            2               2
Adele Webber                      Inf            0          Inf             Inf
Teddy Altman                        2          Inf            0               2
Amelia Shepherd                     2          Inf            2               0
Arizona Robbins                     3          Inf            3               3
                   Arizona Robbins
Addison Montgomery               3
Adele Webber                   Inf
Teddy Altman                     3
Amelia Shepherd                  3
Arizona Robbins                  0

The Grey’s Anatomy network is disconnected (\(4\) connected components)

Diameter

The length of the longest shortest path is called the diameter of the network.

diameter(greys)
[1] 8

Transitivity

Transitivity is a measure of the degree to which nodes in a graph tend to cluster together. This is also called the clustering coefficient.

local
gives an indication of the embeddedness of single nodes

global
indication of the clustering in the network

\[ \frac{3 \times \text{number of triangles} }{\text{total number of triplets}} \]

Transitivity

data("coleman")
g <- as.undirected(coleman[[1]])
transitivity(g, type = "global")
[1] 0.440083
transitivity(g, type = "local", isolates = "zero")
 [1] 0.400000 1.000000 0.000000 0.500000 0.666667 1.000000 0.000000 0.000000
 [9] 0.533333 0.000000 0.400000 1.000000 0.333333 0.333333 0.000000 0.333333
[17] 0.000000 0.400000 0.333333 0.428571 0.303030 0.377778 0.305556 0.000000
[25] 0.000000 0.266667 0.333333 0.000000 0.000000 1.000000 0.166667 0.400000
[33] 0.700000 1.000000 0.333333 0.700000 0.666667 0.285714 0.200000 0.200000
[41] 0.466667 0.600000 0.266667 0.000000 0.700000 0.500000 0.600000 0.466667
[49] 0.666667 0.571429 0.333333 0.571429 0.666667 0.266667 0.357143 0.666667
[57] 0.600000 1.000000 0.500000 0.600000 0.666667 0.500000 0.800000 0.700000
[65] 0.600000 0.714286 0.866667 0.244444 0.714286 0.377778 0.488889 0.000000
[73] 0.000000


In empirical networks, we often observe a tendency towards high transitivity (“the friend of a friend is a friend”)

Degree distribution


The degree of a node in a network is the number of connections it has to other nodes.

The degree distribution is the probability distribution of the degrees over the whole network.

Empirical degree distributions are generally right skewed.
(Many nodes have a few conncetions and few have many) “preferential attachment”, “matthew effect”, “the rich get richer”

Degree distribution


er <- sample_gnp(n = 5000, p = 0.01)
pa <- sample_pa(n = 5000, power = 1.5, m = 2, directed = FALSE)


plot(degree_distribution(er))

plot(degree_distribution(pa),log = "xy")

Centrality

A measure of centrality is an index that assigns a numeric values to the nodes of the network. The higher the value, the more central the node.

“Being central” is a very ambiguous term hence there exists a large variety of indices that assess centrality with very different structural properties.

Standard indices

Degree
Number of direct neighbors (“popularity”)

Closeness
Reciprocal of the sum of the length of the shortest paths

Betweenness
Number of shortest paths that pass through a node (“brokerage”)

Eigenvector
Being central means being connected to other central nodes

PageRank
Similar to eigenvector, just for directed networks

Implemented indices

igraph contains the following 10 indices:

  • degree (degree())
  • weighted degree (strength())
  • betweenness (betweenness())
  • closeness (closeness())
  • eigenvector (eigen_centrality())
  • alpha centrality (alpha_centrality())
  • power centrality (power_centrality())
  • PageRank (page_rank())
  • eccentricity (eccentricity())
  • hubs and authorities (authority_score() and hub_score())
  • subgraph centrality (subgraph_centrality())

Indices in the sna package

The sna package implements roughly the same indices as but adds:

  • flow betweenness (flowbet())
  • load centrality (loadcent())
  • Gil-Schmidt Power Index (gilschmidt())
  • information centrality (infocent())
  • stress centrality (stresscent())

Dedicated packages

centiserve, CINNA

library(centiserve)
as.character(lsf.str("package:centiserve"))
 [1] "averagedis"            "barycenter"            "bottleneck"           
 [4] "centroid"              "closeness.currentflow" "closeness.freeman"    
 [7] "closeness.latora"      "closeness.residual"    "closeness.vitality"   
[10] "clusterrank"           "communibet"            "communitycent"        
[13] "crossclique"           "decay"                 "diffusion.degree"     
[16] "dmnc"                  "entropy"               "epc"                  
[19] "geokpath"              "hubbell"               "katzcent"             
[22] "laplacian"             "leaderrank"            "leverage"             
[25] "lincent"               "lobby"                 "markovcent"           
[28] "mnc"                   "pairwisedis"           "radiality"            
[31] "salsa"                 "semilocal"             "topocoefficient"      

centiserver lists more than 400 indices

CINNA:
Computing and comparing top informative centrality measures

Alternatives?

netrankr:
Tools which allow an indexfree assessment of centrality, including:

  • partial centrality
  • expected centrality
  • probabilistic centrality

Help

Tutorials and material for using netrankr can be found at https://schochastics.github.io/netrankr/

Cohesive groups

Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent, or positive ties.

Methods that formalize the intuitive and theoretical notion of social group using social network properties

Cliques

A clique in a network is a set of nodes that form a complete subnetwork within a network (called a complete subgraph).

A maximal clique is a clique that cannot be extended to a bigger clique by addding more nodes to it.

data("clique_graph")

Cliques

Maximal cliques can be calculated with max_cliques()

# only return cliques with three or more nodes
cl <- max_cliques(clique_graph,min = 3)
cl
[[1]]
+ 3/30 vertices, from 0193e05:
[1]  9 17 18

[[2]]
+ 3/30 vertices, from 0193e05:
[1] 7 4 5

[[3]]
+ 3/30 vertices, from 0193e05:
[1] 7 4 8

[[4]]
+ 3/30 vertices, from 0193e05:
[1] 10  2 11

[[5]]
+ 3/30 vertices, from 0193e05:
[1] 16 12 15

[[6]]
+ 3/30 vertices, from 0193e05:
[1] 6 1 5

[[7]]
+ 4/30 vertices, from 0193e05:
[1] 12 13 15 14

[[8]]
+ 3/30 vertices, from 0193e05:
[1] 12  2  1

[[9]]
+ 5/30 vertices, from 0193e05:
[1] 1 2 5 4 3

Cliques

k-core decomposition

A k-core is a subgraph in which every node has at least k neighbors within the subgraph. A k-core is thus a relaxed version of a clique.

kcore <- coreness(clique_graph)
kcore
 [1] 4 4 4 4 4 3 2 2 2 2 2 3 3 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1

Clustering/Community detection

Minimum-cut method
cut graph into partitions which minimizes some metric

Hierarchical clustering
Agglomerative/Divisive methods to build a hierarchy of clusters
based on node similarity

Modularity Maximization
Modularity is defined as the fraction of edges that fall within given groups minus the expected fraction if edges were random

Statistical inference
stochastic blockmodeling based on generative models

Clustering with igraph


  • There is no agreed upon best method
  • Modularity maximization is still widely considered “state-of-the-art”
  • Generative models are, however, a strong contender
    (not implemented in R yet)
ls("package:igraph",pattern = "cluster_")
 [1] "cluster_edge_betweenness"  "cluster_fast_greedy"      
 [3] "cluster_fluid_communities" "cluster_infomap"          
 [5] "cluster_label_prop"        "cluster_leading_eigen"    
 [7] "cluster_leiden"            "cluster_louvain"          
 [9] "cluster_optimal"           "cluster_spinglass"        
[11] "cluster_walktrap"         

Clustering workflow

data("karate")

Clustering workflow

# compute clustering
clu <- cluster_louvain(karate)

# cluster membership vector
mem <- membership(clu)
mem
 [1] 1 1 1 1 2 2 2 1 3 3 2 1 1 1 3 3 2 1 3 1 3 1 3 4 4 4 3 4 4 3 3 4 3 3
# clusters as list
com <- communities(clu)
com
$`1`
 [1]  1  2  3  4  8 12 13 14 18 20 22

$`2`
[1]  5  6  7 11 17

$`3`
 [1]  9 10 15 16 19 21 23 27 30 31 33 34

$`4`
[1] 24 25 26 28 29 32

Clustering workflow


imc <- cluster_infomap(karate)
lec <- cluster_leading_eigen(karate)
loc <- cluster_louvain(karate)
sgc <- cluster_spinglass(karate)
wtc <- cluster_walktrap(karate)

scores <- c(infomap = modularity(karate,membership(imc)),
            eigen = modularity(karate,membership(lec)),
            louvain = modularity(karate,membership(loc)),
            spinglass = modularity(karate,membership(sgc)),
            walk = modularity(karate,membership(wtc)))

scores
  infomap     eigen   louvain spinglass      walk 
 0.402038  0.393409  0.418803  0.419790  0.353222 

Clustering workflow

Beyond “standard” networks

Two-mode networks

A two-mode network is a network that consists of two disjoint sets of nodes (like people and events)

Common examples include:

  • Affiliation networks (Membership in institutions)
  • Voting/Sponsorship networks (politicians and bills)
  • Citation network (authors and papers)
  • Co-Authorship networks (authors and papers)

Toy example

data("southern_women")

Analyzing two-mode networks


The adjacency matrix is called incidence matrix

A <- as_incidence_matrix(southern_women)
A[1:8, ]
          6/27 3/2 4/12 9/26 2/25 5/19 3/15 9/16 4/8 6/10 2/23 4/7 11/21 8/3
EVELYN       1   1    1    1    1    1    0    1   1    0    0   0     0   0
LAURA        1   1    1    0    1    1    1    1   0    0    0   0     0   0
THERESA      0   1    1    1    1    1    1    1   1    0    0   0     0   0
BRENDA       1   0    1    1    1    1    1    1   0    0    0   0     0   0
CHARLOTTE    0   0    1    1    1    0    1    0   0    0    0   0     0   0
FRANCES      0   0    1    0    1    1    0    1   0    0    0   0     0   0
ELEANOR      0   0    0    0    1    1    1    1   0    0    0   0     0   0
 [ reached getOption("max.print") -- omitted 1 row ]

tnet and bipartite offer some methods to analyse two mode networks directly, by adapting tools for standard networks.

Projecting two-mode networks

B <- A%*%t(A)
B[1:5,1:5]
          EVELYN LAURA THERESA BRENDA CHARLOTTE
EVELYN         8     6       7      6         3
LAURA          6     7       6      6         3
THERESA        7     6       8      6         4
BRENDA         6     6       6      7         4
CHARLOTTE      3     3       4      4         4

Projecting two-mode networks

Filtering projections

naïve
delete all edge with a weight less than x

advanced
statistical tools using null models: backbone
Introduction to the package

Signed networks


Signed networks include two types of relations:
positive (“friends”) and negative (“foes”)

typical research questions involve (implemented in signnet):

  • structural balance
  • blockmodeling
  • (centrality)

Structural balance theory

Beyond triangles
A network is balanced if it can be partitioned into two groups such that all intra group edges are positive and all inter group edges are negiative

Extended form of balance (Davis 1960s)
A network is balanced if it can be partitioned into k groups …

Toy example


library(signnet)
data("tribes")
ggsigned(tribes)

ggsigned(tribes,weights = TRUE)

Measuring structural balance

  • triangles: Fraction of balanced triangles.
  • walks: fraction of signed to unsigned walks
  • frustration: optimal partition such that the sum of intra group negative and inter group positive edges is minimized
balance_score(tribes,method = "triangles")
[1] 0.867647
balance_score(tribes,method = "walk")
[1] 0.357576
balance_score(tribes,method = "frustration")
[1] 0.758621

Blockmodeling

In signed blockmodeling, the goal is to determine \(k\) blocks of nodes such that all intra-block edges are positive and inter-block edges are negative

set.seed(141)
bl <- signed_blockmodel(tribes,3)
bl
$membership
 [1] 3 3 1 1 2 1 1 1 2 2 1 1 2 2 3 3

$criterion
[1] 2
ggblock(tribes,blocks = bl$membership,show_blocks = TRUE,show_labels = TRUE)

Blockmodeling

Generalized blockmodeling

The diagonal block structure is not always the most optimal representaion of the data

Generalized blockmodeling

The function signed_blockmodel_general() allows to specify arbitrary block structures.

set.seed(424) #for reproducibility
blockmat <- matrix(c(1,-1,-1,-1,1,1,-1,1,-1),3,3,byrow = TRUE)
blockmat
     [,1] [,2] [,3]
[1,]    1   -1   -1
[2,]   -1    1    1
[3,]   -1    1   -1
general <- signed_blockmodel_general(g,blockmat,alpha = 0.5)
traditional <- signed_blockmodel(g,k = 3,alpha = 0.5,annealing = TRUE)

c(general$criterion,traditional$criterion)
[1] 0 6

Generalized blockmodeling

Is there a “tidyverse” for networks?

tidygraph I

This package provides a tidy API for graph/network manipulation. While network data itself is not tidy, it can be envisioned as two tidy tables, one for node data and one for edge data. tidygraph provides a way to switch between the two tables and provides dplyr verbs for manipulating them.

tidygraph II

It more or less wraps the full functionality of igraph in a tidy API giving you access to almost all of the dplyr verbs plus a few more, developed for use with relational data.

Network Visualization

Visualizing networks is hard(er)

ordinary data

  • clear data format (rows: observations, columns: variables)
  • plot style dependent on variable scale (barchart, scatterplot, boxplot,…)
  • illustrate relations between variables
  • given relative positions

network data

  • different data formats (adjacency matrix, edgelist, adjaceny list, …)
  • how to choose a proper layout algorithm?
  • more degrees of freedom
  • can we draw any conclusions from a network plot?

which package(s) to choose?


why ggraph?



maintained by Posit
stable and reliable (many other packages have been abandoned)
plays well with other packages

thoughtful API
extension of ggplot2
grammar of graphics

First full example (dont panic)

# load and manipulate data
data("starwars")
sw1 <- starwars[[1]]
sw_palette <- c("#1A5878", "#C44237", "#AD8941", "#E99093", "#50594B")
V(sw1)$interactions <- graph.strength(sw1) 

#plot
ggraph(graph = sw1,layout = "stress") + 
  geom_edge_link0(edge_colour = "grey25",
                  aes(edge_width = weight)) +
  geom_node_point(shape = 21, color = "black",stroke = 1,
                  aes(fill = sex,size = interactions)) +
  geom_node_text(color = "black", size = 4, repel = FALSE, 
                 aes(filter = (interactions>=65),label = name))+
  scale_edge_width(range = c(0.1,1.5),guide = "none")+
  scale_size(range = c(3,10),guide = "none")+
  scale_fill_manual(values = sw_palette, na.value = "grey",name = "")+
  coord_fixed()+
  theme_graph() +
  theme(legend.position = "bottom") +
  guides(fill = guide_legend(override.aes = list(size=6)))

First full example (dont panic)

1) layout

ggraph(graph = sw1,layout = "stress", ...)

  • graph: igraph object sw1 (can also be a tidygraph object)
  • layout: used algorithm
  • …: additional parameters depend on algorithm

2) edges: geoms

geom_edge_link0(edge_colour = "grey25", aes(edge_width = weight))

ls("package:ggraph",pattern = "geom_edge_*")
 [1] "geom_edge_arc"       "geom_edge_arc0"      "geom_edge_arc2"     
 [4] "geom_edge_bend"      "geom_edge_bend0"     "geom_edge_bend2"    
 [7] "geom_edge_density"   "geom_edge_diagonal"  "geom_edge_diagonal0"
[10] "geom_edge_diagonal2" "geom_edge_elbow"     "geom_edge_elbow0"   
[13] "geom_edge_elbow2"    "geom_edge_fan"       "geom_edge_fan0"     
[16] "geom_edge_fan2"      "geom_edge_hive"      "geom_edge_hive0"    
[19] "geom_edge_hive2"     "geom_edge_link"      "geom_edge_link0"    
[22] "geom_edge_link2"     "geom_edge_loop"      "geom_edge_loop0"    
[25] "geom_edge_parallel"  "geom_edge_parallel0" "geom_edge_parallel2"
[28] "geom_edge_point"     "geom_edge_span"      "geom_edge_span0"    
[31] "geom_edge_span2"     "geom_edge_tile"     

geom_edge_type: generate n points, draw path
geom_edge_type0: direct line
geom_edge_type2: can interpolate node parameters

geom_edge_link0() and geom_edge_parallel0() suffice

2) edges: aesthetics

geom_edge_link0(edge_colour = "grey25", aes(edge_width = weight))


mapping aesthetics

  • global: all edges have the same appearance
    (e.g. edge_colour = "grey25")
  • via attributes: appearance depends on attribute
    (e.g. aes(edge_width = weight))

available aesthetics
edge_colo(u)r, edge_width, edge_linetype, edge_alpha

2) edges: aesthetics examples

# create a simple graph of three nodes where all nodes are connected
g <- graph.full(3) 
E(g)$weight <- c(1, 2, 0.5)

ggraph(g, "stress") +
  geom_edge_link0(edge_linetype = "dotted")

2) edges: aesthetics examples

# create a simple graph of three nodes where all nodes are connected
g <- graph.full(3) 
E(g)$weight <- c(1, 2, 0.5)

ggraph(g, "stress") +
  geom_edge_link0(edge_alpha = 0.5)

2) edges: aesthetics examples

# create a simple graph of three nodes where all nodes are connected
g <- graph.full(3) 
E(g)$weight <- c(1, 2, 0.5)

ggraph(g, "stress") +
  geom_edge_link0(aes(edge_alpha = weight, edge_width = weight))

2) edges: misc

g <- graph.full(3,directed = TRUE)

ggraph(g, "stress") +
  geom_edge_parallel0(edge_width = 0.5,
    arrow = arrow(angle = 15, length = unit(0.15, "inches"),
                  ends = "last", type = "closed"))

3) nodes: geoms

geom_node_point(shape = 21, color = "black",stroke = 1, aes(fill = sex,size = interactions))

ls("package:ggraph",pattern = "geom_node_*")
[1] "geom_node_arc_bar" "geom_node_circle"  "geom_node_label"  
[4] "geom_node_point"   "geom_node_range"   "geom_node_text"   
[7] "geom_node_tile"    "geom_node_voronoi"


  • geom_node_point(): draw nodes as a simple point

3) nodes: aesthetics

geom_node_point(shape = 21, color = "black",stroke = 1, aes(fill = sex,size = interactions))


mapping aesthetics

  • global: all nodes have the same appearance
    (e.g. shape = 21)
  • via attributes: appearance depends on attribute
    (e.g. aes(fill = sex))

available aesthetics
alpha, colo(u)r, fill, shape, size, stroke
(usage of colour, fill, and stroke depend on shape)

3) nodes: aesthetic examples

g <- graph.full(3)

ggraph(g, "stress") +
  geom_edge_link0() +
  geom_node_point(size = 5, color = "red")

3) nodes: aesthetic examples

g <- graph.full(3)

ggraph(g, "stress") +
  geom_edge_link0() +
  geom_node_point(size = 5, shape = 21, color = "red", fill = "black", stroke = 2)

3) nodes: aesthetic examples

g <- graph.full(3)

ggraph(g, "stress") +
  geom_edge_link0() +
  geom_node_point(size = 5, shape = 22, color = "red", fill = "black", stroke = 2)

4) labels: geoms

geom_node_text(color = "grey25", size = 4, repel = FALSE, aes(filter = (interactions>=65),label = name))


  • geom_node_text(): add text to node
  • geom_node_label(): add text to node with frame



geom_node_text is the preferred choice

4) labels: aesthetics

geom_node_text(color = "grey25", size = 4, repel = FALSE, aes(filter = (interactions>=65),label = name))


mapping aesthetics
- global: specify font properties
- via attributes: set label to name attribute of node
- filter: only display for nodes (or edges!) that fulfil a given criterion


available aesthetics
many! but most important: label, colour, family, size, and repel

4) labels: aesthetics examples

g <- graph.full(3)
V(g)$name <- c("David", "Termeh", "Luna")

ggraph(g, "stress") +
  geom_edge_link0() +
  geom_node_point(size = 5, color = "grey66")+
  geom_node_text(aes(label = name), color = "black") + coord_fixed(clip = "off")

4) labels: aesthetics examples

g <- graph.full(3)
V(g)$name <- c("David", "Termeh", "Luna")

ggraph(g, "stress") +
  geom_edge_link0() +
  geom_node_point(size = 5, color = "grey66")+
  geom_node_label(aes(label = name), color = "black") + coord_fixed(clip = "off")

5) scales

scale_edge_width(range = c(0.1,1.5),guide = "none")
scale_size(range = c(3,10),guide = "none")
scale_fill_manual(values = sw_palette, na.value = "grey",name = "")


control aesthetics that are mapped within aes()
although optional, set one scale_* per parameter in any aes()

form of scale functions
scale_<aes>_<variable type>()

additional options
guide (show legend or not), name (label in legend), na.value (value for NAs)

5) scales: variable types

node size and edge width (and node/edge alpha)
scale_size() and scale_edge_width()
most relevant parameter is range = c(min,max)

continuous variable to colour
scale_(edge_)colour_gradient(low = ...,high = ...)

categorical variable to colour
scale_colour_brewer()
scale_colour_manual(values = ...)

misc: scale_shape() and scale_edge_linetype()

5) scales: examples

ggraph(graph = sw1,layout = "stress") + 
  geom_edge_link0(aes(edge_width = weight)) +
  geom_node_point(size = 5, shape = 21, aes(fill = sex)) +
  scale_edge_width(range = c(0.1,1.5),guide = "none")+
  scale_fill_manual(values = sw_palette, na.value = "grey",name = "")+
  theme(legend.position = "bottom") 

5) scales: examples

ggraph(graph = sw1,layout = "stress") + 
  geom_edge_link0(aes(edge_width = weight), show.legend = FALSE) +
  geom_node_point(size = 5, shape = 21, aes(fill = sex)) +
  theme(legend.position = "bottom") 

5) scales: examples

ggraph(graph = sw1,layout = "stress") + 
  geom_edge_link0(aes(edge_width = weight)) +
  geom_node_point(size = 5, shape = 21, aes(fill = sex)) +
  scale_edge_width(range = c(0.1,1.5),guide = "none")+
  scale_fill_brewer(palette = "Set1", na.value = "grey",name = "")+
  theme(legend.position = "bottom") 

5) scales: examples

ggraph(graph = sw1,layout = "stress") + 
  geom_edge_link0(aes(edge_width = weight)) +
  geom_node_point(size = 5, fill = "grey25", aes(shape = sex)) +
  scale_edge_width(range = c(0.1,1.5),guide = "none")+
  scale_shape_manual(values=21:24, na.value = 25,name = "")+
  theme(legend.position = "bottom") 

6) themes

theme_graph() + theme(legend.position = "bottom")

control the overall look of the plot

  • theme() has a lot of options but we really don’t need them (except legend.position)
  • theme_graph() erases all defaults (e.g. axis, grids, etc.)


guides(fill = guide_legend(override.aes = list(size=6)))

change appearance of geoms in legend (highly optional!)

summary

layout
ggraph(graph,layout = "stress") +

edges
geom_edge_link0(<<global>>,aes(<<via variables>>)) +

nodes
geom_node_point(<global>,aes(<via variables>)) +
geom_node_text(<global>,aes(<via variables>)) +

scales
scale_<aes>_<variable type>() + (one per variable in aes())

themes
theme_graph()

go beyond the standard layout

layout stress is sufficient for most network visualization tasks

ego centric layout

Emphasize the position of certain nodes in the network.

ggraph(sw1,layout = "focus",focus = 19)+
  draw_circle(col = "#00BFFF", use = "focus",max.circle = 3)+
  geom_edge_link0(edge_colour = "grey25",edge_alpha = 0.5)+
  geom_node_point(shape = 21,size = 5,fill = "grey66")+
  geom_node_text(aes(filter = (name=="ANAKIN"),label = name))+
  theme_graph()+
  coord_fixed()


  • focus=... : id to be put in the center (other nodes are on concentric circles around it)
  • draw_circle: draw the concentric circles

ego centric layout

centrality layout


concentric circle layout according to a centrality index

strength <- graph.strength(sw1)
ggraph(sw1,layout = "centrality",cent = strength)+
  draw_circle(col = "#00BFFF", use = "cent")+
  geom_edge_link0(edge_colour = "grey25",edge_alpha = 0.5)+
  geom_node_point(shape = 21,size = 5,fill = "grey66")+
  geom_node_text(aes(filter = (strength>=45),label = name),repel = TRUE)+
  theme_graph()+
  coord_fixed()

centrality layout

backbone layout

layout_as_backbone() can help emphasize hidden group structures

g <- sample_islands(9,40,0.4,15)
g <- simplify(g)
V(g)$grp <- as.character(rep(1:9,each = 40))


try the standard first

ggraph(g,layout = "stress")+
  geom_edge_link0(edge_colour = "black",edge_width = 0.1, edge_alpha = 0.5)+
  geom_node_point(shape = 21, size = 3, aes(fill = grp))+
  scale_fill_brewer(palette = "Set1")+
  theme_graph()+
  theme(legend.position = "none")

backbone layout

backbone layout



try to reveal the hidden group structure with layout="backbone"

ggraph(g, layout = "backbone")+
  geom_edge_link0(edge_colour = "black",edge_width = 0.1, edge_alpha = 0.5)+
  geom_node_point(shape = 21,size = 3, aes(fill = grp))+
  scale_fill_brewer(palette = "Set1")+
  theme_graph()+
  theme(legend.position = "none")

backbone layout

backbone layout

facebook friendships of a university. Node colour corresponds to dormitory of students

large networks

layout summary

most useful layouts
layout = "stress": all purpose layout algorithm
layout = "focus": ego-centric type layouts
layout = "centrality": concentric centrality layout
layout = "backbone": emphasize a group structure (if it exists)
layout = "sparse_stress": large networks

not covered here
multilevel layouts
dynamic layouts
constrained stress layout algorithm

miscellaneous

Do not recompute layout continuously

lay <- create_layout(g,"stress")

ggraph(lay) + 
  geom_edge_link0() +
  geom_node_point()

But need to recompute the layout if attributes change!

lay <- layout_with_stress(g)

ggraph(g, "manual", x = xy[,1],y = xy[,2]) + 
  geom_edge_link0() +
  geom_node_point()

edgebundling & flow maps

Using the package edgebundle

Summary of the R Ecosystem

  • Analyze “standard” networks with igraph

  • Analyze two-mode network projections with backbone

  • Analyze signed networks with

  • Missing something? try netUtils

  • Need toy data? use networkdata

  • Need a GUI to analyze networks in RStudio? try snahelper

  • “I like my networks tidy” tidygraph

  • “Is there a ggplot2 for networks?” ggraph

  • I cannot do x because it is not implemented in R netUtils

Resources beyond this talk