g1 <- make_graph(c(1,2, 1,3, 2,3, 2,4, 3,5, 4,5), n = 5, dir = FALSE)
g2 <- graph_from_literal(Joey-Chandler:Monica-Ross, Joey-Ross-Rachel)GESIS - Leibniz Institute for the Social Sciences


conventional research methods are often individual based and our models tend to model relations between variables
but nature and culture is structured as networks
Position within a network is important for predicting outcomes
atomic data
individuals or entities
dyadic data
dependent pairs of individuals (e.g. couples)
but treated as independent entities
networks
interdependent and overlapping dyads
usual (statistical) independence assumptions do not hold
dyad level
Fundamental unit of network data collection
(“Does sharing offices lead to friendship?”)
node level
Aggregation of dyad level measurement
(“Do actors with more friends have a stronger immune system?”)
network level
Assessing overal structure of a network
(“Do well connected networks diffuse ideas faster?”)
more levels are possible (triads, groups, …)
Relational states
Relational events
undirected
symmetric relation
directed
asymmetric relation, but can be bi-directional
valued
strength of relation, frequency of contact, etc.
signed
positive and negative relations
or a mixture thereof
Network variables as independent/explanatory
Using network theory to explain the consequences of network properties
social capital, brokerage, adoption of innovation
Network variables as dependent/outcomes
Using ______ theory to explain the antecendents of a network
homophily, balance theory

CRAN dependencies on igraph, graph, network
use igraph if
use network/sna if
does not make a difference in most cases, never load them both!







IGRAPH 4972f0f UN-- 5 6 --
+ attr: name (v/c)
+ edges from 4972f0f (vertex names):
[1] Joey --Chandler Joey --Monica Joey --Ross Chandler--Ross
[5] Monica --Ross Ross --Rachel
-----------------------------------------------------------
UNNAMED NETWORK (undirected, unweighted, one-mode network)
-----------------------------------------------------------
Nodes: 5, Edges: 6, Density: 0.6, Components: 1, Isolates: 0
-Vertex Attributes:
name(c): Joey, Chandler, Monica, Ross, Rachel ...
---
-Edges:
Joey--Chandler Joey--Monica Joey--Ross Chandler--Ross Monica--Ross
Ross--Rachel
node attributes
[1] "Joey" "Chandler" "Monica" "Ross" "Rachel"
IGRAPH 4972f0f UNW- 5 6 --
+ attr: name (v/c), gender (v/c), weight (e/n)
+ edges from 4972f0f (vertex names):
[1] Joey --Chandler Joey --Monica Joey --Ross Chandler--Ross
[5] Monica --Ross Ross --Rachel
---------------------------------------------------------
UNNAMED NETWORK (undirected, weighted, one-mode network)
---------------------------------------------------------
Nodes: 5, Edges: 6, Density: 0.6, Components: 1, Isolates: 0
-Vertex Attributes:
name(c): Joey, Chandler, Monica, Ross, Rachel ...
gender(c): M, M, F, M, F ...
---
-Edge Attributes:
weight(n): 2, 2, 1, 3, 1, 1 ...
---
-Edges:
Joey--Chandler Joey--Monica Joey--Ross Chandler--Ross Monica--Ross
Ross--Rachel
A <- matrix(
c(0, 1, 1,
1, 0, 1,
1, 1, 0),
nrow = 3, ncol = 3, byrow = TRUE)
rownames(A) <- c("Bob","Ann","Steve")
colnames(A) <- c("Bob","Ann","Steve")
A Bob Ann Steve
Bob 0 1 1
Ann 1 0 1
Steve 1 1 0
[,1] [,2]
[1,] "Bob" "Ann"
[2,] "Bob" "Steve"
[3,] "Ann" "Steve"
more efficient for sparse data (null edges aren’t stored)
adjacency matrix
IGRAPH b56c006 UN-- 3 3 --
+ attr: name (v/c)
+ edges from b56c006 (vertex names):
[1] Bob--Ann Bob--Steve Ann--Steve
edgelist
IGRAPH 0637008 UN-- 3 3 --
+ attr: name (v/c)
+ edges from 0637008 (vertex names):
[1] Bob--Ann Bob--Steve Ann--Steve
Data is already in R (e.g. networkdata)
No extra work
Data was processed in another SNA tool
Some extra work (with some issues)
Organize network data in two separate files
| from | to |
|---|---|
| Arizona Robbins | Leah Murphy |
| Alex Karev | Leah Murphy |
| Arizona Robbins | Lauren Boswell |
| Arizona Robbins | Callie Torres |
| Erica Hahn | Callie Torres |
| Alex Karev | Callie Torres |
| name | sex | birthyear |
|---|---|---|
| Addison Montgomery | F | 1967 |
| Adele Webber | F | 1949 |
| Teddy Altman | F | 1969 |
| Amelia Shepherd | F | 1981 |
| Arizona Robbins | F | 1976 |
| Rebecca Pope | F | 1975 |
The density of a network is defined as the fraction of the potential edges in a network that are actually present.
A shortest path is a path that connects two nodes in a network with a minimal number of edges. The length of a shortest path is called the distance between two nodes.
Addison Montgomery Adele Webber Teddy Altman Amelia Shepherd
Addison Montgomery 0 Inf 2 2
Adele Webber Inf 0 Inf Inf
Teddy Altman 2 Inf 0 2
Amelia Shepherd 2 Inf 2 0
Arizona Robbins 3 Inf 3 3
Arizona Robbins
Addison Montgomery 3
Adele Webber Inf
Teddy Altman 3
Amelia Shepherd 3
Arizona Robbins 0
The Grey’s Anatomy network is disconnected (\(4\) connected components)
The length of the longest shortest path is called the diameter of the network.
Transitivity is a measure of the degree to which nodes in a graph tend to cluster together. This is also called the clustering coefficient.
local
gives an indication of the embeddedness of single nodes
global
indication of the clustering in the network
\[ \frac{3 \times \text{number of triangles} }{\text{total number of triplets}} \]
[1] 0.400000 1.000000 0.000000 0.500000 0.666667 1.000000 0.000000 0.000000
[9] 0.533333 0.000000 0.400000 1.000000 0.333333 0.333333 0.000000 0.333333
[17] 0.000000 0.400000 0.333333 0.428571 0.303030 0.377778 0.305556 0.000000
[25] 0.000000 0.266667 0.333333 0.000000 0.000000 1.000000 0.166667 0.400000
[33] 0.700000 1.000000 0.333333 0.700000 0.666667 0.285714 0.200000 0.200000
[41] 0.466667 0.600000 0.266667 0.000000 0.700000 0.500000 0.600000 0.466667
[49] 0.666667 0.571429 0.333333 0.571429 0.666667 0.266667 0.357143 0.666667
[57] 0.600000 1.000000 0.500000 0.600000 0.666667 0.500000 0.800000 0.700000
[65] 0.600000 0.714286 0.866667 0.244444 0.714286 0.377778 0.488889 0.000000
[73] 0.000000
In empirical networks, we often observe a tendency towards high transitivity (“the friend of a friend is a friend”)
The degree of a node in a network is the number of connections it has to other nodes.
The degree distribution is the probability distribution of the degrees over the whole network.
Empirical degree distributions are generally right skewed.
(Many nodes have a few conncetions and few have many) “preferential attachment”, “matthew effect”, “the rich get richer”
A measure of centrality is an index that assigns a numeric values to the nodes of the network. The higher the value, the more central the node.
“Being central” is a very ambiguous term hence there exists a large variety of indices that assess centrality with very different structural properties.
Degree
Number of direct neighbors (“popularity”)
Closeness
Reciprocal of the sum of the length of the shortest paths
Betweenness
Number of shortest paths that pass through a node (“brokerage”)
Eigenvector
Being central means being connected to other central nodes
PageRank
Similar to eigenvector, just for directed networks
igraph contains the following 10 indices:
degree())strength())betweenness())closeness())eigen_centrality())alpha_centrality())power_centrality())page_rank())eccentricity())authority_score() and hub_score())subgraph_centrality())The sna package implements roughly the same indices as but adds:
flowbet())loadcent())gilschmidt())infocent())stresscent()) [1] "averagedis" "barycenter" "bottleneck"
[4] "centroid" "closeness.currentflow" "closeness.freeman"
[7] "closeness.latora" "closeness.residual" "closeness.vitality"
[10] "clusterrank" "communibet" "communitycent"
[13] "crossclique" "decay" "diffusion.degree"
[16] "dmnc" "entropy" "epc"
[19] "geokpath" "hubbell" "katzcent"
[22] "laplacian" "leaderrank" "leverage"
[25] "lincent" "lobby" "markovcent"
[28] "mnc" "pairwisedis" "radiality"
[31] "salsa" "semilocal" "topocoefficient"
centiserver lists more than 400 indices
CINNA:
Computing and comparing top informative centrality measures
netrankr:
Tools which allow an indexfree assessment of centrality, including:
Help
Tutorials and material for using netrankr can be found at https://schochastics.github.io/netrankr/
Cohesive subgroups are subsets of actors among whom there are relatively strong, direct, intense, frequent, or positive ties.
Methods that formalize the intuitive and theoretical notion of social group using social network properties

A clique in a network is a set of nodes that form a complete subnetwork within a network (called a complete subgraph).
A maximal clique is a clique that cannot be extended to a bigger clique by addding more nodes to it.
Maximal cliques can be calculated with max_cliques()
[[1]]
+ 3/30 vertices, from 0193e05:
[1] 9 17 18
[[2]]
+ 3/30 vertices, from 0193e05:
[1] 7 4 5
[[3]]
+ 3/30 vertices, from 0193e05:
[1] 7 4 8
[[4]]
+ 3/30 vertices, from 0193e05:
[1] 10 2 11
[[5]]
+ 3/30 vertices, from 0193e05:
[1] 16 12 15
[[6]]
+ 3/30 vertices, from 0193e05:
[1] 6 1 5
[[7]]
+ 4/30 vertices, from 0193e05:
[1] 12 13 15 14
[[8]]
+ 3/30 vertices, from 0193e05:
[1] 12 2 1
[[9]]
+ 5/30 vertices, from 0193e05:
[1] 1 2 5 4 3
A k-core is a subgraph in which every node has at least k neighbors within the subgraph. A k-core is thus a relaxed version of a clique.
[1] 4 4 4 4 4 3 2 2 2 2 2 3 3 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1
Minimum-cut method
cut graph into partitions which minimizes some metric
Hierarchical clustering
Agglomerative/Divisive methods to build a hierarchy of clusters
based on node similarity
Modularity Maximization
Modularity is defined as the fraction of edges that fall within given groups minus the expected fraction if edges were random
Statistical inference
stochastic blockmodeling based on generative models
# compute clustering
clu <- cluster_louvain(karate)
# cluster membership vector
mem <- membership(clu)
mem [1] 1 1 1 1 2 2 2 1 3 3 2 1 1 1 3 3 2 1 3 1 3 1 3 4 4 4 3 4 4 3 3 4 3 3
$`1`
[1] 1 2 3 4 8 12 13 14 18 20 22
$`2`
[1] 5 6 7 11 17
$`3`
[1] 9 10 15 16 19 21 23 27 30 31 33 34
$`4`
[1] 24 25 26 28 29 32
imc <- cluster_infomap(karate)
lec <- cluster_leading_eigen(karate)
loc <- cluster_louvain(karate)
sgc <- cluster_spinglass(karate)
wtc <- cluster_walktrap(karate)
scores <- c(infomap = modularity(karate,membership(imc)),
eigen = modularity(karate,membership(lec)),
louvain = modularity(karate,membership(loc)),
spinglass = modularity(karate,membership(sgc)),
walk = modularity(karate,membership(wtc)))
scores infomap eigen louvain spinglass walk
0.402038 0.393409 0.418803 0.419790 0.353222
A two-mode network is a network that consists of two disjoint sets of nodes (like people and events)
Common examples include:
The adjacency matrix is called incidence matrix
6/27 3/2 4/12 9/26 2/25 5/19 3/15 9/16 4/8 6/10 2/23 4/7 11/21 8/3
EVELYN 1 1 1 1 1 1 0 1 1 0 0 0 0 0
LAURA 1 1 1 0 1 1 1 1 0 0 0 0 0 0
THERESA 0 1 1 1 1 1 1 1 1 0 0 0 0 0
BRENDA 1 0 1 1 1 1 1 1 0 0 0 0 0 0
CHARLOTTE 0 0 1 1 1 0 1 0 0 0 0 0 0 0
FRANCES 0 0 1 0 1 1 0 1 0 0 0 0 0 0
ELEANOR 0 0 0 0 1 1 1 1 0 0 0 0 0 0
[ reached getOption("max.print") -- omitted 1 row ]
tnet and bipartite offer some methods to analyse two mode networks directly, by adapting tools for standard networks.
naïve
delete all edge with a weight less than x
advanced
statistical tools using null models: backbone
Introduction to the package
Signed networks include two types of relations:
positive (“friends”) and negative (“foes”)
typical research questions involve (implemented in signnet):

Beyond triangles
A network is balanced if it can be partitioned into two groups such that all intra group edges are positive and all inter group edges are negiative
Extended form of balance (Davis 1960s)
A network is balanced if it can be partitioned into k groups …
In signed blockmodeling, the goal is to determine \(k\) blocks of nodes such that all intra-block edges are positive and inter-block edges are negative
$membership
[1] 3 3 1 1 2 1 1 1 2 2 1 1 2 2 3 3
$criterion
[1] 2

The diagonal block structure is not always the most optimal representaion of the data
The function signed_blockmodel_general() allows to specify arbitrary block structures.
set.seed(424) #for reproducibility
blockmat <- matrix(c(1,-1,-1,-1,1,1,-1,1,-1),3,3,byrow = TRUE)
blockmat [,1] [,2] [,3]
[1,] 1 -1 -1
[2,] -1 1 1
[3,] -1 1 -1
general <- signed_blockmodel_general(g,blockmat,alpha = 0.5)
traditional <- signed_blockmodel(g,k = 3,alpha = 0.5,annealing = TRUE)
c(general$criterion,traditional$criterion)[1] 0 6

This package provides a tidy API for graph/network manipulation. While network data itself is not tidy, it can be envisioned as two tidy tables, one for node data and one for edge data. tidygraph provides a way to switch between the two tables and provides
dplyrverbs for manipulating them.

It more or less wraps the full functionality of
igraphin a tidy API giving you access to almost all of the dplyr verbs plus a few more, developed for use with relational data.

ordinary data
network data
maintained by Posit
stable and reliable (many other packages have been abandoned)
plays well with other packages
thoughtful API
extension of ggplot2
grammar of graphics
# load and manipulate data
data("starwars")
sw1 <- starwars[[1]]
sw_palette <- c("#1A5878", "#C44237", "#AD8941", "#E99093", "#50594B")
V(sw1)$interactions <- graph.strength(sw1)
#plot
ggraph(graph = sw1,layout = "stress") +
geom_edge_link0(edge_colour = "grey25",
aes(edge_width = weight)) +
geom_node_point(shape = 21, color = "black",stroke = 1,
aes(fill = sex,size = interactions)) +
geom_node_text(color = "black", size = 4, repel = FALSE,
aes(filter = (interactions>=65),label = name))+
scale_edge_width(range = c(0.1,1.5),guide = "none")+
scale_size(range = c(3,10),guide = "none")+
scale_fill_manual(values = sw_palette, na.value = "grey",name = "")+
coord_fixed()+
theme_graph() +
theme(legend.position = "bottom") +
guides(fill = guide_legend(override.aes = list(size=6)))
ggraph(graph = sw1,layout = "stress", ...)

geom_edge_link0(edge_colour = "grey25", aes(edge_width = weight))
[1] "geom_edge_arc" "geom_edge_arc0" "geom_edge_arc2"
[4] "geom_edge_bend" "geom_edge_bend0" "geom_edge_bend2"
[7] "geom_edge_density" "geom_edge_diagonal" "geom_edge_diagonal0"
[10] "geom_edge_diagonal2" "geom_edge_elbow" "geom_edge_elbow0"
[13] "geom_edge_elbow2" "geom_edge_fan" "geom_edge_fan0"
[16] "geom_edge_fan2" "geom_edge_hive" "geom_edge_hive0"
[19] "geom_edge_hive2" "geom_edge_link" "geom_edge_link0"
[22] "geom_edge_link2" "geom_edge_loop" "geom_edge_loop0"
[25] "geom_edge_parallel" "geom_edge_parallel0" "geom_edge_parallel2"
[28] "geom_edge_point" "geom_edge_span" "geom_edge_span0"
[31] "geom_edge_span2" "geom_edge_tile"
geom_edge_type: generate n points, draw path
geom_edge_type0: direct line
geom_edge_type2: can interpolate node parameters
geom_edge_link0() and geom_edge_parallel0() suffice
geom_edge_link0(edge_colour = "grey25", aes(edge_width = weight))
mapping aesthetics
edge_colour = "grey25")aes(edge_width = weight))available aesthetics
edge_colo(u)r, edge_width, edge_linetype, edge_alpha
geom_node_point(shape = 21, color = "black",stroke = 1, aes(fill = sex,size = interactions))
geom_node_point(): draw nodes as a simple pointgeom_node_point(shape = 21, color = "black",stroke = 1, aes(fill = sex,size = interactions))
mapping aesthetics
shape = 21)aes(fill = sex))available aesthetics
alpha, colo(u)r, fill, shape, size, stroke
(usage of colour, fill, and stroke depend on shape) 
geom_node_text(color = "grey25", size = 4, repel = FALSE, aes(filter = (interactions>=65),label = name))
geom_node_text(): add text to nodegeom_node_label(): add text to node with framegeom_node_text is the preferred choice
geom_node_text(color = "grey25", size = 4, repel = FALSE, aes(filter = (interactions>=65),label = name))
mapping aesthetics
- global: specify font properties
- via attributes: set label to name attribute of node
- filter: only display for nodes (or edges!) that fulfil a given criterion
available aesthetics
many! but most important: label, colour, family, size, and repel
scale_edge_width(range = c(0.1,1.5),guide = "none")
scale_size(range = c(3,10),guide = "none")
scale_fill_manual(values = sw_palette, na.value = "grey",name = "")
control aesthetics that are mapped within aes()
although optional, set one scale_* per parameter in any aes()
form of scale functions
scale_<aes>_<variable type>()
additional options
guide (show legend or not), name (label in legend), na.value (value for NAs)
node size and edge width (and node/edge alpha)
scale_size() and scale_edge_width()
most relevant parameter is range = c(min,max)
continuous variable to colour
scale_(edge_)colour_gradient(low = ...,high = ...)
categorical variable to colour
scale_colour_brewer()
scale_colour_manual(values = ...)
misc: scale_shape() and scale_edge_linetype()
theme_graph() + theme(legend.position = "bottom")
control the overall look of the plot
theme() has a lot of options but we really don’t need them (except legend.position)theme_graph() erases all defaults (e.g. axis, grids, etc.)guides(fill = guide_legend(override.aes = list(size=6)))
change appearance of geoms in legend (highly optional!)
layout
ggraph(graph,layout = "stress") +
edges
geom_edge_link0(<<global>>,aes(<<via variables>>)) +
nodes
geom_node_point(<global>,aes(<via variables>)) +
geom_node_text(<global>,aes(<via variables>)) +
scales
scale_<aes>_<variable type>() + (one per variable in aes())
themes
theme_graph()
layout stress is sufficient for most network visualization tasks
Emphasize the position of certain nodes in the network.
ggraph(sw1,layout = "focus",focus = 19)+
draw_circle(col = "#00BFFF", use = "focus",max.circle = 3)+
geom_edge_link0(edge_colour = "grey25",edge_alpha = 0.5)+
geom_node_point(shape = 21,size = 5,fill = "grey66")+
geom_node_text(aes(filter = (name=="ANAKIN"),label = name))+
theme_graph()+
coord_fixed()focus=... : id to be put in the center (other nodes are on concentric circles around it)draw_circle: draw the concentric circles
concentric circle layout according to a centrality index
strength <- graph.strength(sw1)
ggraph(sw1,layout = "centrality",cent = strength)+
draw_circle(col = "#00BFFF", use = "cent")+
geom_edge_link0(edge_colour = "grey25",edge_alpha = 0.5)+
geom_node_point(shape = 21,size = 5,fill = "grey66")+
geom_node_text(aes(filter = (strength>=45),label = name),repel = TRUE)+
theme_graph()+
coord_fixed()
layout_as_backbone() can help emphasize hidden group structures
try the standard first

try to reveal the hidden group structure with layout="backbone"

facebook friendships of a university. Node colour corresponds to dormitory of students

most useful layouts
layout = "stress": all purpose layout algorithm
layout = "focus": ego-centric type layouts
layout = "centrality": concentric centrality layout
layout = "backbone": emphasize a group structure (if it exists)
layout = "sparse_stress": large networks
not covered here
multilevel layouts
dynamic layouts
constrained stress layout algorithm
Do not recompute layout continuously
But need to recompute the layout if attributes change!
Using the package edgebundle
Analyze “standard” networks with igraph
Analyze two-mode network projections with backbone
Analyze signed networks with
Missing something? try netUtils
Need toy data? use networkdata
Need a GUI to analyze networks in RStudio? try snahelper
“I like my networks tidy” tidygraph
“Is there a ggplot2 for networks?” ggraph
I cannot do x because it is not implemented in R netUtils
Tutorial for Network Visualization: https://www.mr.schochastics.net/material/netVizR/
Tutorial for Network Analysis: https://www.mr.schochastics.net/material/netAnaR/
Tutorial for Tidy Network Analysis: https://www.mr.schochastics.net/material/tidynetAnaR/
Workshop series for Ukraine