Network Data

This chapter introduces the fundamental concepts and terminology used in network analysis and explains how network data can be represented. It begins with an overview of what networks are, how they are structured, and the different ways in which relational data can be stored and organized.

The main focus of the chapter is practical: you will learn how to work with network data in R using the igraph package. We cover how to construct network objects from raw data, how to import existing network files, and how to represent nodes, edges, and their attributes within R.

By the end of this chapter, you will understand the basic principles of network data and be able to create and manipulate network objects in R using igraph.

Packages Needed for this Chapter

library(igraph)

What is a Network?

In the context of social network analysis, a network is a conceptual and analytical construct used to understand, visualize, and examine the relationships and structures that emerge from interactions among individuals, groups, organizations, or even entire societies. Rather than focusing solely on the attributes of actors, a network perspective centers on the ties that connect them and the structures that arise from these connections.

At its core, a network consists of nodes and edges. Nodes represent the actors in the network, such as individuals, organizations, or other social entities. Edges represent the relationships or connections between these actors. These relationships can take many forms and capture different aspects of social life.

Connections may be based on similarities, such as shared location, participation in the same event or organization, or common attributes. They may reflect relational roles, such as kinship ties or other socially defined roles. Networks can also capture relational cognition, including affective or perceptual relations (e.g., liking, trust, perceived influence).

Many network ties are based on interactions, such as who talked to whom, who helped whom, or who sold goods to whom. Others represent flows, such as the transmission of information, beliefs, or money. By mapping these different types of relationships, network analysis provides a systematic way to study how social structures are formed and maintained.

Importantly, ties can differ in their properties. Relations may be symmetric, as in mutual friendship, or asymmetric, as in advice-seeking or authority relations (though asymmetric ties can still be reciprocated and thus become bi-directional). Ties may vary in strength, frequency of contact, or intensity, and they can be positive (e.g., friendship, cooperation) or negative (e.g., conflict, dislike).

By conceptualizing social life in terms of nodes and edges and by distinguishing between different types and properties of ties, social network analysis provides a flexible and powerful framework for examining the relational foundations of social structure.

Why Study Networks?

Conventional research methods are often individual-based, and our models typically focus on relationships between variables rather than relationships between people. However, much of the world around us is fundamentally structured as networks. This is true not only for society, but also for brains (neural networks), organizations (who reports to whom), economies (who sells to whom), and ecologies (who eats whom). In each of these domains, outcomes are shaped not just by the attributes of individual units, but by the pattern of connections among them.

Classical social theorists emphasized this relational foundation of social life. Georg Simmel argued that society exists where individuals enter into interaction. Émile Durkheim described society as a system formed by associated individuals and their channels of communication. Karl Marx similarly stressed that society does not consist of isolated individuals, but expresses the sum of interrelations within which individuals stand. These perspectives share a central insight: social structure is relational.

Despite this, much empirical research continues to privilege individual-level explanations. For example, if David predominantly eats vegetarian food, we might explain this in terms of his ethical beliefs, economic situation, health concerns, or taste preferences. A network perspective, however, would also consider whether David has a vegetarian partner or close friends who influence his dietary choices. Instead of focusing solely on internal attributes, we examine the relational context in which decisions are embedded.

Network effects also shape emotions and well-being. If someone close to you is unhappy, are you likely to remain unaffected? Feelings, behaviors, and norms often spread through social ties. What happens to one person can influence others in their immediate network and beyond.

Similarly, access to opportunities may depend not only on individual merit but also on personal networks. Jobs, information, and resources are frequently obtained through social connections. Life chances are therefore shaped not just by who we are, but by how we are connected and where we are positioned within a network.

Studying networks allows us to move beyond isolated individuals and to analyze the patterns of relationships that structure behavior, outcomes, and inequality. A network perspective makes these relational structures visible and provides tools to systematically investigate their consequences.

From “Ordinary” to Network Data

Traditional research often treats individuals or entities as independent units of analysis. Even when the focus is on pairs of individuals, such as couples, these dyads are frequently treated as if they were independent from one another. However, in many social settings, relationships are interdependent and overlapping. A person can be part of multiple dyads simultaneously, and these connections influence one another.

As a result, the usual statistical assumption of independence does not hold in networked settings. Observations are not isolated: ties between actors create dependencies. What happens in one relationship may affect others, and individuals are embedded in broader relational structures that shape their behavior and outcomes. Recognizing this interdependence is one of the key motivations for adopting a network perspective.

In the social sciences, networks are tools for mapping and quantifying patterns of social connections. They help reveal the underlying dynamics of social cohesion, influence, and information flow within communities and societies. Rather than focusing solely on individual characteristics, network analysis examines how ties between actors (whether individuals, organizations, or other entities) form structures that shape opportunities and constraints.

Social network analysis is guided by a structural intuition: social life is organized through relationships. It is grounded in systematic empirical data, relying on observed and measured connections between actors. At the same time, it draws heavily on graphical representations, using network diagrams to visualize nodes (actors) and edges (ties) and to make structural patterns such as clusters, central positions, or bridging roles visible.

Beyond visualization, network analysis employs mathematical and computational models to quantify structural properties, test hypotheses about relational processes, and model how networks form and evolve. Through the lens of network theory, researchers can explore how social structures influence behaviors, access to resources, diffusion of information, and life chances. For this reason, social network analysis has become an invaluable approach in sociology, anthropology, political science, and many other disciplines concerned with social systems and interactions.

Network Representations

There are several possible ways to express network data. All come with a set of advantages and disadvantages.

Adjacency Matrix

An adjacency matrix is a square matrix where the elements indicate whether pairs of vertices in the graph are adjacent or not—meaning, whether they are directly connected by an edge. If the graph has \(n\) vertices, the matrix \(A\) will be an \(n \times n\) matrix where the entry \(A_{ij}\) is \(1\) if there is an edge from vertex \(i\) to vertex \(j\), and \(0\) if there is no edge. In the case of weighted graphs the weight of the edge is used. This matrix is symmetric for undirected graphs, indicating that an edge is bidirectional.

Pros:

  • Simple Representation: It provides a straightforward and compact way to represent graphs, especially useful for dense graphs where many or most pairs of vertices are connected.

  • Efficient for Edge Lookups: Checking whether an edge exists between two vertices can be done in constant time, making it efficient for operations that require frequent edge lookups.

  • Easy Implementation of Algorithms: Many graph algorithms can be easily implemented using adjacency matrices, making it a preferred choice for certain computational tasks.

Cons:

  • Space Inefficiency: For sparse graphs, where the number of edges is much less than the square of the number of vertices, an adjacency matrix uses a lot of memory to represent a relatively small number of edges.

  • Poor Scalability: As the number of vertices grows, the size of the matrix grows quadratically, which can quickly become impractical for large graphs.

Edge List

An edge list is a matrix where each row indicates an edge. In an undirected graph, an edge is represented by a pair \((i,j)\), indicating a connection between vertices \(i\) and \(j\). For directed graphs, the order of the vertices in each pair denotes the direction of the edge, from the first vertex to the second. In weighted graphs, a third column can be added to each pair to represent the weight of the edge.

Pros:

  • Space Efficiency for Sparse Graphs: Edge lists are particularly space-efficient for representing sparse graphs where the number of edges is much lower than the square of the number of vertices, as they only store the existing edges.

  • Simplicity: The structure is straightforward and easy to understand, making it suitable for simple graph operations and for initial graph representation before processing.

Cons:

  • Inefficient for Edge Lookups: Checking whether an edge exists between two specific vertices can be time-consuming, as it may require scanning through the entire list, leading to an operation that is linear in the number of edges.

  • Inefficiency in Graph Operations: Operations like finding all vertices adjacent to a given vertex or checking for connectivity between vertices can be inefficient compared to other representations like adjacency matrices or adjacency lists, especially for dense graphs.

  • Less Suitable for Dense Graphs: As the number of edges grows, the edge list can become large and less efficient in terms of both space and operation time compared to an adjacency matrix for dense graphs, where the number of edges is close to the maximum possible number of edges.

Adjacency List

An adjacency list is a collection of lists, with each list corresponding to the set of adjacent vertices of a given vertex. This means that for every vertex \(i\) in the graph, there is an associated list that contains all the vertices \(j\) to which \(i\) is directly connected.

Pros:

  • Space Efficiency: Adjacency lists are more space-efficient than adjacency matrices in sparse graphs, as they only store information about the actual connections.

  • Scalability: This representation scales better with the number of edges, especially for graphs where the number of edges is far less than the square of the number of vertices.

  • Efficiency in Graph Traversal: For operations like graph traversal or finding all neighbors of a vertex, adjacency lists provide more efficient operations compared to adjacency matrices, particularly in sparse graphs.

Cons:

  • Edge Lookups: Checking whether an edge exists between two specific vertices can be less efficient than with an adjacency matrix, as it may require traversing a list of neighbors.

  • Variable Edge Access Time: The time to access a specific edge or to check for its existence can vary depending on the degree of the vertices involved, leading to potentially inefficient operations in certain scenarios.

  • Higher Complexity for Dense Graphs: In very dense graphs, where the number of edges approaches the number of vertex pairs, adjacency lists can become less efficient in terms of space and time compared to adjacency matrices, due to the overhead of storing a list for each vertex.

Importing Network Data

Foreign Formats

igraph can deal with many different foreign network formats with the function read_graph. (The rgexf package can be used to import Gephi files.)

read_graph(
  file,
  format = c(
    "edgelist",
    "pajek",
    "ncol",
    "lgl",
    "graphml",
    "dimacs",
    "graphdb",
    "gml",
    "dl"
  ),
  ...
)

If your network data is in one of the above formats you will find it easy to import your network.

Nodes, Edges, and Attributes

If your data is not in a network file format, you will need one of the following functions to turn raw network data into an igraph object: graph_from_edgelist(), graph_from_adjacency_matrix(), graph_from_adj_list(), or graph_from_data_frame().

Before using these functions, however, you still need to get the raw data into R. The concrete procedure depends on the file format. If your data is stored as an excel spreadsheet, you need additional packages. If you are familiar with the tidyverse, you can use the readxl package. Other options are, e.g. the xlsx package.

Most network data you’ll find is in a plain text format (csv or tsv), either as an edgelist or adjacency matrix. To read in such data, you can use base R’s read.table().

Make sure you check the following before trying to load a file: Does it contain a header (e.g. row/column names of an adjacency matrix)? How are values delimited (comma, whitespace or tab)? This is important to set the parameters header, sep to read the data properly.

Networks in igraph

Below, we represent friendship relations between Bob, Ann, and Steve as a matrix and an edgelist.

# adjacency matrix
A <- matrix(
  c(0, 1, 1, 1, 0, 1, 1, 1, 0),
  nrow = 3,
  ncol = 3,
  byrow = TRUE
)

rownames(A) <- colnames(A) <- c("Bob", "Ann", "Steve")
A
      Bob Ann Steve
Bob     0   1     1
Ann     1   0     1
Steve   1   1     0
# edgelist
el <- matrix(
  c("Bob", "Ann", "Bob", "Steve", "Ann", "Steve"),
  nrow = 3,
  ncol = 2,
  byrow = TRUE
)
el
     [,1]  [,2]   
[1,] "Bob" "Ann"  
[2,] "Bob" "Steve"
[3,] "Ann" "Steve"

Once we have defined an edgelist or an adjacency matrix, we can turn them into igraph objects as follows.

g1 <- graph_from_adjacency_matrix(A, mode = "undirected", diag = FALSE)

g2 <- graph_from_edgelist(el, directed = FALSE)
# g1 and g2 are the same graph so only printing g1
g1
IGRAPH 79ac1eb UN-- 3 3 -- 
+ attr: name (v/c)
+ edges from 79ac1eb (vertex names):
[1] Bob--Ann   Bob--Steve Ann--Steve

The printed summary shows some general descriptives of the graph. The string “UN–” in the first line indicates that the network is Undirected (D for directed graphs) and has a Name attribute (we named the nodes Bob, Ann, and Steve). The third and forth character are W, if there is a edge weight attribute, and B if the network is bipartite (there exists a node attribute “type”). The following number indicate the number of nodes and edges. The second line lists all graph, node and edge variables. Here, we only have a node attribute “name”.

The conversion from edgelist/adjacency matrix into an igraph object is quite straightforward. The only difficulty is setting the parameters correctly (Is the network directed or not?), especially for edgelists where it may not immediately be obvious if the network is directed or not.

Import via snahelper

The R package snahelper implements several Addins for RStudio that facilitate working with network data by providing a GUI for various tasks. One of these is the Netreader which allows to import network data.

The first two tabs allow you to import raw data (edges and attributes). Make sure to specify file delimiters, etc. according to the shown preview.

Using the Netreader should comes with a learning effect (hopefully). The last tab shows the R code to produce the network with the chosen data without using the Addin.

The network will be saved in your global environment once you click “Done”.

Scientific reading

Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications.

Scott, J. (2012). What is Social Network Analysis? Bloomsbury Academic.