Many recorded network data span over a long period of time. Researchers often wish to detect “structural changes,” “turning points,” or “critical junctures” from these network data to answer substantively important questions. For example, the evolution of military alliance networks in international politics has been known to experience several breaks due to world wars and the end of Cold War. However, to this date, statistical methods to uncover network changes have been few and far between.
In this vignette, we explain how to use NetworkChange
for network changepoint analysis. NetworkChange
is an
R package that detects structural changes in longitudinal
network data using the latent space approach. Based on the Bayesian
multi-array representation of longitudinal networks (Hoff 2011, 2015), NetworkChange
performs Bayesian hidden Markov analysis (Chib
1998) to discover changes in structural network features across
temporal layers. NetworkChange
can handle various forms of
network changes such as block-splitting, block-merging, and
core-periphery changes. NetworkChange
also provides
functions for model diagnostics using WAIC, average loss, and log
marginal likelihoods as well as visualization tools for dynamic analysis
results of longitudinal networks.
Summary of selected features and functions of the package.
Input data for NetworkChange
takes an array form. Hence,
the dimension of input data is N × N × T.
Currently, NetworkChange
allows longitudinal data of
symmetric (i.e.
undirected) networks.
One quick way to generate a synthetic longitudinal network data set
with a changepoint is to call MakeBlockNetworkChange().
It
has three clusters by default and users can choose the size of data by
selecting the number of nodes in each cluster (n) and the length of time (T). For example, if one chooses
n = 10 and T = 20, an array of 30 × 30 × 20 is generated.
base.prob
is the inter-cluster link probability, and
block.prob
+base.prob
is the intra-cluster link
probability. When one sets block.prob>0
, a clustered
network data set is generated. If block.prob=0
, we have a
random network data set.
set.seed(11173)
n <- 10 ## number of nodes in each cluster
Y <- MakeBlockNetworkChange(n=n, break.point = .5,
base.prob=.05, block.prob=.7,
T=20, type ="split")
dim(Y)
#> [1] 30 30 20
The above code generates a longitudinal network data set with a
changepoint in the middle (break.point=0.5
). We specify the
type of network change as a block-splitting change (`type =“split”})
where the initial two cluster network splits into three clusters.
Currently, MakeBlockNetworkChange()
provides five
different options of network changes (type
): “constant”,
“merge”, “split”, “merge-split”, and “split-merge.” If “constant” is
chosen, the number of breaks is zero. If “merge” or “split” is chosen,
the number of breaks is one. If either “merge-split” or “split-merge” is
chosen, the number of breaks is two.
Users can use plotnetarray()
to visualize the
longitudinal network data. The resulting plot shows a selected number of
equi-distant time-specific networks. By default, n.graph
is
4. Users can change plot settings by changing options in
ggnet
.
A block-merged data set can be generated by settting
type ="merge"
of MakeBlockNetworkChange()
. We
draw 4 snapshots using plotnetarray()
.
set.seed(11173)
Ymerge <- MakeBlockNetworkChange(n=n, break.point = .5,
base.prob=.05, block.prob=.7,
T=20, type ="merge")
plotnetarray(Ymerge)
In this section, We estimate the chagepoints of the block-splitting
data set Y
generated above. We fit a HNC using the
principal eigen-matrix for degree correction.
G <- 100
Yout <- NetworkChange(Y, R=2, m=1, mcmc=G, burnin=G, verbose=0)
#> Initializing using the Alternating Least Squares Method: # of iteration = 10
We then use drawPostAnalysis()
to draw latent node
positions obtained from HNC. We also highlight the latent cluster
structure using the k-means
clustering method over the estimated latent node positions of each
regime. drawPostAnalysis()
provides an option to include
k-means clustering results of the latent node positions for each regime.
n.cluster
sets the number of clusters in each regime. The
clustering results are depicted by assigning unique colors over the
clusters. These two functions, plotting the latent node coordinates and
k-means clustering, can also be applied separately by using
plotU
and `kmeansU} respectively.
As shown by the difference between the distributions of the latent node
traits, the clusters in regime 1 can be clearly distinguished by their
coordinates on the first dimension whereas both dimensional coordinates
are required to distinguish the three clusters in regime 2.
Such transition of the group structure can be easily depicted by
plotting the change of the network generation rule parameters (vt) over
time. plotV
allows one to draw such patterns. While the
first dimensional generation rule parameter v1 exhibits constaly high
values over the entire period, the second dimensional generation rule
parameter rises dramatically near t = 10, the ground truth changepoint
.
Although the above analysis shows suggestive evidence in favor of a single network changepoint, it does not provide a statistical judgement criterion for determining the break number parameter of HNC. Model diagnostics is required to coose an appropriate number of breaks and derive the final modeling result.
BreakDiagnostic()
allows one to check a statistically
supported number of breaks by fitting multiple HNC models with a varying
number of breaks. break.upper
setting allows one to choose
the upper limit of break numbers to be examined.
set.seed(1223)
G <- 100
detect <- BreakDiagnostic(Y, R=2, mcmc=G, burnin=G, verbose=0, break.upper=3)
Users can draw plots of diagnostic results by calling the first element of the test object.
Otherwise, users can print out the numerical results which is stored in the second element of the test object.
print(detect[[2]])
#> $LogMarginal
#> [1] -3075.330 -2651.176 -2666.858 -2674.518
#>
#> $Loglike
#> [1] -3049.776 -2594.429 -2596.480 -2600.354
#>
#> $WAIC
#> [1] 6223.589 5402.413 5458.780 5516.617
#>
#> $`Average Loss`
#> [1] 0.0000000 0.0000000 0.5233333
All diagnostic measures clearly indicate that the number of breaks in the synthetic data is one, agreeing with that of the synthetic data generating process.
In this section, we analyze changes in the international military
alliance network among major powers. The data set is originally from
(Gibler 2008) and users can call this data
set by data(MajorAlly)
.
Our goal in this section is to detect structural changes in the longitudinal alliance network among major powers using HNC. We follow the COW dataset’s coding of “major powers” (the United Kingdom, Germany, Austria-Hungary, France, Italy, Russia, the United States, Japan, and China) in the analysis. We aggregated every 2 year network from the original annual binary networks to increase the density of each layer.
data(MajorAlly)
Y <- MajorAlly
time <- dim(Y)[3]
drop.state <- c(which(colnames(Y) == "USA"), which(colnames(Y) == "CHN"))
newY <- Y[-drop.state, -drop.state, 1:62]
First, we fit a pilot model to elicit reasonable inverse gamma prior values for vt (v0 and v1).
G <- 100
set.seed(1990)
test.run <- NetworkStatic(newY, R=2, mcmc=G, burnin=G, verbose=0,
v0=10, v1=time*2)
V <- attr(test.run, "V")
sigma.mu = abs(mean(apply(V, 2, mean)))
sigma.var = 10*mean(apply(V, 2, var))
v0 <- 4 + 2 * (sigma.mu^2/sigma.var)
v1 <- 2 * sigma.mu * (v0/2 - 1)
Then, we diagnose the break number by comparing model-fits of several models with a varying number of breaks.
set.seed(11223);
detect2 <- BreakDiagnostic(newY, R=2, break.upper=2,
mcmc=G, burnin=G, verbose=0,
v0=v0, v1=v1)
#>
#> At break = 2 one state is dominated by other states and a break point is not defined for this state.
detect2[[1]]
The test results from WAIC, log marginal likelihood, and average loss indicate that HNC with two breaks is most reasonable.
Based on the test result, we fit the HNC with two breaks to the major
power alliance network and save the result in R object
fit
.
G <- 100
K <- dim(newY)
m <- 2
initial.s <- sort(rep(1:(m+1), length=K[[3]]))
set.seed(11223);
fit <- NetworkChange(newY, R=2, m=m, mcmc=G, initial.s = initial.s,
burnin=G, verbose=0, v0=v0, v1=v1)
First, we can examine transitions of hidden regimes by looking at
posterior state probabilities (p(S|𝒴, Θ))
over time. plotState()
in MCMCpack
pacakge
provides a function to draw the posterior state probabilities from
changepoint analysis results. Since our input data is an array, we need
to change the input data as a vector.
Next, we draw regime-specific latent node positions of major powers
using drawPostAnalysis
. Users can choose the number of
clusters in each regime by `n.cluster}.
Then, using drawRegimeRaw()
, we can visualize original
network connections for each regime by collapsing network data within
each regime.
The results are highly interesting to scholars of international relations and history. We discuss the results in details in (Park and Sohn 2020). One of the most interesting findings is the centrality of Austria-Hungary (AUH) during the first two regimes. Austria-Hungary played a role of broker that connected other major powers between 1816 and 1890. This period roughly overlaps with “the age of Metternich” (1815-1848) (Rothenberg 1968). Throughout the first two regimes, the network position of Austria-Hungary remained highly critical in the sense that the removal of Austria-Hungary would have made the major power alliance network almost completely disconnected. In the language of social network analysis, Austria-Hungary filled a “structural hole” in the major power alliance network at the time, playing the role of broker (Burt 2009).
Identifying hidden regimes of the military alliance network makes it clear the central role of Austria-Hungary during the first two regimes in the military alliance network among major powers.
We believe that NetworkChange
will increase the capacity
for longitudinal analysis of network data in various fields, not limited
to social sciences.