The objectives of this introductory class in Network Science are threefold. Our principle aims are summarized below:
Network: A theoretical, mathematical structure that describes the relations or associations between similar elements or groups therof representing them schematically.
From the formalistic point of view, networks are mathematical representations (graphs), in which specific elements are depicted in the form of nodes (or vertices). Associations between nodes are depicted in the form of links (or edges).
Depending on the different types of associations The networks may be: a) directed or undirected b) weighted or unweighted c) simple or bi/multi-partite
Different types of networks may describe the same system depending on the level of available information. If only associations are known we describe the network as undirected. If there are known types of associations that may be positive or negative (e.g. in a regulatory network) we describe a network as directed. Links now have a direction and edges are described as “incoming” or “outgoing”. If besides the simple association there are also recorded quantitative aspects (e.g. how much a link is corresponding to a certain function) then the links become weighted and the edges are assigned a numerical value or score.
Bipartite networks are networks formed by nodes of two different types, with edges not allowed between elements of the same type. Bipartite networks may be “collapsed” to simple networks through joining of nodes, originally connected via nodes of the alternative type. Examples of bipartite networks in biology are metabolic networks where substrates and enzymes form two distinct -non interacting- categories of components.
As in the general case, network motifs are modules of the network that tend to occur more often than expected by chance. In this sense they correspond to types of interactions/associations that may be reflecting common mechanistic relationships between components of the network. In the Figure above we see different motifs of regulation. These are a) auto-activation, b) auto-repression, c) self-activating loop, d) feed forward stimuation, e) central input and f) cascade with negative feedback. All of them are common regulatory modules in the studied organism and may help us understand some basic mechanisms of gene regulation.
We denote with k(i) the degree of the i-th node in the network. This is the number of links associated to i. In an undirected network the total number of links, L, can be expressed as the sum of the node degrees:
\[L=\frac{1}{2} \sum_{i=1}^{N}\frac{1}{k_i}\]
Here the 1/2 factor corrects for the fact that in the sum each link is counted twice.
In a directed network the 1/2 factor is ommitted as the degree for each node is counted separately for incoming and outgoing nodes.
An important measure of centrality for a given network is the mean degree, which corresponds to the average number of links for the total of the networks node’s. In an undirected network this is equal to:
\[<k>=\frac{1}{N}\sum_{i=1}^{N}{k_i}=\frac{2L}{N}\]
Exercise: Calculate the maximum number of edges for a directed and an undirected network with N nodes.
\[d=\frac{1}{N(N-1)}\sum_{i,j=1}^{N}d_{i,j}\]
In order to calculate it one needs to calculate all shortest paths in the network which are \(N(N-1)\). This is not a trivial task but can be solved with the application of a Breadth First Search (BFS) approach. Breadth First Search (BFS) is a graph algorithm that enumerates all nodes in the graph. One can use it to calculate the shortest path of each pair of nodes by starting from a given node and traversing all edges and marking all visited nodes with the length of the path, as shown in the figure below.
\[C_i=\frac{2L_i}{k_i(k_i-1)}\] In geometric terms, the clustering coefficient may be seen as the number of “triangular” relationships, formed by the links between the first neighbours of a given node. In social terms, it measures how likely it is that a person’s “friends” are friends with each other.
The clustering coefficient is assigned to particular nodes but an overall average may be easily calculated through the arithmetic mean of all nodes:
\[<C>=\frac{1}{N}\sum_{i=1}^{N}C_i\]
Exercise: Describe an algorithm that will scan a network for connected components
Exercise: Propose a way to calculate degree assortativity from an adjacency matrix (or an edge list). What can we say of networks that are a) highly assortative in terms of degree or b) of low degree assortativity?
Random networks are formed under the assumption that each node is associated at random with any other node.
Random networks, also called Erdos-Renyi networks from the duo of Hungarian mathematicians who studied them in detail, can be modeled on the basis of two parameters (N, p). N is the number of nodes in the network and p is the probability of two nodes being connected.
Starting from this simple assumption we can deduce the number of edges in a random network as:
\[L=p\frac{N(N-1)}{2}\]
from which it stems that the average degree is:
\[<k>=\frac{2L}{N}=p(N-1)\] The clustering coefficient in a random network is:
\[C_i=\frac{2<L_i>}{k_i(k_i-1)}\] where \(k_i\) is the number of neighbours of node \(i\). If we combine this equation with the one above on the number of links for the whole network, we get that: \[<C>=p=\frac{<k>}{N}\] and thus that the mean clustering coefficient of a random network is equal to the network’s scaled mean degree, which, in turn, is equal to the probability of two nodes being linked.
We thus prefer the latter as it may describe Random Networks with one parameter only, namely \(p\).
How may the mean degree \(<k>\) be affecting the overall structure of a random network? Depending on the value of \(<k>\) a random network may be formed by isolated components, may form one giant component, or even be totally connected. A critical value of \(<k>\) is 1, below which the network is formed in isolated components and above which the emergence of a giant component is observed. In brief for \(<k>>1\) random networks form a component that links all nodes.
Most of the naturally occurring networks are not random. Their degree distribution are not Poisson-like as a considerable number of nodes have higher degrees than expected under randomness. (These are called hubs).
Naturally occurring networks show a particularly consistent node degree distribution according to which:
\[p_k \sim k^γ\] by a log-transformation this becomes:
\[log(p) \sim γlog(k)\] This means that in naturally occurring networks the probability of a node p to have k links is scaling with the logarithm of k. γ is usually between 2 and 3. We say that the degree distribution follows a power-law distribution which is linear in double-logarithmic scale as may be seen in the Figure below for three networks from very different fields.
But what is so interesting in this power-law property? In a number of ways it reflects the dynamics underlying the formation, the evolution and the stability of these networks. First of all it describes networks were the majority of the links are distributed among the minority of nodes. This is reminiscent of a Pareto distribution or the 80-20 rule according to which 20% of the nodes collect 80% of the links and vice versa. This disparity actually means that the range of degree values is much greater than in a random network and so there is significant probability to find nodes with both very high and very low degrees.
It seems that the same structure may be repeated in different scales of magnitude from the very small to the very large, in contrary to random networks where most of the nodes may be contained in a tight range around the mean. This property is what gives these networks their name as “scale-free networks”. The scale-free property emerges in very different systems from linguistics to population dynamics to biology.
It also gives somes additional properties to the networks that carry it. Compared to random networks, scale-free networks are:
1. Μore connected. Average shortest paths are shorter than in random networks. These are even shorter when they involve hubs and, due to the 80-20 principle, most of them do.
2. More robust. Since most of the links are associated to a few nodes it is more difficult to destroy the network’s structure by random attacks to nodes. This will require targeting the hubs.
3. Stably evolving. Attachment of new nodes further enforces the scale-free property since it is done “preferentially”, meaning that the likelihood of linking a new node with a hub.