$G = (V,E) $ is a graph. Let $ A $ be an adjency matrix of a graph G.
There are 3-Levels of learning tasks.
- Node level
- Link level
- Graph level
Node Features
Intro : Given the classes of some nodes, we want to know the class of other nodes.
-
Eigenvalue Centrality
Node is important if its neighborhood is important.
$u_{s} = \frac{1}{\lambda} \sum_{v \in N(s)} u_{v} $ ($\lambda$ is the normalizing constant.)
$\lambda u = Au$ where A is adjencey matrix. Thus, $\lambda$ is the eigenvalue of A. By Perron-Frobenius Theorem, since all the elements of A are non-negative the largest eigenvalue is positive. We usually choose eigenvector u corresponding to the largest eigenvalue as node centrality. -
Betweenness Centrality
Node is important if it lies on many shortest paths.
$u_{s} = \sum_{v \neq s \neq w} \frac{\text{the number of shortest path paths between v and w that pass through s}}{\text{the number of shortest paths between v and w}} $ -
Closeness Centrality
Node is important if it has small shortest path lengths to all other nodes.
$u_{s} = \frac{1}{\sum_{v \neq s}\text{shortest path length between v and s}}$ -
Clustering Coefficients
Node is important if connected to many neighboring nodes.
$e_{v} = \frac{\text{the number of edges among neighboring nodes}}{k_{v} \text{choose} 2}$ It counts the number of triangles that a node touches. -
Graphlets
Count the number of pre-specified subgraphs with the given node.
Link Features
Given the graph, we want to predict the link or relationship between two different nodes. For example, a link between node is deleted then the connection is predicted for research on chemical reactions. Connection between people can be predicted for social network studies.
-
Distance based link features The link feature between two nodes is measured by the shortest path between two nodes. Yet, this does not capture the degree of nodes.
-
Local Neighborhood Overlap Captures the number of common neighbors of two nodes.
(i) Common Neighbors : $ |N(v_{1})\cap N(v_{2})| $
(ii) Jaccard’s coefficient : $\frac{ |N(v_{1})\cap N(v_{2})|}{|N(v_{1})\bigcup N(v_{2})|}$
(iii) Adamic-Adar index : $\sum_{u \in N(v_{1})\cap N(v_{2})} \frac{1}{\log k_{u}} $ ($k_{u}$ is the degree of node u) So as node u is popular and connected to many other nodes, it is less likely that neighbors of node u are linked together. (Think about celebrity.)
Local Neighborhood Overlap fails to connect two nodes without common neighbors. -
Global Neighborhood Overlap Karz index counts the number of walks of all lengths between two nodes. It is easily countable by using adjency matrix. $S_{v_{1},v_{2}} = \sum_{i=1}^{\infty}\beta^{i}A_{v_1,v_2}^{i} $ is the karz index between node v1 and v2. ($\beta$ is the discount factor for each walk, $A_{v_1,v_2}^{i} $ is the walks of length i between two nodes.)
By the geometric series, $ S = (I-\beta A)^{-1} - I $
Graph-level Features
Use Graph Kernels for similarity of graphs. i.e. $K(G,G’) = \phi(G)^{T} \phi(G’)$ is the functional.
-
Graphlet Kernel Idea : Bag of words for graph. But to compute this, subgraph isomorphism must be found. It is NP-hard.
-
Weisfeiler-Lehman Kernel
-
Randon-Walk Kernel
-
Shortest-path graph kernel
-
Quantum-Random-Walk Kernel
-
and many others…
For more details on graph kernels, refer Intro to Graph Kernels ppt(ETHZ) and A survey on graph kernels, Kriege et al(2020)
Main Reference Stanford cs224w