$G = (V,E) $ is a graph. Let $ A $ be an adjency matrix of a graph G.
There are 3-Levels of learning tasks.

  1. Node level
  2. Link level
  3. Graph level
Node Features

Intro : Given the classes of some nodes, we want to know the class of other nodes.

  • Eigenvalue Centrality
    Node is important if its neighborhood is important.
    $u_{s} = \frac{1}{\lambda} \sum_{v \in N(s)} u_{v} $ ($\lambda$ is the normalizing constant.)
    $\lambda u = Au$ where A is adjencey matrix. Thus, $\lambda$ is the eigenvalue of A. By Perron-Frobenius Theorem, since all the elements of A are non-negative the largest eigenvalue is positive. We usually choose eigenvector u corresponding to the largest eigenvalue as node centrality.

  • Betweenness Centrality
    Node is important if it lies on many shortest paths.
    $u_{s} = \sum_{v \neq s \neq w} \frac{\text{the number of shortest path paths between v and w that pass through s}}{\text{the number of shortest paths between v and w}} $

  • Closeness Centrality
    Node is important if it has small shortest path lengths to all other nodes.
    $u_{s} = \frac{1}{\sum_{v \neq s}\text{shortest path length between v and s}}$

  • Clustering Coefficients
    Node is important if connected to many neighboring nodes.
    $e_{v} = \frac{\text{the number of edges among neighboring nodes}}{k_{v} \text{choose} 2}$ It counts the number of triangles that a node touches.

  • Graphlets
    Count the number of pre-specified subgraphs with the given node.

Link Features

Given the graph, we want to predict the link or relationship between two different nodes. For example, a link between node is deleted then the connection is predicted for research on chemical reactions. Connection between people can be predicted for social network studies.

  • Distance based link features The link feature between two nodes is measured by the shortest path between two nodes. Yet, this does not capture the degree of nodes.

  • Local Neighborhood Overlap Captures the number of common neighbors of two nodes.
    (i) Common Neighbors : $ |N(v_{1})\cap N(v_{2})| $
    (ii) Jaccard’s coefficient : $\frac{ |N(v_{1})\cap N(v_{2})|}{|N(v_{1})\bigcup N(v_{2})|}$
    (iii) Adamic-Adar index : $\sum_{u \in N(v_{1})\cap N(v_{2})} \frac{1}{\log k_{u}} $ ($k_{u}$ is the degree of node u) So as node u is popular and connected to many other nodes, it is less likely that neighbors of node u are linked together. (Think about celebrity.)
    Local Neighborhood Overlap fails to connect two nodes without common neighbors.

  • Global Neighborhood Overlap Karz index counts the number of walks of all lengths between two nodes. It is easily countable by using adjency matrix. $S_{v_{1},v_{2}} = \sum_{i=1}^{\infty}\beta^{i}A_{v_1,v_2}^{i} $ is the karz index between node v1 and v2. ($\beta$ is the discount factor for each walk, $A_{v_1,v_2}^{i} $ is the walks of length i between two nodes.)
    By the geometric series, $ S = (I-\beta A)^{-1} - I $

Graph-level Features

Use Graph Kernels for similarity of graphs. i.e. $K(G,G’) = \phi(G)^{T} \phi(G’)$ is the functional.

Main Reference Stanford cs224w