DBLab School of Computer and Electrical Engineering KDBSL NTUA
Thursday, July 02, 2020
Τίτλος Fast shortest path distance estimation in large graphs
Έγγραφο Προβολή εγγράφου
Συγγραφέας Michalis Potamias

Michalis Potamias [22/3/2010]

Fast shortest path distance estimation in large graphs

This talk will focus on approximate landmark-based methods for
point-to-point distance estimation in very large networks. These
methods involve selecting a subset of nodes as landmarks and computing
offline the distances from each node in the graph to those landmarks.
At runtime, when the distance between a pair of nodes is needed, it
can be estimated quickly by combining the precomputed distances. We
prove that selecting the optimal set of landmarks is an NP-hard
problem, and thus heuristic solutions need to be employed. We
therefore explore theoretical insights to devise a variety of simple
methods that scale well in very large networks. The efficiency of the
suggested techniques is tested experimentally using five real-world
graphs having millions of edges.

While theoretical bounds support the claim that random landmarks work
well in practice, our extensive experimentation shows that smart
landmark selection can yield dramatically more accurate results: for a
given target accuracy, our methods require as much as 250 times less
space than selecting landmarks at random. In addition, we demonstrate
that at a very small accuracy loss our techniques are several orders
of magnitude faster than the state-of-the-art exact methods. We also
apply our methods to the task of social search in large graphs.

Finally, I will briefly talk about the notion of distance in
probabilistic graphs. Complex networks, such as biological, social,
and communication networks, often entail uncertainty, and thus, can be
modeled as probabilistic graphs. Similar to the problem of similarity
search in standard graphs, a fundamental problem for probabilistic
graphs is to answer efficiently k-nearest neighbor queries (k-NN),
which is the problem of computing the k closest nodes to some specific
node. We propose novel distance functions that extend well-known graph
concepts, such as shortest paths. In order to compute them in
probabilistic graphs, we design algorithms based on sampling and novel
graph-transformation ideas. During k-NN query processing we
efficiently prune the search space using new techniques. In our
experiments we demonstrate that our distance functions outperform
alternatives in identifying true neighbors in real-world data. We also
demonstrate that our algorithms scale for graphs with tens of millions
of edges.

[ Back ]