Network clusterization algorithm

Community-detection Algorithms

On this homepage the authors' C++-Implementation of the two algorithms introduced in  P. Schütz and A. Caflisch, Efficient Modularity Optimization by Multi-Step Greedy Algorithm and Vertex Mover Refinement are available.

Multi-Step Greedy algorithm

The implementations used for benchmarking can be found here. For the users convenience a wrapped-up version is presented here.

Source

This C++-source is an implementation of the MSG algorithm.

Compilation

This programs has been tested with the GCC compiler suite.
g++ -m32 -O3 MSG_complete.cc -o MSG_complete
The usage of the 32-bit version is recommended to lower numerical instabilities.

Usage

The edge information is expected in the format
Start_Vertex End_Vertex Weight
Please be aware that the program expects a directed edge file. This implies that for an undirected network the links
a b weight
and
b a weight
have to be present. The MSG algorithm is performed by
./MSG_complete edge_file level_parameter_value
The program outputs the file "partition-MSG-"edge_file"-l-eq-"level_parameter_value"-with-Q-"final_modularity. This output can directly be used by the VM procedure below.

Vertex-Mover Procedure

Source

The following implementation is used.

Compilation

This programs has been tested with the GCC compiler suite.
g++ -O3 VM.cc -o VM

Usage

The output of the MSG algorithm has to be fed-in as follows
./VM input_partition edge_file_de_symmetrized
This procedure automatically generates a file named "NO-boosted-"input_partition"-boosted-to-"final_modularity.
It is important to notice that the current implementation assumes unweighted networks. Therefore the redundant information of the edge file used for the MSG has to be removed. This means in the MSG file
a b weight
b a weight

in the VM edge file only
a b weight

For users of the program gawk this transformation can readily be done by
awk '($1<$2)' edge_file > edge_file_de_symmetrized

Network of Words in Titles of Publications (co)authored by Martin Karplus

The edges are listed in the file. Each line corresponds to one edge. The first and second column indicate the indices of the vertices spanning the edge. The weight of the edge is listed in the third column. The words corresponding to a vertex index are given here.

Awk-scripts to create Random Networks to benchmark community-detection algorithms

The following bash-scripts (calling awk) create computer-generated network which can be used to benchmark community-detection algorithms. The following three flavours exist:

  • SED: Small networks with an  exponential distribution imposed on the  degree distribution
  • SLD: Small networks with a linear distribution imposed on the degree distribution
  • LLD: Same as SLD but with at least 300 vertices