Community Detection Algorithms

On this page, the authors' C++ implementation of the two algorithms introduced in  P. Schütz and A. Caflisch, Efficient Modularity Optimization by Multi-Step Greedy Algorithm and Vertex Mover Refinement are available. All files are contained in the following zip archive available for download:

Multi-Step Greedy algorithm

The implementations used for benchmarking can be found below. For the users' convenience a wrapped-up version is presented first.

Source

The C++-source is called "MSG_complete.cc" and implements the multistep greedy algorithm.

Compilation

This programs has been tested with the GCC compiler suite.
g++ -m32 -O3 MSG_complete.cc -o MSG_complete
The usage of the 32-bit version is recommended to lower numerical instabilities.

Usage

The edge information is expected in the format
Start_Vertex End_Vertex Weight
Please be aware that the program expects a directed edge file. This implies that for an undirected network the links
a b weight
and
b a weight
have to be present. The MSG algorithm is performed by
./MSG_complete edge_file level_parameter_value
The program outputs the file "partition-MSG-"edge_file"-l-eq-"level_parameter_value"-with-Q-"final_modularity. This output can directly be used by the VM procedure below.

Vertex-Mover Procedure

Source

The file "VM.cc" in the same zip archive provides the implementation of the vertex mover procedure.

Compilation

This program has been tested with the GCC compiler suite.
g++ -O3 VM.cc -o VM

Usage

The output of the MSG algorithm has to be fed-in as follows
./VM input_partition edge_file_de_symmetrized
This procedure automatically generates a file named "NO-boosted-"input_partition"-boosted-to-"final_modularity.
It is important to notice that the current implementation assumes unweighted networks. Therefore the redundant information of the edge file used for the MSG has to be removed. This means in the MSG file
a b weight
b a weight

in the VM edge file only
a b weight
For users of the program gawk this transformation can readily be done by
awk '($1<$2)' edge_file > edge_file_de_symmetrized

Network of Words in Titles of Publications (co)authored by Martin Karplus

The edges are listed in the file "karplus-edges" (see zip archive). Each line corresponds to one edge. The first and second column indicate the indices of the vertices spanning the edge. The weight of the edge is listed in the third column. The words corresponding to a vertex index are given in the file called "karplus-nodes".

Awk-scripts to create Random Networks to benchmark community-detection algorithms

The following bash-scripts (calling awk) create computer-generated network which can be used to benchmark community-detection algorithms. The following three flavours exist:

  • "create-SED-network": Small networks with an  exponential distribution imposed on the  degree distribution
  • "create-SLD-network": Small networks with a linear distribution imposed on the degree distribution
  • "create-LLD-network": Same as SLD but with at least 300 vertices

Implementations for Benchmarking

Below, we provide the original C++-Implementation of the two algorithms introduced in the reference paper.

Multi-Step Greedy algorithm

Source

This C++-source file is called "MSG.cc," and it is also found in the aforementioned zip archive.

.

Compilation

The program has been tested with the GCC compiler suite.
g++ -m32 -O3 MSG.cc -o MSG To lower the effect of numerical instabilities it is recommended to compile the program with 32-bit.

Usage

The edge information is expected in the format
Start_Vertex End_Vertex Weight
The program assumes a directed network. Therefore, for undirected networks both links
a b weight
and
b a weight have to be present. The MSG algorithm is performed by
./MSG edge_file level_parameter_value
The program outputs per default no partition.

Extraction of Partition

The source file "transform.cc" contains a tiny C++-program that can be used to transform the MSG output into a partition. The compilation is straightforward
g++ -O3 transform.cc
It can be used as follows

  1. ./MSG edge_file level_parameter_value | grep merging | awk '{print $4 "\t" $5}' > temp_file
  2. ./transform temp_file partition-MSG

The output-format of partition-MSG is
Vertex_Number Community_Index 1

Vertex-Mover Procedure

Source

The vertex mover procedure is unchanged and found in "VM.cc" in the zip archive.

Compilation

This programs has been tested with the GCC compiler suite.
g++ -O3 VM.cc -o VM

Usage

The output of the MSG algorithm has to be fed-in as follows
./VM input_partition edge_file_de_symmetrised
This procedure automatically generates a file named "NO-boosted-"input_partition"-boosted-to-"final_modularity.
It is important to notice that the current implementation assumes unweighted networks. Therefore the redundant information of the edge file used for the MSG has to be removed. This means in the MSG file
a b weight
b a weight

in the VM edge file only
a b weight