Network clusterization algorithm
Community-detection Algorithms
On this homepage the authors' C++-Implementation of the two algorithms introduced in P. Schütz and A. Caflisch, Efficient Modularity Optimization by Multi-Step Greedy Algorithm and Vertex Mover Refinement are available.
Multi-Step Greedy algorithm
The implementations used for benchmarking can be found here. For the users convenience a wrapped-up version is presented here.
Source
This C++-source is an implementation of the MSG algorithm.
Compilation
This programs has been tested with the GCC compiler suite.
g++ -m32 -O3 MSG_complete.cc -o MSG_complete
The usage of the 32-bit version is recommended to lower numerical instabilities.
Usage
The edge information is expected in the format
Start_Vertex End_Vertex Weight
Please be aware that the program expects a directed edge file. This implies that for an undirected network the links
a b weight
and
b a weight
have to be present. The MSG algorithm is performed by
./MSG_complete edge_file level_parameter_value
The program outputs the file "partition-MSG-"edge_file"-l-eq-"level_parameter_value"-with-Q-"final_modularity. This output can directly be used by the VM procedure below.
Vertex-Mover Procedure
Source
The following implementation is used.
Compilation
This programs has been tested with the GCC compiler suite.
g++ -O3 VM.cc -o VM
Usage
The output of the MSG algorithm has to be fed-in as follows
./VM input_partition edge_file_de_symmetrized
This procedure automatically generates a file named "NO-boosted-"input_partition"-boosted-to-"final_modularity.
It is important to notice that the current implementation assumes unweighted networks. Therefore the redundant information of the edge file used for the MSG has to be removed. This means in the MSG file
a b weight
b a weight
in the VM edge file only
a b weight
For users of the program gawk this transformation can readily be done by
awk '($1<$2)' edge_file > edge_file_de_symmetrized
Network of Words in Titles of Publications (co)authored by Martin Karplus
The edges are listed in the file. Each line corresponds to one edge. The first and second column indicate the indices of the vertices spanning the edge. The weight of the edge is listed in the third column. The words corresponding to a vertex index are given here.
Awk-scripts to create Random Networks to benchmark community-detection algorithms
The following bash-scripts (calling awk) create computer-generated network which can be used to benchmark community-detection algorithms. The following three flavours exist:
- SED: Small networks with an exponential distribution imposed on the degree distribution
- SLD: Small networks with a linear distribution imposed on the degree distribution
- LLD: Same as SLD but with at least 300 vertices