# Netzschleuder network catalogue, repository and centrifuge

Problems with this dataset? Open an issue.
You may also take a look at the source code.
The networks in this dataset can be loaded directly from graph-tool with:
import graph_tool.all as gt
g = gt.collection.ns["bag_of_words/enron"]

(and likewise for the other networks available.)

# bag_of_words — Bag of words (2008)

Description

Five text collections in the form of bags-of-words, i.e. a bipartite document–word network. Left nodes are documents and right nodes are words. Edge weights are multiplicities.

After tokenization and removal of stopwords, the vocabulary of unique words was truncated by only keeping words that occurred more than ten times. Individual document names (i.e. a identifier for each docID) are not provided for copyright reasons.

Tags
Name Nodes Edges $\left<k\right>$ $\sigma_k$ $\lambda_h$ $\tau$ $r$ $c$ $\oslash$ $S$ Kind Mode NPs EPs gt GraphML GML csv