Five text collections in the form of bags-of-words, i.e. a bipartite document–word network. Left nodes are documents and right nodes are words. Edge weights are multiplicities.
After tokenization and removal of stopwords, the vocabulary of unique words was truncated by only keeping words that occurred more than ten times. Individual document names (i.e. a identifier for each docID) are not provided for copyright reasons.