logo Netzschleuder network catalogue, repository and centrifuge

Problems with this dataset? Open an issue.
You may also take a look at the source code.
The network in this dataset can be loaded directly from graph-tool with:
import graph_tool.all as gt
g = gt.collection.ns["trec"]

trec — TREC collection (2010)


A bipartite network of documents and the words they contain, extracted from NIST's Text Retrieval Conference (TREC) disks 4 and 5, from 2010. These archives contain material drawn from the Financial Times Ltd., the Congressional Record of the 103rd Congress, the Federal Register, the Foreign Broadcast Information Service, and the Los Angeles Times newspaper.1

  1. Description obtained from the ICON project. 

Informational Language Unweighted
Upstream URL OK
Tip: hover your mouse over a table header to obtain a legend.
Name Nodes Edges $\left<k\right>$ $\sigma_k$ $\lambda_h$ $\tau$ $r$ $c$ $\oslash$ $S$ Kind Mode NPs EPs gt GraphML GML csv
trec 1,729,302 83,629,405 96.72 1358.95 2935.32 13.41 -0.21 0.00 7 1.00 Undirected Bipartite weight 152.7 MiB 426.6 MiB 405.9 MiB 349.4 MiB
None drawing