Netzschleuder network catalogue, repository and centrifuge

Problems with this dataset? Open an issue.
You may also take a look at the source code.

The network in this dataset can be loaded directly from graph-tool with:
import graph_tool.all as gt
g = gt.collection.ns["trec"]

trec — TREC collection (2010)

Description

A bipartite network of documents and the words they contain, extracted from NIST's Text Retrieval Conference (TREC) disks 4 and 5, from 2010. These archives contain material drawn from the Financial Times Ltd., the Congressional Record of the 103rd Congress, the Federal Register, the Foreign Broadcast Information Service, and the Los Angeles Times newspaper.¹

Description obtained from the ICON project. ↩

Tags

Informational Language Unweighted

Citation

J. Kunegis, "TREC (disks 4–5)." KONECT, the Koblenz Network Collection (2016), https://doi.org/10.1145/2487788.2488173 [@sci-hub]

Upstream URL OK

http://konect.cc/networks/gottron-trec

Networks

Tip: hover your mouse over a table header to obtain a legend.

Name	Nodes	Edges	$\left<k\right>$	$\sigma_k$	$\lambda_h$	$\tau$	$r$	$c$	$\oslash$	$S$	Kind	Mode	NPs	EPs	gt	GraphML	GML	csv
trec	1,729,302	83,629,405	96.72	1358.95	2935.32	13.41	-0.21	0.00	7	1.00	Undirected	Bipartite		weight	152.7 MiB	426.6 MiB	405.9 MiB	349.4 MiB

Ridiculograms^*

^* These are automatically generated force-directed visualizations, and can be quite meaningless for networks both big and small. They should not be taken seriously as sources of scientific insight. See here for a discussion.