Problems with this dataset? Open an issue.
You may also take a look at the source code.
The network in this dataset can be loaded directly from graph-tool with:import graph_tool.all as gt g = gt.collection.ns["trackers"]
A large bipartite network of internet domains and the trackers the contain (also identified by their domain), as collected from the CommonCrawl corpus.1
Name | Nodes | Edges | $\left<k\right>$ | $\sigma_k$ | $\lambda_h$ | $\tau$ | $r$ | $c$ | $\oslash$ | $S$ | Kind | Mode | NPs | EPs | gt | GraphML | GML | csv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trackers | 40,421,974 | 140,613,762 | 6.96 | 2294.88 | 2673.57 | 13778.17 | -0.10 | 0.00 | 21 | 0.99 | Undirected | Bipartite | 272.4 MiB | 629.6 MiB | 599.1 MiB | 620.8 MiB |