Analysing the Tor Web with High Performance Graph Algorithms

Abstract
The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network demand for specific algorithms to explore and analyze it. Tor is an anonymity network that allows offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far the attention of the research community has focused on assessing the security of the Tor infrastructure. Most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize users and services, while little or no information is available about the topology of the Tor Web graph or the relationship between pages' content and topological structure. With our work we aim at addressing such lack of information. We describe the topology of the Tor Web graph measuring both global and local properties by means of well-known metrics that require due to the size of the network, high performance algorithms. We consider three different snapshots obtained by extensively crawling Tor three times over a 5 months time frame. Finally we present a correlation analysis of pages' semantics and topology, discussing novel insights about the Tor Web organization and its content. Our findings show that the Tor graph presents some of the character- istics of social and surface web graphs, along with a few unique peculiarities.
Anno
2019
Tipo pubblicazione
Altri Autori
Bernaschi, Massimo and Celestini, Alessandro and Guarino, Stefano and Lombardi, Flavio and Mastrostefano, Enrico