Mapping Autonomous Systems (AS) to the owner organizations is critical to connect AS-level and organization-level research. Unfortunately, constructing an accurate dataset of AS-to-organization mappings is difficult due to a lack of ground truth information. CAIDA AS-to-organization (CA2O), the current state-of-the-art dataset, relies heavily on Whois databases maintained by Regional Internet Registries (RIRs) to
infer the AS-to-organization mappings. However, inaccuracies in Whois data can dramatically impact the accuracy of CA2O, particularly on inferences of ASes owned by a same organization (sibling ASes).
In this work, we leverage PeeringDB (PDB) as an additional data source to detect the potential errors of sibling relations in CA2O. By conducting a meticulous semi-manual investigation, we discover the sources of inaccuracies in CA2O are two pitfalls of Whois data, and we systematically analyze how the pitfalls jointly influence the CA2O. We also build an improved dataset on sibling relations, which corrects mappings of 12.5% of CA2O organizations with sibling ASes (1,028 CA2O organizations, associated with 3,772 ASNs). To make the process more scalable, we design an automatic approach to reproduce our manually-built dataset with high fidelity. The approach is able to automatically improve inferences of sibling ASes for each new version of CA2O.
Speaker Zhiyi Chen
Ещё видео!