Surveying the .NZ Top Level Domain: Business Sector Categorisation
Keywords:Software Engineering, .NZ Top Level domain
There is a vast amount of information present across “.NZ” domains, but no publicly accessible or viable tools or resources exist to categorise them. All currently existing solutions are either heavily rate-limited or require expensive monthly subscriptions, leaving no affordable option to categorise our dataset of 210,000 “.NZ” domains. Combining this new business category dimension of data with other data sets such as the Transport Layer Security (TLS) information or cookie usage of ”.NZ” domains would enable us greater insight into the landscape of New Zealand’s digital presence. With such a rich dataset, we can draw conclusions as to which of New Zealand business sectors are the most or least secure, which sectors have the greatest online presence, and more - providing a solid foundation for further research into how Aotearoa New Zealand’s online presence impacts our economy, cyber-security, and more. This project surveyed publicly available options for domain business categorisation, alongside developing a system capable of rapidly extracting website information and classifying it into one of 25 business categories. The developed classifier uses the ‘transformers’ Python library and categorises a test dataset with 70% accuracy, using only a website’s title, description, and keyword meta-tags. A command-line interface was also developed using the ’click’ Python library to allow for information extraction and classification via a scripting interface, enabling automation and integration with other systems. Thirdly, an SQLite database was designed and populated with both our dataset and information extracted from domains. With this system, it was discovered that the top ”.NZ” domain category in the dataset is ‘Business’ at 79,815 instances, followed by ’Arts & Entertainment’ with 25,983 instances, and ’Shopping’ with 12,647 instances.