The World Wide Web (the Web) is the main driving force behind the rapid diffusion of Internet technology. As a result, we are beginning to live a significant part of our lives in Cyberspace. Measuring and monitoring our surroundings is an essential human activity that helps us both to understand and shape the world we live in. Substantial efforts have in the past years been invested into further understanding the Internet in general and the Web in particular through, for example, surveys of user attitude and behaviour, maps of Internet traffic, and indexing of content. Very little research has, however, investigated how to measure and monitor the contents of Web sites based on a combination of linguistics and data visualisation measures. Many efforts have demonstrated the use of techniques from within a particular discipline such as information retrieval, data mining, or autonomous agents. This paper, however, explores issues related to the monitoring of contents and changes to the Web based on a range of measures. The paper aims to demonstrate the principles behind the application of semiautomatic measurement instruments to forward our understanding of the Web as a body of textual traces of human activity. The paper suggests five basic types of measures for studying the Web: volume, density, vocabulary, structure, and relative measures. A survey of 82 Swedish Web sites was conducted using semi-autonomous Web robots for information retrieval and filtering based on techniques from linguistics and information visualisation. Examples demonstrate how such data can be applied to summarise site contents, identify site topic, map site structure, and compare Web sites. The results are discussed and related to emergent issues, such as Web navigation, electronic commerce and the management of knowledge.
|Scandinavian Journal of Information Systems
|Published - 1999