Thursday, June 26, 2008

rfLogger: A Logging Browser and Data Processing Method SurLogger: A Logging Browser and Data Processing Method In Web-based Studies



SurfLogger: A Logging Browser and Data Processing Method

In Web-based Studies

Jibo, He

(Department of Psychology, University of Illinois, Urbana/Champaign, IL, 61801, USA)

Jibohe2@uiuc.edu

ABSTRACT

Despite of the increasing interest in web-based studies, researchers lack a convenient tool for data collection. The existing tools have constraints in the data they could collect or in the availabilities to the study environment. SurfLogger, described in this paper, is an automated data logging tool, free, open-source, cross-platform, and easy to modify. SurfLogger is expected to meet the increasing needs of web-based studies.

Keywords

SurfLogger, browser, instrumentation, Web, WWW, Python

INTRUDCTION

In this information age, the World Wide Web (WWW) is the most fast developing information resources (Eighmey, & McCord, 1998). The booming and infinite opportunities accompanying WWW win interests from vast of communities, including web site designers, user interface researcher, cognitive psychologists, E-commence businessman, as well as many others who are interested in characterizing how users interact with web browser and gain information from varied designs of web pages (Eighmey, & McCord, 1998; Wang, Jing, He, and Yang, 2007; Reeder, Pirolli, and Card, 2000).

Despite of the wide interest in web-based researches, there are still no full-fledged and easily accessible tools to collect user log and browser interactive data. Current data collection methods are far from convenient. Some studies collected data from servers or proxies, which is not only bothersome and expensive to configure the servers or proxies, but also cannot capture users’ interaction with the browser and users experience (Pitkow, 1998). Another substitutive solution is to use videotaped data, usually providing more comprehensive users information (Byrne, John, Wehrle, and Crow, 1999). But coding videotaped data is too consuming in time and labor, and accuracies of coding cannot be guaranteed as perfect. Reeder, Pirolli and Card (2000) did a wonderful job to create WebLogger for data collection in web-based studies. Sadly, WebLogger was written in Visual Basics and depends on Microsoft’s Internet Explorer 6.0 (IE) (Reeder, Pirolli and Card, 2000, 2001). WebLogger cannot be used after IE updating to the latest version of IE 7.0, and neither in Linux and Mac Operating System.

To meet the needs for such a tool in web-based research, I have developed SurfLogger, which collects users’ interaction data with both the web and the browser. SurfLogger is an automated data logging tool, free, open-source, and cross-platform (can be used in Windows, Linux, Mac and many other operating systems), and easy to modify. SurfLogger does not depend on other software, such as Internet Explorer, and does not need installation.

SURFLOGGER

Description

SurfLogger is written in Python, a scripting language, and the GUI (Graphical User Interface) is created with wxPython, which is a Python bundle of wxWidget. SurfLogger can record a variety of user actions with the web pages and the browsers. SurfLogger produces two files, logfile.txt and urlfile.txt. Logfile.txt stores action IDs (natural numbers assigned to each action, used to track the record to the responding actions), the time for each actions, interaction with the browsers (such as, clicking on the Back, Forward, Home, etc. buttons), and mouse coordination when clicking. The time record could be used to compute the time of completion for each task. The number of button press on the browsers could be used as a measure of effort in carrying out the task. SurfLogger also captures the images of each screen when the web page refreshes. Marking the mouse coordination on the screen captures could tell us what links the users clicked at. Urlfile.txt stores action IDs and URLs (Uniform Resource Locator). Action IDs are used to synchronize the record in logfile.txt and urlfile.txt. URL record is stored in a separate file because the abundant information it can provides. I will give an example about how to extract information from urlfile.txt in case study section of this paper.

SurfLogger also calls external software to record the whole process of user actions. Currently, I used Michael Urman’s Screen Recorder named cankiri as my external software for recording, because it is also written in python and shares the same spirit of open source. With video record, the researchers could know more about users’ actions. If quality of recording is emphasized, SurfLogger could easily switch to call other recording software, and only one line of the code has to change to refer to the path of the external software.

Log File Format

The records are stored in two files, the logfile.txt and urlfile.txt. Every variable takes up one line, which begins with variable name, followed by variable value. The variable name is self-explanatory. In the logfile.txt (see Figure 1), a record set for one action includes the mouse coordination, browser action (clicking on Back, Forward Home or other button on the browser), time for the action, and action ID. In the urlfile.txt (see Figure 2), each set of record contains action ID and URL. Adjacent record sets are separated by a blank line. The format of log file is designed to be human-readable and easily read for analysis software.

ID: 3

TIME: 04 Apr 2008 11:50:04

Mouse Coordination: 125 52

Browser Action: Back

Figure 1. Records in logfile.txt

ID3

URLhttp://www.citeulike.org/user/testMaterial/article/2624476

Figure 2. Records in urlfile.txt

Case study

To demonstrate how SurfLogger could benefit web-based research, I will briefly explain the usability analysis of IGroup as a case study (Wang, Jing, He, and Yang, 2007). IGroup is an image search engine, presenting the results in semantic clusters. To test whether IGroup can increase search efficiency compared to MSN, we developed the predecessor of SurfLogger, which functioned similarly like SurfLogger, but less flexible. We developed a measure of Search Effort to compare IGroup and MSN objectively. Search Effort was defined as the number of query input, and number of links and cluster names clicked by the users. Query input, links and cluster names clicked were extracted from URLs recorded by our automated logging tool. A sample URL recorded in this study was listed as follows:

Wednesday, August 30, 2006 3:06:54 PM

http://msra-vss50-b/igroup2/search.aspx?q=Disney#g,14,1,-1

The characters in bold, “Disney”, “14”, and “1” were the input query, ID of cluster name, and result page. The information could be extracted from the URL by simple text processing. For code of data reduction, URL extraction and source code of SurfLogger please refer to my project page of SurfLogger.

RELATED WORK

Although a large number of researchers are interested in web-based study, there are not many well-developed tools. Reeder, Pirolli and Card (2000) developed a great tool called WebLogger, which can collect extensive data, including user input from keyboard and mouse, user actions on the interface elements of IE, and URLs. Choo, Detlor and Turnbull (1999) also developed a similar tool named WebTracker. But both WebLogger and WebTracker can be used only in Windows platform, and relied on the explorer software of IE or Netscape’s Navigator. After the explorers upgraded, the codes of WebLogger and WebTracker have to update too, in order to function normally.

The LogSquare, sold by ManGold Inc., can record keyboard entries, web page actions, mouse clicks, user comments and coding etc. However, despite the price of LogSquare, it can not offer researchers the flexibilities in data collecting and analysis. IT companies also wrote some tools for their usability test. But these tools are usually not full-fledged, and not available to the common researchers (Wang, Jing, He, and Yang, 2007).

Besides the above mentioned automated logging tools, researchers also used some compensatory recording methods. Catledge and Pitkow (1995) studies user interface by capturing client-side browsing event with NCSA’s XMosaic. Byrne and his colleague (1999) used videotape recording to study web-browsing behaviors. However, these methods are not only time-consuming, but also provided limited data about users’ behaviors.

CONCLUSION

SurfLogger is a useful tool for collecting data for web-based researches. With its great features of automated data logging, free, open-source, cross-platform, and no dependence on other browsers, SurfLogger can free many researchers from the financial and time cost in data collecting. SurfLogger is expected to contribute more to the increasing interest in web-based researches.

ACKNOWLEDGEMENTS

I appreciate Dr. Wai-Tat Fu of University of Illinois, Urbana/Champaign for suggestion and revision. I would also like to thank Jeff Grimmett and Michael Urman for sharing their work, thank Robin Dunn and other members of the Python/wxPython communities for information and help.

REFERENCES

Byrne, M.D., John, B.E., Wehrle, N.S., and Crow, D.C. (1999). The Tangled Web We Wove: A Taskonomy of WWW Use. In Proceedings of CHI '99 (Pittsburgh PA, May, 1999), ACM Press, 544-551.

Cankiri. http://www.tortall.net/mu/wiki/Cankiri

Catledge, L.D. and Pitkow, J.E. (1995). Characterizing browsing strategies in the World Wide Web. In Computer Networks and ISDN Systems 27: 1065-1073.

Choo, C.W., Detlor, B., and Turnbull, D. Working. (1999). The web: An Empirical Model of Web Use. HICSS’33 (Hawaii International Conference on Systems Science). Available at http://choo.fis.utoronto.ca/FIS/ResPub/HICSS/

Eighmey, J., & McCord, L. (1998). Adding value in the information age: Uses and gratifications of sites on the World Wide Web, Journal of Business Research
41(3):187-194.

Igroup: http://igroup.msra.cn/

Jing, F. Wang, C., Yao, Y. , Deng, K. , Zhang, L., & Ma, W.Y. (2006). IGroup: A web image search engine with semantic clustering of search results. International Multimedia Conference: Proceedings of the 14th annual ACM international conference on Multimedia, Santa Barbara, CA, USA.

LogSquare. http://www.mangold.de/LogSquare.16.0.html.

Pitkow, J.E. (1998). Summary of WWW Characterizations. In Proceedings of the Seventh International WWW Conference, Brisbane, Australia. Also available at http://www7.scu.edu.au/programme/fullpapers/1877/com1877.htm.

Reeder, R., Pirolli, P., and Card, S. (2001). WebEyeMapper and Weblogger: Tools for analyzing eye tracking data collected in web-use studies.

Reeder, R., Pirolli, P., and Card, S. (2000). WebLogger: A data collection tool for web-use studies. Technical report number UIR-R-2000-06 online at http://www.parc.xerox.com/istl/projects/uir/pubs/default.html.

Wang, S., Jing, F., He, J., Yang, J. (2007), IGroup: Presenting Web Image Search Results in Semantic Clusters. Proc. SIG CHI’2007, ACM Press.

Python. http://www.python.org/

wxPython. http://www.wxpython.org/



1 comment:

David More said...

What about Techsmith's Morae? I'm surprised you don't mention it or aren't aware of it.