Session (web analytics)
This article may require copy editing for References need cleanup. (December 2015) |
Sessions, or visits, is a unit of measurement in web analytics, capturing either a user's actions within a particular time period, or a user's actions in completing a particular task. As well as being directly useful as a metric within web analytics, sessions are also used in operational analytics and to provide personalised features, such as user-specific recommendations for other pages or items to view. These uses are dependent on session reconstruction - taking a series of user events and splitting the series into a set of sessions - which tends to use one of two classes of methodologies: time-oriented approaches, which use user inactivity as a signal to end a session and begin a new one, and navigation-based approaches, which divide requests into sessions based on an unbroken chain of hyperlinks between the requested pages.
Definition
The definition of "session" varies, particularly when applied to search engines.[1] Generally, a session is understood to consist of "a sequence of requests made by a single end-user during a visit to a particular site",[2] In the specific context of search engines, "sessions" and "query sessions" have multiple, contradictory and interchangeable definitions;[1] some researchers consider a session or query session to be all queries made by a user in a particular time period,[3] while others argue that sessions can be divided thematically, and a "session" is a series of queries with a consistent underlying user need, and that sessions terminate when that need does, even if the user continues searching for other purposes.[4][5]
Uses
Sessions can be used directly in web analytics, with sessions-per-user serving as a metric of website usage.[6][7] Other metrics used within research and applied web analytics include session length,[8] and user actions per session;[9] session length, particularly, is seen as a more accurate alternative to measuring page views.[10] With all of these metrics, and with sessions as a concept, the goal is to improve the website's usability, due to the substantial impact that usability has on website usage and operator profits.[11] Sessions are also used to provide personalised features such as user-specific recommendations and search term suggestions.[12]
Reconstructed sessions have also been used to measure total user input, including to measure the number of labour hours taken to construct Wikipedia.[13] Sessions are also used for operational analytics, including developing data anonymisation methodologies, identifying anomalies in networking,[14] and synthetic workload generation for testing servers with artificial traffic.[15] Some writers have argued that sessions are not appropriate as a workload characterisation metric within the context of e-commerce platforms, due to substantial variations in how different classes of user interact with that type of site. Instead, a state transition network is suggested.[16]
Session reconstruction
Essential to the use of sessions in web analytics is being able to identify them. This is known as "session reconstruction". Approaches to session reconstruction can be divided into two main categories: time-oriented, and navigation-oriented.[17]
Time-oriented approaches
Time-oriented approaches to session reconstruction look for a period of inactivity, or "inactivity threshold": a span of time between requests by a user. Once this period of inactivity is reached, the user is assumed to have left the site or stopped using the browser entirely, and the session is ended: further requests from the same user are considered a second session. A common value for the inactivity threshold is 30 minutes,[18][19] a well-established value sometimes described as the industry standard.[18] The utility of this value has been questioned: some researchers have argued that it produces artefacts around naturally long sessions,[20] and have experimented with other thresholds, including 10 and 60 minutes.[21] Despite this, Jones & Klinkner argue in a paper at the 2008 Conference on Information and Knowledge Management that, at least in relation to search data, "no time threshold is effective at identifying [sessions]".[22]
One alternative that has been proposed is using user-specific thresholds rather than a single, global threshold for the entire dataset.[23][24] This has the problem of assuming that the thresholds follow a bimodal distribution, and is not suitable for datasets that cover a long period of time.[20]
Navigation-oriented approaches
Navigation-oriented approaches exploit the structure of websites - specifically, the presence of hyperlinks and the tendency of users to navigate between pages on the same website by clicking on them, rather than typing the full URL into their browser.[17] One way of identifying sessions by looking at this data is to build a map of the website: if the user's first page can be identified, the "session" of actions lasts until they land on a page which cannot be accessed from any of the previously-accessed pages. This takes into account backtracking, where a user will retrace their steps before opening a new page.[25] A simpler approach, which does not take backtracking into account, is to simply require that the HTTP referer of each request be a page that is already in the session. If it is not, a new session is created.[26] This class of heuristics "exhibits very poor performance" on websites that contain framesets.[27]
References
- ^ a b Gayo-Avello 2009, p. 1824.
- ^ Arlitt 2000, p. 2.
- ^ Donato 2010, p. 324.
- ^ Gayo-Avello 2009, p. 1825.
- ^ Lam 2007, p. 147.
- ^ Weischdel 2006, p. 464.
- ^ Catledge 1995, p. 5.
- ^ Jansen 2006, p. 10.
- ^ Jansen 2000, p. 12.
- ^ Khoo 2008, p. 377.
- ^ Heer 2002, p. 243.
- ^ Huang 2003, p. 638.
- ^ Geiger 2014, p. 1.
- ^ Meiss 2009, p. 177.
- ^ Arlitt 2000, p. 8.
- ^ Menascé 1999, p. 119.
- ^ a b Spiliopoulou 2003, p. 176.
- ^ a b Eickhoff 2014, p. 3.
- ^ Ortega 2010, p. 332.
- ^ a b Mehrzadi 2012, p. 3.
- ^ He 2002, p. 733.
- ^ Jones 2008, p. 2.
- ^ Murray 2006, p. 3.
- ^ Mehrzadi 2012, p. 1.
- ^ Cooley 1999, p. 19.
- ^ Cooley 1999, p. 23.
- ^ Berendt 2003, p. 179.
Bibliography
- Arlitt, Martin (2000). "Characterizing Web User Sessions" (PDF). SIGMETRICS Performance Evaluation Review. 28: 50–63. doi:10.1145/362883.362920.
{{cite journal}}
: Invalid|ref=harv
(help) - Berendt, Bettina (2003). "The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis". WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles (PDF). WEBKDD. Springer. doi:10.1007/978-3-540-39663-5_10. ISBN 978-3-540-39663-5.
{{cite book}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Catledge, L. (1995). "Characterizing browsing strategies in the world-wide web" (PDF). Proceedings of the Third International World-Wide Web Conference on Technology, tools and applications. 27: 1065–1073. doi:10.1016/0169-7552(95)00043-7.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Cooley, Robert (1999). "Data Preparation for Mining World Wide Web Browsing Patterns" (PDF). Knowledge and Information Systems. 1 (1). Springer: 5–32. doi:10.1007/BF03325089. ISSN 0219-3116.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Donato, Debora (2010). "o you want to take notes?: identifying research missions in Yahoo! search pad" (PDF). Proceedings of the 19th International Conference on World Wide Web. ACM.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Eickhoff, Carsten (2014). "Lessons from the Journey: A Query Log Analysis of Within-Session Learning" (PDF). Proceedings of the Seventh International Conference on Web Search and Web Data Mining. ACM. doi:10.1145/2556195.2556217.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Gayo-Avello, Daniel (2009). "A survey on session detection methods in query logs and a proposal for future evaluation" (PDF). Information Sciences. 179: 1822–1843. doi:10.1016/j.ins.2009.01.026. ISSN 0020-0255.
{{cite journal}}
: Invalid|ref=harv
(help) - Geiger, R.S. (2014). "Using Edit Sessions to Measure Participation in Wikipedia" (PDF). Proceedings of the 2013 ACM Conference on Computer Supported Cooperative Work. ACM. doi:10.1145/2441776.2441873.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - He, Daqing (2002). "Combining evidence for automatic Web session identification". Information Processing and Management. 38. Elsevier: 727–742. doi:10.1016/S0306-4573(01)00060-7. ISSN 0306-4573.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Heer, Jeffrey (2002). "Separating the swarm: categorization methods for user sessions on the web". Proceedings of the SIGCHI Conference on Human factors in Computing Systems. 4 (1). ACM.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Huang, Chien‐Kang (2003). "Relevant term suggestion in interactive web search based on contextual information in query session logs". Journal of the American Society for Information Science and Technology. 54 (7). American Society for Information Science and Technology: 638–649. doi:10.1002/asi.10256.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Jansen, Bernard J. (2000). "Real life, real users, and real needs: a study and analysis of user queries on the web" (PDF). Information Processing and Management. 36: 207–227. doi:10.1016/S0306-4573(99)00056-4. ISSN 0306-4573.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Jansen, Bernard J. (2006). "How are we searching the world wide web? A comparison of nine search engine transaction logs" (PDF). Information Processing and Management. 42 (1): 248–263. doi:10.1016/j.ipm.2004.10.007. ISSN 0306-4573.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Jones, Rosie (2008). "Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs" (PDF). CIKM 08. ACM. doi:10.1145/1458082.1458176.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Khoo, Michael (2008). "Using Web Metrics to Analyze Digital Libraries" (PDF). Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries. ACM.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Lam, Heidi (2007). "Session viewer: Visual exploratory analysis of web session logs". IEEE Symposium on Visual Analytics Science and Technology. IEEE.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Mehrzadi, David (2012). "On Extracting Session Data from Activity Logs". Proceedings of the 5th Annual International Systems and Storage Conference (PDF). SYSTOR '12. ACM. doi:10.1145/2367589.2367592. ISBN 978-1-4503-1448-0.
{{cite book}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Meiss, Mark (2009). "What's in a Session: Tracking Individual Behavior on the Web" (PDF). Proceedings of the 20th ACM conference on Hypertext and hypermedia. ACM.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Menascé, Daniel A. (1999). "A Methodology for Workload Characterization of E-commerce Sites" (PDF). Proceedings of ACM Conference on Electronic Commerce. ACM.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Murray, G. Craig (2006). "Identification of User Sessions with Hierarchical Agglomerative Clustering" (PDF). Proceedings of the American Society for Information Science and Technology. 43 (1). American Society for Information Science and Technology: 1–9. doi:10.1002/meet.14504301312.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Ortega, J.L. (2010). "Differences Between Web Sessions According to the Origin of their Visits" (PDF). Journal of Informetrics. 4 (3). Elsevier: 331–337. doi:10.1016/j.joi.2010.02.001. ISSN 1751-1577.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help) - Spiliopoulou, Myra (2003). "A framework for the evaluation of session reconstruction heuristics in web-usage analysis" (PDF). Informs Journal on Computing. 15: 171–190. doi:10.1287/ijoc.15.2.171.14445. ISSN 1526-5528.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help)CS1 maint: extra punctuation (link) - Weischdel, Birgit (2006). "Website optimization with web metrics: a case study" (PDF). Proceedings of the 8th International Conference on Electronic Commerce. doi:10.1145/1151454.1151525.
{{cite journal}}
: Invalid|ref=harv
(help); Unknown parameter|coauthors=
ignored (|author=
suggested) (help)