|WikiProject Computing||(Rated C-class)|
- 1 No company names
- 2 Page tagging "more accurate"
- 3 Web server integration
- 4 What else do we want to include?
- 5 Specific company's blogs
- 6 Too many blogs
- 7 Tag-based ≠ centralized, correct?
- 8 Integration with other systems
- 9 Change this definition
- 10 Visitors vs Unique Users
- 11 Advantages
- 12 Books section as Advertisement
- 13 Limitations of Web Analytics Methods and Internet Architecture
- 14 Second opinion please?
- 15 dmoz
- 16 Server-side script logging is missing
- 17 Problems
- 18 Mouse-Tracking
- 19 Off-site
- 20 Adding a link
No company names
Can I propose that we don't include any company names in this article, even in a list at the bottom? Although I work for a web analytics vendor, I think naming specific companies will only turn into a battle as a hundred companies want to be included. Stephen Turner 10:59, 18 August 2005 (UTC)
- Being in the same situation, I agree completely. Wikipedia is the free encyclopedia, not the free advertising center. I urge contributors to also limit mention of products - I'm personally inclined to mention only those that are not only not available any longer, but also not easily linked to an active vendor. Mr. Bene 16:22, 18 August 2005 (UTC)
- Nice trick, you two. Probably most people fall for it. It's pretty clear that the real reason you don't want a list of web analytics companies is that some people will use the list as something to put in their firewalls to block being spied on. I'm well aware that web analytics companies go way out of their way to hide themselves in a page. I catch them with my outgoing firewall. Bostoner (talk) 16:12, 9 February 2012 (UTC)
- One problem is that people (like me) look towards Wikipedia to find information, including which vendors are available. Excluding a list of vendors is not in the spirit of Wikipedia in that listing vendors is providing information to the public. I vote to have vendors listed. —Preceding unsigned comment added by 184.108.40.206 (talk) 21:57, 7 February 2008 (UTC)
Page tagging "more accurate"
I'm trying to avoid the claim that page tagging is "more accurate". I anticipate some disagreement over this, but I think that most of it is a myth, fed by companies that only do page tagging (sorry, Mr Bene — I suspect you're in one of these companies — nothing personal intended).
In particular, the claim that spiders cause problems for log analysis is bogus. Spiders identify themselves, so it's easy to exclude them from the human visitors. And in fact, log analysis has an advantage over page tagging in this respect, because it can report spider activity.
Instead let's try to present the arguments for and against each method. There are valid arguments on both sides. And as I tried to say, the economic arguments are often the biggest ones, and they can come down on either side of the argument.
Stephen Turner 17:09, 18 August 2005 (UTC)
- It's a bit odd keeping the company details out of the discussion. However, your suspicions are incorrect — my company continues to provide log analysis tools as well as page embedding tools. My personal opinion is that straight log analysis provides the most accurate information possible about the activity the web server has seen, while page tagging solutions provide significantly more accurate information about human usage of the web pages - the way people are using the web site. That's why the text was "more accurate in presenting human activity" and not simply "more accurate".
- One solution type that I haven't touched on is the hybrid solutions, that include page tagging and log analysis information together. Want to touch on this?
- Mr. Bene 17:15, 18 August 2005 (UTC)
- Maybe I spoke too strongly yesterday. I just feel that there's an often-repeated view that "of course we all know page tagging is better" and in fact I'm not convinced it's quite so clear cut. I think logfile analysis is superior in many ways: I think the accuracy argument is somewhat overstated, and the advantages of software purchase vs outsourcing, and no vendor lock-in, are big advantages for many companies.
- I realise other people strongly believe page tagging is better for perfectly valid reasons, and I'm not trying to criticise people who hold that opinion. But for writing a Wikipedia article, it's good practice just to state the advantages and disadvantages of each method.
- As for hybrid solutions: I think we should acknowledge their existence and the reasons for them, although I don't feel qualified to discuss them in detail.
- Stephen Turner 09:27, 19 August 2005 (UTC)
- I added the hybrid solutions. In order to discuss them after the discussion of the advantages of each approach, I ended up combining the history and the advantages/disadvantages sections into one new section on technologies. Stephen Turner 11:04, 19 August 2005 (UTC)
Web server integration
I removed the reference to IIS Assistant, because I think it's not historically accurate. The first log analysis programs date from about 1993, but IIS Assistant dates from 1996.
The method of web server integration is an interesting one, and we should mention it somewhere. Maybe there could be a section of "other methods". Aren't there solutions which sniff the traffic going down the wires, for example?
Stephen Turner 09:54, 19 August 2005 (UTC)
- Done. See Web Analytics Demystified page 18 for a discussion of network sniffers. Stephen Turner 11:30, 19 August 2005 (UTC)
What else do we want to include?
! There should be a large section on what is and what isn't within the law. Privacy laws, Protection of personal data, that kind of thing.
I feel the section on the technologies is pretty much getting there. The section on definitions is just starting. But what else should be included in this article? To put it another way, what would a person looking for an article on web analytics want to know? Stephen Turner 14:20, 22 August 2005 (UTC)
Stephen -- I am a novice to the field of web analytics and I, of course, turned to Wikipedia for some insight. In addition to focusing on the methodologies (page tagging etc), can the authors provide some generic information on the tools used to augment such analysis -- WebSide Story, Omniture or Coremetrics are some that come to my mind. I understand that the objective is to keep this entry vendor-agnostic, but a lot of firms that do not specialize in a specific approach usually turn to an ASP-based solution. Your thoughts? -- M. Nazir
- 1st party vs 3rd party cookies - maybe additional information needed about page tagging that circumvents this by including Visitor ID somewhere else, like CGI Parameters? Mr. Bene 22:00 25 August 2005 (UTC)
- I agree with all this. But I'm beginning to get a feeling that we're concentrating purely on the technology of web analytics, and not enough on the practice of web analytics. What I'm not quite so sure is how to fix it. An encyclopedia article shouldn't become a textbook, but maybe a section on KPIs would help? Stephen Turner 08:28, 26 August 2005 (UTC)
Greetings, gentlemen. A very nice entry you've created here! I applaud your decision to keep the entry vendor-agnostic. As far as what else to include, may I suggest a section on failings of web analytics? For example, while WA attempts to measure the behavior of individuals, Avinash often points out that 1) multiple people may use a single machine, such as a public terminal, or the family PC, 2) one person may use multiple machines, such as home and work, and 3) a single person on a single machine may have 3a) multiple browser tabs or windows open, all viewing different pages on the same site, and 3b) multiple types of browsers active on the same machine, such as IE and Firefox, for reasons of compatibility with different sites, and thus have multiple cookies. -- WDave Rhee (WAF Co-Moderator)
Specific company's blogs
I've just reverted the addition of a blog to the external links, because it concentrated on a specific product.
There are lots of new blogs at the moment from people affiliated with one company or another. If we add one or two it's unfair, but if we add all of them it would get completely out of hand. So I suggest we avoid them altogether.
Stephen Turner 09:32, 19 November 2005 (UTC)
Too many blogs
I've now removed all the blogs, and the mailing list, from the References section. There were just too many of them already, and I could anticipate many more being added.
WP:EL is quite clear:
- "Certain kinds of pages should not be linked from Wikipedia articles... 7. Links to blogs, social networking sites (such as MySpace), or discussion forums unless mandated by the article itself."
And WP:NOT says:
- "Wikipedia is not a directory"
which that section was in danger of becoming.
So even though I enjoy reading some of those resources, I think Wikipedia policies are quite clear that it's not suitable to maintain a list of them here.
- And I've removed them all again (with the encouragement of one of the blog authors too). This list is growing out of hand again, and still only covers a small proportion of the web analytics blogs. Stephen Turner (Talk) 16:58, 6 August 2007 (UTC)
- I note that WP:EL allows blogs of a recognised authority. Is there scope to have a highly select subset of those who are "recognised"? Certainly there are blogs of recognised authorities on WA, who also blog on WA. The intent being that the page is not overwhelmed with such a list, but would act as a feeder into many of the other WA blogs themselves. I for one find much valuable information about WA in many of those removed blogs. ??? --Steve McInerney 11:36, 9 August 2007 (UTC)
- Ah yes! Of course. I withdraw the suggestion. --Steve McInerney 07:23, 10 August 2007 (UTC)
My opinion is that you are severly limiting the usefulness of this article by not including lists or links at all. You are sacrificing the needs of the users reasearching this topic for some purist attitude to how Wikipedia should be. Internet is about hyperlinking and connecting information sources not about creating a self contained information hoard. Cheers! Arvek (talk) 22:01, 29 June 2008 (UTC)
Tag-based ≠ centralized, correct?
Reading through this article, there seems to be some equating of tag-based solutions with centralization. Many of the downsides to the tag-based approach seem to imply it MUST be centralized (ie "Page tagging solutions involve vendor lock-in.") And yet the article itself states, "some vendors offer installable page tagging solutions with no additional page view costs." This, to me, as a shopper of analytic solutions, introduces confusion. It would seem, upon reading this article, that a non-centralized, tag-based system could be quite useful, and nothing in the article indicates it is impossible, and yet we don't really cover it. This is compounded by the odd "no company names" thing going on here (something not shared, by the way, in the Web log analysis software article). I'm not qualified enough to do it, but it would seem to me that clarification is in order, or at least an attempt to not exclusively equate the two. Even if such a product (tag-based, non-centralized) does not exist (and quick research indicates it does (http://ask.slashdot.org/comments.pl?sid=206862&cid=16874210), it ought to be addressed, no? —The preceding unsigned comment was added by Lizstless (talk • contribs) 00:42, 8 March 2007 (UTC).
- You're right. Page tagging solutions are most commonly run by the vendor, but there are page tagging solutions that are installed in-house (the company I work for, ClickTracks, does a lot of business this way, for example), and maybe we should cover them a bit more. However, note that they still involve vendor lock-in, because the data is in a proprietary form which cannot be read by any other program. You still can't read historical data in another program if you're considering switching vendors like you can with logfiles.
- Is there any other specific sentence you think is misleading?
- Stephen Turner (Talk) 10:02, 8 March 2007 (UTC)
- Hi thank you so much. Though even then, there's no technical reason why locally-hosted, tag solutions would HAVE to be stored in a proprietary format, though, is there? Just seems so strange to me, but I guess it makes sense in the sense that logfiles are yours, they're all in open formats, and no one can ever take them from you. But yeah, I think upon re-re-reading, I would just want to modify or delete the "page tagging solutions involve vendor lock-in" line, since it seems either inaccurate or overly broad. Thanks! Lizstless 04:09, 9 March 2007 (UTC)
- Late to this. However, in the page tagging solutions that I've implemented, call made to the clear gif serving web server has been different at least in terms of parameter names, if not in terms of structure. In addition, some have required modification to the web server itself, so that specialized logs have been generated. Finally, processing the logs and generating data for reporting puts the data into a proprietary database format. In my books, you've got vendor lock-in at instrumentation level, at collection level, and at database level. Mr. Bene 17:36, 9 August 2007 (UTC)
Integration with other systems
I would suggest expanding the article by talking about how Web analytics fits into Email, SEO, etc. I can write this section if you guys agree with it. —Preceding unsigned comment added by Xavier casanova (talk • contribs)
- Hi Xavier, Good to see you round here! I think this is a good idea; go for it. Stephen Turner (Talk) 20:51, 8 March 2007 (UTC)
- PS You can sign your contributions on talk pages by using four tildes, like this: ~~~~. Stephen Turner (Talk) 20:52, 8 March 2007 (UTC)
Change this definition
Respectfully, I think we need to change this definition. "Web Analytics" is the work you do to interpret web traffic data. It's generally performed by a person with the help of a computer. What you're describing when you talk about logfile analysis or page tagging is web traffic reporting, in spite of the names that folks assign to products like 'Google Analytics'.
Can we move the content that discusses traffic data collection methodologies to 'web traffic reporting' and focus this page more on the practice of web analytics? Ian Lurie
- I agree that web analytics is really about the analysis not the data collection, and that there is too much focus on the technical side at the moment. That's just because that's the easiest bit to write! I still think the technical stuff belongs here (at least until the page gets too long) but the real analytics needs expanding. Stephen Turner (Talk) 09:39, 8 May 2007 (UTC)
Visitors vs Unique Users
The difference between the "page impression" metric from a simple log analysis tool and the "page view" metric from a page tagging solution is that with the page tagging solution you've got a higher likelihood of actually dealing with the number of times the page was rendered in a web browser, vs the number of times it was requested from the web server. The same distinction may be possible between the "Visitor" and "Unique User" metrics, potentially independently of the analytics tool, but dependent on the method of identifying uniqueness. A "Visitor" is generally seen as a uniquely identified web client that may make multiple visits before deleting cookies and becoming a "New Visitor" again. A "Unique User" is more often associated with a re-creatable identity, a "user", who provides a set of credentials, and who will receive the same identity even after they delete their cookies. Thoughts? Mr. Bene 22:54, 13 June 2007 (UTC)
- There's definitely something worth saying about the technical issues, but I'm not sure whether it's possible to make such a clear statement about the terminology. In my experience, different analytics programs use different terms for the same statistic, and even the same term for different statistics. Stephen Turner (Talk) 00:14, 14 June 2007 (UTC)
Open for discussion, here's the inaccuracies I'd like to correct:
"With logfile analysis, information not normally collected by the web server can only be recorded by modifying the URL." Or by joining on an back-end database. Or possibly other methods. Inaccurate by being overly specific.
Mr. Bene 19:00, 29 August 2007 (UTC)
- Apologies for just reverting yesterday without opening a discussion here — I shouldn't have done that. You have some good points and I'll try and get round to responding to them later today. Stephen Turner (Talk) 10:10, 30 August 2007 (UTC)
Books section as Advertisement
I have tagged the Books section as Advertisement since it links to specific vendors pages. I am personally uncomfortable with having to make the choice between allowing some vendors and not others - I would rather leave this up to the ISBN auto-link. However, the ISBN auto-link doesn't find all the books. Mr. Bene 20:43, 2 November 2007 (UTC)
Limitations of Web Analytics Methods and Internet Architecture
I would like to see more discussion of the limitations of the available web analytics methods in the context of the Internet architecture and technology. I am not knowledge about the subject but could offer my opinion. For example:
“The original ARPANET grew into the Internet” . In its infancy, the Internet (originally termed the ARPANET) was developed by DARPA with the intention of providing a robust, fault tolerant, communication system for geographically dispersed Department of Defense researchers and, by policy, commercial use of the network was explicitly forbidden.
The Internet was not designed to capture identifying information about its users. Internet traffic consists of anonymous packets of information identified only by sequence number plus source and destination IP addresses. Also, the architecture of the Internet does not utilize or require “global control at the operations level” and “no information (is) retained by the gateways (or routers) about the individual flows of packets passing through them”. 
Principally for these reasons, the methods presently available to collect behavior information about website visitors are fundamentally flawed and unreliable. For example; web surfers concerned about privacy have multiple options available to them for circumvention of any of the presently available data collection techniques.
 Internet Society, “A Brief History of the Internet”, Barry M. Leiner, et. al., December 11, 2007, http://www.isoc.org/internet/history/brief.shtml —Preceding unsigned comment added by 220.127.116.11 (talk) 22:16, 11 December 2007 (UTC)
Second opinion please?
Could someone please review the recent additions by User:18.104.22.168 which I've reverted twice already? I don't want to get into an edit war. Thanks. Stephen Turner (Talk) 19:27, 29 December 2007 (UTC)
- Could someone else please comment on this. Thank you. Stephen Turner (Talk) 17:01, 30 January 2008 (UTC)
what about giving a link to the appropriate dmoz listing
- To quote:
WP:NOTDIR says: "Wikipedia is not a directory"
Server-side script logging is missing
It's rather odd that only log file analysis and remote-server calling is mentioned, when things like FireStats are very common. If someone's up to the job... :) 22.214.171.124 (talk) 22:43, 16 April 2008 (UTC)
"Web analytics is the study of online behaviour in order to improve it."
This is quite a meaningless statement. Remember this is an encyclopedia not a marketing textbook, and not everyone can or wants to read this article with the implicit assumptions of marketers in mind.
"drivers and conversions"
This is (presumably marketing) terminology that leaves a non-specialist reader confused and risks reinforcing a subjective view of the topic. —Preceding unsigned comment added by 126.96.36.199 (talk) 12:55, 28 June 2008 (UTC)
- Good question. I think it belongs. I see the boundary as being whether the stats are made up of thousands of visitors (web analytics) or just ten (usability testing). So a heat map of everyone's mouse movements would be web analytics. Stephen Turner (Talk) 18:57, 28 November 2008 (UTC)
- Well, you can interpret this as a long-run live test with a large number of people under most realistic circumstances, or not? Maybe the boundary is on how you use the results: improve or invest into the content and service in the most visited areas or improve the UI based on the results. Arthur Schiwon (talk) 14:35, 4 December 2008 (UTC) (yay, i got an account ;) )
- When people know, they are being watched or they test something, they behave in a different way than they would normally. That is the problem in laboratory conditions. Though you cannot do this in a beginning stadium, it is possible to use the results to improve usability after the launch. However, for usability testing you use more likely replays of the recordings rather than a heat map, i think. Arthur Schiwon (talk) 13:48, 5 December 2008 (UTC)
- To go back to the question, I agree with Stephen that it belongs on this list as long as it analytical information, (i.e. thousands of visitors) as opposed to usability testing (i.e. tens of visitors). As far as I know however, the only companies that do this are ClickTale and Userfly, so it's very specific to certain companies. I think it's difficult to add this without looking like advertising or self promotion, anyone want to give it a try? -Shmuls (talk) 12:29, 2 November 2009 (UTC)
I wish to add a link to http://www.kaushik.net/ at the end of the article.
I am a reader of this blog, and find it (to my knowledge) the best in the field. However, it is a blog, do you think it should be added or not?