Talk:Tag soup

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Computing  
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
WikiProject Internet  
WikiProject iconThis article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.


This whole article is smug non-sense that is all a matter of opinion, it really needs a re-work from a new perspective or a good ol' deletion... (talk) 04:51, 14 August 2010 (UTC)

Was Macromedia Dreamweaver MX 2003 really the first WYSIWYG editor to produce well formed code? I find this difficult to believe... —The preceding unsigned comment was added by Thalter (talkcontribs) .

I just tested DreamWeaver 8 and it still happily produces invalid (though well-formed) code. I removed mention of DreamWeaver 8 because it comes off looking like a product endorsement. I would imagine there are many products that produce just as invalid code as DreamWeaver. And as Thalter mentions above, well-formed code is probably more commonly produced by products long before 2003. --Cplot 18:05, 15 October 2006 (UTC)

Origin of the term[edit]

Perhaps one should add something about the origin of the term, it was apparently first used by Dan Connolly of the W3C, when talking about parsers that accept anything thrown at them, according to Dave Winer here: [1]

--Bourn 07:51, 15 June 2006 (UTC)

Widespread usage[edit]

Visual HTML editors may have contributed a lot to invalid markup, but the other half of the coin is HTML parsers that have been too forgiving of poor markup. By accepting poor markup and managing to make sense of it, browsers have lulled web authors into a lack of care about proper markup. Had browsers stuck to the standards and not accepted invalid markup, there may have been little tag soup around.

--Bourn 07:56, 15 June 2006 (UTC)

Citations and Neologism[edit]

Someone recently pointed out that the article has very little in the way of citations. They placed a single cite-needed template on the article, but I removed thinking it would be better to discuss a systemic problem here (rather than just randomly placing the cite-needed on one out of many sentences). Clearly this article pushes the boundaries of Wikipeida’s neologism policy. I think if the article were written more clearly and to include all the various ways this term is used the article might not qualify for deletion.

In particular, I can think of at least 3 or 4 ways this get used that should be more clearly delineated in the article. One is in terms of using semantics elements for presentational reasons (like the blockquote example in the article). Another way is in terms of ill-formed markup (e.g., improperly nested tags). A third way is in invalid content models (like placing an unordered list directly inside another unordered list). A forth use of the term refers to the use of proprietary elements instead of standards based elements . A fifth and increasingly popular use of the term is in the sense of the comparison of HTML (where all of these errors must be dealt with and corrected upon rendering) and XML (where the specification requires failing on ill-formed xml; saving tremendous amounts of burden on the parsing process). In general it serves as a term to denigrate one or another undesirable practice in the WWW community (depending on whom you ask). This is another reason I think an article in this state must be clear and as comprehensive as it can: so as not to add NPOV violation to the wrest of the list of violations.--Cplot 23:15, 14 October 2006 (UTC)

After laying out these various definitions of Tag Soup, I went back and looked at the article again. I notice that in several places these various definitions are confused. For example, at one point it discusses how GUI editors often produce ill-formed or invalid markup and then goes on to discuss quirksmode and other presentational issues. This may also suggest another definition of tab soup, but quirksmode is not related to most of what's discussed in this article.
In particulars, the ariticle seems to move back and forth between the malformed/invalid definition and the semantic v. presentational definition. I think the lead section needs to clearly dilineate these various definitions and then the remainder of the article can elaborate on the reasons each type exists, proliferates and continues and what each implies for web authors and users. --Cplot 04:25, 16 October 2006 (UTC)
Hm. I think that strictly speaking the term tag soup, alluding to the random mixing up of elements and tags, means malformed or invalid HTML. The use of non-semantic HTML is something else (although clearly strongly associated with tag soup). "Tag soup" is one bit of jargon used in the whole subject of standards-based HTML authoring, and a better subject for a Wiktionary definition than a Wikipedia article. It would be some work, but I'd rather see this converted into an article on the subject of standards-based web authoring (and merged with Web design (Tableless)), which could cover validation, well-formedness, semantic coding, and touch on web accessibility. If it grows big enough, then sections like semantic HTML could be spun off into separate articles. Michael Z. 2006-10-16 05:56 Z


OK, I did a complete reorganization of the article. I tried to retain much of the content and even the existing rpose. Two sections that were redundant with what was already there and with each other I left off. There may be pieces of those or ideas reflected in those sections that could be reincorporated, but I'm going to stop here for now. Here are the two sections that I did not reincorporate yet. I tried to take the 4 or 5 definitions of tag soupa an d talk about the causes, uses and implications of each in turn. So these pieces might still provide material for those separate sections or perhaps they could be reincorporated into their own sections. However, I believe the ideas are adequately elaborated in the article as it stands.

As you can see the first section starts out by talking about invalid and malformed and then the next section is talking about quirks mode (???) The quirks mode definition may be even a sixthy definition of tag soup, but I couldn't really come up with a ratinal incorporation of that. It may be that the editor who added that was just confusing concepts. Quirks mode refers to a browser using non-standard box models in applying CSS. It has little to do with the quality of the markup (except for including or leaving off the DocType declaration). If anyone can make the quirks mode definition work I'm fine with adding that. --Cplot 06:48, 16 October 2006 (UTC)

Widespread usage of tag soup[edit]

Today, the majority of Web pages consist of invalid or malformed HTML and thus may be considered tag soup.

One possible cause of the proliferation of tag soup may be that many graphical HTML editors produce invalid code and also frequently conflate presentational and structural markup.

Another factor in the popularity of tag soup is that most mainstream Web browsers currently in use tolerate code that is invalid or not well-formed without raising any errors. Thus, testing Web pages using current mainstream browsers will not enforce valid or well-formed pages.


Early browsers were very forgiving of malformed HTML and went to great lengths to render a Web page in the manner it thought the author 'intended' it to look.

Because of this, most current mainstream Web browsers can render Web pages in more than one mode, including a "Quirks mode". The Web browser switches into Quirks Mode when it encounters a Web page that appears to be using tag soup. Quirks Mode allows the browser to render the Web page in the same way as older browsers may have rendered it. The problem of tag soup is carried forth as each new browser that is released needs to be able to render the existing Web.

While most mainstream Web browsers can render tag soup in more or less the way the author 'intended' it, many other user agents cannot. For example, Web browsers for people with disabilities may have problems rendering the page. Other examples of user agents which may have problems with malformed code or code which is not used for its intended purpose include tools such as search engine spiders and Web browsers in hand-held devices.

What are the above paragraphs doing here? Were they supposed to be posted to the article? mmj (talk) 05:24, 8 January 2009 (UTC)

'Tag soup' referring to 'misuse of semantic elements'[edit]

I don't believe that the this should come under the name 'tag soup' or that it should be in this article. The assumption here seems to be that 'tag soup' is a colloquial term for any markup that is 'frowned upon' as being inelegant or against the spirit of the specifications, which is untrue. Instances of code that are syntactically and structurally valid when interpreted as strictly as possible (for example, by a validating SGML parser) are not tag soup and do not contribute in any way to the requirement on current browsers, and the formation of new HTML versions, to handle invalid syntax and structure. ...

  • A <b> tag with no corresponding </b> tag is tag soup, because it is invalid syntax. It cannot be processed as HTML unless the parser has special handling for invalid syntax.
  • A <p> element inside an <em> element is tag soup, because it is invalid structure. It cannot be processed as HTML unless the parser has special handling for invalid structure.
  • An <img> or <table> element used for spacing or positioning is wrong, but it is '''not tag soup'''. It can be processed as HTML without any need for the parser to handle invalid syntax or structure.

Therefore, I propose removing or reorganising references within this article to the semantics of HTML elements. While they represent bad coding today, they are not tag soup because tag soup has a specific meaning. mmj (talk) 05:24, 8 January 2009 (UTC)

SGML comments, and XHTML markup[edit]

I don't think either of these are examples of 'tag soup'. Why would anyone insist on using that obscure SGML comment syntax in HTML, and what would happen if they did?

<!-- -- -->In SGML this text should be interpreted as part of the comment<!-- -- -->

will be interpreted as two isolated comments, both containing '--' and the text between will be seen as character data in the document. If this appears in a place where bare text is not allowed, then it is a syntax error, if it was within a legal element, then the text will be displayed quite happily. In neither case is this a legitimate example of tag soup.

Claiming that valid XHTML per the W3C XHTML 1.0 specification is tag soup when served with an internet media type that is valid under Appendic C (as revised 1 August 2002), (our only reference in this article!) is nonsense. It may be that IE6 can't cope, and goes into quirks mode, but long gone are the days when IE6's bug defined the world wide web. I shall remove these two items until someone comes up with good, referenced, current reasons why they should be prominently displayed in the article. --Nigelj (talk) 16:45, 18 September 2010 (UTC)