Talk:Website Parse Template
- Website Parse Template helps web crawlers to generate RDF triplets. This format has no relation with pages' HTML codes. It's a separate file that is located on the same directory as sitemap. Kiranoush (talk) 08:18, 22 May 2008 (UTC)
Needs examples and definitions
This article, to be readable, needs corresponding example fragments of HTML input and certainly some output from WPT, as produced by the example templates.
It should also link to the DTD or Schema that defines ICDL.
I'm also confused as to the distinction between icdl (the screen-scraper parser definition) and icdl (the ontological description language). I'm not even seeing any namespaces here, which worries me. I hope these do some clear formal definition somewhere.
Totally subjective POV comment, with no place near a Wikipedia article page
I can't say I'm impressed by this protocol! It seems to be taking the wrong approach to a SemWeb solution. Rather than the fairly well-described techniques for embedding accessible metadata into a resource, such as RDFa, it's gaffer-taping on an external hack. This is what GRDDL already does, except that GRDDL uses a rather simpler approach built out of existing tools (albeit the old "With XSLT we can transform anything to anything" canard). It's generally accepted that embedding metadata into a resource is preferable than building extractors (extractors are complex to build and brittle in service), and this WPT approach seems to combine complexity, limited function (XPath is far from sufficient), requirement to be the site's operator and a whole new complicated language.
I don't understand the use of ontology here either. The input to WPT is entirely non-ontological, dumb-scraping through XPath (as it has to be). The output is neither a common-denominator format such as Dublin Core, nor (unless it uses OWL) is it described ontologically in a communicable format. Proprietary ontology descriptions are unworkable, almost by definition.