Talk:Text Encoding Initiative

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Sociology  
WikiProject icon This article is within the scope of WikiProject Sociology, a collaborative effort to improve the coverage of Sociology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
WikiProject Linguistics  
WikiProject icon This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of Linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
WikiProject Digital Preservation  
WikiProject icon This article is within the scope of WikiProject Digital Preservation.
 ???  This article has not yet received a rating on the quality scale.
 

TEI project list[edit]

I was wondering: shouldn't the page keep a list of

  • projects that have articles on the wiki
  • projects that are truly TEI-based?

Firstly, the TEI has two or three (if you count their wiki) places where TEI-based projects can be listed, whether or not they are 'encyclopedic' or 'notable' from the point of view of Wikipedia. There is no need to use this article as an additional link farm.

Secondly, I'm thinking e.g. of the recent addition of the "SWORD" project, which says "The software is also capable of utilizing certain resources encoded in using the Text Encoding Initiative (TEI) format" - is that enough to qualify it as a TEI project? It seems to be an OSIS project.

Thanks, XPtr (talk) 21:12, 27 November 2009 (UTC)

The situation should probably be re-evaluated in the light of the new TEI category too. Maybe restrict the list to content-oriented projects which have TEI as the authoritative version of the text? Stuartyeates (talk) 05:56, 29 November 2009 (UTC)

The SWORD Project produces and publishes texts encoded using TEI, including texts converted from other sources/formats (XML or otherwise) as well as newly authored/encoded texts. Some of those documents are only publicly released in a privately defined format (a compressed & indexed database), but a number of them are publicly released as XML documents, à la Perseus. That's certainly more than I can say for some of the listed projects, which use TEI in only an organization-internal manner and do not offer any actual TEI documents. The same also holds for The SWORD Project with respect to OSIS. So, to whatever degree it is an OSIS project, it is likewise a TEI project. Oskilla (talk) 10:25, 3 December 2009 (UTC)

Jargon[edit]

(REFS:) There are now enough references – I deleted the call for references – but the article is (JARGON:) too compact, and uses a jargonish language which is slightly similar to marketroid language, f.ex. "text-centric" and "community of practice". Now word such as these are really needed and sadly missing in the standard languages from Europe – and probably most others as well – but I perceive that the text in general should be fleshed out to comply with the normal human slow speed of "conceptualization" (pardon for my jargon!) with respect to the number of words produced. Also: a certain increased verbosity (pardon!) allows for somewhat better precision on how items described in the text interconnects.

The dscription of the context of TEI have to be fleshed out by purpose and examples: the "initiative" is an organization created to deal with the wild-grown flora of computer formats for storing various text from the "humanities, social science and linguistics", primary sources from great authors, dictionaries etc.. Rursus dixit. (mbork3!) 08:25, 2 April 2011 (UTC)

Request for an improvement of the TEI article[edit]

Hi, TEI has got a very active community. Could somebody improve the article respective the maintained standard TEI-XML. I'm familiar with several data formats and XML, but still I found no place that provides a good entry point to understand TEI. Normally, Wikipedia is the place, where people (or is it just me?) look first. Please break it down to people not familiar with the TEI technical standard and explain how the format works. Also: could somebody provide an example of how TEI-XML looks like? A comparison: The RDFa article is really helpful. RDFa is also not so easy to understand, but the article provides good examples. SebastianHellmann (talk) 21:14, 5 February 2012 (UTC)

Suggested merge[edit]

Merge Both articles (TEI_Lite and ODD_(Text_Encoding_Initiative)) are not very long. I think it would improve the Text Encoding Initiative article (which has some issues, see my comment above) to have the content right here. I am no insider, but it seems that the Text Encoding Initiative's main purpose is to generate, standardize and maintain these formats. SebastianHellmann (talk) 21:14, 5 February 2012 (UTC)

  • Oppose. To me they look like their subjects are at very different levels of detail — TEI being a big community of people working on a broad array of text encoding issues, and ODD being a very specific technical solution to a low-level problem for this community (how to formalize the metadata that describes an encoding). I think it would cause big WP:UNDUE problems to try to merge them. I don't see a big need for the ODD article to grow; I agree that expansion of the TEI article would be reasonable but I think the place to start is in the sections on projects and customizations, which are currently very terse (just a list each), rather than trying to bring in four paragraphs of text on one very specific file format. —David Eppstein (talk) 21:52, 5 February 2012 (UTC)
Fair point. Then TEI_Lite and ODD should be extended (with an example) and the TEI article should have some sort of summary explaining the relation between the three articles (like size or popularity). Currently the article seems to stress the importance of ODD as the main aspect of the guidelines, so WP:UNDUE is countered by the article itself ... Having one article per customised format, on the other hand, seems to result in Wikipedia becoming a WP:DICTIONARY. We should discuss merging ODD and TEI Lite then as Formats build on TEI guidelines. I will write an email to the TEI List SebastianHellmann (talk) 22:28, 5 February 2012 (UTC)

Just wondering: wouldn't it be so much nicer to first get more than a faint idea about something, and only then mess up (oh, pardon, 'improve') the article about that thing? The TEI is many things, TEI Lite is a popular encoding format, and ODD is basically a meta-schema language with a zing. And it takes one hopefully well-willing but sadly ignorant zealot to mess stuff up, what a pity he didn't insert those templates asking for confirming every second statement. Ooops, I shouldn't have mentioned that, should I. XPtr (talk) 23:11, 5 February 2012 (UTC)

Should I be sorry for trying to improve the article? Just use the rollback function, I would not be offended, if you did so. By the way, if the article were better, I would already have more than a faint idea. SebastianHellmann (talk) 09:23, 6 February 2012 (UTC)
  • I suggest that what we need is a completely new structure based on encyclopedia lines rather than happenstance. I suspect that it'll need to be done by someone with perspective rather than a long-time TEI person. Stuartyeates (talk) 02:27, 6 February 2012 (UTC)
  • Oppose Different enough for 2 pages. However, Text Encoding Initiative is somewhat chaotic and I would suggest a 90% rewrite - but I am not going to volunteer for it. ODD could do with a rename to suggest it is mostly about the format not the effort to support the format. Both articles need attention and various fixes, not a merge. History2007 (talk) 10:13, 7 February 2012 (UTC)

Restructure[edit]

I've had a crack at a restucture, based on SebastianHellmann's suggestions. There are still some things to do, but the structure now seems to make sense (or at least it does to me). Stuartyeates (talk) 06:22, 7 February 2012 (UTC)

TEI presuppositions[edit]

TEI presupposes views in semantics which are in no way obvious and have been widely debated.

At local UNB, a TEI centre, critical essays by Harvard philosopher Hilary Putnam ( 1991-92 Gifford Lectures ) are consigned to storage as unread (no, an e-text version is not available at UNB.)

TEI is effectively the Shibboleth of the Digital Humanities as a "discipline" ( as silly a notion as "Digital History" or "Digital Botany".)

A history section might revisit the "General Semantics" movement or other social science fads to put this current enthusiasm in perspective.

Programming for the humanities has a history in the SNOBOL and ICON programming languages for which we have articles. There should be a link to PEG parsing as most XSLT amateurs are unaware of recent developments. Then there is the modernization of PROLOG through LOGTALK and constraints: this is a "digital" area that has also been ignored by the XSL enthusiasts (XSLT is a Turing-complete programming language but the movement has an enthusiasm for procedural Java which is baffling except as a fad.)

The neglect of critics is worrisome in any heavily-funded movement in a university context, as recently noted in Canadian universities clamouring for their share of Northern and Arctic "development" funds.

The revision of botanical and zoological taxonomy through genome studies should give pause, but it has not. Some in the humanities truly do believe that there is nothing new under the sun (the collected texts of the new Bloom will suffice.)

I remember having my feet x-rayed for shoe size in a department store: more recently I stood on a harmless infra-red pad to assess sole inserts: those are the technical vagaries in social institutions as compared to, say, history as a discipline.

The "markup" of Russian texts from 1917 through 1989 under an ideological regime should already have given us all pause in this hyperbole surrounding an otherwise useful practice of tagging text for research purposes. And there was a Vatican Index of importance at one time ...

Perhaps most worrisome is the filtering of texts based on tags by pedagogues in parochialized school systems: previously text which was inked black was evidently so. XSLT and generated-PDF's are another matter altogether.

Those who are not skeptical might revisit the tagging of diachronic variants from "Selected Poems" to "Collected Poems" with regard to significance, interest and context (semantics.)

Should a "folio" be discovered in which "mortal coil" is "mortal boil", which will be the hapax legomenon, and which the print error?

XSL Transforms, a tempting "scapegoat", but from XML there may be no escape through these walls, though it get your goat. G. Robert Shiplett 13:23, 19 March 2012 (UTC)

CES Eagles broken links[edit]

Missing as See Also: CES

So many CES page links are broken, that this ref may be helpful:

http://xml.coverpages.org/xces.html

Oxford and Tufts: the facts[edit]

The factual question is whether non-professors are using these services.

I am under-whelmed by Tufts Perseus: am I alone?

Do students use TEI-encoded text ? Are Cliff Notes and their ilk using TEI ?

Do more engineering students read "notes" or the Shakespeare ? Where is TEI in that social phenomenon of the non-text reader in the age of social networking? Does TEI mean more or fewer students actually read the Darwin texts? The Freud texts?

Of course the objective of TEI is to aid corpus-based research ... but does that include student term papers? And the sociological evidence is ?

G. Robert Shiplett 14:40, 19 March 2012 (UTC)