Template talk:SWL

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Discussion copied from Wikipedia:Village_pump_(proposals)#silently_encoding_semantic_links[edit]

Many others in the past have noted the potential power of a Semantic Wikipedia. By providing semantic context to wikilinks, we have the beginnings of making Wikipedia queryable by computers (without resorting to natural language processing). The Semantic Mediawiki (SMW) extension seems to be the leading technical solution, and the possibility of incorporating it here at WP seems to be continually discussed but with no real action plan.

My interim proposal is to silently encode semantic wikilinks (SWL), which would allow users to optionally encode a semantic context in wikilinks. The links would show up as usual as wikilinks/hyperlinks. The semantic context would be stored but not queryable. But perhaps the existence of potential SWLs will motivate us to move on the technical solution. And once the specific technical solution is decided, it should be simple to write a bot to revise these silent SWLs to their proper syntax.

As a proof of concept, I've mocked up a SWL template at User:AndrewGNF/SWL and modified two test pages User:AndrewGNF/UGCG and User:AndrewGNF/ITK (gene). (I'm a biologist, and the SWLs I'm playing with here represent common biological relationships.)

Comments/thoughts? Cheers, AndrewGNF (talk) 00:00, 21 July 2009 (UTC)

I kind of like the idea of having hidden semantic information, so that someone with a copy of the Wikipedia database could hypothetically do semantic queries on it. But it would basically have to stay hidden. Running a Semantic MediaWiki myself, I can tell you that the most likely reason Wikipedia does not plan to use SMW is because SMW is really, really slow. On a site as popular as Wikipedia, it would bring all the servers to their knees. rspεεr (talk) 02:46, 21 July 2009 (UTC)
Yes, I've heard this too. But I've also heard that in principle, the WMF is in favor of getting some semantic solution in place. The advantage of this system of silent wikilinks is that it's not tied to any specific technical implementation. AndrewGNF (talk) 04:40, 21 July 2009 (UTC)
Seems to put too much junk into the middle of the source. Data like that is easiest to manage when it's in one place. The infoboxes are probably the best bet for this sort of 'hidden' information. Why would they not be sufficient?   M   03:28, 21 July 2009 (UTC)
I agree that the added syntax makes the code a bit more difficult to read. That would be the primary trade-off in my mind. Regarding the infoboxes, it's not clear to me how that would be a solution. Clearly lots of links in infoboxes have an implied semantic meaning (so this may be an argument for why we could just mine the infobox semantic links later). But if you believe that there are links that could be semantically annotated in the free text, then I don't think the infoboxes are a solution. In my mind, infoboxes are primarily a method of organizing things visually, whereas semantic links are good for organizing data semantically. I think these are sometimes, but not always, overlapping goals... AndrewGNF (talk) 04:40, 21 July 2009 (UTC)

Not hearing any strong opposition, I went ahead and created this template in the main namespace at {{SWL}} and made one change to use it. More feedback and discussion is welcome. Otherwise, I'm going to start using it in my edits and we'll see if it catches on... Cheers, AndrewGNF (talk) 17:09, 22 July 2009 (UTC)

OK... So, you changed "[[RTN1]]" to "{{SWL|target=RTN1|type=PPI}}". And that is supposed to mean... what? In other words - what does "PPI" mean? Documentation doesn't seem to cover that...
And I am not sure that there is lack of "any strong opposition" - I would count "Seems to put too much junk into the middle of the source" as such... In short - such change makes the code harder to understand for anyone (man or machine) trying to read it now and there is no real guarantee that it will be helpful to anyone in the future... I doubt that can be a good idea... --Martynas Patasius (talk) 18:22, 22 July 2009 (UTC)
Thanks for your input. You're right, I chose a bad first example. I mentioned I was a biologist, and to many biologists, "PPI" clearly means "Protein-protein interactions". So bad choice by me. But here is another example, which essentially can be interpreted as "UGCG produces glycosphingolipids". But one can easily imagine simpler examples too. (The standard one on the Semantic Mediawiki site is "Berlin Has_capital Germany".) You're right, the "type" field is free text and open to bad choices. Note that the template automatically creates categories (e.g., Category:SWL/PPI and Category:SWL/produces) that can be used to exactly define the relationship. Does this help address your concern at all? Cheers, AndrewGNF (talk) 19:03, 22 July 2009 (UTC)
No, we need to make articles less terrible to edit before we can start adding all sorts of new confusing crap into them IMO. If we want to start putting semantic data into a database, we should start with the things we already have templates and metadata for - Geographic coordinates and biographical data. The framework is already there, but the problem with those is that there's no way to get the data efficiently. Mr.Z-man 18:38, 22 July 2009 (UTC)
Interesting, thanks, I wasn't aware of these templates before. So these are essentially silent templates for storing classes of structured information, rather than a silent template for storing a single piece of semantic data (as {{SWL}} does). This is interesting -- I'll need to think about this more. Off the top of my head, I think SWL will be more likely to be noticed since it's inline (and hence more likely to encourage semantic contributions from editors), but it's also a bit more disruptive being inline. Hmmm... Cheers, AndrewGNF (talk) 19:17, 22 July 2009 (UTC)
Incidentally, in that sample edit, it's trivially easy for a natural language processor to pick out that UGCG interacts with RTN1, and that the type of interaction is PPI even without that template. One thing that might help is developing a style guide for describing interactions in natural language, so that they are consistent and easy to parse.   M   18:51, 22 July 2009 (UTC)
While I agree that that is one of the easier cases for NLP to pick up, I tend to think in general that nothing in NLP is "simple". But regardless, I think we can easily envision cases that wouldn't be much more difficult for NLP. However, I think I don't like the idea of creating a style guide to make things NLP-friendly because it constrains what editors can do with the text. I think the text should be written to maximize readability, and semantic content should be added in a way that maximizes accuracy and precision. My two cents on that idea... Cheers, AndrewGNF (talk) 19:21, 22 July 2009 (UTC)

RfC: Should we use this template to enable contributions of semantic wikilinks?[edit]

The possibility of adding semantic context to Wikipedia has been discussed many times but with no action plan. The use of {{SWL}} would enable users to (optionally) create wikilinks with a semantic tag describing the relationship. The primary stated advantages are: 1) building a set of semantic wikilinks (SWL) will hopefully motivate WP to move on a technical solution, 2) this template will allow easy migration of SWLs once a technical solution is decided, and 3) the set of SWLs can immediately be used by external tools and programs, making WP content more useful. The primary stated disadvantages are: 1) usage of {{SWL}} will make the wikicode less readable, 2) there are possibly other existing solutions that accomplish similar goals, and 3) there is some question of the utility of these semantic links. More discussion and comments are appreciated. AndrewGNF (talk) 19:14, 24 July 2009 (UTC)

  • Oppose - IMO, having articles that are easy to edit is more important than creating such semantic data. We need to improve that before we can start cluttering up wikitext even more. Argument number 2 in favor is rather misleading. Yes conversion would be easier, but the overall work would be greater, as we still would have to manually add the template to pages, then convert it later. The template also isn't significantly easier to use to extract data from. By not doing anything with the "type" parameter, it still requires scripts to download and parse the wikitext, so its only a marginal improvement in efficiency/reliability with a potentially huge decrease in the usability of wikitext. Such a design also makes it impossible to include the template in other templates like infoboxes. Mr.Z-man 21:29, 24 July 2009 (UTC)
Comment: Apologies, I wasn't meaning to be misleading. My rationale for stating that migration would be "easy" is twofold. First, the change might be able to be made directly within the template itself. For example, if Semantic Mediawiki is the solution chosen, then this template could be modified from [[{{{target}}}]] to [[{{{type}}}:{{{target}}}]], updating all existing links in one step. Second, a bot could easily scan through and reformat the template to whatever syntax was necessary. Anyway, my overarching point is that this template allows the community to start collecting/contributing semantic links now, rather than waiting until the technical solution is in place and starting from zero then. (This reasoning, of course, presumes that one believes that WP should enable semantic links in the future.) Cheers, AndrewGNF (talk) 22:01, 24 July 2009 (UTC)
I still think you're missing my point. Migration would likely be easy, but the overall work would be harder. People would have to learn the template syntax now, then another syntax once we get a real software solution (I certainly hope we wouldn't keep an ugly template system around once its part of the software). Also, this assumes that any semantic data system will be either Semantic MediaWiki (which as noted above, has some potential performance issues) or would work the exact same way, by adding the data with wikilinks. By starting such a system now, we're basically locking ourselves in to one specific method (otherwise the learning curve would be steeper and conversion would be more difficult). Mr.Z-man 14:53, 26 July 2009 (UTC)
Thanks for clarifying. Yes, users having to learn a new syntax when a specific technical solution is in place is another disadvantage. I tend to think this is outweighed by the opportunity to start assembling semantic links now, but I agree that reasonable people can disagree on this point. Cheers, AndrewGNF (talk) 21:53, 28 July 2009 (UTC)
  • Support — I share the vision that WP should at some point in the future enable semantic links. {{SWL}} is a simple way to get this started now, and I see no problem to create a bot that would convert this template to anything deemed more suitable once the time for semantic links at WP has come. ---- Daniel Mietchen 00:10, 26 July 2009 (UTC)
    • This is ignoring the serious flaws in this template. Particularly that it cannot be embedded into other templates (like infoboxes) that aren't subst'd and still work properly. Mr.Z-man 14:53, 26 July 2009 (UTC)
  • Oppose. I suspect that all three disadvantages are real. On the other hand, only the second advantage is probably real. For example, the third advantage is: "the set of SWLs can immediately be used by external tools and programs, making WP content more useful". But we know that this template is not going to stay for a long time (either it will be deleted as unnecessary, or it will be replaced by, let's say, Semantic MediaWiki links). Will someone write a serious (not just "proof of concept") program to use this template? I suspect that it is more likely that the development will be postponed until the template is replaced by something more permanent. Thus the data is not going to be used in any serious way any time soon. And that destroys the first advantage ("building a set of semantic wikilinks (SWL) will hopefully motivate WP to move on a technical solution"): why should one learn to enter the data that is not going to be used? Even worse, the way to enter that data is going to be changed in the near future... Thus there won't be much data entered. And that makes even the second advantage right but useless... So, if there are several disadvantages, but no relevant advantages, then the idea should not be implemented. --Martynas Patasius (talk) 17:55, 27 July 2009 (UTC)
I think your comments are all valid. I think I weight the issues differently for two reasons. First, the fact that a semantic wikipedia has been extensively discussed in the past without any action plan suggests to me that no real solution is on the horizon. And second, having some data at least enables the possibility of a tool being developed, whereas having no semantic data (the status quo) guarantees that there will be no tool. And as you may guess from above, I have a tool in mind that relates to the Gene Wiki. Appreciate the comments... Cheers, AndrewGNF (talk) 21:53, 28 July 2009 (UTC)
Actually, I am not sure that it has been discussed that extensively ([1] - adding quotation marks leaves just 8 pages with rather short discussions in most cases)... And I guess there might be a bigger problem than having no data: having no applications that would justify all the work... For the example of a possible application (forming the list of people who died at a certain age - Wikipedia:Village_pump_(proposals)/Archive_27#New section for Number articles: Age at time of death) doesn't seem to be worth the effort... Thus it would be interesting to find out more about your application - although I am already afraid of possibility that some day the vandals will be able to damage not just encyclopedia that is known to be "useful but unreliable" anyway, but also some important research project and (indirectly) our health... Wouldn't it be better to have a different wiki for that? Something somewhat similar to Wikispecies perhaps? --Martynas Patasius (talk) 00:27, 29 July 2009 (UTC)
You're right, perhaps it is more accurate to say that it has been brought up on many occasions, but nothing tangible has happened. I think part of that inertia is the fact that we don't have real world applications because we don't have real world data. What's the use case I'm thinking of? Well, right now, the Gene Wiki part of Wikipedia draws its data from many "gene annotation" databases (e.g., Uniprot, Entrez Gene, Ensembl), but that's a one-way street. If we can get some of our amazing wikipedia content here into more structured content, then I want to set up a pipeline where the WP article can be one more source that an official curator can review. So contributions from the community have a chance of actually making it back into an official database that scientists use to make discoveries. (Don't worry, the "curation" part ensures that an professional reviews it before it is accepted.) There are other applications that I can easily imagine relating to gene set enrichment analysis (alluded to here). But the bottom line is that by starting to create the data warehouse, you enable the entire field of bioinformatics to dream up new data mining tools and analyses. I hope this context is useful... Cheers, AndrewGNF (talk) 05:30, 29 July 2009 (UTC)
  • Support: this template can help to automate the building of the LinkedData. It will greatly help some tool such as Freebase or Dbpedia, to extract structured information from Wikipedia and to make this information available on the Web. In the case of bioinformatics, it could become an outstanding interest as the relation between the Genes, the Diseases, etc... would be semantically defined and would help scientists to link their research and the knowledge. But there is not only bioinformatics, this template could be used, for example, to semantically define the relationship between the individuals (parent/children), etc... I would even suggest to use this template in a more effecient way: rather than using a simple word for defining the type, I would use an URI from a defined ontology to better describe the relation between the components. For example, instead of using
 {{SWL|target=Glycosphingolipids|type=produces}} 

I would use

 {{SWL|target=Glycosphingolipids|type=http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Biosynthesis}} 

I tested this idea here, with NSP3.--Plindenbaum (talk) 08:01, 30 July 2009 (UTC)

I agree that we should link semantic types to ontologies. But I personally think we should do that behind the scenes (and resolve ambiguities and synonyms in the background as well), since asking the average contributor to figure out the right term in an ontology is a bit much IMHO. I think we ask domain experts to contribute their knowledge however they want, and then let ontologists and information scientists format things properly. This follows the existing WP model where we have groups of users with dedicated functions -- copy editors, governance people, image/figure generation, etc.
You probably also noticed that the category link at the bottom of NSP3 is broken. The template auto-creates categories to track usage of the template, and that apparently doesn't like the URI. Anyway, if we decide to go with the URIs, I'm pretty sure we could figure out a way to fix this... Cheers, AndrewGNF (talk) 16:01, 30 July 2009 (UTC)
  • Oppose for the reasons I stated above.   M   03:04, 4 August 2009 (UTC)
  • Support: I support this proposal for reasons cited above. Since its been more than 2 years now since this discussion took place and the WMF have not moved one bit to implement a semantic solution I think the arguments for bootstrapping one with the SWL template are even more valid now then they were then. As far as cluttering up the markup.. I don't think anything could be worse than what editors have to deal with now for inline references yet that seems to have worked out alright. With the promotion of the gadgets (userscripts) framework, I can imagine many more opportunities for using this data even directly in the context of Wikipedia as well as scripts that will simplify SWL-authoring in the same way the Cite scripts simplify reference insertion (and hide its horrendous influence on markup). Benjamin Good (talk) 18:03, 26 September 2011 (UTC)

Relation to Microformats[edit]

Have you considered the use of microformats to do this? They seem to be at least partially accepted as a standard for doing this kind of work. Though the use of css for semantic encoding seems strange to me, it also seems like about the same thing could be achieved with this pattern as with the current semantic wiki link syntax and there is already a user community and technology for processing the data. Thoughts? Benjamin Good (talk) 19:57, 13 May 2010 (UTC)

Type parameter conversion[edit]

One thing we've noticed while working on this template is that, currently, the template renders {{SWL | type=substrate_for | target=Protein kinase A | label=PKA}} roughly as

<span class="swl">
    <span class="substrate_for">
        <a href="/wiki/Protein_kinase_A" title="Protein kinase A">PKA</a>
    </span>
</span>

As you can see, the type parameter gets converted into a span class. Microformats ordinarily display the relevant information in rel= tags, but due to limitations on user HTML in MediaWiki, the rel tag is unavailable. This may lead to issues since the template allows users to arbitrarily specify the type, and if they use spaces instead of underscores, the type becomes a collection of class attributes. For example, without the underscore, a DOM parser would understand the second span to have the classes "substrate" and "for"; if the type were something silly like "used as NavHeader", it may inadvertently inherit CSS and JS behaviors from the preexisting NavHeader class used in MediaWiki.

I personally don't think this will be too much of an issue, as MediaWiki CSS tends to be well-scoped (div.NavHeader instead of just .NavHeader), but regardless, we should have an automated system for replacing spaces in the {{SWL}} type parameter with underscores. I think a very simple bot would be appropriate, that would simply monitor transclusions of the template and make edits as needed. Thoughts? --Pleiotrope (talk) 17:55, 12 October 2011 (UTC)