User:Bazzargh/citemark

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This script is a Bookmarklet that extracts useful metadata from most webpages. Unlike more sophisticated tools like Zotero, it doesn't try to make data fit the citation template; it blindly dumps out a useful set of metadata in citation format, leaving you to edit the results. I find that it's easier to remove data that I've extracted by mistake rather than have to type in data that other tools failed to extract.

I was surprised not to find something like this already in existence; if there's a better one already out there, let me know!

Firefox, Safari, Chrome, and Opera[edit]

Usage: Create a bookmark with the value below. Go to the web page you want to cite, and click the bookmark. An alert will appear, from which you can copy and paste the text of the citation (press Cmd-A Cmd-C on Mac, Ctrl-A Ctrl-C on windows. Firefox won't highlight the text you're copying on Mac). If you select text before using the bookmark, that will appear as the quote parameter in the citation.

The bookmark URL, with comments and whitespace stripped, as an URL-encoded string for convenience:

javascript:function%20h(t){var%20z=document.getElementsByTagName(t)[0];return%20z&&z.childNodes[0]?e(t,z.childNodes[0].nodeValue):'';}function%20e(a,b){var%20d=Date.parse(b);return%20a&&b?('|'+a+'='+(d?f(d):b)+'\n'):''}function%20z(x){return%20x<10?'0'+x:x}function%20f(x){var%20d=new%20Date(x);return%20d.getFullYear()+'-'+z(d.getMonth()+1)+'-'+z(d.getDate())}var%20s=e('url',window.location)+e('accessdate',Date())+e('title',document.title)+e('date',document.lastModified)+e('quote',window.getSelection()+'')+h('h1')+h('h2');var%20m=document.getElementsByTagName('meta');for(var%20i=0;i<m.length;i++){s+=e(m[i].getAttribute('name'),m[i].getAttribute('content'))}alert('{{cite%20web'+s+'}}')

Source[edit]

Here's the source code, with additional whitespace and comments. I'd encourage you to use this version and edit it down into a bookmark, rather than the condensed version above; don't just trust javascript you find on the net!

javascript:
function h(t){ // grab the text of the first element with a tag name
 var z=document.getElementsByTagName(t)[0];
 return z&&z.childNodes[0]?e(t,z.childNodes[0].nodeValue):'';
}
function e(a,b){ // format a template attr
 var d=Date.parse(b);return a&&b?('|'+a+'='+(d?f(d):b)+'\n'):''
}
function z(x){return x<10?'0'+x:x} // zero-pad numbers in dates
function f(x){ // format a date
 var d=new Date(x);return d.getFullYear()+'-'+z(d.getMonth()+1)+'-'+z(d.getDate()) 
}
var s=e('url',window.location) // grab the url... etc
 +e('accessdate',Date())
 +e('title',document.title)
 +e('date',document.lastModified)
 +e('quote',window.getSelection()+'') // grab the selection
 +h('h1')+h('h2'); // h1, h2 may potentially contain the 'real' title
var m=document.getElementsByTagName('meta'); // grab all metadata
for(var i=0;i<m.length;i++){s+=e(m[i].getAttribute('name'),m[i].getAttribute('content'))} 
alert('{{cite web'+s+'}}') //dump everything for copy and paste

Notes: this version hasn't been tested in Safari yet - I don't know if window.getSelection works there but everything else should be ok. There's better ways to grab the h1, h2 tags, but unfortunately they interfere with the NoScript extension.

Internet Explorer[edit]

Internet explorer won't support bookmarklets longer than 512 characters, so there's a limit to what can be achieved. The URL below will capture the selection, and create a citation with the url, title, date, accessdate, and quote parameters set. When you change a favourite to this value in IE, it will complain that "The protocol 'javascript' does not have a registered program". You need to click yes to use the bookmarklet, javascript URLs do work in IE. Tested in IE7/Win only, its all I have handy.

Usage: Create a favourite with the value below. Go to the web page you want to cite, and click the favourite. An alert will appear, from which you can copy and paste the text of the citation (Ctrl-A Ctrl-C). If you select text before using the favourite, that will appear as the quote parameter in the citation.

The favourite URL, with comments and whitespace stripped, as an URL-encoded string for convenience:

javascript:function%20e(a,b){var%20d=Date.parse(b);return%20a&&b?('|'+a+'='+(d?f(d):b)+'\n'):''}function%20z(x){return%20x<10?'0'+x:x}function%20f(x){var%20d=new%20Date(x);return%20d.getFullYear()+'-'+z(d.getMonth()+1)+'-'+z(d.getDate())}var%20s=e('url',window.location)+e('accessdate',Date())+e('title',document.title)+e('date',document.lastModified)+e('quote',document.selection.createRange().text);alert('{{cite%20web'+s+'}}')

Source[edit]

Mostly this is cut down from the firefox version, but the selection is captured differently.

javascript:
function e(a,b){
 var d=Date.parse(b);
 return a&&b?('|'+a+'='+(d?f(d):b)+'\n'):''
}
function z(x){return x<10?'0'+x:x}
function f(x){
 var d=new Date(x);
 return d.getFullYear()+'-'+z(d.getMonth()+1)+'-'+z(d.getDate())
}
var s=e('url',window.location)+e('accessdate',Date())
 +e('title',document.title)+e('date',document.lastModified)
 +e('quote',document.selection.createRange().text);
alert('{{cite web'+s+'}}')

Examples[edit]

An example of the output (in IE you will see much less):

{{cite web|url=http://news.bbc.co.uk/1/hi/health/4014597.stm
|accessdate=2008-02-07
|title=BBC NEWS | Health | Smoking ban proposed for England
|date=2008-02-07
|h2=bbc.co.uk Navigation
|keywords=BBC, News, BBC News, news online, world, uk, international, foreign, british, online, service
|OriginalPublicationDate=2004-11-16
|UKFS_URL=/1/hi/health/4014597.stm
|IFS_URL=/2/hi/health/4014597.stm
|Headline=Smoking ban proposed for England
|Section=Health
|Description=Campaigners say government plans to ban smoking in restaurants, clubs and most pubs do not go far enough.
}}

Obviously you need to edit this down - the guessed 'date' is wrong but the correct one is in there with a different name; the same goes for the title. Also, most of the fields use names that are not valid for {{cite web}} and need to be removed. After editing, we have:

{{cite web|url=http://news.bbc.co.uk/1/hi/health/4014597.stm
|accessdate=2008-02-07
|publisher=bbc.co.uk
|date=2004-11-16
|title=Smoking ban proposed for England
|quote=Campaigners say government plans to ban smoking in restaurants, clubs and most pubs do not go far enough.
}}

Which formats the reference like so:

"Smoking ban proposed for England". bbc.co.uk. 2004-11-16. Retrieved 2008-02-07. Campaigners say government plans to ban smoking in restaurants, clubs and most pubs do not go far enough. 

Tips[edit]

If you select text on the page before using the bookmarklet, it will add this to the citation as a quote. However, this can be useful just to quickly copy and paste additional info with the citation. As an example, this article:

"Linux Today - Dave Whitinger and Miguel de Icaza at the ZD Open Source Forum". 1999-07-08. Retrieved 2008-02-08. 

Does not, at the time of writing, serve up the metadata required for the document creation date. To create the citation, I highlighted the text on the page Jul 8, 1999 before using citemark. It was recognized as a date and automatically reformatted to the date format used by {{cite web}}, as quote=1999-07-08.

More usually, you can use this to copy and paste a larger chunk of text from the page at the same time as grabbing the url, title, and access date.