User:Dr pda/prosesize

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This script adds a Page size link to the toolbox, i.e. the box in the left hand column (by default) which also contains What links here (among other things). Clicking on this link displays some statistics about the page and prose size (see below), and highlights the 'readable prose'. Clicking the link again turns these off. Sizes are displayed in kilobytes (kB), or in bytes if the value is less than 10kB.

For the alternative version which always displays sizes in bytes, not kilobytes, see User:Dr pda/prosesizebytes.js.

How to get it working[edit]

Installing the script[edit]

Add {{subst:js|User:Dr_pda/prosesize.js}} to your skin script file (i.e. User:YourUserName/vector.js when using the Vector skin) and save it.

After saving, you have to bypass your browser's cache to see the changes. Mozilla/Safari: hold down Shift while clicking Reload (or press Ctrl-Shift-R), Internet Explorer: press Ctrl-F5, Opera/Konqueror/Mozilla: press F5.

To try without installing[edit]

An alternative way to run the script without installing it is to go to the page you are interested in, then paste

javascript:importScript('User:Dr pda/prosesize.js'); getDocumentSize();

into the address bar of your browser instead of the URL. It's also possible to make this a bookmark, to save having to type it out each time.

Please note that running the script without installing it does not work in all browsers. It will not work in Google Chrome at all, and may not work in relatively new versions of Mozilla and Internet Explorer.

Sample output[edit]

Document statistics:

  • File size: 89 kB
  • Prose size (including all HTML code): 28 kB
  • References (including all HTML code): 10 kB
  • Wiki text: 31.8 kB
  • Prose size (text only): 18 kB (3310 words) "readable prose size"
  • References (text only): 4 kB
  • Images: 443 kB

Quick summary[edit]

  • File size: size of HTML document
  • Prose size (including all HTML code): size of HTML within <p></p> tags
  • References (including all HTML code): size of HTML for cite.php references
  • Wiki text: size of text+markup within the edit box
  • Prose size (text only): size of text within <p></p> tags. This is the so-called "readable prose size"
  • References (text only): size of text for cite.php references
  • Images: size of image thumbnails (Internet Explorer only)

File size[edit]

This is the total size of the HTML document. If you went to View->Page Source (or the equivalent) in your browser, and saved the resulting output to your computer, the file size would be the size of this file. This number does not include any images. The file size (plus the image size) is what you need to look at when considering how long a page will take to load.

For Internet Explorer this number is obtained from the document.fileSize property. For other browsers it is obtained by loading the page again with an XMLHttpRequest, so this number may take a few seconds to appear.

Prose size[edit]

Wikipedia:Article size says

there [are] stylistic reasons why the main body of an article should not be unreasonably long, including readability issues ... For stylistic purposes, only the main body of prose (excluding links, see also, reference and footnote sections, and lists/tables) should be counted toward an article's total size, since the point is to limit the size of the main body of prose.

One of the main motivations for this script was to provide a convenient way of calculating the prose size. The technique used is to just count the text within <p></p> tags in the HTML source of the document, which corresponds almost exactly to the definition of 'readable prose'. (Feb 2011: The script has been updated to now count text in <blockquote> tags as well.) This method is not perfect however and may include text which isn't prose (eg in navboxes), or exclude text which is (eg in {{cquote}}, or prose written in bullet-point form, eg Anarchism#Recent developments within Anarchism). The text counted as prose is highlighted in yellow, so it is easy to see whether the prose size is over or underestimated.

Two numbers are given for the prose size: HTML and text only. The HTML size is the size of the HTML code contained within <p></p> tags. This number can be compared to the file size to see how much of the document consists of readable prose. The text-only size is the size of just the words, without any formatting. (This is what you would get if you copied and pasted the prose from the article into something like notepad, which strips out all the formatting). The word count is self-explanatory, and is calculated from the number of spaces in the text-only prose. Note that Internet Explorer highlights the section headings, but does not count them as prose. (This is because there is an 'invisible' <p></p> before them containing a link so that you jump to the right place when you click the appropriate section in the table of contents.)

References size[edit]

Now that cite.php inline citations are becoming very common, it is often useful to know how much of the article size comes from these references. The HTML references size is the size of what is produced by the <references/> tag, plus the size of the HTML to produce the markers (i.e. [1]). The text-only size is again just the text of the references, plus the text of the markers. Note that the contribution of the markers is explicitly subtracted from both prose size numbers. The markers also should not affect the word count, since there should be no spaces between them and the preceding word/punctuation.

Wiki text size[edit]

In addition to the above numbers, which are calculated from the HTML source of the page, there is also the size of the text plus wiki markup which appears in the edit box when you edit a page. This number is shown next to each revision on the History tab, and is also the same number which appears in warnings about page length (e.g. Note: This page is 37 kilobytes long.). The prose size script queries the API automatically to retrieve this value for the current article. This involves another XMLHttpRequest, so it may take a few seconds for the number to appear; if there is a problem with the search, the script will display an error message.

Images size[edit]

N.B. This only works in Internet Explorer (or browsers supporting the element.fileSize parameter).

This number is the total size of the image thumbnails, i.e. the size of the images which actually appear on the page, not the full size versions they link to. The total number/size of images affects how long the page takes to load, although the text of the page is loaded first and hence readable while the images are still loading. It is also possible to turn off images to speed up loading of the page. Note that the script only counts images within the article (i.e. not the WP logo, skin background, etc). It also currently counts every occurrence of a repeated image, whereas the browser only needs to download it once (this would have an effect on pages with many flags denoting nationality for example).