Lightweight markup language

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

For instance, a person downloading a software library might prefer to read the documentation in a text editor rather than a web browser. Another application for such languages is to provide for data entry in web-based publishing, such as weblogs and wikis, where the input interface is a simple text box. The server software then converts the input into a common document markup language like HTML.

History[edit]

Lightweight markup languages were originally used on text-only displays which could not display characters in italics or bold, so informal methods to convey this information had to be developed. This formatting choice was naturally carried forth to plain-text email communications. Console browsers may also resort to similar display conventions.

In 1986 international standard SGML provided facilities to define and parse lightweight markup languages using grammars and tag implication. The 1998 W3C XML is a profile of SGML that omits these facilities. However, no SGML DTD for any of the languages listed below is known.

Types[edit]

Lightweight markup languages can be categorized by their tag types. Like HTML (<b>bold</b>), some languages use named elements that share a common format for start and end tags (e.g. BBCode [b]bold[/b]), whereas proper lightweight markup languages are restricted to ASCII-only punctuation marks and other non-letter symbols for tags, but some also mix both styles (e.g. Textile bq. ) or allow embedded HTML (e.g. Markdown), possibly extended with custom elements (e.g. MediaWiki <ref>source</ref>).

Most languages distinguish between markup for lines or blocks and for shorter spans of texts, but some only support inline markup.

Some markup languages are tailored for a specific purpose, such as documenting computer code (e.g. POD, RD) or being converted to a certain output format (usually HTML) and nothing else, others are more general in application. This includes whether they are oriented on textual presentation or on data serialization.[clarification needed]

Presentation oriented languages include AsciiDoc, atx, BBCode, Creole, Crossmark, Epytext, Haml, JsonML, MakeDoc, Markdown, Org-mode, POD, reST, RD, Setext, SiSU, SPIP, Xupl, Texy!, Textile, txt2tags, UDO and Wikitext.

Data serialization oriented languages include Curl (homoiconic, but also reads JSON; every object serializes), JSON, OGDL, and YAML.

Comparison of language features[edit]

Comparing language features
Language HTML export tool HTML import tool Tables Link titles class attribute id attribute Release date
AsciiDoc Yes Yes Yes Yes Yes Yes November 25, 2002[1]
BBCode No No Yes No No No 1998
Creole No No Yes No No No July 4, 2007[2]
GitHub Flavored Markdown Yes No Yes Yes No No ?
Markdown Yes Yes Yes Yes Yes/No Yes/No March 19, 2004[3][4]
Markdown Extra Yes Yes Yes[5] Yes Yes Yes ?
MediaWiki Yes Yes Yes Yes Yes Yes 2002[6]
MultiMarkdown Yes No Yes Yes No No ?
Org-mode Yes Yes[7] Yes Yes Yes Yes 2003[8]
PmWiki Yes[9] Yes Yes Yes Yes Yes January, 2002
POD Yes ? No Yes ? ? 1994
reStructuredText Yes Yes[7] Yes Yes Yes auto April 2, 2002[10]
Textile Yes No Yes Yes Yes Yes December 26, 2002[11]
Texy Yes Yes Yes Yes Yes Yes 2004[12]
txt2tags Yes Yes[13] Yes[14] Yes Yes/No Yes/No July 26, 2001[15]
Slack No No No Yes No No [16][17]
WhatsApp No No No No No No March 16, 2016[18]

Markdown's own syntax does not support class attributes or id attributes; however, since Markdown supports the inclusion of native HTML code, these features can be implemented using direct HTML. (Some extensions may support these features.)

txt2tags' own syntax does not support class attributes or id attributes; however, since txt2tags supports inclusion of native HTML code in tagged areas, these features can be implemented using direct HTML when saving to an HTML target.[19]

Comparison of implementation features[edit]

Comparing implementations, especially output formats
Language Implementations XHTML Con/LaTeX PDF DocBook ODF EPUB DOC(X) LMLs Other License
AsciiDoc Python, Ruby, JavaScript XHTML LaTeX PDF DocBook ODF EPUB No  — Man page etc. GNU GPL, MIT
BBCode Perl, PHP, C#, Python, Ruby (X)HTML No No No No No No Public Domain
Creole PHP, Python, Ruby, JavaScript[20] Depends on implementation CC_BY-SA 1.0
GitHub Flavored Markdown Haskell (Pandoc) HTML LaTeX, ConTeXt PDF DocBook ODF EPUB DOC AsciiDoc, reST OPML GPL
Java,[21] JavaScript,[22][23][24] PHP,[25][26] Python,[27] Ruby[28] HTML[22][23][24][26][27] No No No No No No Proprietary
Markdown Perl (originally), C,[29][30] Python,[31] JavaScript, Haskell,[7] Ruby,[32] C#, Java, PHP HTML LaTeX, ConTeXt PDF DocBook ODF EPUB RTF MediaWiki, reST Man page, S5 etc. BSD-style & GPL (both)
Markdown Extra PHP (originally), Python, Ruby XHTML No No No No No No BSD-style & GPL (both)
MediaWiki Perl, PHP, Haskell, Python XHTML No No No No No No GNU GPL
MultiMarkdown C, Perl (X)HTML LaTeX PDF No ODF No DOC, RTF OPML GPL, MIT
Org-mode Emacs Lisp, Ruby (parser only), Perl, OCaml XHTML LaTeX PDF DocBook ODF EPUB[33] DOCX[33] Markdown TXT, XOXO, iCalendar, Texinfo, man, contrib: groff, s5, deck.js, Confluence Wiki Markup[34], TaskJuggler, RSS, FreeMind GPL
PmWiki PHP XHTML 1.0 Transitional, HTML5 No PDF export addons No No EPUB export addon No GNU GPL
POD Perl (X)HTML, XML LaTeX No DocBook No No RTF Man page, plain text Artistic License, Perl's license
reStructuredText Python,[35][36] Haskell (Pandoc), Java, HTML, XML LaTeX PDF DocBook ODF EPUB DOC man, S5, Devhelp, QT Help, CHM, JSON Public Domain
Textile PHP, JavaScript, Java, Perl, Python, Ruby, ASP, C#, Haskell XHTML No No No No No No Textile License
Texy! PHP, C# (X)HTML No No No No No No GNU GPL v2 License
txt2tags Python,[37] PHP[38] (X)HTML, SGML LaTeX PDF DocBook ODF EPUB DOC Creole, AsciiDoc, MediaWiki, MoinMoin, PmWiki, DokuWiki, Google Code Wiki roff, man, MagicPoint, Lout, PageMaker, ASCII Art, TXT GPL

Comparison of lightweight markup language syntax[edit]

Although usually documented as yielding italic and bold text, most lightweight markup processors output semantic HTML elements em and strong instead. Monospaced text may either result in semantic code or presentational tt elements. Few languages make a distinction, e.g. Textile, or allow the user to configure the output easily, e.g. Texy.

LMLs sometimes differ for multi-word markup where some require the markup characters to replace the inter-word spaces (infix). Some languages require a single character as prefix and suffix, other need doubled or even tripled ones or support both with slightly different meaning, e.g. different levels of emphasis.

Comparing text formatting syntax
HTML output <strong>strongly emphasized</strong> <em>emphasized text</em> <code>code</code> semantic
<b>bold text</b> <i>italic text</i> <tt>monospace text</tt> presentational
AsciiDoc *bold text* 'italic text' +monospace text+ Can double operators to apply formatting where there is no word boundary (for example **b**old t**ex**t yields bold text).
_italic text_ `monospace text`
ATX *bold text* _italic text_ |monospace text| email style
Creole **bold text** //italic text// {{{monospace text}}} Triple curly braces are for nowiki which is optionally monospace.
Markdown[39] **bold text** *italic text* `monospace text` semantic HTML tags
__bold text__ _italic text_
MediaWiki '''bold text''' ''italic text'' <code>monospace text</code> mostly resorts to inline HTML
Org-mode *bold text* /italic text/ =code=
~verbatim~
PmWiki '''bold text''' ''italic text'' @@monospace text@@
reST **bold text** *italic text* ``monospace text``
Setext **bold text** ~italic text~ N/A
Textile[40] *strong* _emphasis_ @monospace text@ semantic HTML tags
**bold text** __italic text__ presentational HTML tags
Texy! **bold text** *italic text* `monospace text` semantic HTML tags by default, optional support for presentational tags
//italic text//
txt2tags **bold text** //italic text// ``monospace text``
POD B<bold text> I<italic text> C<monospace text> Indented text is also shown as monospaced code.
BBCode [b]bold text[/b] [i]italic text[/i] [code]monospace text[/code] Formatting works across line breaks.
Slack *bold text* _italic text_ `monospace text` ```block of monospaced text```
WhatsApp *bold text* _italic text_ ```monospace text```
Bold face or strong emphasis
Code AsciiDoc ATX Creole Markdown MediaWiki Org-mode PmWiki reST Setext Slack Textile Texy! txt2tags WhatsApp
*bold* Yes Yes No No No Yes No No No Yes Yes No No Yes
**bold** Yes No Yes Yes No No No Yes Yes No Yes Yes Yes No
__bold__ No No No Yes No No No No No No No No No No
'''bold''' No No No No Yes No Yes No No No No No No No
Italic type or normal emphasis
Code AsciiDoc ATX Creole Markdown MediaWiki Org-mode PmWiki reST Setext Slack Textile Texy! txt2tags WhatsApp
*italic* No No No Yes No No No Yes No No No Yes No No
**italic** No No No No No No No No No No No No No No
_italic_ Yes Yes No Yes No No No No No Yes Yes No No Yes
__italic__ Yes No No No No No No No No No Yes No No No
'italic' Yes No No No No No No No No No No No No No
''italic'' Yes No No No Yes No Yes No No No No No No No
/italic/ No No No No No Yes No No No No No No No No
//italic// No No Yes No No No No No No No No Yes Yes No
~italic~ No No No No No No No No Yes No No No No No
Underlined text
Code AsciiDoc ATX Creole Markdown MediaWiki Org-mode PmWiki reST Setext Slack Textile Texy! txt2tags WhatsApp
_underline_ No No No No No Yes No No Yes No No No No No
__underline__ No No No No No No No No No No No No Yes No
Strike-through text
Code AsciiDoc ATX Creole Markdown MediaWiki Org-mode PmWiki reST Setext Slack Textile Texy! txt2tags WhatsApp
~stricken~ No No No No No No No No No Yes No No No Yes
~~stricken~~ No No No GFM No No No No No No No No No No
+stricken+ No No No No No Yes No No No No No No No No
--stricken-- No No No No No No No No No No No No Yes No
Monospaced font, teletype text or code
Code AsciiDoc ATX Creole Markdown MediaWiki Org-mode PmWiki reST Setext Slack Textile Texy! txt2tags WhatsApp
@code@ No No No No No No No No No No Yes No No No
@@code@@ No No No No No No Yes No No No No No No No
`code` Yes No No Yes No No No No No Yes No Yes No No
``code`` Yes No No Yes No No No Yes No No No No Yes No
```code``` No No No Yes No No No No No Yes/No No No No Yes
=code= No No No No No Yes No No No No No No No No
~code~ No No No No No Yes No No No No No No No No
+code+ Yes No No No No No No No No No No No No No
++code++ Yes No No No No No No No No No No No No No
{{{code}}} No No Yes No No No No No No No No No No No
|code| No Yes No No No No No No No No No No No No

Heading syntax[edit]

Headings are usually available in up to six levels, but the top one is often reserved to contain the same as the document title, which may be set externally. Some documentation may associate levels with divisional types, e.g. part, chapter, section, article or paragraph.

Most LMLs follow one of two styles for headings, either Setext-like underlines or atx-like[41] line markers, or they support both.

Underlined headings[edit]

Level 1 Heading
===============

Level 2 Heading
---------------

Level 3 Heading
~~~~~~~~~~~~~~~

The first style uses underlines, i.e. repeated characters (e.g. equals =, hyphen - or tilde ~, usually at least two or four times) in the line below the heading text.

Underlined heading levels
Chars: = - ~ * # + ^ _ : ` < > min
Markdown 1 2 No No No No No No No No No No No No 1
Setext 1 2 No No No No No No No No No No No No ?
AsciiDoc 1 2 3 No No No No No No No No No No No 2
Texy! Yes Yes No Yes Yes No No No No No No No No No ?
reStructuredText Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes heading width

RST and Texy determine heading levels dynamically, which makes authoring more individual on the one hand, but complicates merges from external sources on the other hand.

Prefixed headings[edit]

# Level 1 Heading
## Level 2 Heading ##
### Level 3 Heading ###

The second style is based on repeated markers (e.g. hash #, equals = or asterisk *) at the start of the heading itself, where the number of repetitions indicates the (sometimes inverse) heading level. Most languages also support the reduplication of the markers at the end of the line, but whereas some make them mandatory, others do not even expect their numbers to match.

Line prefix (and suffix) headings
Character: = # * ! + Suffix Levels Indentation
AsciiDoc Yes No No No No Optional 1–6 No
ATX No Yes No No No No unlimited ?
Creole Yes No No No No Optional 1–6 No
MediaWiki Yes No No No No Yes 1–6 No
txt2tags Yes No No No Yes Yes 1–6 No
Markdown No Yes No No No Optional 1–6 No
Texy! No Yes No No No Optional 6–1 or 1–6, dynamic No
Org-mode No No Yes No No No 1– +∞ alternative[42][43]
PmWiki No No No Yes No Optional 1–6 No

POD and Textile choose the HTML convention of numbered heading levels instead. Org-mode supports indentation as a means of indicating the level. BBCode does not support section headings at all.

Other heading formats
Language Format
POD
=head1 Level 1 Heading
=head2 Level 2 Heading
Textile[40],
Jira
h1. Level 1 Heading
h2. Level 2 Heading
h3. Level 3 Heading
h4. Level 4 Heading
h5. Level 5 Heading
h6. Level 6 Heading

Link syntax[edit]

Hyperlinks can either be added inline, which may clutter the code because of long URLs, or with named alias or numbered id references to lines containing nothing but the address and related attributes and often may be located anywhere in the document. Most languages allow the author to specify text Text to be displayed instead of the plain address http://example.com and some also provide methods to set a different link title Title which may contain more information about the destination.

LMLs that are tailored for special setups, e.g. wikis or code documentation, may automatically generate named anchors (for headings, functions etc.) inside the document, link to related pages (possibly in a different namespace) or provide a textual search for linked keywords.

Most languages employ (double) square or angular brackets to surround links, but hardly any two languages are completely compatible. Many can automatically recognize and parse absolute URLs inside the text without further markup.

Inline hyperlink syntax
Languages Basic syntax Text syntax Title syntax
BBCode, Creole, MediaWiki, PmWiki http://example.com
Textile "Text":http://example.com "Text (Title)":http://example.com
Texy! "Text .(Title)":http://example.com
AsciiDoc http://example.com[Text]
Slack <http://example.com|Text>
txt2tags [http://example.com] [Text http://example.com]
MediaWiki [http://example.com Text]
Creole, MediaWiki, PmWiki [[Name]] [[Name|Text]]
Org-mode [[Name][Text]]
Creole [[Namespace:Name]] [[Namespace:Name|Text]]
Org-mode [[Namespace:Name][Text]]
Creole, PmWiki [[http://example.com]] [[http://example.com|Text]]
BBCode [url]http://example.com[/url] [url=http://example.com]Text[/url]
Markdown <http://example.com> [Text](http://example.com) [Text](http://example.com "Title")
reStructuredText `Text <http://example.com/>`_
POD L</Name>
POD L<http://example.com/>
Reference syntax
Languages Text syntax Title syntax
reStructuredText
... Name_ ...
.. _Name: http://example.com
ATX
... [Text] ...
[Text] http://example.com
AsciiDoc
... [[id]] ...
<<id>>
... [[id]] ...
<<id,Text>>
... anchor:id ...
xref:id
... anchor:id ...
xref:id[Text]
Markdown
... [Text][id] ...
[id]: http://example.com
... [Text][id] ...
[id]: http://example.com "Title"
... [Text][] ...
[Text]: http://example.com
... [Text][] ...
[Text]: http://example.com "Title"
... [Text] ...
[Text]: http://example.com
... [Text] ...
[Text]: http://example.com "Title"
Org-mode Org-mode's normal link syntax does a text search of the file. You can also put in dedicated targets with <<id>>.
Textile
... "Text":alias ...
[alias]http://example.com
... "Text":alias ...
[alias (Title)]http://example.com
Texy!
... "Text":alias ...
[alias]: http://example.com
... "Text":alias ...
[alias]: http://example.com .(Title)


List syntax[edit]

HTML requires an explicit element for the list, specifying its type, and one for each list item, but most lightweight markup languages need only different line prefixes for the bullet points or enumerated items. Some languages rely on indentation for nested lists, others use repeated parent list markers.

Unordered, bullet list items
Characters: * - + # . · _ : indent skip nest
Markdown Yes Yes Yes No No No No No No No No 0–3 1–3 indent
MediaWiki Yes No No No No No No No No No No 0 1+ repeat

Languages differ on whether they support optional or mandatory digits in numbered list items, which kinds of enumerators they understand (e.g. decimal digit 1, roman numerals i or I, alphabetic letters a or A) and whether they support to keep explicit values in the output format. Some Markdown dialects, for instance, will honor a start value other than 1, but ignore any other explicit value.

Ordered, enumerated list items
Chars: #1 1. 1) 1] 1} (1) [1] {1} indent skip nest
Markdown No 1 1 No No No No No 0–3 1–3 indent
MediaWiki # No No No No No No No 0 1+ repeat

Slack assists the user in entering enumerated and bullet lists, but does not actually format them as such, i.e. it just includes a leading digit followed by a period and a space or a bullet character in front of a line.


See also[edit]

References[edit]

  1. ^ "AsciiDoc ChangeLog". Retrieved 2017-02-24. 
  2. ^ "WikiCreole Versions". Retrieved 2017-02-24. 
  3. ^ "Markdown". Aaron Swartz: The Weblog. 2004-03-19. 
  4. ^ "Daring Fireball: Markdown". Archived from the original on 2004-04-02. Retrieved 2014-04-25. 
  5. ^ "PHP Markdown Extra". Michelf.com. Retrieved 2013-10-08. 
  6. ^ "MediaWiki history". Retrieved 2017-02-24. 
  7. ^ a b c Pandoc, which is written in Haskell, parses Markdown (in two forms) and ReStructuredText, as well as HTML and LaTeX; it writes from any of these formats to HTML, RTF, LaTeX, ConTeXt, OpenDocument, EPUB and several other formats, including (via LaTeX) PDF.
  8. ^ "Org mode for Emacs – Your Life in Plain Text". orgmode.org. OrgMode team. Retrieved 2016-12-09. 
  9. ^ "PmWiki Cookbook - Export addons". Retrieved 7 January 2018. 
  10. ^ "An Introduction to reStructuredText". Retrieved 2017-02-24. 
  11. ^ "Textism › Tools › Textile". textism.com. Archived from the original on 26 December 2002. 
  12. ^ "What is Texy". Retrieved 2017-02-24. 
  13. ^ "Html2wiki txt2tags module". cpan.org. Retrieved 2014-01-30. 
  14. ^ "Txt2tags User Guide". Txt2tags.org. Retrieved 2017-02-24. 
  15. ^ "txt2tags changelog". Retrieved 2017-02-24. 
  16. ^ "Slack Help Center > Using Slack > Send messages > Format your messages". Retrieved 2018-08-07. 
  17. ^ "Slack API documentation: Basic message formatting". Retrieved 2018-08-07. 
  18. ^ "WhatsApp FAQ: Formatting your messages". Retrieved 2017-11-21. 
  19. ^ "Txt2tags User Guide". Txt2tags.org. Retrieved 2017-02-24. 
  20. ^ "Converters". WikiCreole. Retrieved 2013-10-08. 
  21. ^ pegdown : A Java library for Markdown processing
  22. ^ a b gfms : Github Flavored Markdown Server
  23. ^ a b marked : A full-featured markdown parser and compiler, written in JavaScript. Built for speed.
  24. ^ a b node-gfm : GitHub flavored markdown to html converter
  25. ^ Parsedown : Markdown parser written in PHP
  26. ^ a b Ciconia : Markdown parser written in PHP
  27. ^ a b Grip : GitHub Readme Instant Preview
  28. ^ github-markdown : Self-contained Markdown parser for GitHub
  29. ^ peg-markdown is an implementation of markdown in C.
  30. ^ Discount is also an implementation of markdown in C.
  31. ^ "Python-Markdown". Github.com. Retrieved 2013-10-08. 
  32. ^ Bruce Williams <http://codefluency.com>, for Ruby Central <http://rubycentral.org>. "kramdown: Project Info". RubyForge. Archived from the original on 2013-08-07. Retrieved 2013-10-08. 
  33. ^ a b "Via ox-pandoc and pandoc itself". 
  34. ^ Atlassian. "Confluence 4.0 Editor - What's Changed for Wiki Markup Users (Confluence Wiki Markup is dead)". Retrieved 2018-03-28. 
  35. ^ Docutils is an implementation of ReStructuredText in Python
  36. ^ Sphinx is an implementation of ReStructuredText in Python and Docutils with a number of output format Builders
  37. ^ Aurelio Jargas www.aurelio.net (2012-01-11). "txt2tags". txt2tags. Retrieved 2013-10-08. 
  38. ^ "txt2tags.class.php - online convertor [sic]". Txt2tags.org. Retrieved 2013-10-08. 
  39. ^ "Markdown Syntax". Daringfireball.net. Retrieved 2013-10-08. 
  40. ^ a b Textile Syntax Archived 2010-08-12 at the Wayback Machine.
  41. ^ "atx, the true structured text format" by Aaron Swartz (2002)
  42. ^ "using org-adapt-indentation". 
  43. ^ "using org-indent-mode or org-indent". 

External links[edit]