Choice of Document Formats
About this Document
This is an overview of popular documentation file formats, which is derived from a reply I wrote to a thread in the Boston Perl Mongers mailing list.
Document Information
Licence
This document is Copyright by Shlomi Fish, 2006, and is available under the terms of the Creative Commons Attribution License (CC-by) 3.0 Unported (or at your option any later version of that licence).
For securing additional rights, please contact Shlomi Fish and see the explicit requirements that are being spelt from abiding by that licence.
TL;DR Summary
My preferred documentation formats are (in order):
Custom XML/etc. formats which I translate to DocBook 5 or XHTML5 using XSLT or similar.
Perl POD
AsciiDoc
Sometimes I use Template Toolkit or similar to generate them.
The Article Itself
A previous correspondent wrote:
Write it in POD?
I’m not aware of any POD based Wikis, but it doesn’t seem like it would be hard to merge the two approaches, with a “traditional” web-facing wiki front-end that stores things as a POD-like syntax on the back.
This way, you get the collaborative editing and there are already tools out there to convert the POD source to PDF etc.
I think Kwiki has a plugin for POD (Perl’s so-called “Plain, Old, Documentation”).
Just a note about POD: POD is incredibly limited. Some things that you may want to try to do with it are not possible. It is not the only generic format available, however. One option is naturally DocBook/XML, which can be translated into HTML ( both XHTML 1.x as well as XHTML 5 and HTML 5) as well as PDF, Microsoft Word, LaTeX, EPUB and other formats. It can be directly translated to plain text, either using docbookrx, or through an intermediate format. POD can be translated into DocBook/XML using Pod::PseudoPod::DocBook.
Don’t use the original module by Alligator Descartes which is the still the first result on CPAN out of being a Dead Camel. It is old and broken and has been unmaintained for a long time.
Note that the DocBook generated from the POD may not be perfectly semantically-correct due to the fact DocBook is richer than POD.
Other alternatives for such markups that are somewhat text-with-brief-style-specifiers can be found in this Linux-elitists thread called “mini-markup language?”. Prominent examples include AsciiDoc which can be converted to HTML, XHTML and DocBook/XML using Asciidoctor, and the many Markdown dialects, which can only be converted to XHTML (and suffer from fragmentation and incompatibilities).
They all can be converted to HTML and some of them to DocBook too. One Wiki or another is also an option, but note that they tend to have incompatible formats, and some may not have an ability to export as DocBook. I used to like the MediaWiki format which is an extension of that of UseModWiki (and its Oddmuse Wiki fork, which should be better.), but I think that DokuWiki’s format is also quite good. Note however that the MediaWiki format is very hard to parse reliably, and it used to only have one ad hoc and hacky parser written in PHP in the MediaWiki source. I really dislike the default Kwiki format, and despite all the flood of Kwiki plugins, no-one has written a UseModWiki/Oddmuse/MediaWiki-subset format for it yet. I keep intending to do that, but I have not found the time for this yet.
One recent note is that I grew to like the concept of ikiwiki and similar wikis which keep pages along with their history inside a standard version control systems for better reliability than MediaWiki's SQL-based storage or the ad-hoc versioning systems of older wikis.
You can also try to use XHTML 1.1 or XHTML 5 with semantic markup of elements for use as a good generic markup.
Why the Markdown Dialects Should be Avoided as much as Possible
There are too many Markdown dialects (e.g: GitHub's, reddit's, Stack Exchange's) each one with its own army and navy (= fragmentation and incompatibilities). Moreover, they can only be converted to XHTML.
LaTeX and TeX
All that put aside, I should note that if you are thinking about using TeX or LaTeX, please re-consider. Tex/LaTeX are very convenient for generating PostScript or PDF but:
The only thing that can understand TeX is tex
. I believe it was said much earlier than when Tom Christiansen ported that idiom to the Perl world. It is in fact much more true for TeX than it is for Perl.Conversion of LaTeX to DocBook or HTML often doesn’t work quite well. Often, the tools are outdated and generate old or invalid HTML, and often they break on more than complex LaTeX. TeX and LaTeX are Turing-complete, and the syntax is incredibly problematic.
LaTeX has poor support for hypertext, and other PDF niceties.
PDF and PostScript, which are the default-and-least-error-prone TeX output formats, have relatively poor accessibility and internationalisation. For example, from my understanding Bi-directional text (mixed Arabic-English text, etc.) is rendered visually.
It is easier to convert semantic XHTML or DocBook/XML to LaTeX than the other way around.
LaTeX is much less verbose than DocBook/XML, but I think you can find a better format. It is is still usable for writing texts with lots of mathematical formulae, but still a very problematic format. When working with LaTeX I often get obscure TeX errors that I can’t tell immediately what exactly went wrong. In DocBook/XML it just reports that one tag is missing, or that the order of tags are incorrect, which takes me much less time to solve.
One option to convert simple LaTeX to DocBook 5/XML or XHTML is TeX4ht which works reasonably well, and there is also MathJax.
typst
typst is a A new markup-based typesetting system that is powerful and easy to learn.
It is also FOSS (Apache-2.0), and received over 30,000 GitHub stars. However, I have no first-hand experience with it.
Note about Word Processors
One may be tempted to use word processors such as Microsoft Word or LibreOffice Writer, and they provide the XML-based standards of ODF and .docx. However, part of the problem with them is that their conversion to output formats tends to be non-semantic if not completely non-valid, and at least in the case of MS Word, have a huge problem with maintaining compatibility of documents created with newer or older versions. As I noted in a Reddit comment, word processors are the kind of programs that are created by smart people for people who are less smart than them, because smart programmers can use text editors and markup languages. I have written a mini-essay about “Selling for people stupider than you”.
Note that in the past, I preferred using WordPerfect, as a word processor, over Microsoft Word (and later LibreOffice Writer) due to its "reveal codes" feature (but WordPerfect was not ported to Microsoft Windows in time for it to be usable). That put aside, I found Excel, Visio, and to a lesser extent PowerPoint, to be useful and usable applications. Nevertheless, I avoid using their desktop versions due to lack of portability, a non-FOSS licence, and the availability of some decent FOSS alternatives.
More about POD
Going full circle now - POD is a good option if it does what you need. The Camel Book and some other perl books were written in POD. I wrote some documentation for Perl and non-Perl projects in POD. I also write some of my man pages in POD because nroff's syntax scares me.
But if you feel that you want something better, you have many options.
One final note is that DocBook/XML was problematic for using in bi-directional texts because of implementation or standard problems, last time I checked. Otherwise, its Unicode support should be very good.
See Also
Pandoc - converts between many formats although not always flawlessly.
docmake - a command line tool to translate DocBook/XML to its end formats.
weasyprint - a solution for converting HTML pages to PDFs, which does not require a headless browser engine and implements a subset of CSS.
docbookrx: (An early version of) a DocBook to AsciiDoc converter.
HTML::WikiConverter - convert from HTML to wikis/etc. markups.
Unixdoc - a vision for a unified documentation system.
Coverage and Comments
TODO: Fill in.