Thursday, August 23, 2007

XML is a Markup Language

Duh John... Why waste your time blogging the obvious?
XML is a wonderful tool, but tools are designed for specific purposes. You may be able to use them for other tasks (I've been know to use a cordless drill to pound in a nail), but tools are most useful when applied to the tasks for which they were designed (Don't get me started on Swiss Army Knives).



What Markup Languages are good at:

Markup languages were originally created to add formatting information to raw text.

Markup languages let you take a raw text string such as:

The quick brown fox jumped over the lazy dog.

and render it as something much more pleasing to the eye such as:

The quick brown fox jumped over the lazy dog.

Markup languages are really quite simple to understand. They let you delineate sections of text and specify the attributes that should be applied to each section of text. That's pretty much all there is to it.

HTML and XML are two of the best known markup languages. In HTML and XML sections of text are delineated by start tags <> and end tags </>. In HTML the set of tags is pre-defined (although they seem to add new tags on occasion). In XML you define your own tags. In both HTML and XML most tags can have multiple attributes... for example a <FONT> tag can specify the font style, weight, color, etc.

Programs that render text that is marked up with HTML, such as your favorite Web Browser, generally ignore markup tags and attributes that they don't "know". This behavior goes back to the original purpose for markup languages: Use markup to change the way a string of text is displayed (or printed).

The raw text in an HTML document generally makes sense by itself. Markup changes the way that text is presented... but it does not change the words. Better to display the words without formatting them properly than to alter the words.

Programs that deal with XML do not have a hard and fast rule on how to deal with markup tags and attributes that they don't "know". The raw text in an XML document does not necessarilly make sense by itself. XML is often used to add meaning to a section of text... For example, a tag called <ADDRESS> probably has nothing to do with formatting text... it probably defines a block of text that should be "used" as an address by the reader of the document.

For this reason, if the meaning of an XML markup tag or attribute is not "known", then the entire section of text should often be ignored.

XML has a "standard" mechanism for defining the rules that a specific XML document should follow (XSD: XML Schema Definition). For example, some tags should only be used in sections of text that are delineated by another tag. This feature is very helpful if the format of the XML document is very important to your application.

That's pretty much all there is to say about markup languages in general, and XML in specific.

If you need to add formatting information or meaning to raw text, then a markup language is a great tool. If the markup tags are likely to change over time, then XML is a great tool.

What Markup Languages are not good at:

Markup languages are not particularly good programming languages. Markup languages are great when you need a lot of adjectives. Unfortunately, when writing a program you need a lot of verbs:

If Bob Is Late Then Fire Bob.

<IF> <PERSON> Bob <ARRIVAL> Late </ARRIVAL> </PERSON> <THEN> <PERSON> Bob <ACTION> Fire </ACTION> </PERSON> </THEN> </IF>

Yuck!

People do insist on designing XML-ish programming languages (like BPEL ), but it's really not a good fit.

Markup languages are also not a particularly good choice if you need to efficiently transmit or parse documents that are highly structured and the structure never (almost never) changes. In many cases, the tags in a marked-up document can take up more space than the raw text in the document. If the structure of the text never (almost never) changes, then it's much more efficient (from a programming standpoint) to have a "map" or "index" for the document rather than to embed markup tags in the document.

It's very efficient for a program to parse a document where you "know" that an "address" is a;ways 40 characters long and always starts at character number 5:

Name starts a character 1

Address starts at character 5

Phone starts at character 46

Bob,4001 First Street Culver City California,5124655567

It's less efficient for a program to scan through a document looking for an <ADDRESS> tag:


<NAME>Bob</NAME><ADDRESS>4001 First Street Culver City California</ADDRESS><PHONE>5124655567</PHONE>

If the format of your data never (almost never) changes, and you really, really need to be able to efficiently transmit and parse the data, then a markup language is not your friend.

XML is a Markup Language:

If a Markup Language makes sense in your application, then XML is a great tool.

'Nuff said.

No comments:

Post a Comment