What does in XML mean?

article-page-main_ehow_images_a04_s0_bp_use-cdata-xml-800x800CDATA stands for Character Data and it means that the data in between these tags includes data thatcould be interpreted as XML markup, but should not be.

The key differences between CDATA and comments are:

  • CDATA is still part of the document, while a comment is not.
  • In CDATA you cannot include the string ]]> (CDEnd), while in a comment -- is invalid.
  • Parameter Entity references are not recognized inside of comments.

This means given these three snippets of XML from one well-formed document:

<!ENTITY MyParamEntity "Has been expanded">

Within this comment I can use ]]>
and other reserved characters like <
&, ', and ", but %MyParamEntity; will not be expanded
(if I retrieve the text of this node it will contain
%MyParamEntity; and not "Has been expanded")
and I can't place two dashes next to each other.

Within this Character Data block I can
use double dashes as much as I want (along with <, &, ', and ")
*and* %MyParamEntity; will be expanded to the text
"Has been expanded" ... however, I can't use
the CEND sequence (if I need to use it I must escape one of the
brackets or the greater-than sign).

Why does it look so weird?

The CDATA section is a marked section. In SGML there is both an abstract syntax as well as a concrete syntax. The abstract syntax of a marked section declaration begins with a markup declaration open(mdo) delimiter followed by a declaration subset open (dso) delimiter. A status keyword comes next followed by a second declaration subset open (dso) delimiter. A marked section ends with a marked section close (msc) delimiter followed by a markup declaration close (mdc) delimiter. Therefore the abstract syntax of a marked section declaration is:

mdo dso status-keyword dso my-data msc mdc

concrete syntax is defined for each document. This syntax is specified within the SGML declaration associated with each document. The concrete syntax defines the delimiters to be used for the document. The default SGML delimiters, which I assume are defined in ISO 8879:1986, are as follows:

  • Markup declaration open: <!
  • Declaration subset open: [
  • Marked section close: ]]
  • Markup declaration close: >

But you are free to define your own concrete syntax and so can modify the characters used as the delimiters.

Therefore the default concrete syntax of a marked section declaration is:

<![ status-keyword [my-data]]>

Possible status-keywords are: CDATA, RCDATA, IGNORE, INCLUDE, TEMP

Which brings us to:

<![ CDATA [my-data]]>

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s