8.2. Vocabulary
When working with XML documents, you will encounter several terms that might be unfamiliar. The following example shows an XML document that is an XHTML document:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>XML Example</title>
</head>
<body background="bg.png">
<p>
Moved to <a href="http://example.org/">example.org</a>.
<br />
foo & bar
</p>
</body>
</html>
The first line is the XML declaration; it specifies the XML version and the XML file encoding. Notice that the line starts with <?. This combination of characters can cause a problem if you use this file as a PHP script. If you have the PHP setting short open tags enabled (the default), PHP sees the tag <? as the opening tag of a PHP section. If you work with XML in combination with PHP, change the short_open_tag setting in the php.ini file to Off.
After the XML declaration, you'll find the DOCTYPE declaration on three lines, enclosed by < and >. In this case, the DOCTYPE statement specifies that the root tag in the XML document is html, that the document type is PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN", and that a DTD (Document Type Definition) for this type of document can be found at http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd. A DTD file describes the structure of a document type. Validating parsers can use the DTD file to see whether the XML file being parsed is a valid XML file in relation to the given DTD. Not all parsers are validating parsers; some only care that the document is well-formed. A well-formed document conforms to the XML standard (for example, all elements in the document follow the XML specifications). A valid XML document conforms to the DTD associated with the document type, as well as to the XML specifications. To check whether an XHTML (and HTML) document type is valid according to the specified document type, you can use the validator available online at http://validator.w3.org.
The rest of the document consists of the content itself, starting with the root element (also called root node):
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
According to the XHTML 1.0 Transitional DTD, the root element (html) must contain an xmlns declaration for the XHTML namespace. A namespace provides a means of mixing two separate document types into one XML document, such as embedding MathML into XHTML.
The child elements of the root node follow:
<head>
<title>XML Example</title>
</head>
<body background="bg.png">
<p>
Moved to <a href="http://example.org/">example.org</a>.
<br />
foo & bar
</p>
</body>
The head tags (<head> and </head>) enclose the nested title tag that specify the title XML Example.
The body tag includes the background attribute. Attributes contain extra information about a specific tag. XML standards require all attributes to have a value. Values for attributes must be enclosed with single or double quotes. Using one quoting style throughout your document is recommended but not required. In this case, background specifies a background picture to be found in the file bg.png. Another correct attribute is <option selected="true"></option>. Specifying an option with the code <option selected></option> is incorrect by XML standards because the selected attribute has no value.
All opening tags, such as <p>, need a matching closing tag, such as </p>. For elements that have no content, you can merge the opening and closing tag. Instead of using <br></br> in your document, you can use <br/>. Because some browsers may have problems parsing <br/>, add a space before the /, so that the resulting tag is <br />.
Some special characters cause problems in XML documents. For example, < and > are used for tags, so if you use < or > in an XML document, the character is treated as a tag. Entities were developed to enable you to use special characters in your document without using confusing XML. Entities are character combinations, beginning with an ampersand (&) and ending with a semicolon (; ), that you can use in your document instead of special characters. The entity is recognized correctly and not treated as a special character. For instance, you can use < to represent < and > to represent >. When you use the entities, the characters are included in your document correctly and not treated as tags. Entities are also used to input non-ASCII characters into your XML file, for example, ë or . The entities for these two symbols are ë and €. For a fairly complete list of entities, see http://www.w3.org/TR/REC-html40/sgml/entities.html. If you want to use the & character itself, of course, you need to use an entity&, as shown in the example XML file.
|