8.5. PEAR
In some cases, none of the previous techniques may be appropriate. For example, the DOM XML extension might not be available, or you might want to parse something very specific and don't want to build a parser yourself. PEAR contains classes that deal with parsing XML, which might be useful. We'll cover two of them: XML_Tree and XML_RSS. XML_Tree is useful for building XML documents through a tree when the DOM XML extension is not available or when you want to build a document fast without too many features. XML_RSScan parse RSS files. RSS files are XML documents describing the last few items of (for example) a news site.
8.5.1. XML_Tree
Building an XML document with XML_Tree is quite easy, and can be done when the DOM XML extension is not available. You can install this PEAR class by typing pear install XML_Tree at your command prompt. To show you the difference between XML_Trees and the "normal" DOM XML method, we're going to build the same X(HT)ML document again.
<?php
require_once 'XML/Tree.php';
/* Create the document and the root node */
$dom = new XML_Tree;
$html =& $dom->addRoot('html', '',
array (
'xmlns' => 'http://www.w3.org/1999/xhtml',
'xml:lang' => 'en',
'lang' => 'en'
)
);
/* Create head and title elements */
$head =& $html->addChild('head');
$title =& $head->addChild('title', 'XML Example');
/* Create the body and p elements */
$body =& $html->addChild('body', '', array ('background' => 'bg.png'));
$p =& $body->addChild('p');
/* Add the "Moved to" */
$p->addChild(NULL, "Moved to ");
/* Add the a */
$p->addChild('a', 'example.org', array ('href' => 'http://example.org'));
/* Add the ".", br and "foo & bar" */
$p->addChild(NULL, ".");
$p->addChild('br');
$p->addChild(NULL, "foo & bar");
/* Dump the representation */
$dom->dump();
?>
As you can see, it's much easier to add an element with attributes and (simple) content with XML_Tree. For example, look at the following line that adds the a element to the p element:
$p->addChild('a', 'example.org', array ('href' => 'http://example.org'));
Instead of four method calls, you can add it with a one liner. Of course, the DOM XML extension has many more features than XML_Tree, but for simple tasks, we recommend this excellent PEAR Class.
8.5.2. XML_RSS
RSS (RDF Site Summary, Really Simple Syndication) feeds are a common use of XML. RSS is an XML vocabulary to describe news items, which can then be integrated (also called content syndication) into your own web site. PHP.net has an RSS feed with the latest news items at http://www.php.net/news.rss. You can find the dry specs of the RSS specification at http://web.resource.org/rss/1.0/spec, but it's much better to see an example. Here is part of the RSS file we're going to parse:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
>
<channel rdf:about="http://www.php.net/">
<title>PHP: Hypertext Preprocessor</title>
<link>http://www.php.net/</link>
<description>The PHP scripting language web site</description>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://qa.php.net/" />
<rdf:li rdf:resource="http://php.net/downloads.php" />
</rdf:Seq>
</items>
</channel>
<!-- RSS-Items -->
<item rdf:about="http://qa.php.net/">
<title>PHP 4.3.5RC1 released!</title>
<link>http://qa.php.net/</link>
<description>PHP 4.3.5RC1 has been released for testing. This is
the first release candidate and should have a very low number
of problems and/or bugs. Nevertheless, please download and test
it as much as possible on real-life applications to uncover any
remaining issues. List of changes can be found in the NEWS
file.</description>
<dc:date>2004-01-12</dc:date>
</item>
<item rdf:about="http://www.php.net/downloads.php">
<title>PHP 5.0 Beta 3 released!</title>
<link>http://www.php.net/downloads.php</link>
<description>PHP 5.0 Beta 3 has been released. The third beta of
PHP is also scheduled to be the last one (barring unexpected
surprises). This beta incorporates dozens of bug fixes since
Beta 2, better XML support and many other improvements, some
of which are documented in the ChangeLog. Some of the key
features of PHP 5 include: PHP 5 features the Zend Engine 2.
XML support has been completely redone in PHP 5, all
extensions are now focused around the excellent libxml2
library (http://www.xmlsoft.org/). SQLite has been bundled
with PHP. For more information on SQLite, please visit their
website. A new SimpleXML extension for easily accessing and
manipulating XML as PHP objects. It can also interface with
the DOM extension and vice-versa. Streams have been greatly
improved, including the ability to access low-level socket
operations on streams.<description> <dc:date>2003-12-21<
dc:date>
</item>
<!-- / RSS-Items PHP/RSS -->
</rdf:RDF>
This RSS files consists of two parts: the header, describing the site from which the content is syndicated, and a list of available items. The second part consists of the news items. We don't want to refetch the RSS file from http://php.net every time a user visits a page that displays this information. Thus, we're going to add some caching. Downloading the file once a day should be sufficient because news isn't updated more often than daily. (On php.net, other sites might have different policies.)
We're going to use the PEAR::XML_RSS class that we installed with pear install XML_RSS. Here is the script:
<?php
require_once "XML/RSS.php";
$cache_file = "/tmp/php.net.rss";
First, as shown previously, we include the PEAR class and define the location of our cache file:
if (!file_exists($cache_file) ||
(filemtime($cache_file) < time() - 86400))
{
copy("http://www.php.net/news.rss", $cache_file);
}
Next, we check whether the file has been cached before and whether the cache file is too old (86,400 seconds is one day). If it doesn't exist or is too old, we download a new copy from php.net and store it in the cache file:
$r =& new XML_RSS($cache_file);
$r->parse();
We instantiate the XML_RSS class, passing our RSS file, and call the parse() method. This method parses the RSS file into a structure that can be fetched by other methods, such as getChannelInfo() that returns an array containing the title, description, and link of the web site, as shown here:
array(3) {
["title"]=>
string(27) "PHP: Hypertext Preprocessor"
["link"]=>
string(19) "http://www.php.net/"
["description"]=>
string(35) "The PHP scripting language web site"
}
getItems() returns the title, description, and link of the news item. In the following code, we use the getItems() method to loop over all items and display them:
foreach ($r->getItems() as $value) {
echo strtoupper($value['title']). "\n";
echo wordwrap($value['description']). "\n";
echo "\t{$value['link']}\n\n";
}
?>
When you run the script, you will see that it outputs the news items from the RSS file:
PHP 4.3.5RC1 RELEASED!
PHP 4.3.5RC1 has been released for testing. This is the first release
candidate and should have a very low number of problems and/or bugs.
Nevertheless, please download and test it as much as possible on real-life
applications to uncover any remaining issues. List of changes can be found
in the NEWS file.
http://qa.php.net/
PHP 5.0 BETA 3 RELEASED!
PHP 5.0 Beta 3 has been released. The third beta of PHP is also
scheduled to be the last one (barring unexpected surprises). This
beta incorporates dozens of bug fixes since Beta 2, better XML
support and many other improvements, some of which are documented in
the ChangeLog. Some of the key features of PHP 5 include: PHP 5
features the Zend Engine 2. XML support has been completely redone in
PHP 5, all extensions are now focused around the excellent libxml2
library (http://www.xmlsoft.org/). SQLite has been bundled with PHP.
For more information on SQLite, please visit their website. A new
SimpleXML extension for easily accessing and manipulating XML as PHP
objects. It can also interface with the DOM extension and vice-versa.
Streams have been greatly improved, including the ability to access
low-level socket operations on streams.
http://www.php.net/downloads.php
|