============================================================ Xerces Perl: The Perl API to the Apache Xerces XML parser ============================================================ $Id: README,v 1.22 2002/12/09 21:11:23 jasons Exp $ LEGAL HOOP JUMPING: =================== This code is distributed under the terms of the Apache Software License, Version 1.1. See the file LICENSE for details 1) Current Release: XML::Xerces 2.3.0-0 ======================== XML::Xerces is the Perl API to the Apache project's Xerces XML parser. It is implemented using the Xerces C++ API, and it provides access to *most* of the C++ API from Perl. Because it is based on Xerces-C, XML::Xerces provides a validating XML parser that makes it easy to give your application the ability to read and write XML data. A shared library is provided for parsing, generating, manipulating, and validating XML documents. XML::Xerces is faithful to the XML 1.0 recommendation and associated standards (DOM 1.0, DOM 2.0. SAX 1.0, SAX 2.0, Namespaces, and Schema). The parser provides high performance, modularity, and scalability. It also provides full support for Unicode. XML::Xerces implements the vast majority of the Xerces-C API (if you notice any discrepancies please mail the list <URL: mailto:xerces-p-dev@xml.apache.org> ). The exception of this are some functions in the C++ API which have been overloaded to accept different arguments may currently have only a single version in the Perl API. This is a simple fix and most of the overloaded functions are finished, but will take time to catch them all. Also, there are some functions in the C++ API which either have better Perl counterparts (such as file I/O) or which manipulate internal C++ information that has no role in the Perl module. The majority of the API is created automatically using Simplified Wrapper Interface Generator (SWIG) <URL: http://www.swig.org/> . However, care has been taken to make most method invocations natural to perl programmers, so a number of rough C++ edges have been smoothed over (See the Special Perl API Features section). 2) Available Platforms ======================== The code has been tested on the following Unix platforms: * Linux * BSD * Solaris An early version of XML::Xerces (1.3.3) was ported to Windows. A port is underway, and once completed, Windows will become a fully supported platform. 3) Build Requirements ======================== 3.1) ANSI C++ compiler ------------------------ Builds are known to work with the GNU compiler. Ports to other compilers such as MSVC++ (the Microsoft Visual C++ compiler and development environment) are in the works. Contributions in this area are always welcome :-). 3.2) Perl5 ------------------------ #### NOTE: #### Required version: 5.6.0 ############### XML::Xerces now supports Unicode. Since Unicode support wasn't added to Perl until 5.6.0, you will need to upgrade in order to use this and future versions of XML::Xerces. Upgrading to at least to the latest stable release, 5.6.1, is recommended, but if you already have 5.6.0 installed it will work fine. If you plan on using Unicode, I *strongly* recommend upgrading to Perl-5.8.0, the latest stable version. There have been significant improvements to Perl's Unicode support. 3.3) The Apache Xerces C++ XML Parser ------------------------ #### NOTE: #### Required version: 2.3.0 ############### Which can be downloaded from the apache archive: http://xml.apache.org/dist/xerces-c/stable/ You'll need both the library and header files, and to set up any environment variables that will direct the XML::Xerces build to the directories where these reside. 4) Development Tools ======================== #### NOTE: #### These are only for internal XML::Xerces development. If your intention is solely to use XML::Xerces to write XML applications in Perl, you will *NOT* need these tools. ############### 4.1) SWIG ------------------------ Simplified Wrapper Interface Generator (SWIG) <URL: http://www.swig.org/> is an open source tool by David Beazley of the University of Chicago for automatically generating Perl wrappers for C and C++ libraries (i.e. *.a or *.so for UNIX, *.dll for Windoes). You can get the source from the SWIG home page <URL: http://www.swig.org/> and then build it for your platform. You will only need this if the include Xerces.C and XML::Xerces files do not work for your perl distribution. The pre-generated files have been created by SWIG 1.3 and work under perl-5.005 and perl-5.6. This port will only work with SWIG 1.3.17+. If your planning to use SWIG, you can set the environment variable SWIG to the full path to the SWIG executable before running perl Makefile.pl. For example: export SWIG=/usr/bin/swig This is only necessary if it isn't in your path or you have more than one version installed. 5) Prepare for the build ======================== 5.1) Download XML::Xerces ------------------------ Download the release and it's digital signature, from the apache Xerces-C archive <URL: http://xml.apache.org/dist/xerces-p/stable> . 5.2) Verify the archive ------------------------ Optionally verify the release using the supplied digital signature (see the apache Xerces-Perl archive <URL: http://xml.apache.org/xerces-p/download.html> for details) 5.3) Unpack the archive ------------------------ Unpack the archive in a directory of your choice. Example (for UNIX): * tar zxvf XML-Xerces-2.3.0-0.tar.gz * cd XML-Xerces-2.3.0-0 5.4) Examine Makefile.PL ------------------------ Examine the Perl script "Makefile.PL". You shouldn't need to change any of the information unless you are attempting to build on a platform other than UNIX, in which case, you will probably have to. 5.5) Getting Xerces-C ------------------------ If the Xerces-C library and header files are installed on your system directly, e.g. via an rpm or deb package, proceed to the build. Otherwise, you must download Xerces from xml.apache.org and build it. To build XML::Xerces in this case, make sure the value of your XERCESCROOT environment variable is the top-level directory of your xerces distribution (i.e. the same value it needs to be to build XERCES). If you have built Xerces-C yourself and want to work directly from the build directory, then you should only need to set the XERCESCROOT environment variable. If you have installed xerces on your system you should only need to set the XERCES_INCLUDE, XERCES_LIB, and XERCES_CONFIG environment variables. For example: export XERCES_INCLUDE=/usr/include/xerces export XERCES_LIB=/usr/lib export XERCES_CONFIG=/home/jasons/build/xerces-c-2.3.0/config.status 6) Build XML::Xerces ======================== A) Go to the XML-Xerces-2.3.0-0 directory. B) Build XML::Xerces as you would any perl package that you might get from CPAN: * perl Makefile.PL * make * make test * make install 7) Using XML::Xerces ======================== XML::Xerces implements the vast majority of the Xerces-C API (if you notice any discrepancies please mail the list). Documentation for this API are sadly not available in POD format, but the Xerces-C html documentation is available online <URL: http://xml.apache.org/xerces-c/apiDocs/index.html> . For more information, see the sample scripts: DOMCount.pl, DOMCreate.pl, and DOMPrint.pl in the samples/ directory, or the test scripts located in the t/ directory (especially the TestUtils.pm module). 8) Special Perl API Features ======================== Even though XML::Xerces is based on the C++ API, it has been modified in a few ways to make it more accessible to typical Perl usage, primarily in the handling: * String I/O (Perl strings versus XMLch arrays) * List I/O (Perl lists versus DOM_NodeList's) * Hash I/O (Perl hashes versus DOM_NamedNodeMap's) * DOM Serialization API * Implementing Perl handlers for C++ event callbacks * handling C++ exceptions 8.1) String I/O ------------------------ Any functions in the C++ API that return XMLCh arrays will return plain vanilla perl-strings in XML::Xerces. This obviates calls to transcode (in fact, it makes them entirely invalid). 8.2) List I/O ------------------------ Any function that in the C++ API returns a DOM_NodeList (e.g. getChildNodes() and getElementsByTagName() for example) will return different types if they are called in a list context or a scalar context. In a scalar context, these functions return a reference to a XML::Xerces::DOM_NodeList, just like in C++ API. However, in a list context they will return a Perl list of XML::Xerces::DOM_Node references. For example: # returns a reference to a XML::Xerces::DOM_NodeList my $node_list_ref = $doc->getElementsByTagName('foo'); # returns a list of XML::Xerces::DOM_Node's my @node_list = $doc->getElementsByTagName('foo'); 8.3) Hash I/O ------------------------ Any function that in the C++ API returns a DOM_NamedNodeMap (getEntities() and getAttributes() for example) will return different types if they are called in a list context or a scalar context. In a scalar context, these functions return a reference to a XML::Xerces::DOM_NamedNodeMap, just like in C++ API. However, in a list context they will return a Perl hash. For example: # returns a reference to a XML::Xerces::DOM_NamedNodeMap my $attr_map_ref = $element_node->getAttributes(); # returns a hash of the attributes my %attrs = $element_node->getAttributes(); 8.4) Serialize API ------------------------ The XML::Xerces::DOMParse module implements a generic serializer API for DOM Trees. See the script DOMPrint.pl for an example of how to use the API. For less complex usage, just use the serialize() method defined for all DOM_Node subclasses. 8.5) Implementing {Document,Content,Error}Handlers from Perl ------------------------ Thanks to suggestions from Duncan Cameron, XML::Xerces now has a handler API that matches the currently used semantics of other Perl XML API's. There are three classes available for application writers: * PerlErrorHandler (SAX 1/2 and DOM 1) * PerlDocumentHandler (SAX 1) * PerlContentHandler (SAX 2) Using these classes is as simple as creating a perl subclass of the needed class, and redefining any needed methods. For example, to override the default fatal_error() method of the PerlErrorHandler class we can include this piece of code within our application: package MyErrorHandler; @ISA = qw(XML::Xerces::PerlErrorHandler); sub fatal_error {die "Oops, I got an error\n";} package main; my $dom = new XML::Xerces::DOMParser; $dom->setErrorHandler(MyErrorHandler->new()); 8.6) Handling exceptions ({XML,DOM,SAX}Exception's) ------------------------ Some errors occur outside parsing and are not caught by the parser's ErrorHandler. XML::Xerces provides a way for catching these errors using the PerlExceptionHandler class. There are a default methods that prints out an error message and calls die(), but if more is needed, see the files t/XMLException.t, t/SAXException.t, and t/DOMException.t for details on how to roll your own handler. 9) Sample Code ======================== XML::Xerces comes with a number of sample applications: * SAXCount.pl: Uses the SAX interface to output a count of the number of elements in an XML document * SAX2Count.pl: Uses the SAX2 interface to output a count of the number of elements in an XML document * DOMCount.pl: Uses the DOM interface to output a count of the number of elements in an XML document * DOMPrint.pl: Uses the DOM interface to output a pretty-printed version of an XML file to STDOUT * DOMCreate.pl: Creates a simple XML document using the DOM interface and writes it to STDOUT