1: If you don't need it, throw it away

One thing that many architects don't initially get about XML is that it's just a way to represent information. There's nothing magical about an XML document: It just shows how various pieces of information relate to one another. When you receive a document from an external source that has information you know you'll never use (such as internal reference numbers from that source that have no bearing on your system), toss them! Use an XSLT style sheet or some other mechanism to filter out the information you want to keep and drop the information you don't want. Remember, it's always going to be more efficient to filter the data once (as it comes into your system) than every time you need to access it. Similarly, if you receive information about seven million customers in one monster document, but the information would be more useful to you in separate documents, break it into one document per customer. After all, if you received a fixed-width file from a mainframe system, you almost certainly wouldn't keep it around in that form because it wouldn't be particularly useful. Don't be afraid to dissect, reorganize, or otherwise modify XML documents to suit your needs.

2: Don't use XML for searching

XML documents (by themselves) are not well suited to being searched. Because they're just flat text, any of XML's native searching mechanisms (such as XPath) must parse the entire document (or documents) to locate the piece (or pieces) you're interested in. If you're trying to work with that single document with information about seven million customers, searching would be extremely inefficient. If you break the document up into smaller documents -- say, one per customer -- the problem still occurs: To find the particular customer you're looking for, you need to parse each document until you find the appropriate one. The only good solution for searching XML documents is to introduce some sort of indexing mechanism -- either a relational database index or some sort of native XML indexing tool -- that significantly reduces the amount of information that has to be processed to locate the document (or document fragment) you're interested in. When you have data-oriented information (as opposed to text-oriented information such as a book manuscript), a relational database is well suited for this task, and it provides other benefits, as you'll see in the next tip.

3: Don't use XML for summarization

Summarizing information stored in XML documents is also very inefficient. The native language provided by XPath contains only the bare minimum of aggregation functionality, and even this is not easily usable if the information you want to summarize is found in more than one document. Also, summarization presents the same problem as searching: Each document must be parsed to discover and extract the information being summarized. Again, I recommend indexing the information, thus reducing the amount of information to sift to discover the pieces that are being operated on. Alternatively, you could generate an additional document that contains summary information as detail XML documents are introduced into the system. However, that would not allow you to do ad hoc summarization, and it can be a bit of a management chore. For the best flexibility for summarization tasks, a relational database is really the only good choice; most off-the-shelf XML indexers do not expose the indexes themselves for direct programmatic manipulation.

4: Use XML to drive rendering

One real power of XML lies in its ability (via XSLT) to render its contents to various other forms. This is especially crucial if your system needs to support various means of data consumption -- through an HTML interface such as a desktop Web browser, through a portable device using WML, or to a data-transfer standard agreed upon by your industry. Relational data can drive rendering, too, but it's not as good at the job. Each possible rendering requires significant coding time. Also, if a request is received to render a piece of information that you have stored as an atomic XML document (such as a single customer), you can do so without touching the indexing system, which frees up cycles on that system to support the searching and summarization of the data as necessary.



Keywords:Tips of XML for Data,XML , XML document,XSLT style sheet , mechanism , filter,customers , monster document,XML indexing tool, benefits,XML indexers,desktop Web browser, WML, data