July 11, Intro to XML. 1. Why XML evolved. SGML (Standard Generalized Markup Language) for defining and representing structured documents. XML. 1. Introduction to XML. XML was designed to describe data. HTML was designed to display data. What You Should Already Know. Before you continue you. Introduction to XML. Lesson 1: Int roduct ion t o XML. How to Learn Using O'Reilly School of Technology Courses. Setting XML Mode. What is XML? A History of.
|Language:||English, Spanish, Hindi|
|Genre:||Business & Career|
|Distribution:||Free* [*Register to download]|
SGML. In the present chapter we introduce informally the basic concepts underlying such XML is an extensible markup language used for the description of. XML stands for Extensible Markup Language and is a text-based markup language derived from Standard Generalized Markup Language (SGML). This tutorial. This introduction to XML presents the Extensible Markup Language at a In addition to covering the XML Specification, this article outlines related XML.
XML has come into common use for the interchange of data over the Internet. This is not an exhaustive list of all the constructs that appear in XML; it provides an introduction to the key constructs most often encountered in day-to-day use.
Character An XML document is a string of characters. Almost every legal Unicode character may appear in an XML document. Processor and application The processor analyzes the markup and passes structured information to an application.
The specification places requirements on what an XML processor must do and not do, but the application is outside its scope.
The processor as the specification calls it is often referred to colloquially as an XML parser. Markup and content The characters making up an XML document are divided into markup and content, which may be distinguished by the application of simple syntactic rules. Strings of characters that are not markup are content. In addition, whitespace before and after the outermost element is classified as markup. You could define xmlns: The only thing that's important about the namespace string is that it's unique; that's why most namespace definitions look like URLs.
The XML parser does not go to http: It's confusing, but that's how namespaces work. So far in this tutorial you've learned about the basic rules of XML documents; that's all well and good, but you need to define the elements you're going to use to represent data.
You'll learn about two ways of doing that in this section. The next couple of sections look at fragments of DTDs. This DTD defines all of the elements used in the sample document. It defines three basic things:. Although the DTD is pretty simple, it makes it clear what combinations of elements are legal.
Here are some examples, along with their meanings:. All of the elements are required.
The comma indicates a list of items. The question mark indicates that an item is optional; it can appear once or not at all.
The plus sign indicates that an item must appear at least once, but can appear any number of times. The asterisk indicates that an item can appear any number of times, including zero. Vertical bars indicate a list of choices; you can choose only one item from the list. Also notice that this example uses parentheses to group certain elements, and it uses a question mark against the group. Before going on, a quick note about designing XML document types for flexibility.
Consider the sample name and address document type; I clearly wrote it with U. If you want a DTD or schema that defines rules for other types of addresses, you would have to add a lot more complexity to it. Finally, be aware that in many parts of the world, concepts like title, first name, and last name don't make sense.
The bottom line: If you're going to define the structure of an XML document, you should put as much forethought into your DTD or schema as you would if you were designing a database schema or a data structure in an application.
The more future requirements you can foresee, the easier and cheaper it will be for you to implement them later. This introductory tutorial doesn't go into great detail about how DTDs work, but there's one more basic topic to cover here: You can define attributes for the elements that will appear in your XML document. Using a DTD, you can also:.
Here's how to do that:. Finally, DTDs allow you to define default values for attributes and enumerate all of the valid values for an attribute:.
Thus, you can do a very limited form of data validation. They have several advantages over DTDs:. It adds two constraints: Although the schema is much longer than the DTD, it expresses more clearly what a valid document looks like. Here's the schema:.
Here's an example:. Most of the elements contain text; defining them is simple. You merely declare the new element, and give it a datatype of xsd: The sample schema defines constraints for the content of two elements: This summary only scratches the surface of what XML schemas can do; there are entire books written on the subject. For the purpose of this introduction, suffice to say that XML schemas are a very powerful and flexible way to describe what a valid XML document looks like. This section takes a look at a variety of programming interfaces for XML.
These interfaces give developers a consistent interface for working with XML documents. There are many APIs available; this section looks at four of the most popular and generally useful ones: The parser reads in the entire document and builds an in-memory tree, so your code can then use the DOM interfaces to manipulate the tree.
You can move through the tree to see what the original document contained, you can delete sections of the tree, you can rearrange the tree, add new branches, and so on. The DOM provides a rich set of functions that you can use to interpret and manipulate an XML document, but those functions come at a price.
The remainder of this section discusses why you might want to use one interface or the other. The main feature of JDOM is that it greatly reduces the amount of code you have to write. Although this introductory tutorial doesn't discuss programming topics in depth, JDOM applications are typically one-third as long as DOM applications, and about half as long as SAX applications. DOM purists, of course, suggest that learning and using the DOM is good discipline that will pay off in the long run.
JDOM doesn't do everything, but for most of the parsing you want to do, it's probably just the thing. There are also methods that allow you to control whether the underlying parser is namespace-aware and whether it uses a DTD or schema to validate the XML document.
To determine which programming interface is right for you, you need to understand the design points of all of the interfaces, and you need to understand what your application needs to do with the XML documents you're going to process.
Consider these questions to help you find the right approach. A variety of standards exist in the XML universe. In addition to the base XML standard, other standards define schemas, style sheets, links, Web services, security, and other important items. This section covers the most popular standards for XML, and points you to references to find other standards.
This spec, located at w3.
All of the XML document rules discussed earlier in this tutorial are defined here. You can find the namespaces standard at the W3C as well: This tutorial discussed schemas briefly in Defining document content ; if you want the complete details on all the things you can do with XML schemas, the primer is the best place to start. The Extensible Stylesheet Language, XSL, defines a set of elements called formatting objects that describe how data should be formatted.
Although it's primarily designed for generating high-quality printable documents, you can also use formatting objects to generate audio files from XML. The standard is at w3. XPath is defined at w3. You can find the complete SAX specification at www.
At the JDOM site, you can find code, sample programs, and other tools to help you get started. Their wide acceptance is a tribute to the active participation of XML developers worldwide. By transferring only necessary leaf-documents instead of an entire original document and reducing the amount of data to be transferred, it will be possible to eliminate the annoying latency of loading WWW content through a narrow bandwidth.
It becomes possible to browse a huge document on a device with relatively small memory, by processing only the data of some of the leaf-documents at a time. It is possible to use the same Navigation for devices with different architectures, e. In the case of a client device with CompactHTML browser, each leaf-document may corresponds to a file. In the case of a client device with WML browser, each leaf-document may correspond to a card, and a group of several cards correspond to a deck, or file.
Using XSLT which provides a function to manipulate the document tree structure, as well as some useful extensions e. However, the objective of Navigation is different from Style as explained by the analogy of a play as follows: Navigation is a scenario to explore a document. The scene, or a leaf-document to be displayed, will shift upon the user's interaction. The story, or the order may change according to the user's interaction.
Style is an arrangement and decoration of the scene, i. For example, a document may be displayed in a table format so that users can look through the content at a glance on PCs. The same document, on small devices, may be divided into fragments and these linked to each other so that users can browse the document by following the links navigation , and may be formatted as a list instead of a table so that it will be legible style.
Separating Navigation from Style will improve the legibility of both Style and Navigation, otherwise it will be hard to tell the flow or view of a document from the complicated and tangled description of stylesheet.
It will also improve the re-usability of Style and Navigation. For example, when there are two types of client devices with the same screen size but different color depth, it will be possible to use the same Navigation to process a document for two types of devices, while it may be necessary to use two different kinds of Style. This note strongly suggests developing an independent language for Navigation. However, it may be possible to merge the Navigation function into Style.
Even in that case, it should make it possible to specify the Navigation of a document in a non-programmatic way. It consists of multiple blocks, each of which is a set of instructions to create leaf-documents of a specific type.