Belgian Waffles $ Two of our famous . This tutorial will teach you the basics of XML. The tutorial is divided into sections such as. XML Basics, Advanced XML, and XML tools. Each of these sections. Basic XML Concepts. 3. „XML is the cure for your data exchange, information pdf">."/>
xml version="" encoding="UTF-8"?> Belgian Waffles $ Two of our famous . This tutorial will teach you the basics of XML. The tutorial is divided into sections such as. XML Basics, Advanced XML, and XML tools. Each of these sections. Basic XML Concepts. 3. „XML is the cure for your data exchange, information pdf">.
|Language:||English, Spanish, German|
|Distribution:||Free* [*Register to download]|
Parsing XML. A Basic XML Document. Differences Between XML and HTML. Common Mistakes. White Space. Closing Tags. Nesting Tags. Root Element. read the entire document at ruthenpress.info on the W3C Web .. There are three basic ways to tell a browser (specifically, Microsoft Internet Ex-. Before you continue you should have a basic understanding of the following: HTML is about displaying information, while XML is about carrying information.
You should see something similar to Figure 2. What happens if you need to transform your own XML document into an XML document that meets the needs of another organization or person?
Not to worry — XSLT will save the day! You see, Web browsers only supply collapsible tree formatting for XML documents without style sheets. XML documents that result from a style sheet transformation are displayed without any styling at all, or at best are treated as HTML — not at all the desired result.
There are several things that need to be added to your style sheet to signal to the browser that the document is more than a plain XML file, though. Here we have declared a default namespace for tags without prefixes in the style sheet. Next up, we can flesh out the output element to more fully describe the output document type: In addition to the method and indent attributes, we have specified a number of new attributes here: Internet Explorer for Windows displays XHTML documents in Quirks Mode when this declaration is present, so by omitting it we can ensure that this browser will display it in the more desirable Standards Compliance mode.
The rest of the style sheet is as it was for the HTML output example we saw above. Now, we need to identify exactly what we need for our news items, binary files, and Web copy.
We must also manage and track site administrators using XML. Compared to our article content type, news will be fairly straightforward. We will need to track these pieces of information:. The easiest way to keep track of copy is to treat each piece a little like an article. An XML document that tracks a piece of Web copy will look like this:. We will need to keep track of each administrator on the site, as these are the folks who can log in and make changes to advertisement copy, articles, news pieces, and binary files.
After that, you should have enough of a working knowledge of XML and its wacky family to really start development. In fact, in many contexts, consistency can be a very beautiful thing. Remember that XML allows you to create any kind of language you want.
In many cases, as long as you follow the rules of well-formedness, just about anything goes in XML. However, there will come a time when you need your XML document to follow some rules — to pass a validity test — and those times will require that your XML data be consistently formatted.
What we need is a way to enforce that kind of rule. In XML, there are two ways to set up consistency rules: A DTD document type definition is a tried and true if not old-fashioned way of achieving consistency. Each of these technologies contains lots of hidden nooks and crannies crammed with rules, exceptions, notations, and side stories.
Speaking of side stories, did you know that DTD actually stands for two things? It stands not just for document type definition, but also document type declaration.
The declaration consists of the lines of code that make up the definition. Just a warning before we start this chapter: As for the first question, many possible answers spring to mind:. Using a system to ensure consistency allows your XML documents to interact with all kinds of applications, contexts, and business systems — not just your own. The way DTDs work is relatively simple.
A DTD might look something like this:.
Those of you who are paying attention should have noticed some remarkable similarities between this DTD and the Letter to Mother example that we worked on in Chapter 2, XML in Practice.
In fact, if you look closely, each line of the DTD provides a clue as to how our letter should be structured. This is called an element declaration. You can declare elements in any order you want, but they must all be declared in the DTD.
A DTD element declaration consists of a tag name and a definition in parentheses. These parentheses can contain rules for any of:. In this case, we want the letter element to contain, in order, the elements to , from , and message. As you can see, the sequence of child elements is comma-delimited. In fact, to be more precise, the sequence not only specifies the order in which the elements should appear, but also, how many of each element should appear.
In this case, the element declaration specifies that one of each element must appear in the sequence.
If our file contained two from elements, for example, it would be as invalid as if it listed the message element before to. How will you do that? With a neat little system of notation, defined in Table 3. After the letter declaration, we see these three declarations: So whenever you see this notation in a DTD, you know that the element must contain only text.
This notation allows the paragraph element to contain any combination of plain text and b , i , u , and highpriority elements. Note that with mixed content like this, you have no control over the number or order of the elements that are used. What about elements such as the hr and br , which in HTML contain no content at all?
These are called empty elements, and are declared in a DTD as follows:. Remember attributes? An attribute declaration is structured differently than an element declaration. For one thing, we define it with!
Also, we must include in the declaration the name of the element that contains the attribute s , followed by a list of the attributes and their possible values. Basically, this attribute can contain any string of characters or numbers.
In DTD-speak, this means that the attribute is optional. Instead of allowing any arbitrary text, however, the DTD limits the values to either male or female. If, in our document, an actor element fails to contain a gender attribute, or contains a gender attribute with values other than male or female , then our document would be deemed invalid.
The actorid attribute has been designated an ID. In DTD-speak, an ID attribute must contain a unique value, which is handy for product codes, database keys, and other identifying factors. In our example, we want the actorid attribute to uniquely identify each actor in the list. The ID type set for the actorid attribute ensures that our XML document is valid if and only if a unique actorid is assigned to each actor.
Incidentally, if you want to declare an attribute that must contain a reference to a unique ID that is assigned to an element somewhere in the document, you can declare it with the IDREF attribute type.
An entity is a piece of XML code that can be used and reused in a document with an entity reference. There are different types of entities, including general, parameter, and external. General entities are basically used as substitutes for commonly-used segments of XML code. For example, here is an entity declaration that holds the copyright information for a company:. Parameter entities are both defined and referenced within DTDs.
What this says is that each of the elements paragraph , intro , sidebar , and note can contain regular text as well as b , i , u , citation , and dialog elements. Not only does the use of a parameter entity reduce typing, it also simplifies maintenance of the DTD.
External entities point to external information that can be copied into your XML document at runtime. For example, you could include a stock ticker, inventory list, or other file, using an external entity.
An external DTD is usually a file with a file extension of. First, you must edit the XML declaration to include the attribute. This will search for the letter. If the DTD lives on a Web server, you might point to that instead:. Finally, XML Schema provides very fine control over the kinds of data contained in an element or attribute. Now, for some major drawbacks: Most of the criticism aimed at XML Schema is focused on its complexity and length.
Okay, now you know a lot more about DTDs than you did before. The first thing you do is you take a look at the dozens of corporate memos you and your colleagues have received in the past few months. After a day or two of close examination, a pattern emerges. Although your first impulse might be to run out and create a sample XML memo document, please resist that urge for now.
Because these memos are internal to the company, and there may be a need for a separate external memo DOCTYPE, you decide to use internalmemo as your root element name:.
The first element — the root element — is internalmemo. This element will contain all the other elements, which hold date, sender, recipient, subject line, and all other information. Because these represent a lot of elements, it would be useful to split your document into two logical partitions: The header will contain recipient, subject line, date, and other information.
The body will contain the actual text of the memo. In DTD syntax, the above declaration states that our internalmemo element must contain one header element and one body element. Next, we will indicate which elements these will contain. In DTD syntax, the above declaration states that the header element must contain single date , sender , and recipients elements, an optional blind-recipients element, and then a subject element.
In DTD syntax, the above declaration states that the body element must contain one or more para elements, followed by a single sig element. Most of the other elements will contain plain text, except the para elements, in which we will allow bold and italic text formatting. That was simple enough. Those pieces of information are hardly ever displayed on a document — they are used only for administrative purposes.
In any case, we want to be able to control the data that document creators put in for values such as priority. The best way to store these pieces of information is to add them as attributes to the root element. To do that, we need to add an attribute declaration to our DTD:.
The result should look a lot like Figure 3. Do you see how, under Results, it reads No errors or warnings found.? In Dreamweaver MX , the results list for a valid document is simply empty, and the status bar beneath the list reads Complete. What happens if some things are out of place? What would happen then? Notice that Dreamweaver MX tells you where the problem lies with a specific line number and provides a description of the problem. The validator catches that too, as you can see in Figure 3.
Figure 3. Error resulting from a misplaced element. Again, the validator gives you a line number and a description that can lead you to resolve the problem. All you need to do is put the sender element back in the prescribed order, and the document will validate once more. In that case, we embedded the DTD right into the file. You now have a reusable DTD that you can apply to other internal memos.
We now understand articles, news stories, binary files, and Web copy, and are well on our way to completing the requirements-gathering phase of the project — we can start coding soon! If you recall, we are tracking author, status, keyword, and other vital information in separate files. That is, each individual article, news story, binary file, and Web copy file keeps track of its own keywords, status, author, and dates.
If we wanted to display all documents for a certain author, we would have to dig through all of our files to find all the matches. Never fear — I have a proposal that will solve this problem. In fact, the rest of this chapter will be devoted to tackling this issue. With any luck, it will also give you some insights into the ways in which you can analyze requirements and come up with more architecturally sound XML designs.
The other problem is a little less obvious. To our application, these three names are different, and articles will thus be listed under three different authors. To solve this problem, we should create a separate author listing authors. Once we have this figured out, we can get rid of the author element in all the other content types, and replace them with an authorid elements. Handling our authors this way also allows us to track other information about authors, such as their email addresses, their bylines in case they want to publish under pseudonyms , and other such information.
Instead of a separate author element, we would add an authorid element to our articles, like this: All we need to do is use this author ID in our articles, news stories, and all other content we add to our CMS; this ID is used to look up the author and retrieve the information we need.
The big question remains: To be completely honest, most articles, news stories, and such will be submitted to the site through our administrative tool. This tool will have the necessary forms that will restrict data entry to certain fields.
In other words, our administrative tool will do most of the work of validating our content. However, I think it would be good practice to develop a DTD for our article content type — after all, this is one of the most important document types we have in our system, and it has to be done right.
Although we have declared our body element to contain character data, our article bodies will indeed be formatted using HTML tags.
Try writing DTDs for these as well. We used it to transform an XML letter to mother into something that could be displayed in a browser window. XPath is used in a variety of applications and technologies, however, XSLT is where its power and versatility really shine.
For all intents and purposes, XPath is a query language. It uses a simple notation that is very similar to directory paths hence the name XPath. When we put together a template, we normally use XPath to establish a match. For example, we can always handle the root of an XML document like this:.
With XPath, you can select all elements that have a particular tag name. Or, you could match certain elements depending on their location within an XML file. As you can see, the basic XPath syntax looks a lot like a file path on your computer. But you can go a step further and set conditions on which elements are matched within your specified path.
These conditions are called predicates , and appear within square brackets following the element name you wish to set conditions for. The symbol identifies priority in this example as an attribute name, not a tag name. XPath also has a number of useful functions built in. For example, if you need to grab the first or last element of a series, you can use XPath to do so. Although most practical applications are relatively simple, XPath can get quite twisty when it needs to be. The XPath Recommendation is quite a useful reference to these areas of complexity.
Book chapters provide an excellent opportunity to understand the arbitrary complexity of most XML documents. From the perspective of an XML document designer, however, a book chapter can be intimidatingly complex. Chapters can have titles and sections, and those sections can have titles. There are paragraphs throughout — some belong to the chapter for example, introductory paragraphs , but others belong to sections.
Sections can contain subsections. Paragraphs can contain text in italics, bold text, and other inline markup. In fact, one could even have different types of paragraphs, like notes, warnings, and tips.
There are lots of possibilities for displaying these kinds of information. Almost every legal Unicode character may appear in an XML document. Processor and application The processor analyzes the markup and passes structured information to an application. The specification places requirements on what an XML processor must do and not do, but the application is outside its scope. The processor as the specification calls it is often referred to colloquially as an XML parser.
Markup and content The characters making up an XML document are divided into markup and content, which may be distinguished by the application of simple syntactic rules. Strings of characters that are not markup are content. In addition, whitespace before and after the outermost element is classified as markup. Element An element is a logical document component that either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag.
The characters between the start-tag and end-tag, if any, are the element's content, and may contain markup, including other elements, which are called child elements. Attribute An attribute is a markup construct consisting of a name—value pair that exists within a start-tag or empty-element tag.
What kind of requirements do we need to gather? Essentially, requirements fall into three major categories: What kind of content will the CMS handle? How is each type of content broken down?
Who will be visiting the site, and what behaviors do these users expect to find? For example, will they want to browse a hierarchical list of articles, search for articles by keyword, see links to related articles, or all three?
What do the site administrators need to do? For example, they may need to log in securely, create content, edit content, publish content, and delete content.
If your CMS will provide different roles for administrative users — such as site administrators, editors, and writers — your system will become more complex. In the world of XML, each of these different types of content is, naturally enough, called a document type. You also have to know how each of these content types will break out into its separate components, or metadata.
Each article, for instance, will have various pieces of metadata, such as a headline, author name, and keywords, each of which the CMS needs to track. The final challenge — to define various types of metadata — can be a blessing in disguise. In my experience, once people grasp the importance of metadata, they race off in every direction and collect every single piece of metadata they can find about a given content type.
For example, the client might start to track the date on which an article is first drafted. When was it first published? When should it automatically be removed from the site, or archived?
How is this document uniquely identified in the system? Who holds the copyright to it? What other content is it related to? Which keywords describe the content for indexing or search purposes in other words, how do we find the content? Who should have access to the content the entire public, only site subscribers, or company staff?
Does the CMS view an article body as being separate from headings and paragraphs, or are all these items seen as one big lump of XML? Gathering metadata can be very tricky. At first glance, we could say that all of our articles should contain elements for author name and email address, and leave it at that. However, we may later decide that we want site visitors to search or browse articles by author.
In this case, it would make more sense to have a centralized list of authors, each with his or her own unique ID. Having a separate author listing would also allow us to easily set bylines for each author, in case someone decided they wanted to publish pieces under a pen name. It would also allow us to track author information across content types. Of course, agreeing on this approach means that we need to do other work later on, such as building administrative interfaces for author listings.
The other two are site functionality and site design. Every piece of metadata could potentially drive some kind of site behavior, but each piece of metadata also must be managed by the administration tools you set up.
Site Behavior Site behavior should always be based on and driven by metadata. Typical site behavior for a CMS-powered Website includes browsing by content categories, browsing by author, searching on titles and keywords, dynamic news sidebars, and more. Additionally, many XML- and database-powered sites feature homepages that boast dynamically updated content, such as Top Ten Downloads, latest news headlines, and so on.
It will also have to administer pieces of information that have nothing to do with content types, such as which users are authorized to log in to the CMS, and the privileges each of them has.
It goes without saying that your administrative interface has to be secure, otherwise, anyone could click to your CMS and start deleting content, making unauthorized changes to existing content, or adding new content that you may not want to have on your site.
A workflow is simply a set of rules that allow you to define who does what, when, and how. For example, your workflow might stipulate that a user with writer privileges may create an article, but that only a production editor can approve that content for publication on the site.
In many cases, CMS workflows emulate actual workflows that exist in publication and marketing departments. Defining your Content Types We want to publish articles and news stories on our site.
We definitely want to keep track of authors and site administrators, and we also want to build a search engine. Whenever I build an XML-powered application, I try to define the content types first, because I find that all the other elements cascade from there. Articles The articles in our CMS will be the mainstay of our site.
In addition to the article text, each of our articles will be endowed with the following pieces of metadata: A unique identifier.