Album 1 - Close to Home:

A Hypothetical Course Catalog Voyage

This album contains several snapshots related to a hypothetical college course catalog marked up using XML. The pictures are over-simplified in order to give the spirit of XML rather than deal with all the features. The album includes snapshots of

Snapshot 1: The document in black in white

In the following XML document, we show a partial catalog markup. The basic idea is that the catalog consists of a list of departments, and within a department, we have a list of courses.


<?xml version="1.0"?>
<catalog>
  <department>
     <dept-name>Computer Science</dept-name>
     <course>
	     <course-num>111</course-num>
	     <title>Fundamentals of Computer Science I</title>
	     <credits>4</credits>
	     <description>An examination of some of the major areas of computer science 
                     such as computer organization, algorithms and data structures, 
                     programming, and the theory of computation. Weekly meetings will 
                     include lectures and a laboratory session.
	     </description>
	  </course>
      <course>
	     <course-num>112</course-num>
	     <title>Fundamentals of Computer Science II</title>
	     <credits>4</credits>
	     <prereq>Computer Science 111</prereq>
	     <description>This course includes a disciplined approach to programming in a 
                     high level, object-oriented programming language building on the 
                     introduction in Computer Science 111. Emphasis is on problem 
                     solving methods and algorithm development. Additional topics will 
                     be chosen from linear data structures, machine organization, 
                     computer graphics, numerical methods and applications of computer 
                     science.
	     </description>
	   </course>
       <course>
	      <course-num>201</course-num>
	      <title>Fundamentals of Computer Science III</title>
	      <credits>3</credits>
	      <prereq>Computer Science 112</prereq>
	     <description>This course is a continuation of Computer Science 112. Emphasis 
                     is on the use and implementation of common data structures, 
                     introductory algorithm analysis and object-oriented design. 
                     Additional topics will be chosen from networking, artificial 
                     intelligence and parallel processing.
	      </description>
	  </course>
      <course>
	      <course-num>120</course-num>
	      <title>Procedural Programming</title>
	      <credits>4</credits>
	      <prereq>Permission of instructor.</prereq>
	      <description>This course includes the fundamentals of traditional procedural 
                     programming with a popular language such as C++. Topics include 
                     the common control structures, arrays, and classes.
	      </description>
	  </course>
  </department>
  <department>
      <dept-name>Mathematics</dept-name>
      <course>
	      <course-num>101</course-num>
	      <title>Calculus I</title>
	      <credits>3</credits>
	      <description>An introduction to the calculus of functions of one variable 
                     including a study of limits, derivatives, extrema, integrals 
                     and the fundamental theorem.
	      </description>
	  </course>
      <course>
	      <course-num>102</course-num>
	      <title>Calculus II</title>
	      <credits>3</credits>
	      <prereq>The equivalent of Mathmatics 101.</prereq>
	      <description>A continuation of Mathematics 101, including techniques of 
                     integration, transcendental functions, and infinite series. 
                    (A special section of Mathematics 102 is offered in the fall 
                    term for well-prepared freshmen. See department head for details.)
	      </description>
	  </course>
      <course>
	      <course-num>118</course-num>
	      <title>Introduction to Statistics</title>
	      <credits>3</credits>
	      <prereq>Mathematics 101</prereq>
	      <description>Elementary probability and counting. Mean and variance of 
                     discrete and continuous random variables. Central Limit Theorem. 
                     Confidence intervals and hypothesis tests concerning parameters 
                     of one of two normal populations.
	      </description>
	  </course>
  </department>
  <department>
      <dept-name>Chemistry</dept-name>
      <course>
	      <course-num>100</course-num>
	      <title>Modern Descriptive Chemistry</title>
	      <credits>4</credits>
	      <prereq>Permission of the department.</prereq>
	      <description>An elementary study of the structure and reactions of molecules. 
                     Laboratory work illustrates some fundamental procedures in 
                     chemistry. Designed for non-science students fulfilling general 
                     education requirements or desiring a science elective. No credit 
                     given for this course if a 200-level chemistry course has been 
                     successfully completed. Laboratory course.
	      </description>
	  </course>
      <course>
	      <course-num>104</course-num>
	      <title>The Conceptual Foundations of Quantum Theory</title>
	      <credits>3</credits>
	      <description>An introduction to what is currently the fundamental theory of 
                     nature. Quantum behavior is considered in the context of 
                     classical (Newtonian) notions of waves and particles and is 
                     applied to atomic, molecular, and nuclear systems. The practical 
                     and philosophical implications of quantum theory are considered 
                     in detail. No mathematics beyond high school algebra is assumed.
	      </description>
	  </course>
      <course>
	      <course-num>105</course-num>
	      <title>Foundations of Chemistry</title>
	      <credits>3</credits>
	      <description>An historical review of the development of chemistry, with 
                     emphasis on the applications of chemistry during its development. 
                     Designed particularly for non-science students fulfilling general 
                     education requirements or desiring a science elective. (May not 
                     be used for credit in the interdepartmental major in the natural 
                     sciences and mathematics.)
	      </description>
	  </course>
  </department>
</catalog>


Things you should have noticed

Snapshot 2: Zooming out - the bigger picture

Snapshot 3: Zooming in - some XML syntax rules

Snapshot 4: Looking from another angle - the document structure

The nesting of elements in an XML document provides a parent-child relationship where an element is the child of the element in which it is nested. All elements except the outermost has a parent. Elements that are nested immediately within the same element are considered siblings. Thus, the entire document can always be viewed structurally as a tree. This important for revealing the structure of the document, and for processing the document in a recursive fashion. Such tree structures can be readily handled by standard computing techniques. The following picture shows part of the tree for the example document.

Snapshot 5: Viewing the document with Internet Explorer

Unfortunately, not all popular browsers will display XML documents. That is due to change with the next versions, however. At this point, Internet Explorer does provide some XML capabilities. The elements are shown with indentations indicating the nesting. Each element can be expanded or compressed using the +/- symbols to the left. The following picture shows the document with all of the elments compressed:

Next, we show the document expanded to show all elements of the second mathematics course.

Snapshot 6: A crime detected

One problem with HTML documents is that syntax rules are not enforced by the browsers. Often end tags are omitted or tag pairs are not nested properly. This is bad because the author does not get feedback as to why the document is not displayed as intended. Furthermore, different browsers may make different guesses as to what was intended.

With XML, all parsers, even those in the browsers, are required to report syntax errors. For example, suppose we forget the end tag </credits> in the first computer science course. Here is the message we get when we try to open the document with Internet Explorer.

This indicates that the parser found the end tag for the course element before finding the end tag for the credits element nested inside the course.

Snapshot 7: A glance at an XML editor

As with all developing technologies, there is a multitude of tools for working with XML. As an example, here is a glance at an XML editor called XMLMate:

Snapshot 8: A Cascade (Cascading Style Sheet, that is)

An important idea with XML is to separate content from display. That is, our documents should hold content with meaning. This content should be amenable to display in many forms, only one of which is a visual web page. To achieve this, we can make use of various kinds of style sheets that specify how various content items should be displayed. Thus, one document could be displayed many ways by using different style sheets. On the other hand one style sheet could be used with many different documents. For example, all of the pages in this project use the same style sheet. By changing the background color, say, in the style sheet, all of the pages will display with the new background.

Since XML allows us to make up our own tags, there is no way for a browser to know how we want the elements to be displayed. It is one thing to build in the browser knowledge of how to display a fixed set of tags like those of HTML, but quite another to allow authors complete freedom in tags. How should we display a <whichwhatwho>? Therefore, it is essential to use some style system unless we want just the default structure like shown earlier with Internet Explorer.

One popular type of style sheet supported by the popular browsers is Cascading Style Sheets. These can be used with HTML files or, using Internet Explorer, with XML files. The following is a style sheet called catalog.css that specifies how we want various elements from our catalog tags to be displayed. Note that we are specifying that the prerequisites should not be shown.

    department {display: block;}

	dept-name {font-weight: bold; color: BLUE; font-size: 24 pt;}

	course {display: block; font-size: 14 pt; margin: 20 pt;} 

	title {font-style: italic; color:GREEN;}

	prereq {display: none;}

	description {display: block; margin-left: 30 pt;}

By adding the following line

<?xml-stylesheet type="text/css" href="catalog.css"?>

as the second line of the catalog.xml file, we view the document as shown below:

	

Snapshot 9: Transforming the document to HTML with XSL

XSL (XML Style Language), as the name suggests is a style language developed specifically for XML documents. There are actually two major components of XSL:

Here, we focus on XSLT. First, we use XSLT to transform our XML catalog document into an HTML document. To hint at some of the capabilities of XSLT, we have XSLT build a table of contents at the beginning of the document listing each department as a link to the section where the department's courses are listed. This example is fairly complicated; so we look first at part of the page created from the XML document. Then we will look at pieces of the XSL program used to create the HTML file. As usual, our goal is to give the spirit of the process, not the details. Here is the top of the HTML page created from the transformation. Notice the table of contents at the top, and the various formatting for the items within a course.

In effect, an XSL translation program consists of templates telling what actions should be carried out when certain kinds of XML nodes are encountered in the tree structure. These actions include putting text into the output file, carrying out XSL looping instructions, and recursive actions to be taken in the subtree under the current node. The following picture shows the main template for the HTML conversion shown above. The match of "/" indicates that we are matching the root element of the tree. This section says that when we encounter this element, we should produce HTML code for the HEAD section, code to set the background color, etc., then produce code for the table of contents, and finally apply the other templates recursively to the rest of the tree. To produce the table of contents, the instructions says to produce HTML code for a list and produce a list item for each depart-name element it finds in the document.

Below we show the templates for the department elements and the dept-name elements. For a department element, we produce a new paragraph, process the dept-name element for this department and then create the list for the courses of this department. For the dept-name element, we build an anchor for the corresponding item in the table of contents.

Finally, we show the other templates in the file. Each course element generates a list item, the course number, credits and title are bold, and the prerequisites are in italics.

Snapshot 10: Transforming the document to text with XSL

To illustrate how a single XML document can be used in different formats, we now show the document after is has been transformed into plain text using XSLT:

Below we show part of the XSL file used to do the transformation. Here we use a lot of text nodes. In particular, note how these nodes are used to achieve line breaks and indentations.

Snapshot 11: Look at this! My own language! (DTDs)

Two important features about XML that we have observed so far are

However, without additional information, the parser has no way to know application-specific rules about the structure of elements that we create. For example, the parser would not know that

XML provides the Document Type Definition (DTD) facility to allow us to specify structural rules for our application. An XML document which satisfies the general XML syntax is referred to as well-formed, while a document that also satisfies the rules given in a DTD is said to be valid. A parser that will also check the validity of a document against a specified DTD is said to be a validating parser. The power of this concept is illustrated by noting that within a corporation or within a discipline, a standard set of DTDs can be established and enforced using a validating parser. This insures consistency of markup and allows the development of applications that assume certain standards.

Below we show a simple DTD for the catalog example:

Here, an * indicates zero or more occurrences of the element. Thus, the first rule states that a department element contains a list of department elements. The second rule specifies that a department element consists of a dept-name element followed by a list of course elements. The third rule simply states that a dept-name element consists of character data. The ? symbol indicates that the specified element occurs zero or one time. Thus the fourth rule requires that a course element consist of a course-num element followed by a title element followed by a credits element possibly followed by a prereq element and then a description element.

Assuming the above file is named "catalog.dtd", we can add the line

<!DOCTYPE catalog SYSTEM "catalog.dtd">

as the second command in our XML document, and a validating parser will check that our documents satisfy the rules of the DTD.

Snapshot 12: Glancing at a DTD editor.

There are many tools available for working with XML. Here is a quick glance at a tool that can be used for creating and managing DTDs. This is a demonstration version of XML Authority by Extensibility.

Snapshot 13: Hinting at the database connection.

One of the most promising applications of XML is as a standard format for data for transfer among different types of applications. For example, data from a database query could be output in XML format. This XML data could then be combined with similar data from other sources and presented as input to another application which produces further XML output, etc. To illustrate this idea, we have an Access database for our catalog data. As shown below there are two linked (via department number) tables, one for the department information and the second for the course data:

There are many tools for producing XML data from any ODBC object such as a database or spread sheet. Here we use a program called ODBC2XML. After setting up an ODBC DSN for the database, we can create an XML file with SQL processing commands to query the database and produce the output in XML format. Our first example is simply to produce an XML file with a list of department names. Here is the XML file specifying the query:

Here, the fourth line is issuing an SQL query to produce a result called q1 containing all of the rows of the department table. The fifth line produces a line of output for each line in q1. The command inserts the DeptName field from the current row of q1. The output XML document produced is shown below:

Now we move on to a slightly more complicated example. This time we have a nested query.

The outer query at line 3 again selects all of the rows of the department table. At line 4 we again insert the department name. But now at line 5 we issue an inner query (for each row of the outer query) that produces all of the rows from the courses table matching the DeptNum field for the current department (from outer query). We then the rows produced by the inner query to pull the information for the various courses. What does this produce as output? You guessed it! The original catalog XML document!

The simple picture is that we can store and manage our catalog data in a relational database and then easily produce XML documents from this for various purposes. The bigger picture is that we could use XML as a glue for holding together the various data sources for a large enterprise.