DEVELOPMENT AND TESTING OF A STANDARDIZED FORMAT FOR DISTRIBUTED LEARNING ASSESSMENT AND EVALUATION USING XML
By
TERESA FERRANDEZ
B.S.C.E., University of Dalarna, 1997
A thesis submitted in partial fulfillment of the requirements for the degree of
Master of Science in Computer Engineering
in the Department of Electrical and Computer Engineering
in the College of Engineering
at the University of Central Florida
Orlando, Florida
Fall Term
1998
ABSTRACT
This document presents development of a markup language created to enable instructors to share instructional material. The language is developed as an Extensible Markup Language (XML), and it is intended to be used as a common file format for Internet distance learning tools. The markup language can model the information of an entire course but the research has been focused towards creating a markup language for requirements of questions and quizzes.
The Instructional Management Systems (IMS) project group, a research group focused on distance learning over the Internet, has defined a set of requirements for distance learning tools. The markup language developed for this thesis complies with these requirements and thus, any distance learning tool using this markup language will also comply with these standards.
Apart from the markup language, a set of tools has also been developed to save the information stored in the XML files to a content server. The overall objective of the content server is to allow users to share materials amongst different classes and sets of users.
2
ACKNOWLEDGEMENTS
I would to express my sincere appreciation to those people whose guidance assisted me in the completion of this thesis. Foremost, I would like to thank Dr. Eaglin for his support and encouragement, and Dr. Bauer for serving as my advisor. I would also like to express my gratitude to Dr. Linton and Dr. Klee for serving on my committee.
I would also like to thank my fiancé Lars Norlander for being patient with my neglected domestic chores, and most of all, for always being there to offer support when I needed
it.
3
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
CHAPTER 1 INTRODUCTION
1.1 Problem statement
1.2 Thesis purpose and organization
1.3 General approach to problem
1.4 Distance learning tools on the Internet
1.4.1 WebCT
1.4.2 CyberProf
1.5 Related work
1.5.1 Distance Learning Database
1.5.2 WebCT filter
CHAPTER 2 IMS - Instructional metadata systems
2.1 IMS Specifications for Distance Learning
2.2 Usage of the IMS Specification in this implementation
CHAPTER 3 xml
3.1.1 Background
3.1.2 Intention of XML
3.1.3 Future
3.1.4 XML Details
4
3.2 Other Applications that use XML
3.2.1 Math ML
3.2.2 Music ML
3.2.3 Microsoft Office
3.3 General XML design considerations
3.3.1 Attributes vs. Elements
3.4 DTD vs. Schema
3.4.1 DTD Overview
3.4.2 Schema Overview
3.4.3 Conclusion
3.5 Rendering XML files using XSL
3.5.1 Overview of XSL
3.5.2 Possibilities and limitations
3.5.3 Possible uses for XSL's with SATML
3.6 XML and security issues
CHAPTER 4 development of satml
4.1 Review of similar XML initiatives
4.1.1 QML (Question Markup Languge)
4.1.2 QML (Quiz Markup Language)
4.1.3 QuizzIT
4.1.4 TML (Tutorial Markup Language)
4.1.5 Instructional Management Systems (IMS)
4.2 Overview of Stored Information
4.3 SATML design
4.3.1 Course Information
4.3.2 Assignments
4.3.3 Questions
4.3.4 Assessments
4.3.5 Course Schedules
4.4 SATML Versions
4.4.1 Version 1
4.4.2 Final version
CHAPTER 5 Development of database converter
5.1 Overview of methods
5.1.1 Java
5.1.2 Parsing API
5.1.3 JDBC - ODBC
5.2 Overview of the development process
5.2.1 Specification
5.2.2 Design
5.2.3 Conclusion
CHAPTER 6 method of evaluation
6.1 Test cases
6.2 Evaluation criteria
6.3 Test results
6.4 Conclusions
CHAPTER 7 Summary
APPENDIX A SATML DTD FILES
APPENDIX B CODE FOR SATML-DATABASE CONVERTERS
APPENDIX C SATML FILES DESCRIBING COURSE EEL5937
APPENDIX D DATABASE DIAGRAM
LIST OF REFERENCES
5
LIST OF FIGURES
Figure 1 Common file format for Distance learning material
Figure 2 Example of entry form for multiple choice questions in WebCT
Figure 3 Example of a WebCT Multiple Choice file
Figure 4 Example of questions stored in CyberProf
Figure 5 The IMS System Model
Figure 6 XML fragment
Figure 7 Example of MML
Figure 8 Default rendering of the nth root
Figure 9 MusicML code
Figure 10 Example of Music ML rendering
Figure 11 Example of embedded elements
Figure 12 Example of attributes
Figure 13 Books - Example 1
Figure 14 Books - Example 2
Figure 15 Books - Example 3
Figure 16 Books - Example 4
Figure 17 Example of minimalist approach for XML rectangle
Figure 18 Example of extravagant approach for XML rectangle
Figure 19 Employee DTD
Figure 20 Employee XML
Figure 21 Syntax of Attlist
Figure 22 Example of the use of notations in XML
Figure 23 Employee list XML file
6
Figure 24 Schema for an employee list XML
Figure 25 DCD for bank loans
Figure 26 ElementDef Example
Figure 27 AttributeDef Example
Figure 28 Attribute Example
Figure 29 Group Example
Figure 31 Example of XSL
Figure 32 The root rule
Figure 33 A more specific pattern
Figure 34 Specifying attribute values
Figure 35 Retreiving attributes
Figure 36 Examples of wildcards
Figure 37 first-of-type
Figure 38 QML Structure
Figure 39 QML Example
Figure 40 QML Structure
Figure 41 Quizzit structure
Figure 42 Example of Quizzit XML
Figure 43 TML structure
Figure 44 TML Example
Figure 45 IMS MC-Test Structure
Figure 46 IMS MC-Test Answers structure
Figure 47 IMS Performance-Data structure
Figure 48 IMS Notification Structure
Figure 49 IMS Navigation Structure
Figure 50 IMS MC-Test Example
Figure 51 IMS MC-Test Answers Example
Figure 52 IMS Performance example
Figure 53 IMS Notification example
Figure 54 IMS Navigation example
Figure 55 The structure of a general course
7
LIST OF TABLES
Table 1 IMS Meta Data fields
Table 2 Examples of Element Declaration
Table 3 Attlist attributes
Table 4 ElementDef Attributes
Table 5 AttributeDef attributes
Table 6 Group attributes
Table 7 Course Information
Table 8 Assignment elements
Table 9 Question elements
Table 10 Assessment Elements
Table 11 Schedule Elements
8
CHAPTER1 INTRODUCTION
1.1 Problem statement
There are many distance learning software packages available for the Internet today. Two examples are WebCT, which was developed by the department of Computer Science at the university of British Columbia; and CyberProf, developed by the Center of Complex System Research at the University of Illinois.
Most tools are similar in the sense that they let Instructors create quizzes and perform assessments of the students.
The problem is that there are as many ways of storing the assessments and quizzes, as there are tools on the web. Unlike graphics programs, text editors, etc., distance learning tools have no standard file formats and thus there is no way to import or export data from one tool to another. In most tools there are not even means to transfer quizzes, questions, or assessments between different classes created with the same tool.
1.2 Thesis purpose and organization
The objective of this thesis is to develop a standard for storage and transfer of the quizzes, assessments and
1
course data created in current and future distance learning tools.
The work that will be done for this thesis includes the development of a file format for the course data and an application for storing the content of the files into databases as shown in .
Figure Common file format for Distance learning material
The file format will be called Standardized Assessment and Testing Markup Language (or SATML) and the draft for the file format will be submitted to the IMS (Instructional Metadata Systems) Question Interoperability group, as a suggestion for the IMS distance learning file format for this type of data.
In the thesis the issues related to the development of the file format and the converter, will be addressed:
Related work.
The parts of the IMS Specification that are related to this thesis.
Background of XML.
Other applications that have used XML as basis for their files.
XML design considerations.
Rendering XML files using XSL.
Information which needs to be stored.
Security issues when transferring XML files over the Internet, such as methods to avoid that the students read the files with answers to quizzes and assessments.
Methods for parsing XML files.
1.3 General approach to problem
The plan is to implement the file format as an XML (eXtensible Markup Language) which is described in further detail in Chapter 3.
For the implementation of the filter between the XML layer and the Database layer, the Simple Api for XML (SAX: SAX API) developed by the Microstar Corporation and Java will be used due to their close relationship with XML. All Open Database Connectivity (ODBC) compliant relational databases will be supported but the focus will be on Oracle and Access due to the availability of both of these to the researcher and their common use in the industry.
The file format and the filter will be tested by exporting quizzes from WebCT to SATML using a filter developed by Alexander Aguilar, a member of the AIM research team. The SATML files will then be saved to a database and restored to SATML. The different test cases are specified in Chapter 6.
1.4 Distance learning tools on the Internet
Two common distance learning tools on the Internet are WebCT and CyberProf. These tools are both used by the professors at the University of Central Florida to complement their traditional classes. Typically the instructors use them to create homework, quizzes and exams that can be automatically graded. The structure of the files stored by these tools has been used in the research conducted for the development of the SATML file format.
This chapter contains a review of these tools and a discussion of their advantages and disadvantages.
1.4.1 WebCT
WebCT is a tool in which you can create and distribute
distance learning material. WebCT not only provides a web based study environment but also a web based course management environment. WebCT contains various tools for communication between the instructor and the student and also among the students. Some examples are chat, bulletin boards, conference system etc. In addition to the communication tools WebCT provides tools for creating quizzes and assignment and it also provides student-tracking tools where the student and the instructor can view the students progress among other things.
The user interface used to create the different types of questions in WebCT is very intuitive as shown in .
2
Figure Example of entry form for multiple choice questions in WebCT
3
WebCT stores every question in a separate file and the files are only visible inside the course where they were created.
1.4.1.1 Developers
The department of Computer Science at the University of British Columbia developed WebCT. An introduction to WebCT can be found on the WebCT web site (http://www.webct.com/webct).
1.4.1.2 Influence on SATML
WebCT heavily influenced the design of the SATML's question files for two reasons: a) the quiz structure of WebCT is complete and tested b) many classes at the University of Central Florida have been using WebCT successfully.
WebCT allows the instructor to create four different types of questions: multiple choice, short answer, calculated, and matching. These four types of questions have been adopted in the SATML file format since they seem to cover most of the needs that an instructor might have while creating quizzes.
4
:::MC:::1:::0
:::TITLE:::Question 24
:::QUESTION:::H
In general, the annual temperature ranges are greater in the Southern Hemisphere than in the Northern Hemisphere.
:::IMAGE:::
:::LAYOUT:::vertical
:::ANSWER1:::0:::H
True
:::ANSWER2:::100:::H
False
Figure Example of a WebCT Multiple Choice file
shows an example of how a multiple choice question is stored in WebCT's flat file database format.
":::MC:::1:::0" states that the question is a multiple choice question where the student is only allowed to choose one of the answers and a wrong answer can never result in a negative score.
The second line ":::TITLE:::Question 24", defines the title of the question.
":::QUESTION:::H"
"In general, the annual temperature ranges are greater in the Southern Hemisphere than in the Northern Hemisphere."
States the text of the question and the H on the first line declare that the question text should be treated as HTML as opposed to regular text.
After the question text optional image links can be inserted to illustrate the question but in this case ":::IMAGE:::" has been left empty which means that the creator of the question has chosen not to provide any images.
The next line ":::LAYOUT:::vertical" defines that the answers of the question should be lined up vertically. This line is followed by lines stating the possible answers of the question where the numbers 0 and 100 indicate how correct the answer is (correctness), and the H indicates that the answer text should be treated as HTML, instead of regular ascii text.
:::ANSWER1:::0:::H
True
:::ANSWER2:::100:::H
False
Optionally lines providing feedback to the student selecting a particular answer can also follow these lines.
Most distance learning tools store similar information about their quizzes and questions and since WebCT contained most of the data used in normal distance learning tools, the WebCT structure was used as the backbone of the SATML structure.
1.4.2 CyberProf
CyberProf is a tool that helps the instructor of a course develop assignments for his students.
Unlike WebCT, CyberProf does not store the questions individually. A file in CyberProf contains an entire assignment. The information is created and stored in an HTML like manner.
CyberProf has three different question types: symbolic, singleword and multiple_choice as shown in .
What is the sum of 40 and 25?
<question name = q1 type = symbolic solution = 65>
<hint name = q1>
<B>Hint:</B> If you can't add it in your head, use a
calculator!
</hint>
What is the name of the first U.S. President?
<question name = q2 type = singleword solution =
washington;george washington>
Which of these does not belong?
<question name = q3 type = multiple_choice solution = 3>
<option name = q3 value = 1>bird</option>
<option name = q3 value = 2>dog</option>
<option name = q3 value = 3>rock</option>
<option name = q3 value = 4>cat</option>
<option name = q3 value = 5>fish</option>
Figure Example of questions stored in CyberProf
The example above was taken from the web site .
The structure of the Cyberprof assignment files is very straightforward and easy to understand, but the questions that can be created in the CyberProf tools are very restricted. CyberProf has one nice feature that WebCT lacks and that is the possibility to give the students hints when they are not able to answer the questions.
Since the ability to give the students hints is a major advantage, this feature has been adopted in the creation of the SATML files.
1.4.2.1 Developers
CyberProf was developed at the center of Complex System Research at the University of Illinois
1.5 Related work
The development of the SATML file format is part of a larger research project. Aside from the development of the SATML file format a filter to save WebCT data in SATML format and a content database to store all the instructional data created at the University has been developed.
1.5.1 Distance Learning Database
The distance learning database has a close relationship to the SATML file format and thus the structure of the distance learning database follows the structure of the SATML file structure.
The distance learning database is a distributed database and replication is used to keep the parts of the database current.
1.5.2 WebCT filter
The WebCT filter is intended to work like the file filters used in text editors or graphics programs. The concept is to make the instructors creating course content in WebCT able to save their work in the common SATML format. The prototype for the filter was created for WebCT but developers of distance learning tools that want to make use of the SATML file format can develop similar filters.
The WebCT filter is an add-on and thus it does not affect the internal structure of WebCT.
5
CHAPTER2 IMS - Instructional metadata systems
The IMS Project group is a research group formed by academic, commercial, and government organizations to improve the situation of distance learning. It is based on an initiative from Educom. In the article Robert C, Herrick Jr states that Educom is an organization founded thirty-four years ago by a group of medical school deans and vice presidents from Duke, Harvard, SUNY, the universities of California, Illinois, Michigan, Pittsburgh and Virginia, dedicated to the idea that digital computers offered an incredible opportunity for sharing among institutions of higher education.
The IMS project was formed to develop an infrastructure for managing access to learning materials and environments, to speed the development of instructional software, and to facilitate collaborative and authentic learning activities.
The states that there are three major obstacles being addressed by the IMS Project group:
Lack of standards for locating and operating interactive platform-independent materials
Lack of support for the collaborative and dynamic nature of learning
6
Lack of incentives and structure for developing and sharing content
The IMS working group has developed a prototype for a virtual university environment containing tools for conducting web based classes.
The tools range from student management tools to tools for displaying function graphs on a group of machines simultaneously.
The diagram in and its diagram description shows the architecture of an IMS tool as presented in the IMS Specification, version 0.5 .
7
Figure The IMS System Model
The connections between various parts of the Model represent the various types of data exchange and communication in the IMS system.
1. A Search Engine may use meta-data information to query a Content Server for specific kinds of learning material.
2. A Search Engine may query a Profile Server to find persons with desired skill certifications.
3. An authoring tool may query a Profile Server for preference and performance data to customize its presentation.
4. A Search Engine can obtain course meta-data from a Content Server at an organization hosting an IMS server.
5. Authoring tools may exchange content with an IMS Management System.
6. Authoring tools may interact with content servers to find or provide content and content meta-data.
7. A content server may provide content material to an IMS Management System.
8. There will be many content servers reachable by search engines.
Each course or instructional unit created in the IMS prototype can have it's own distinct resources, and the resources can be anything from web sites containing instructional material to specific teaching tools.
The idea is that all the resources provided for a class or instructional unit should contain meta-data tags to enable searches, management, and reuse of the material.
2.1 IMS Specifications for Distance Learning
The IMS specification defines a number of meta-data fields that the IMS prototype uses for searches and management of instructional data as shown in .
Table IMS Meta Data fields
|
Name
|
Type
|
Definition
|
|
Author or Creator
|
Opt
|
The person responsible for the creation of the work
|
|
|
|
|
|
Name
|
Type
|
Definition
|
|
Coverage
|
Opt
|
The coverage of the instructional unit, (use of this field is very experimental)
|
|
Date
|
Mand
|
A date associated with the creation or the availability of the resource.
May contain several sub elements such as Publication date, Availability date and expiration date.
|
|
Description
|
Mand
|
A textual description of the contents of the resource.
|
|
Format
|
Mand
|
The format of the resource, ex. Book, html etc.
|
|
Language
|
Mand
|
The language the resource is presented in, ex. US-en
|
|
Other Contributors
|
Opt
|
A person or organization besides the creator of the resource that has helped developing the resource
|
|
Publisher
|
Mand
|
The entity responsible for making the resource available in its present form
|
|
Relation
|
Opt
|
An identifier of a second resource and it's relationship with this resource
|
|
Resource Identifier
|
Mand
|
A string or number that uniquely identifies this resource
|
|
Resource Type
|
Opt
|
The type of the resource, ex. Tutorial
|
|
Rights Management
|
Opt
|
A list of what the student or instructor can do with the resource.
It currently has two sub fields: agent and use rights.
|
|
Source
|
Opt
|
Information that this resource is based on
|
|
Subject
|
Mand
|
The topic or the keywords of this resource
|
|
Title
|
Mand
|
The title of the resource
|
|
Agent
|
Opt
|
A person responsible for managing a part of a resource. An instructor is an example of an agent.
|
|
Availability Date
|
Opt
|
The date the resource is available for use.
|
|
Concepts
|
Opt
|
Ideas related to the resource
|
|
Container Type
|
Mand
|
The container type is a very broad description of the resource type
|
|
|
|
|
Name
|
Type
|
Definition
|
|
Expiration Date
|
Opt
|
The date that the contents of the resource are no longer valid.
|
|
Granularity
|
Opt
|
The size of the resource
Curriculum
Course
Unit
Topic
Lesson
Fragment
NA (Not applicable)
|
|
Interactivity Level
|
Opt
|
The level of interaction between the user and the container
Low
Medium
High
|
|
Keywords
|
Opt
|
One or more words exemplifying the contents of the course
|
|
Last Modified Date
|
Opt
|
The day the resource was last modified
|
|
Learning Level
|
Opt
|
The difficulty of the material
|
|
Location
|
Opt
|
The URL showing where the resource can be retreived
|
|
Meta-Meta-Data
|
Mand
|
Information about Meta-data
|
|
Objectives
|
Opt
|
Learning objectives met by the container
|
|
Pedagogy
|
Opt
|
The method used to teach the contents of the resource
Discovery, Expository
|
|
Platform
|
Opt
|
Software and hardware required to use the contents of the resource
|
|
Prerequisites
|
Opt
|
Courses and or capabilities needed to use the material
|
|
Presentation
|
Opt
|
Describes how the materials are presented to the user
Images, Verbal, Sound, Multi-User
|
|
Price Code
|
Cond
|
The price of using a particular offering
|
|
Publication Date
|
Opt
|
The date the resource was first published
|
|
Role
|
Opt
|
The role of the entity serving as the learning resource
Curriculum, Course, Unit,
Topic, Lesson,
Fragment, NA (Not applicable)
|
|
Scheme
|
Mand
|
A description of the information structure of the resource
|
|
Size Of
|
Opt
|
Size of the container in bytes
|
|
Structure
|
Opt
|
The organization of the material
Linear, Hyperdimensional, Branched, Parceled, Null
|
|
|
|
|
|
Name
|
Type
|
Definition
|
|
Use Rights
|
Opt
|
What a user can do with the offering
Restricted
Use
Aggregatable
Disaggregatable
Distributable
Editable
|
|
Use Time
|
Opt
|
The average time a normal student would spend on the container (in minutes).
|
|
User Support
|
Opt
|
Indicates whether user support is available or not
|
|
Version
|
Mand
|
The version of the resource
|
Not all of the fields defined by the IMS project group need to be present in a resource for IMS compliance.
The center column in defines whether a certain field is optional or mandatory. This does not mean that the field has to be present in the resource, it only means that if the field is present, a value must be given for the field. For further information on the IMS meta-data fields consult the web site .
2.2 Usage of the IMS Specification in this implementation
The SATML format consists of several different file types defining different part of a resource. One of these file types contains the meta data about the current course or instructional unit. In this meta data file most of the information about the resource is gathered.
The SATML Meta data file contains 30 of the 42 fields recommended by the IMS project group, along with about 10 new fields.
8
CHAPTER3 xml
XML is a computer template language for describing information. Extensible Markup Languages (XMLs) are very similar to HTML, in fact HTML 4.0 can be expressed as an XML. A XML file is built up by a set of tags (metadata) that describes the objects in the file as shown in .
<PERSON SEX="Male">
<NAME>Arnold Burns</NAME>
<AGE>23</AGE>
</PERSON>
Figure XML fragment
There are many advantages of XML:
XML lets you store information about the data in the files.
XML files are plain ASCII files so bandwidth requirements when transferred over the Internet are minimized.
There is/will be support for rendering and searching XML files in the most common web browsers.
XML is recognized by many large corporations in the computer industry (Microsoft, Netscape...) to be the next major standard for the Internet.
9
This chapter is based on information from technical reference books and .
3.0.1 Background
People have always wanted to share information, through speech, books, music or arts. But in order to convert regular data into information we have to provide the receiver with information about our data.
Although the meaning of "4, Eddie, A, blue", might be very clear, few people can understand the meaning of that data without any explanation. It might be the age of your daughter, the name of your uncle, your grade in history and the color of your car. The data loses meaning without information about the data to back it up.
This is where the ideas about meta-data (data about data) comes in. When HTML was created, the intention was to markup all the information in the documents according to its meaning, and not according to how it would be rendered in a browser. Document titles should be marked up with the <title> tag, addresses with the <address> tag etc. Since the browsers often know more about the readers' preferences than the author of the web page, the browsers should take care of the rendering of the page.
However, the browser vendors have decided to put more of the rendering information in the HTML pages, such as <font>, <i>, <b>, etc. instead of leaving the rendering to the browser or special style sheets.
If the fact that it would be better if we could split the information part and the rendering part is disregarded, there are still two things that are missing in HTML.
HTML has tags for marking up titles, heading, lists, addresses, tables, etc. but even though HTML may be a very rich language for markup of plain text it is very limited when it comes to marking the contents of the document.
The other problem with HTML is that HTML has no internal structure. The way HTML is set up you can have a level 2 heading <H2> without preceding it with a level 1 heading <H1>.
The W3C put together a working group (XWG) and started developing a standard for markup languages called XML (Extensible Markup Language).
The ideas behind XML have been developing since the 1960's, culminating in the approval of XML's parent, SGML, as an international standard in 1986.
SGML (Standard Generalized Markup Language) is an ISO standard for electronic document exchange, archival and processing using meta-data tags to mark up the information.
In many ways XML is a "purification" of SGML, removing portions of it with limited applications that complicated its very powerful and basically simple central ideas.
3.0.2 Intention of XML
The web site hosted by W3C states that when the XML Working Group (XWG) at W3C started with XML they had 10 goals in mind.
1. XML shall be straightforwardly usable over the Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML documents.
5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6. XML documents should be human-legible and reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance.
3.0.3 Future
As more and more companies are starting to realize the potential of XML, XML is not only starting to gain ground as the new language for the Internet, but also as a new way to store data in traditional applications to make the applications smarter. Users of software applications are getting used to the applications keeping track of their preferences and storing the content of the files in an XML format enables the applications to display the file contents differently for different users, depending on their preferences.
3.0.4 XML Details
There are three types of XML documents: well formed, valid and incorrect.
For an XML document to be well formed it has to follow a set of rules. The valid XML documents, apart from following the XML rules also have to conform to a DTD (Document Type Definition) which is explained in more detail in section .
All the XML documents that don't follow the XML rules are invalid or incorrect.
To verify and/or validate your XML files you can use one of many tools available on the Internet. An example of a tool that checks if your XML file is well-formed is the XML-Notepad by Microsoft, which can be downloaded freely from the Microsoft XML Notepad website .
3.1 Other Applications that use XML
Extensible Markup Languages have been created for various areas. This chapter contains reviews a few of them in order to show the potential of XML and how XML has been used to markup a wide range of information.
The objective of this review is to get a feel for how XML can be used, and to get some experience of the process of designing an XML.
3.1.1 MathML
The Mathematical Markup Language was developed because mathematical equations have always been hard to display and model.
Usually images are used to display equations but there are a few problems with images:
You can't search for parts of equations if they are represented by images.
If the font size or type is changed the equations still stay the same.
Images slow the downloading of the document.
Using images makes the text harder to read, since you need more line spacing between the lines that uses images than between the rest of the lines.
Images are harder to cut and paste than text.
Images can't be compressed like text.
The W3C formed a working group for developing a standard markup language for mathematics in 1997 and the first draft of the Mathematical Markup Language was submitted in May 1997.
There has been a lot of software developed to display and edit the Mathematical Markup Language. The most known of these software products is the MathML browser Amaya.
shows an example of a square root modeled in MathML.
The nth root of a is given by:
<apply> <root/>
<ci> a </ci>
<ci> n </ci>
</apply>
Figure Example of MML
The default rendering of the square root in Amaya is shown in .
n√a
Figure Default rendering of the nth root
For more information on the MathML draft consult the .
3.1.2 Music ML
The Music Markup Language describes music in the same way songbooks describe music, With chords, notes, etc.
The example in is provided to give you a feel for how a markup language can be used.
The example below describes a segment of a melody with all its chords, notes and beams.
<Segment>
<SubSegment position="one">
<Chord>
<Note beat="quarter" name="f" level="zero"/>
<Note beat="quarter" name="b" level="zero"/>
<Note beat="quarter" name="d" level="plus1"/>
<Note beat="quarter" name="f" level="plus1"/>
</Chord>
<Chord info="Em9-5">
<Note beat="half" name="c" level="zero"></note>
<Note beat="half" name="c" level="plus1"></note>
</Chord>
<Beam size="triple">
<Note beat="quarter" name="e" level="zero"></note>
<Note beat="quarter" name="a" level="zero"></note>
<Note beat="quarter" name="b" level="zero"></note>
</Beam>
</SubSegment>
...
</Segment>
Figure MusicML code
The XML document is used as a parameter for a Java applet, which renders the music in the fashion shown below.
Figure Example of Music ML rendering
The example shown in and the image shown in are taken from the web site.
One of the true advantages of Markup Languages is the ability to render the same data in different manners. The default rendering of the Music Markup Language is the one shown in , however other possibilities are the creation of MIDI files from MusicML files.
3.1.3 Microsoft Office
In ", it is written that the development of the new Microsoft Office suite with the code name Office 9 is focused on making office documents easier to publish. The default format for Microsoft Office files will be HTML with embedded XML instead of Microsoft's binary format used in the current versions of Microsoft Office. Embedded XML code will be used to save the meta data about the object that the author of the document has saved in the preference dialogue boxes. Office 9 will also use embedded CSS's (Cascading Style Sheets) to save the appearance of the documents.
3.2 General XML design considerations
In all the news groups about XML the discussions about the topic of XML design is the one that merits the most discussion.
In this chapter the issue of attributes/elements and links/embedded data will be discussed to give a feeling for why certain design decisions were made.
3.2.1 Attributes vs. Elements
The question of using attributes or elements is for the most part only an esthetical design question, but there are a few rules of thumb that are worth considering when designing a Markup Language. The following rules of thumb were gathered through reading Robin Covers and postings in the comp.text.sgml news group ], ] and ].
1. Use an embedded element when the information you are encoding is part of the parent element.
<author>
<first-name>Betty</first-name>
<last-name>Boop</last-name>
</author>
Figure Example of embedded elements
2. Use an attribute when the information is inherent to the parent but not a constituent part.
<person height="123">
<head/>
<body/>
</person>
Figure Example of attributes
3. Use attributes for simple data type validation
4. Use elements for complex structure validation
5. Use attributes for things that will not produce ink on the paper.
6. Use elements when the information has an internal structure of its own.
7. Use elements when the information can be contained in more than one element.
8. Use attributes when modeling a static object that will not change over time.
9. Use attributes to represent a link.
3.2.1.1 Practical considerations
Some tools used to edit XML files don't show the attributes while editing the file, instead a dialog box to edit the attributes, must be opened.
This can be very tedious while editing large files.
3.2.1.2 Advantages and Disadvantages
3.2.1.2.1 Advantages of attributes
Attributes can have default values
Attributes can have data types
Attributes take less space since they have no start and end tags
3.2.1.2.2 Disadvantages of attributes
Attributes are not convenient for large values
Values containing quotes can be difficult to handle
White spaces can't be ignored in attributes
Attribute values are harder to search for in search engines
Attribute values often don't appear on the screen in editing tools
Attributes can be slightly more awkward to access in the processing API's
Attributes are unordered
3.2.1.3 Context
There is no golden rule about when to use attributes and when to use elements. In fact sometimes attributes should be used for one thing and in another context elements should be used for the same thing.
Here are some examples of how the encoding of an object can change just because the object is in a different context.
BOOKS
1. For an Insurance Companies list of property to be replaced or customs list of objects declared at the border, the only important thing is that the object is a book.
<BOOK TITLE="Mary's chance" Author="Sheldon, Sidney"/>
or
<OBJECT TYPE="Book" PRICE="$8">
Figure Books - Example 1
2. Encoding of a title inline where the title will be printed as part of the paragraph. The Authors name can be extracted for use in index lists etc.
<PARA>
I enjoyed the book <BOOK AUTHOR="Sheldon,
Sidney">Mary's chance</BOOK>.
</PARA>
Figure Books - Example 2
3. For an online bookstore, a library catalogue, the citation line of a quotation etc
<BOOK>
<TITLE>Mary's chance</TITLE>
<AUTHOR>Sheldon, Sidney</AUTHOR>
</BOOK>
Figure Books - Example 3
4. A totally different approach, using links to the author
so that the author information can be reused for other books.
<BOOK ID="The Call of the Wild">
<AUTHOR UNIT="Jack London"/>
</BOOK>
<PERSON ID="Jack London">
<FIRST>Jack</FIRST>
<LAST>London</LAST>
<PHONE>(206) 555-3423</PHONE>
<WORK UNIT="The Call of the Wild"/>
<WORK UNIT="Love those Wolves"/>
</PERSON>
<STORE ID="Walmart">
<CUSTOMER UNIT="Jack London"/>
...
</STORE>
Figure Books - Example 4
RECTANGLES
These are two examples of how differently you can model a rectangle depending on your needs.
1. "Minimalist" approach as seen in .
<RECTANGLE X="0" Y="0" WIDTH="0" HEIGHT="0"/>
Figure Example of minimalist approach for XML rectangle
2. Extravagant approach as seen in .
<RECTANGLE>
<ORIGIN><X>0</X><Y>0</Y></ORIGIN>
<SIZE><DX>7in</DX><DY>9in</DY></SIZE>
<LABEL>My Pretty Rectangle</LABEL>
<IMAGE>floral.jpg </IMAGE>
<BACKGROUND>gold</BACKGROUND>
<FOREGROUND>blue</FOREGROUND>
...
</RECTANGLE>
Figure Example of extravagant approach for XML rectangle
3.3 DTD vs. Schema
A DTD as well as a schema are defined as templates for or a specification of files of a specific extensible markup language. The DTD/Schema is used for validation of the XML files and they specify which elements are allowed in different contexts, how many times every element can or must occur and which data types are allowed for specific elements.
XML is case sensitive so any names and strings that appear in the DTD's or schemas need to be in the same case in the XML document for the XML document to be valid.
3.3.1 DTD Overview
Defining a markup language is not difficult. shows a simple DTD for an employee list and an example of an XML file written using this DTD is shown in . The example is very limited but it shows the basic elements of a DTD (Document Type Definition).
<!ELEMENT EMPLOYEELIST (EMPLOYEE+)>
<!ELEMENT EMPLOYEE (NAME, EMPLOYEENUMBER,
OFFICEPHONE?)>
<!ELEMENT NAME (FNAME, LNAME)>
<!ELEMENT FNAME (#PCDATA)>
<!ELEMENT LNAME (#PCDATA)>
<!ELEMENT EMPLOYEENUMBER (#PCDATA)>
<!ELEMENT OFFICEPHONE EMPTY>
<!ATTLIST PART AREA CDATA #REQUIRED
NUMBER CDATA #REQUIRED>
Figure Employee DTD
<EMPLOYEELIST>
<EMPLOYEE>
<NAME>
<FNAME>Burt</FNAME>
<LNAME>Reynolds</LNAME>
</NAME>
<EMPLOYEENUMBER>123</EMPLOYEENUMBER>
<OFFICEPHONE AREACODE="407" NUMBER="555-1212" />
</EMPLOYEE>
<EMPLOYEE>
<NAME>
<FNAME>Ronald</FNAME>
<LNAME>Reagan</LNAME>
</NAME>
<EMPLOYEENUMBER>456</EMPLOYEENUMBER>
<OFFICEPHONE AREACODE="407" NUMBER="555-2323" />
</EMPLOYEE>
<EMPLOYEE>
<NAME>
<FNAME>Pete</FNAME>
<LNAME>Sampras</LNAME>
</NAME>
<EMPLOYEENUMBER>789</EMPLOYEENUMBER>
<OFFICEPHONE AREACODE="407" NUMBER="555-3434" />
</EMPLOYEE>
</EMPLOYEELIST>
Figure Employee XML
This chapter will describe the most common elements of a DTD and the possible values for these elements as stated in W3C's draft.
3.3.1.1 <?XML>
The <?XML> element is used by the XML processor to determine how the file should be processed.
The <?XML> element has 7 different attributes and all of them are optional.
<?XML encoding="UTF-8"> indicates the character encoding used for this document. The possible values for character encoding are UTF-8, UTF-16, UCS-2, UCS-4, ISO8859, Shift-JIS, EUC-JIS, New-JIS. Not all of these have to be supported but the XML processor must support at least the UTF-8 or the UCS-2 encoding.
<?XML RMD="NONE"> tells the XML processor that it does not need to validate the XML file against its DTD. For this attribute the possible values are NONE, INTERNAL and ALL where ALL means that the entire DTD must be checked to validate the XML file.
The rest of the attributes; empty, notext, text, idinfo and default; are used to give a summary of the DTD for non-validating processors.
The <?XML> element is used in the XML document and not in the DTD.
3.3.1.2 <!DOCTYPE>
The <!DOCTYPE> element is used in the XML files to include markup definitions or pointers to DTDs.
The <!DOCTYPE> element contains the name of the element it refers to, usually the top element of the document.
<!DOCTYPE questions SYSTEM "Satml.dtd"> is an example of how to include the markup definitions stored in a DTD in the XML file, and <!DOCTYPE question [<!ELEMENT question (#PCDATA)>]> is an example of how to include inline markup definitions in the XML file.
3.3.1.3 <!-- COMMENT -->
Comments are used to make the XML document or the DTD more readable and to provide additional data. Comments are created in XML files or DTDs by surrounding the comments with <!-- and -->.
3.3.1.4 <!ELEMENT>
The <!ELEMENT> tag is used to define the possible elements in the document.
The syntax of the <!ELEMENT> tag is <!ELEMENT NAME (CONTENTS)>. The name has to be unique and the contents describe the child elements that are allowed within this element.
For example <!ELEMENT DATE (DAY, MONTH, YEAR)> means that the element DATE must contain a DAY element followed by a MONTH element which is followed by a YEAR element and no other elements are allowed inside the DATE element.
There are four types of restrictions on how many times an element can or must occur inside another element. The default occurrence value for an element is required, i.e. it must occur once and only once inside the parent element. A "?" after an element means that the element is optional. "+" means that the element can occur one or more times and "*" indicates that the elements can occur zero or more times.
If an element doesn't contain any child elements its content is described as #PCDATA or EMPTY. If an element can contain any mix of child elements and text the content is described as ANY.
#PCDATA means that you can include ordinary text without markup between the start and end tag of the element, while EMPTY means that the element has no contents and therefore only the start tag is required. (Note that the start tag ends with "/>" instead of ">" if the element is empty to indicate that the end tag will be omitted. shows a few variations of element declarations.
10
Table Examples of Element Declaration
|
Example
|
Definition
|
<!ELEMENT EMPLOYEE
(NAME, OFFICEHOURS, PHONE+)>
|
An employee element consists of a name element followed by a officehours element followed by one or more phone elements
|
<!ELEMENT EMPLOYEE
(NAME, OFFICEHOURS?, PHONE+)>
|
An employee element consists of a name element optionally followed by a officehours element and one or more phone elements
|
<!ELEMENT EMPLOYEE
(NAME, OFFICEHOURS*, PHONE+)>
|
An employee element consists of a name element followed by zero or more officehours element followed by one or more phone elements
|
<!ELEMENT EMPLOYEE
(NAME, OFFICEHOURS, (PHONE+|EMAIL*))>
|
An employee element consists of a name element followed by an officehours element followed by either one or more phone elements or zero or more email elements
|
|
<!ELEMENT EMPLOYEE ANY>
|
An employee element consists of any combination of elements and character data, in any order
|
<!ELEMENT EMPLOYEE
(((NAME, OFFICEHOURS) | (NAME, OFFICEHOURS)), PHONE?)>
|
An employee element consists of the elements name and officehours in any order followed by an optional phone element
|
3.3.1.5 <!ATTLIST>
<!ATTLIST> defines the attributes of the element. The syntax of the "attlist" tag is
<!ATTLIST NAME
ATTRIBUTENAME TYPE RESTRICTION
ATTRIBUTENAME TYPE RESTRICTION
...
>
Figure Syntax of Attlist
NAME refers to the element to which the attributes will be attached.
explains the different types of attributes that can be defined in a DTD.
Table Attlist attributes
|
Type
|
Example
|
Definition
|
|
String
|
<!ATTLIST PHONENUMBER
AREACODE CDATA ...>
|
The value of the areacode attribute can be any character string Ex. AREACODE="(407)"
|
|
Enumerated
|
<!ATTLIST OFFICEHOURS
DAY (m|tu|w|th|f) ...>
|
The value of the day attribute has to be one of the five indicated strings
Ex. DAY="m"
|
|
ID
|
<!ATTLIST EMPLOYEE
EMPLOYEENR ID ...>
|
The value of the employeenr attribute has to be unique in the document for the XML document to be valid. An ID string can begin with a letter, a "_" or a ":"
Ex. EMPLOYEENR="tf389"
|
|
IDREF
|
<!ATTLIST WORKORDER
EMPID IDREF ...>
|
The value of empid has to match a value assigned to an ID attribute somewhere in the document
Ex. EMPID="tf389"
|
|
IDREFS
|
<!ATTLIST WORKORDER
EMPIDS IDREFS ...>
|
The same as IDREF but IDREFS can contain more than one ID reference
Ex.
EMPIDS="tf1 tf2"
|
|
NMTOKEN
|
<!ATTLIST EMPLOYEE
PHONENR NMTOKEN ...>
|
A nmtoken is similar to a string, except that a nmtoken can only contain letters, digits and the following characters ".-_:", no white spaces are allowed
Ex.
PHONENR="555-1212"
|
|
NMTOKENS
|
<!ATTLIST EMPLOYEE
PHONENRS NMTOKENS ...>
|
The same as NMTOKEN but NMTOKENS can contain more than one NMTOKEN.
|
The <!ATTLIST> element also contains information about whether or not a value must be supplied for the attribute.
The three options are #REQUIRED, #IMPLIED and #FIXED.
<!ATTLIST EMPLOYEE NUMBER CDATA #REQUIRED> means that the attribute number has to be specified in the XML file.
Ex. <EMPLOYEE NUMBER="tf00389">Tess Ferrandez</EMPLOYEE> is valid but not <EMPLOYEE>Tess Ferrandez</EMPLOYEE>.
<!ATTLIST EMPLOYEE NUMBER CDATA #IMPLIED> means that the employee number is optional.
A fixed attribute means that the value of the attribute is specified and there can be no other values.
Ex. <!ATTLIST FRUIT EDIBLE CDATA #FIXED "YES">
3.3.1.6 <!ENTITY>
Entities are used as aliases for other text or external files. If you declare the <!ENTITY myGreeting "Sincerely - Teresa Ferrandez"> and insert the word &myGreeting; in the XML document, the word &myGreeting; would be replaced by "Sincerely - Teresa Ferrandez". You can also use external entities like <!ENTITY copyRightText SYSTEM "http://www.a.com/copyright.xml"> and the word ©RightText; in the XML file will be replaced by the contents of the file http://www.a.com/copyright.xml.
3.3.1.7 <!NOTATION>
The <!NOTATION> element allow XML documents to refer to external data which is not described in an XML format, in a consistent way.
The <!NOTATION> element specifies an application that can be used to process the information.
<!DOCTYPE person [
<!NOTATION DSIG SYSTEM
"http://www.acmepc.com/dsig.exe">
<!EMTITY sig SYSTEM "mysig.dsg" NDATA DSIG>
<!ELEMENT person (name)>
<!ATTLIST person signature ENTITY #REQUIRED>
<!ELEMENT name (#PCDATA)>
]>
<person signature = "sig">
<name>Sean McGarth</name>
</person>
Figure Example of the use of notations in XML
is a fragment of an XML file taken from the book XML by Example showing an example of how the notation element can be used.
3.3.2 Schema Overview
In January of 1998 Microsoft, Arbor Text, University of Edinburgh, Data Channel and the Inso Corporation submitted a draft for a variation of the XML DTD called XML-Data or XML-Schemas to the World Wide Web Consortium (W3C).
The W3C website states that the W3C was founded in October 1994 to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. Submittals are sent to the W3C, which works with the submitter to either recommend or reject the proposal. Proposals accepted by the W3C typically become Internet standards.
The XML-Schemas have a few major advantages over XML DTDs but the XML-Data proposal was never approved by the W3C. Instead a new working group was formed for a fusion between XML-Data and another similar and very popular format called Resource Description Framework (RDF).
RDF is a framework for metadata; it provides interoperability between applications that exchange machine-understandable information on the Web.
The new proposal has the name DCD (Document Content Description) and Textuality, Microsoft and IBM submitted it to the W3C in July of 1998 ].
The new DCD proposal is a superset of the XML-Data and it is designed to be conformant with RDF.
To define an eXtensible Markup Language for an employee list of the format shown in , an XML-Schema like the one shown in must be created.
<employeeList>
<employee title="research assistant">
<firstName>Teresa</firstName>
<lastName>Ferrandez</lastName>
<employeNumber>828-00-0398</employeeNumber>
</employee>
<employee title="doctor">
<firstName>Ron</firstName>
<lastName>Eaglin</lastName>
<employeNumber>555-00-1212</employeeNumber>
</employee>
</employeeList>
Figure Employee list XML file
11
<?XML version='1.0'?>
<s:schema id="employeeList">
<elementType id="firstName">
<string/>
</elementType>
<elementType id="lastName">
<string/>
</elementType>
<elementType id="employeeNumber">
<string/>
</elementType>
<elementType id="employee">
<element id="fn" type="#firstName">
<element id="ln" type="#lastName">
<element id="en" type="#employeeNumber">
<key id="k1">
<keyPart href="fn"/>
<keyPart href="ln"/>
<keyPart href="en"/>
</key>
<attribute name="title" default="manager">
</elementType>
</s:schema>
Figure Schema for an employee list XML
3.3.2.1 General Syntax
All DCDs are surrounded by a DCD element.
<DCD>
...
</DCD>
The DCD element defines the beginning and the end of the DCD. Inside the DCD element there can optionally be a <?DCD> element containing compiler directives, and there must be at leas one <ElementDef> element.
3.3.2.2 <ElementDef>
shows a fragment of a DCD for bank loans.
12
<ElementDef Type="Loan">
<Description>A Bank Loan</Description>
<Group RDF:Order="Seq">
<Element>InterestRate</Element>
<Element>Amount</Element>
<Element>Maturity</Element>
</Group>
</ElementDef>
<ElementDef Type="InterestRate" Datatype="float"/>
<ElementDef Type="Amount" Datatype="int"/>
<ElementDef Type="Maturity" Model="Data"
Datatype="dateTime"/>
Figure DCD for bank loans
The DCD defines the element <Loan> that contains a InterestRate, a Amount and a Maturity element in that order.
The attributes of the ElementDef element and their possible values are shown in .
Table ElementDef Attributes
|
Attribute
|
Possible values
|
Description
|
|
Type
|
A string that starts with a letter, _ or : continued by any series of letters, digits, ., _, or -
|
The type is the name of the element, ex. <title></title> has the type="title"
|
|
Model
|
Empty - No content
Any - Text and elements
Of any type
Data - Only text
Elements - Only Specific
Elements
Mixed - Text and specific
Elements
|
Indicates the possible content of this element
|
|
Contents
|
Open - may contain non
Specified Elements
Closed - may not contain
Unspecified
Elements
|
Indicates if this element can have child elements not stated in the group statement
|
|
Datatype
|
Can be string, int, number, dateTime etc. For full list see [1] the DCD submission to the W3C
|
Defines what data the element can contain
|
|
Attribute
|
Possible values
|
Description
|
|
Min, Max, MinExclusive, MaxExclusive
|
Numbers or strings depending on the data type of the contents
|
Upper or lower bounds on the contents, this attribute is only meaningful if the content model is data
|
|
Default
|
Any value that is of the same datatype as the contents of the element
|
The default value of the contents if none is proviede
|
|
Fixed
|
True or False
|
True means that other values than the default is allowed
|
|
Root
|
True or False
|
True means that this element may serve as the root element of this kind of document
|
The only required attribute is the type attribute, which has to be unique in the DCD.
3.3.2.3 <Element>
The <Element> element refers to a previously defined element, and it is used to define the structure of the XML document.
<ElementDef Type="Employee" Model="Elements">
<Element>FirstName</Element>
<Element>LastName</Element>
</ElementDef>
Figure ElementDef Example
In the example shown in above, the element Employee has to contain a FirstName and a LastName element in that order.
3.3.2.4 <AttributeDef>
An AttributeDef declares an attribute, which may be provided for one or more elements in the DCD.
An example of how an attribute definition can be used to define a first name attribute is shown in .
<DCD>
<?DCD syntax="explicit"?>
...
<AttributeDef Name="FirstName" DataType="string">
...
</DCD>
Figure AttributeDef Example
shows the attributes of the <AttributeDef> element and their possible values.
Table AttributeDef attributes
|
Attribute
|
Possible values
|
Description
|
|
Name
|
A string that starts with a letter, _ or : continued by any series of letters, digits, ., _, or -
|
The name of the attribute
|
|
Global
|
True, False
|
Indicates whether the Name property of this attribute must be unique in the DCD
|
|
Default
|
See Default for ElementDefs
|
|
|
Description
|
See Description for ElementDefs
|
|
|
Max, Min, MaxExclusive, MinExclusive
|
See Max, Min etc. for ElementDefs
|
|
|
Fixed
|
See Fixed for ElementDefs
|
|
|
ID-Role
|
ID, IDREF, IDREFS
|
Signals that the attribute has unique ID pointer sematics
|
|
|
|
|
|
Attribute
|
Possible values
|
Description
|
|
Id
|
Ex. "sizeAtt"
|
Attributes defined with Global="False" can be referred to in other element definitions in the DCD by this identifier.
|
|
Resource
|
A string referring to an attribute id in the document
Ex. "#sizeAtt"
|
Refers to the id of an attribute in the document.
|
|
Occurs
|
Required, Optional
|
Indicates whether the precence of the attribute is required
|
|
UniqueIn
|
Name of an element present in the DCD or null
|
Defines in what context the attribute is unique, an element name indicates that the attribute name is unique in that element, null indicates uniqueness in the whole document
|
|
Datatype
|
See Datatype for ElementDefs
|
|
3.3.2.5 <Attribute>
The <Attribute> element refers to an already defined Attribute. This element is used to provide elements with attributes.
<ElementDef Type="IMG">
<Attribute>BORDER</Attribute>
<Attribute>SRC</Attribute>
</ElementDef>
Figure Attribute Example
shows how to create an element IMG with the attributes BORDER and SRC. The attributes BORDER and SRC have to be defined with AttributeDefs
3.3.2.6 <InternalEntityDef> and <ExternalEntityDef>
The internal and external entity definitions provide alias for common sentences or texts like copyright notices.
3.3.2.7 <Description>
The <Description> element is an optional element provided to give extra information about elements and attributes for better readability.
3.3.2.8 <Group>
The order of the child elements and whether or not they are required is specified in the <Group> element.
The example in shows the usage of the group element.
<ElementDef Type="person" Model="Elements">
<Group Order="Seq">
<Element>FirstName</Element>
<Group Occurs="Optional">
<Element>MI</Element>
</Group>
<Element>LastName</Element>
</Group>
</ElementDef>
Figure Group Example
In this example the person element contains one FirstName element optionally followed by a MI element which is followed by a LastName element in that order.
The attributes for the group element are Occurs and Order and their values are shown in .
13
Table Group attributes
|
Attribute
|
Values
|
Description
|
|
Occurs
|
Required, Optional, OneOrMore, ZeroOrMore
|
Defines if the elements in the group are required or not and how many times they can occur within the parent element
|
|
Order
|
Seq, Alt
|
Defines if the order of the child elements is important or not
|
3.3.3 Conclusion
A DTD is a very short and concise definition of an XML, while a DCD is very verbose and not as restricted as the DTD.
The DCD not only defines the types and the format of the attributes but also the data types for the element information. The question is if this type definition really is necessary since most XML parsing applications need to check the format of the data anyway.
For SATML a DTD was used mainly because the normal DTD is a standard by the W3C while the Microsoft schema is still subject to change.
3.4 Rendering XML files using XSL
One of the design principles of XML is that the content and the formatting shall remain separate.
The XSL (Extensible Stylesheet Language) contains the formatting options for the chosen XML. One XML can have several XSL's for different purposes.
After the XSL document and the XML document is written an XSL processor combines the XML content with the XSL formatting to generate an output file that can be read by the browser or application for which the data is intended.
3.4.1 Overview of XSL
The Extensible Stylesheet language is designed for formatting XML data, in a similar way that CSS (Cascading Style Sheets) are designed for formatting of HTML documents. However XSL provides functionality beyond CSS.
XSL is based on the Document Style Semantics and Specification Language (DSSSL) standard, which is an International Standard for specifying document transformation and formatting.
Microsoft's web site states that XSL provides the following capabilities:
Formatting of source elements based on ancestry/descendery, position and uniqueness
The creation of formatting constructs, including generated text and graphics
The definition of reusable formatting macros
Writing-direction independent style sheets
Extensible set of formatting objects
The basic element of a XSL document is a construction rule, which contains a pattern and an action. The pattern describes an element in the XML file and the action describes what should happen to the output file when this pattern is found.
contains three types of XSL rules each with their patterns and actions.
<xsl>
<rule>
<root/>
<HTML>
<BODY>
<children/>
</BODY>
</HTML>
</rule>
<rule>
<target-element type="orders"/>
<DIV font-size="14pt" font-family="serif">
<children/>
</DIV>
</rule>
<style-rule>
<target-element type="customer"/>
<apply fornt-weight="bold"/>
</style-rule>
</xsl>
Figure Example of XSL
The second rule in the example above indicates that whenever the XSL-processor finds an <orders> element its contents should be given the font serif 14pt. The resulting output file will contain the following row:
<DIV font-size="14pt" font-family="serif"> ... </DIV>.
3.4.1.1 Rule types
Every XSL document has to have one and only one root rule, which defines the basic structure of the output document.
<rule>
<root/>
<HTML>
<BODY>
<children/>
</BODY>
</HTML>
</rule>
Figure The root rule
The root rule in encloses the content of the XML file with the basic start and end tags of an HTML document.
If the user does not specify a root rule, a built in root rule will kick in, but the built in root rule does not have any formatting information.
Construction rules have one or more patterns followed by one or more actions that should occur when all the patterns are matched.
The third kind of rule is the style-rule which contains one or more patterns followed by an apply element of the following sort <apply font-style="italic" color="red">. The style is then merged with the other formatting provided by the construction rules.
3.4.1.2 Rule hierarchy
If an element causes more than one rule to fire, the rule with the most specific pattern will file in most cases. The following nine criteria are checked to determine which rule is more specific.
1. The pattern with the highest importance value (set by the importance attribute).
2. The pattern with the greater number of id attributes.
3. The pattern with the greater number of class attributes.
4. The pattern with the greater number of <element> or <target-element> elements having a type attribute.
5. The pattern with fewer wildcards
6. The pattern with the higher specified priority
7. The pattern with the higher number of only qualifiers
8. The pattern with the higher number of position qualifiers
9. The pattern with the higher number of attributes specified
3.4.1.3 The pattern
Every pattern contains one <target-element> element with the following syntax <target-element type="name"> where name is the name of the element in the XML file that the rule should apply to.
The pattern <target-element type="A"> indicates that the rule applies to all <A> elements, but the pattern can also be more specified like the pattern in .
<element type="B">
<element type="C">
<target-element="A"/>
</element>
</element>
Figure A more specific pattern
A pattern can also include attributes, wildcards and qualifiers.
3.4.1.3.1 Attributes
You can include attributes in patterns in two ways, one method is by specifying the value the attribute should have for the rule to fire like in .
<!-- pattern matches <IMG width="300"> elements -->
<target-element type="IMG">
<attribute name="width" value="300"/>
</target-element>
<!-- pattern matches immediate children to <TABLE
border="0"> elements -->
<element type="TABLE">
<attribute name="border" value="0"/>
<target-element/>
</element>
Figure Specifying attribute values
The second way to use attributes in the pattern is to make the rule fire based on the presence of the attribute value <attribute name="A" has-value="yes">.
You can also use the value of the attribute in the action part of the rule as shown in .
<album width="300">
<photograph image-url="barbie.gif"/>
</album>
<rule>
<target-element type="photograph"/>
<IMG SRC='=getAttribute("image-url")'
width='=parent.getAttribute("width")'/>
</rule>
Figure Retreiving attributes
In the example above the resulting line in the output document will be <IMG SRC="barbie.gif" width="300"/>.
3.4.1.3.2 Wildcards
Wildcards can be used in patterns to create more general rules like in .
Figure Examples of wildcards
In example A, the rule applies to any element that is an immediate child of element A. In example B the rule applies to any child of element A. In example C the rule applies to any immediate children of the immediate children of element A.
3.4.1.3.3 Qualifiers
Qualifiers are used to apply special formatting to the first or last element in a group through the position attribute.
The pattern in causes its rule to fire only when it finds the first occurrence of element A inside an element B.
<element type=″B″>
<target-element type=″A″ position=″first-of-type″>
</element>
Figure first-of-type
The other possible values for position are: last-of-type, first-of-any and last-of-any. The values first-of-any and last-of-any causes the rule to fire only if element A is the first child element of element B.
The <target-element> element can also have an only attribute with the following values, of-any or of-type. The argument only="of-any" will cause the rule to fire only if the target element is the only child element of its parent.
3.4.1.4 The action
The action of a rule defines the formatting elements that will be applied to the element in the output file. Any HTML code or CSS constructs can be use in the action part of the rule.
The action part can also be used to reorder the elements, insert scripts and dynamic HTML. In the action part you can also evaluate expressions such as childNumber(this) which returns the order of this element inside the parent element.
3.4.2 Possibilities and limitations
Separating the content from the formatting enables the creator of XML files to present the information based on the users preferences. The possibilities of this are endless. In the case of SATML question files, the questions can be displayed differently for students and for instructors without changing the content. The questions can be shown with or without answers, with or without hints. They can be displayed as a list of titles or with their full content, the user can choose to only see multiple choice questions or questions from a specific category only by using different XSLs.
The limitations of using XML with XSL mostly consist in that very few Internet browsers are able to read XML and XSL files and XSL lacks some capability when it comes to scripting, however XSL is still being developed these capabilities will probably be added.
3.4.3 Possible uses for XSL's with SATML
An XML viewer is in the process of being developed to show SATML assignments using XSL and Java. XSL can also be used to display the other SATML files and it can give the student a choice of how he/she wants the information to be displayed.
3.5 XML and security issues
XML can currently not be used to deliver security critical information to the user because XML files are plain ASCII files without encryption and therefore, anyone can go in and view the full content of the XML files at any time. Restricting the user from viewing the source of the XML files causes the user to be unable to view any part of the XML file.
This is a problem when creating assignments in XML since the students are perfectly able to go in and view the answers of the questions before even taking the quiz or solving the problems provided in the assignment.
There are XML working groups that are working on ways to restrict the user from seeing particular parts of the XML files, in the case of the question files this would apply to the answers, but as of October 1998 nothing has been decided in this matter.
14
CHAPTER4 development of satml
The development of the SATML file format was done in four steps. The first step was to review similar initiatives done in SGML and other formats. This step was followed by a deeper look into the IMS specification and the suggested information that should be saved. After finding out what information was needed to make a resource IMS compliant, the next step was to review a few largely used distance learning tools to find out what kind of information they stored. The first draft of the SATML language was then developed. After creating tools to import and export the information to databases a few modifications were done to the file format due to some fields not being practical. In the last step the final version of the SATML file format was developed and submitted to the IMS Question Interoperability group for review.
This chapter explains the steps taken towards the development of the SATML file format in greater detail.
4.1 Review of similar XML initiatives
The desire to share instructional data is not new and there have been several attempts to solve this problem in
15
the past. This chapter describes a few of these initiatives and their advantages and disadvantages.
4.1.1 QML (Question Markup Languge)
4.1.1.1 Developers of QML
QML is copyrighted by the Tekamah Corporation ] and the information for this chapter was taken from their web site about the .
4.1.1.2 Description
QML is a question markup language for multiple choice questions.
4.1.1.3 Structure
shows the structure of a QML file.
Figure QML Structure
4.1.1.3.1 Elements
QUESTION
The question element is the outermost element of the file.
Atts: SHUFFLE (Opt.) can have the values yes or no. Yes means that the answers should come in random order when displayed.
VISALTS (Opt.) defines the number of visible alternatives. It is possible to provide distractors (i.e. incorrect answers) to a question, and when the question is rendered the display application chooses one correct answer and VISALTS-1 incorrect answers to display. If the instructor gives 6 answers and chooses to only show 4, the question can be rendered in 4*(6!/(3!+3!)) = 240 different ways.
NUMDIS (Opt.) defines the number of distractors that the display application should show. This attribute is useful if the instructor wants to randomize the questions even more by adding one or more additional correct answers. If the user provides 2 correct answers and 5 incorrect answers, setting VISALTS to 4 and NUMDIS to 3 would mean that the question can be rendered in (4*3)*(6!/(4!+2!)) = 480 different ways.
SubElements: The Question element contains an zero or more keywords, followed by the stem (statement) of the question, one or more answers and an optional general feedback.
KEYWORD
An optional number of keywords can be provided for a question to enable instructors to find questions that relate to a specific subject.
Atts: None
SubElements: None
STEM
The stem of the question is the actual question text.
Atts: None
SubElements: None
ALT
An alternative can be either a correct answer or an incorrect answer.
Atts: The possible values of the CORRECT (Req.) attribute are yes and no, where yes means that this is the correct alternative. There can be one or more correct alternatives and one or more incorrect alternatives but the user is only able to choose one alternative when answering the question.
SubElements: The alternative is composed by an answer element containing the answer text and an optional ELEMENTS feedback element giving specific feedback to the user if this alternative is chosen.
ANSWER
The answer element contains answer text for one of the alternatives
Atts: None
SubElements: None
FEEDBACK
Feedback is an optional text that should be displayed to the user when a specific answer is chosen or when the question is answered.
Atts: None
SubElements: None
4.1.1.4 Examples
shows an example of a multiple-choice question created in QML. The question will be displayed with four of the seven alternatives and 3 of the visible alternatives will be incorrect answers.
<QUESTION SHUFFLE="yes" VISALTS="4" NUMDIS="3">
<KEYWORD>Government</KEYWORD>
<KEYWORD>History</KEYWORD>
<STEM>
What is the name of the president of the United
States?
</STEM>
<ALT CORRECT="no">
<ANSWER>George Washington</ANSWER>
<FEEDBACK>Incorrect!!!</FEEDBACK>
</ALT>
<ALT CORRECT="no">
<ANSWER>George Bush</ANSWER>
<FEEDBACK>Incorrect!!!</FEEDBACK>
</ALT>
<ALT CORRECT="no">
<ANSWER>Ronald Reagan</ANSWER>
<FEEDBACK>Incorrect!!!</FEEDBACK>
</ALT>
<ALT CORRECT="yes">
<ANSWER>Bill Clinton</ANSWER>
<FEEDBACK>Correct!!!</FEEDBACK>
</ALT>
<ALT CORRECT="yes">
<ANSWER>William Clinton</ANSWER>
<FEEDBACK>Correct!!!</FEEDBACK>
</ALT>
<ALT CORRECT="no">
<ANSWER>Abe Lincoln</ANSWER>
<FEEDBACK>Incorrect!!!</FEEDBACK>
</ALT>
<FEEDBACK>
The most recent American presidents were Bill
Clinton, George Bush and Ronald Reagan.
</FEEDBACK>
</QUESTION>
Figure QML Example
4.1.1.5 Advantages and Disadvantages
A big disadvantage of QML is that the only questions that can be modeled are multiple-choice questions. The advantages of QML is that QML allows instructors to provide feedback and the instructor can also control how and if the answers should be randomized.
4.1.2 QML (Quiz Markup Language)
4.1.2.1 Developers of QML
Dr. Robert Bamberger ], Christopher Shorey ] and Richard Simpkinsson ] developed QML at the Washington State University.
4.1.2.2 Description
QML is a markup language designed to quickly and easily create quizzes in Asymetrix Toolbook for use as applications or as internet based documents.
QML is SGML like and some of the elements in QML are not compliant to the XML standard.
4.1.2.3 Structure
The Quiz Markup Language is different from other examples in its syntax. This review only focuses on the content of the Markup Language and for simplicity the Quiz Markup Language has been converted to XML. shows the structure of this XML.
16
Figure QML Structure
The structure for the Quiz Markup Language is based on information found on the web site.
4.1.2.3.1 Elements
QUIZ
Quiz is the outermost tag allowed.
Atts: None
SubElements: The QUIZ element contains optional elements, FEEDBACKCORRECT, FEEDBACKINCORRECT, FEEDBACKPARTCORRECT, followed by one or more question PAGEs, and a number of optional elements including
TITLE, SUBTITLE, RANDOM, TRIES, MULTICORRECT, MAXSCORE, AUTORESET, AUTOLOCKANSWERS, DELAYEDFEEDBACK, and SCORED.
FEEDBACKCORRECT
Feedbackcorrect gives feedback if the student answers a question correctly. This element can reside inside the QUIZ element to give a global feedback on correct answers or inside a QUESTION tag to give feedback if the specific question is answered correctly.
Atts: None
SubElements: None
FEEDBACKINCORRECT
Feedback incorrect gives feedback if the student answers a question incorrectly. This element can reside inside the QUIZ element to give a global feedback on incorrect answers or inside a QUESTION tag to give feedback if the specific question is answered incorrectly.
Atts: None
SubElements: None
FEEDBACKPARTCORRECT
Feedbackpartcorrect gives feedback if the student answers a question partly correctly. This element can reside inside the QUIZ element to give a global feedback on partly correct answers or inside a QUESTION tag to give feedback if the specific question is answered partly correctly.
Atts: None
SubElements: None
PAGE
The page element defines what type of page to use and it allows for customization of individual pages.
Atts: STYLE (opt.) defines what type of page to use. The possible values are INTRO (introduction page with title, subtitle and intro textfield), MENU (menu page), SUM (summary page) and QUEST (question pages).
The attributes SCOREBUTTON (opt.), SCOREFIELD (opt.), FEEDBACKFIELD (opt.) and NAVBUTTONS (opt.) define whether the corresponding objects should be defined for the questions on this page.
NAME (req.) is the name of the question.
The attributes TEXTFIELD and INTROTEXTFIELD define whether the corresponding objects should be included on the page.
SubElements: A page can contain one or more QUESTIONs.
TITLE
The title element gives a global name for all the questions in the quiz. If the value of this element contains #N#, for example Question #N#, the #N# will be replaced by numbers running from 1 to n where n is the number of questions in the quiz.
Atts: None
SubElements: None
SUBTITLE
The subtitle element gives a global subtitle for all the question pages in the quiz. If the value of this element contains #N#, for example Section #N#, the #N# will be replaced by numbers running from 1 to n where n is the number of question pages in the quiz.
Atts: None
SubElements: None
RANDOM
Random globally defines whether the answers to the questions should be randomized or not. This element can be overridden by a questions random attribute.
Atts: None
SubElements: None
TRIES
The TRIES element globally sets the number of times the student can change his answer while on that page. This value will be reset once the student leaves the page.
This element can be overridden by the question tries attribute.
Atts: None
SubElements: None
MULTICORRECT
The MULTICORRECT element globally defines whether the multiple-choice questions in the quiz can have more than one correct answer.
This element can be overridden by the question multicorrect attribute.
Atts: None
SubElements: None
MAXSCORE
The MAXSCORE element globally defines the maximum score for the questions in the quiz. This element can be overridden by the question maxscore attribute.
Atts: None
SubElements: None
AUTORESET
The AUTORESET element globally defines whether the question widget on a page should be automatically reset. The possible values for this element is; True (the question will be reset both on the entry and exit of the page), False (the question will never be reset), enterPage (the question will be reset on the entry of the page), leavePage (the question will be reset when the page is left), ONCE (the question will be reset once). This element can be overridden by the question autoreset attribute.
Atts: None
SubElements: None
AUTO