An Accessible Introduction to XML
By Sue Smith
XML is one of the most commonly used technologies in web and software development. In this article, I will introduce XML in principle and in practice, giving you a sound grasp of the basics. We will explore what XML is, how it is used and what it is used for. XML is an accessible topic even for people with little or no experience in programming or coding in general. If you have any knowledge of HTML, you will have some familiarity with what we cover here. (More on how HTML and XML are related in a moment.) Along the way, we will illuminate some of the terminology you will come across when dealing with XML.
XML is a markup language for modeling structured data. You'll see what this means as we work through some examples. For the moment, the key principle to understand is that XML is for storing data. Although it is a markup language like HTML, the content of an XML document is not a webpage, it is data, just like the content of a database. XML is therefore typically combined with other technologies before being built into something that a user would see and interact with, such as a webpage or a screen in an application user interface.
XML documents hold items of data in markup structures, as in the following example element:
<book>Pride and Prejudice</book>
The data item in this element is the string of text: "Pride and Prejudice". The element marks this data item between an opening tag and a closing tag. The closing tag is the same as the opening tag except for the forward slash before the element name. The element name can be any text string using numbers, letters and some characters. An element name cannot contain spaces. In order for an XML data store to be used successfully, it must adhere to a set of basic syntactic rules, which we will touch on later.
If you have worked with HTML before, the element structure should look familiar. In some cases, particularly if XHTML is being used, HTML is actually a form of XML:
<div>Here is the content of a webpage element.</div>
In this case, the data is the content of the
div element, which a web browser will interpret as a section of the visible page. In HTML, the data items in the elements are the various parts of the webpage, such as text, images etc. In XML, the data items may be used in many different ways, depending on the application context.
XML is called "extensible" markup language because application developers are free to define their own tags, elements, structures and even new markup languages. If you consider HTML, it uses a finite number of predetermined tags, which developers cannot deviate from. With XML, you can design the tags to suit the needs of your project. You can optionally create a set of rules regarding the structures you want the XML data in your project to use – we will look at this later.
The primary structure in an XML document is an element. As we have seen, an element contains data. If, as in this example, the element only contains a single item of data such as a text string, it is described as simple:
This could appear in the payroll records for a company. We will use several examples of the possible ways to model such records in XML throughout this article.
An element in XML can also contain other elements:
<department> <employee>Mary Smith</employee> <employee>John Mitchell</employee> </department>
department element contains two
employee elements. In this case
department is the parent element, while each
employee is a child element. Both
employee elements are also considered to be descendants of the
The child elements could also contain additional child elements of their own:
<department> <employee> <first_name>Mary</first_name> <last_name>Smith</last_name> </employee> <employee> <first_name>John</first_name> <last_name>Mitchell</last_name> </employee> </department>
The important point to grasp here is that an element in XML can have one or more children, but only one immediate parent. This is why XML is said to be tree–structured. As with database development, it is up to the developer to decide how best to reflect the data using the structures available. However, with XML it can be easier to extend the structures over time without having to rewrite the whole system.
There is one more key feature in XML markup – the attribute. Any element in XML can have one or more attributes added to its opening tag:
<employee type="permanent"> <first_name>Mary</first_name> <last_name>Smith</last_name> </employee>
Attributes are used to provide additional data for an element. It is again up to the developer to decide whether a particular data item should be stored as an element or an attribute. For example, the above code could alternatively be represented as follows:
<employee> <type>permanent</type> <first_name>Mary</first_name> <last_name>Smith</last_name> </employee>
When an element contains child elements or attributes (or both) it is described as a complex element.
This freedom to use whatever structures you decide are best for your data is a vital aspect of XML usage. As long as an XML document abides by the general syntax rules, the elements and attributes can be chosen freely and the results will be usable with any technology that can handle XML (almost every programming language). In fact, the element and attribute names are themselves part of the data - specifically the metadata. Metadata describes the data being stored, i.e. the content of the elements and attributes.
The basic syntax rules for XML are pretty straightforward:
- Elements must be closed and nested properly
- Element and attribute names are case sensitive
- Attribute values must appear between quotes.
Here is an example of some XML that is syntactically incorrect as it breaks all of these rules – see if you can spot four errors:
<employee type=permanent> <first_name> Mary <first_name> <last_name> Smith </Employee> </last_name>
Before we move on, a final rule to note is that an XML element can be self–closing as in the following alternative representation of the "employee" element:
<employee type="permanent" first_name="Mary" last_name="Smith" />
/> at the end of the opening tag acts as the closing tag, which means that no separate closing tag is needed.
The basic building blocks in an XML document are not complicated (as you have seen), but there are a few final points to note. First, the document must have a root element, which is
<company> <department name="Human Resources"> <employee type="permanent"> <first_name>Mary</first_name> <last_name>Smith</last_name> </employee> <employee type="permanent"> <first_name>John</first_name> <last_name>Mitchell</last_name> </employee> <employee> <first_name type="contract">John</first_name> <last_name>Mitchell</last_name> </employee> </department> <department name="Sales"> <employee type="contract"> <first_name>Mary</first_name> <last_name>Smith</last_name> </employee> <employee type="permanent"> <first_name>John</first_name> <last_name>Mitchell</last_name> </employee> </department> </company>
The root element must appear only once in the document, containing all of the other elements inside it. As we've mentioned, each element has a single parent element. Elements with the same parent are therefore said to be siblings. The two
department elements above are siblings.
XML documents can optionally begin with an XML declaration, which is not necessary but can be helpful in certain contexts:
This special type of tag, known as a directive, does not need to be closed.
You may have come across the term "semantic" with reference to web technologies such as XML and HTML5. This is a topic many people find confusing, partly because it is interpreted in so many different ways. However, in essence the semantic idea is not a complex one – we have in fact already touched on the principle in this article. Remember that the element names in the payroll XML described the data that each element held. This is because the names we picked are meaningful: "employee" and "department". XML provides the ability to build a level of meaning into your code structures – this is why it is described as semantic.
While the basic items in an XML document are relatively simple, the code can become pretty challenging to read and work with when multiple complex elements are combined into large structures. For this reason, it's worth considering validation before attempting to use any XML documents you have created. There are many free online and software-based validators available, including the W3C XML Schema Validator.
With validation you can check your markup for general XML syntax errors or alternatively check that your XML meets more specific requirements. Such requirements, known as schemas, are laid out using something called an XML Schema (XSD) or Document Type Definition (DTD).
(If you've used HTML you may have come across DTDs before, in the form of the DocType definition. If you aren't familiar with what this critical part of your HTML document is all about, you should check out our Introduction to DocTypes.)
Using an XML Schema, you can define which elements, attributes and structures are permissible within compliant XML documents. This can be useful when XML is used in conjunction with other applications or within a single application with lots of different components. XSDs are actually themselves written in XML, so they are quick and easy to create.
One of the most common web applications for XML is in RSS (Really Simple Syndication) feeds. These feeds, commonly provided by blogs and news sites, provide an XML representation of recently-posted content. RSS, like XHTML, is just a special type of XML with a pre-defined set of tags. In a future article, we'll show how you can use jQuery's simple XML tools to display content from an RSS feed.
If you plan on learning either web development or programming for desktop environments, XML familiarity is a good tool to have in your box. XML data is quickly and easily built, does not in itself require programming skills and is readily transferred between applications – which is possible because of the shared set of structural rules that are understood in each different context. All of this combined with the ability to define your own structures in XML makes a data storage and transfer model that is useful for a wide range of projects.