ICS 125: XML
Overview
- Why was XML invented?
- What is XML?
- XML Concepts
- XML Example
- XML DTDs
- XML Parsing
- Related Standards
- XML Resources
Why was XML Invented?
- Before XML, every program had its own file format that was
totally unlike any other file format
- Since everything was totally different, there was no good way to
briefly describe the file formats
- Since everything was totally different, there was no good way to
build widely reusable utilities that work on them
- The format of network messages sent between programs was also
totally different for every program
What is XML?
- XML stands for eXtensible Markable Language
- XML is a standard way to represent any kind of data:
- The low-level syntax always the same
- The higher-level syntactic structures are described in another file (a DTD)
- XML looks like HTML, but you can define your own tags in a DTD
- XML is about representing data, not at all about how it looks on the screen
- Since XML is standard and widely usable, people have made widely
usable tools and libraries that process XML: many XML parsers are
available for all programming languages
- A DTD is like a grammar specification file
XML Concepts
- XML File: a data file, uses ASCII or other human readable characters
- DTD declaration: each XML file can name the DTD that defines it's structure:
- <?xml version="1.0" encoding="ISO-8859-1"?>
- <!DOCTYPE note SYSTEM "note.dtd">
- Element:
- Elements have a name
- Start-tag and end-tag with something in between: <elementname>something...</elementname>
- Elements can be nested: <elementname><nestedelement>something...</nestedelement></elementname>
- Every element has an end-tag. An element with no content can
be written as <elementname></elementname> or as
<elementname />
- Attribute: a key-value pair within an opening tag. E.g.,
- <elementname attr="value" attr2="value" />
- CDATA: text that is not elements. Can only occur between begin and end tags that allow CDATA
XML Example
XML DTDs (Document Type Definitions)
- A DTD is like a grammar the defines the legal format of an XML
file. The DTD tells a parser how to handle the XML file, and how to
determine if it is valid.
- Each XML file can start with a DOCTYPE line that refers to the
DTD file, or the DTD can be directly defined inside the XML file, or
an XML file may not specify a DTD.
- <!ELEMENT element-name category>
- <!ELEMENT element-name EMPTY>
- <!ELEMENT element-name ANY>
- <!ELEMENT element-name (ele1,ele2+,ele3*,ele4?,(ele5|ele6))>
- <!ELEMENT element-name (#PCDATA|ele1)*)>
- PCDATA: Parsed character data, it may include markup
- CDATA: Non-parsed character data, any markup will be treated as
literal characters
- <!ATTLIST element-name attribute-name attribute-type default-value)>
- <!ATTLIST element-name attribute-name CDATA "default")>
- <!ATTLIST element-name attribute-name (enum1|enum2|...) enum1)>
- <!ATTLIST element-name attribute-name ID #REQUIRED>
- <!ATTLIST element-name attribute-name IDREF #REQUIRED>
- <!ATTLIST element-name attribute-name CDATA #IMPLIED>
- <!ATTLIST element-name attribute-name CDATA #FIXED "default">
- A "well-formed" XML document is one that has matching tags,
quoted attributes, and fits general XML syntax rules.
- A "valid" XML document is one that follows the rules defined in
it's DTD
XML Parsing
- There are two main types of XML parsers
- SAX, SAX2: Parse an XML document and trigger events for your
application to process on-the-fly
- DOM: Parse an XML document and build a tree datastructure in
memory that your application can process in any order
- XML Parsers can have validation turned on or off
Related Standards
- XHTML: a more well-defined version of HTML
- XSL: XML Style Language, can transform one XML document to another
- XML Schema: an alternative to DTDs
- XPath: allows reference to specific tags in a document via a path
- XForms: replacement for HTML forms
XML Resources
sample use cases templateexample software testing plan templateProject plan template