5 TinyXML is a simple, small, C++ XML parser that can be easily
6 integrated into other programs.
8 <h2> What it does. </h2>
10 In brief, TinyXML parses an XML document, and builds from that a
11 Document Object Model (DOM) that can be read, modified, and saved.
13 XML stands for "eXtensible Markup Language." It allows you to create
14 your own document markups. Where HTML does a very good job of marking
15 documents for browsers, XML allows you to define any kind of document
16 markup, for example a document that describes a "to do" list for an
17 organizer application. XML is a very structured and convenient format.
18 All those random file formats created to store application data can
19 all be replaced with XML. One parser for everything.
21 The best place for the complete, correct, and quite frankly hard to
22 read spec is at <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
23 http://www.w3.org/TR/2004/REC-xml-20040204/</a>. An intro to XML
24 (that I really like) can be found at
25 <a href="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial</a>.
27 There are different ways to access and interact with XML data.
28 TinyXML uses a Document Object Model (DOM), meaning the XML data is parsed
29 into a C++ objects that can be browsed and manipulated, and then
30 written to disk or another output stream. You can also construct an XML document
31 from scratch with C++ objects and write this to disk or another output
34 TinyXML is designed to be easy and fast to learn. It is two headers
35 and four cpp files. Simply add these to your project and off you go.
36 There is an example file - xmltest.cpp - to get you started.
38 TinyXML is released under the ZLib license,
39 so you can use it in open source or commercial code. The details
40 of the license are at the top of every source file.
42 TinyXML attempts to be a flexible parser, but with truly correct and
43 compliant XML output. TinyXML should compile on any reasonably C++
44 compliant system. It does not rely on exceptions or RTTI. It can be
45 compiled with or without STL support. TinyXML fully supports
46 the UTF-8 encoding, and the first 64k character entities.
49 <h2> What it doesn't do. </h2>
51 TinyXML doesn't parse or use DTDs (Document Type Definitions) or XSLs
52 (eXtensible Stylesheet Language.) There are other parsers out there
53 (check out www.sourceforge.org, search for XML) that are much more fully
54 featured. But they are also much bigger, take longer to set up in
55 your project, have a higher learning curve, and often have a more
56 restrictive license. If you are working with browsers or have more
57 complete XML needs, TinyXML is not the parser for you.
59 The following DTD syntax will not parse at this time in TinyXML:
63 <!ELEMENT Comment (#PCDATA)>
67 because TinyXML sees this as a !DOCTYPE node with an illegally
68 embedded !ELEMENT node. This may be addressed in the future.
72 For the impatient, here is a tutorial to get you going. A great way to get started,
73 but it is worth your time to read this (very short) manual completely.
77 <h2> Code Status. </h2>
79 TinyXML is mature, tested code. It is very stable. If you find
80 bugs, please file a bug report on the sourceforge web site
81 (www.sourceforge.net/projects/tinyxml). We'll get them straightened
82 out as soon as possible.
84 There are some areas of improvement; please check sourceforge if you are
85 interested in working on TinyXML.
87 <h2> Related Projects </h2>
89 TinyXML projects you may find useful! (Descriptions provided by the projects.)
92 <li> <b>TinyXPath</b> (http://tinyxpath.sourceforge.net). TinyXPath is a small footprint
93 XPath syntax decoder, written in C++.</li>
94 <li> <b>TinyXML++</b> (http://code.google.com/p/ticpp/). TinyXML++ is a completely new
95 interface to TinyXML that uses MANY of the C++ strengths. Templates,
96 exceptions, and much better error handling.</li>
103 TinyXML can be compiled to use or not use STL. When using STL, TinyXML
104 uses the std::string class, and fully supports std::istream, std::ostream,
105 operator<<, and operator>>. Many API methods have both 'const char*' and
106 'const std::string&' forms.
108 When STL support is compiled out, no STL files are included whatsoever. All
109 the string classes are implemented by TinyXML itself. API methods
110 all use the 'const char*' form for input.
112 Use the compile time #define:
116 to compile one version or the other. This can be passed by the compiler,
117 or set as the first line of "tinyxml.h".
119 Note: If compiling the test code in Linux, setting the environment
120 variable TINYXML_USE_STL=YES/NO will control STL compilation. In the
121 Windows project file, STL and non STL targets are provided. In your project,
122 It's probably easiest to add the line "#define TIXML_USE_STL" as the first
127 TinyXML supports UTF-8 allowing to manipulate XML files in any language. TinyXML
128 also supports "legacy mode" - the encoding used before UTF-8 support and
129 probably best described as "extended ascii".
131 Normally, TinyXML will try to detect the correct encoding and use it. However,
132 by setting the value of TIXML_DEFAULT_ENCODING in the header file, TinyXML
133 can be forced to always use one encoding.
135 TinyXML will assume Legacy Mode until one of the following occurs:
137 <li> If the non-standard but common "UTF-8 lead bytes" (0xef 0xbb 0xbf)
138 begin the file or data stream, TinyXML will read it as UTF-8. </li>
139 <li> If the declaration tag is read, and it has an encoding="UTF-8", then
140 TinyXML will read it as UTF-8. </li>
141 <li> If the declaration tag is read, and it has no encoding specified, then TinyXML will
142 read it as UTF-8. </li>
143 <li> If the declaration tag is read, and it has an encoding="something else", then TinyXML
144 will read it as Legacy Mode. In legacy mode, TinyXML will work as it did before. It's
145 not clear what that mode does exactly, but old content should keep working.</li>
146 <li> Until one of the above criteria is met, TinyXML runs in Legacy Mode.</li>
149 What happens if the encoding is incorrectly set or detected? TinyXML will try
150 to read and pass through text seen as improperly encoded. You may get some strange results or
151 mangled characters. You may want to force TinyXML to the correct mode.
153 You may force TinyXML to Legacy Mode by using LoadFile( TIXML_ENCODING_LEGACY ) or
154 LoadFile( filename, TIXML_ENCODING_LEGACY ). You may force it to use legacy mode all
155 the time by setting TIXML_DEFAULT_ENCODING = TIXML_ENCODING_LEGACY. Likewise, you may
156 force it to TIXML_ENCODING_UTF8 with the same technique.
158 For English users, using English XML, UTF-8 is the same as low-ASCII. You
159 don't need to be aware of UTF-8 or change your code in any way. You can think
160 of UTF-8 as a "superset" of ASCII.
162 UTF-8 is not a double byte format - but it is a standard encoding of Unicode!
163 TinyXML does not use or directly support wchar, TCHAR, or Microsoft's _UNICODE at this time.
164 It is common to see the term "Unicode" improperly refer to UTF-16, a wide byte encoding
165 of unicode. This is a source of confusion.
167 For "high-ascii" languages - everything not English, pretty much - TinyXML can
168 handle all languages, at the same time, as long as the XML is encoded
169 in UTF-8. That can be a little tricky, older programs and operating systems
170 tend to use the "default" or "traditional" code page. Many apps (and almost all
171 modern ones) can output UTF-8, but older or stubborn (or just broken) ones
172 still output text in the default code page.
174 For example, Japanese systems traditionally use SHIFT-JIS encoding.
175 Text encoded as SHIFT-JIS can not be read by TinyXML.
176 A good text editor can import SHIFT-JIS and then save as UTF-8.
178 The <a href="http://skew.org/xml/tutorial/">Skew.org link</a> does a great
179 job covering the encoding issue.
181 The test file "utf8test.xml" is an XML containing English, Spanish, Russian,
182 and Simplified Chinese. (Hopefully they are translated correctly). The file
183 "utf8test.gif" is a screen capture of the XML file, rendered in IE. Note that
184 if you don't have the correct fonts (Simplified Chinese or Russian) on your
185 system, you won't see output that matches the GIF file even if you can parse
186 it correctly. Also note that (at least on my Windows machine) console output
187 is in a Western code page, so that Print() or printf() cannot correctly display
188 the file. This is not a bug in TinyXML - just an OS issue. No data is lost or
189 destroyed by TinyXML. The console just doesn't render UTF-8.
193 TinyXML recognizes the pre-defined "character entities", meaning special
204 These are recognized when the XML document is read, and translated to there
205 UTF-8 equivalents. For instance, text with the XML of:
211 will have the Value() of "Far & Away" when queried from the TiXmlText object,
212 and will be written back to the XML stream/file as an ampersand. Older versions
213 of TinyXML "preserved" character entities, but the newer versions will translate
214 them into characters.
216 Additionally, any character can be specified by its Unicode code point:
217 The syntax " " or " " are both to the non-breaking space characher.
220 TinyXML can print output in several different ways that all have strengths and limitations.
222 - Print( FILE* ). Output to a std-C stream, which includes all C files as well as stdout.
223 - "Pretty prints", but you don't have control over printing options.
224 - The output is streamed directly to the FILE object, so there is no memory overhead
226 - used by Print() and SaveFile()
228 - operator<<. Output to a c++ stream.
229 - Integrates with standart C++ iostreams.
230 - Outputs in "network printing" mode without line breaks. Good for network transmission
231 and moving XML between C++ objects, but hard for a human to read.
233 - TiXmlPrinter. Output to a std::string or memory buffer.
234 - API is less concise
235 - Future printing options will be put here.
236 - Printing may change slightly in future versions as it is refined and expanded.
239 With TIXML_USE_STL on TinyXML supports C++ streams (operator <<,>>) streams as well
240 as C (FILE*) streams. There are some differences that you may need to be aware of.
244 - the Print() and SaveFile() methods
246 Generates formatted output, with plenty of white space, intended to be as
247 human-readable as possible. They are very fast, and tolerant of ill formed
248 XML documents. For example, an XML document that contains 2 root elements
249 and 2 declarations, will still print.
253 - the Parse() and LoadFile() methods
255 A fast, tolerant read. Use whenever you don't need the C++ streams.
258 - based on std::ostream
261 Generates condensed output, intended for network transmission rather than
262 readability. Depending on your system's implementation of the ostream class,
263 these may be somewhat slower. (Or may not.) Not tolerant of ill formed XML:
264 a document should contain the correct one root element. Additional root level
265 elements will not be streamed out.
268 - based on std::istream
271 Reads XML from a stream, making it useful for network transmission. The tricky
272 part is knowing when the XML document is complete, since there will almost
273 certainly be other data in the stream. TinyXML will assume the XML data is
274 complete after it reads the root element. Put another way, documents that
275 are ill-constructed with more than one root element will not read correctly.
276 Also note that operator>> is somewhat slower than Parse, due to both
277 implementation of the STL and limitations of TinyXML.
279 <h3> White space </h3>
280 The world simply does not agree on whether white space should be kept, or condensed.
281 For example, pretend the '_' is a space, and look at "Hello____world". HTML, and
282 at least some XML parsers, will interpret this as "Hello_world". They condense white
283 space. Some XML parsers do not, and will leave it as "Hello____world". (Remember
284 to keep pretending the _ is a space.) Others suggest that __Hello___world__ should become
287 It's an issue that hasn't been resolved to my satisfaction. TinyXML supports the
288 first 2 approaches. Call TiXmlBase::SetCondenseWhiteSpace( bool ) to set the desired behavior.
289 The default is to condense white space.
291 If you change the default, you should call TiXmlBase::SetCondenseWhiteSpace( bool )
292 before making any calls to Parse XML data, and I don't recommend changing it after
298 Where browsing an XML document in a robust way, it is important to check
299 for null returns from method calls. An error safe implementation can
300 generate a lot of code like:
303 TiXmlElement* root = document.FirstChildElement( "Document" );
306 TiXmlElement* element = root->FirstChildElement( "Element" );
309 TiXmlElement* child = element->FirstChildElement( "Child" );
312 TiXmlElement* child2 = child->NextSiblingElement( "Child" );
315 // Finally do something useful.
318 Handles have been introduced to clean this up. Using the TiXmlHandle class,
319 the previous code reduces to:
322 TiXmlHandle docHandle( &document );
323 TiXmlElement* child2 = docHandle.FirstChild( "Document" ).FirstChild( "Element" ).Child( "Child", 1 ).ToElement();
326 // do something useful
329 Which is much easier to deal with. See TiXmlHandle for more information.
332 <h3> Row and Column tracking </h3>
333 Being able to track nodes and attributes back to their origin location
334 in source files can be very important for some applications. Additionally,
335 knowing where parsing errors occured in the original source can be very
338 TinyXML can tracks the row and column origin of all nodes and attributes
339 in a text file. The TiXmlBase::Row() and TiXmlBase::Column() methods return
340 the origin of the node in the source text. The correct tabs can be
341 configured in TiXmlDocument::SetTabSize().
344 <h2> Using and Installing </h2>
346 To Compile and Run xmltest:
348 A Linux Makefile and a Windows Visual C++ .dsw file is provided.
349 Simply compile and run. It will write the file demotest.xml to your
350 disk and generate output on the screen. It also tests walking the
351 DOM by printing out the number of nodes found using different
354 The Linux makefile is very generic and runs on many systems - it
355 is currently tested on mingw and
356 MacOSX. You do not need to run 'make depend'. The dependecies have been
359 <h3>Windows project file for VC6</h3>
361 <li>tinyxml: tinyxml library, non-STL </li>
362 <li>tinyxmlSTL: tinyxml library, STL </li>
363 <li>tinyXmlTest: test app, non-STL </li>
364 <li>tinyXmlTestSTL: test app, STL </li>
368 At the top of the makefile you can set:
370 PROFILE, DEBUG, and TINYXML_USE_STL. Details (such that they are) are in
373 In the tinyxml directory, type "make clean" then "make". The executable
374 file 'xmltest' will be created.
378 <h3>To Use in an Application:</h3>
380 Add tinyxml.cpp, tinyxml.h, tinyxmlerror.cpp, tinyxmlparser.cpp, tinystr.cpp, and tinystr.h to your
381 project or make file. That's it! It should compile on any reasonably
382 compliant C++ system. You do not need to enable exceptions or
386 <h2> How TinyXML works. </h2>
388 An example is probably the best way to go. Take:
390 <?xml version="1.0" standalone=no>
391 <!-- Our to do list data -->
393 <Item priority="1"> Go to the <bold>Toy store!</bold></Item>
394 <Item priority="2"> Do bills</Item>
398 Its not much of a To Do list, but it will do. To read this file
399 (say "demo.xml") you would create a document, and parse it in:
401 TiXmlDocument doc( "demo.xml" );
405 And its ready to go. Now lets look at some lines and how they
409 <?xml version="1.0" standalone=no>
412 The first line is a declaration, and gets turned into the
413 TiXmlDeclaration class. It will be the first child of the
416 This is the only directive/special tag parsed by TinyXML.
417 Generally directive tags are stored in TiXmlUnknown so the
418 commands wont be lost when it is saved back to disk.
421 <!-- Our to do list data -->
424 A comment. Will become a TiXmlComment object.
430 The "ToDo" tag defines a TiXmlElement object. This one does not have
431 any attributes, but does contain 2 other elements.
437 Creates another TiXmlElement which is a child of the "ToDo" element.
438 This element has 1 attribute, with the name "priority" and the value
445 A TiXmlText. This is a leaf node and cannot contain other nodes.
446 It is a child of the "Item" TiXmlElement.
453 Another TiXmlElement, this one a child of the "Item" element.
457 Looking at the entire object tree, you end up with:
459 TiXmlDocument "demo.xml"
460 TiXmlDeclaration "version='1.0'" "standalone=no"
461 TiXmlComment " Our to do list data"
463 TiXmlElement "Item" Attribtutes: priority = 1
464 TiXmlText "Go to the "
466 TiXmlText "Toy store!"
467 TiXmlElement "Item" Attributes: priority=2
471 <h2> Documentation </h2>
473 The documentation is build with Doxygen, using the 'dox'
478 TinyXML is released under the zlib license:
480 This software is provided 'as-is', without any express or implied
481 warranty. In no event will the authors be held liable for any
482 damages arising from the use of this software.
484 Permission is granted to anyone to use this software for any
485 purpose, including commercial applications, and to alter it and
486 redistribute it freely, subject to the following restrictions:
488 1. The origin of this software must not be misrepresented; you must
489 not claim that you wrote the original software. If you use this
490 software in a product, an acknowledgment in the product documentation
491 would be appreciated but is not required.
493 2. Altered source versions must be plainly marked as such, and
494 must not be misrepresented as being the original software.
496 3. This notice may not be removed or altered from any source
499 <h2> References </h2>
501 The World Wide Web Consortium is the definitive standard body for
502 XML, and their web pages contain huge amounts of information.
504 The definitive spec: <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
505 http://www.w3.org/TR/2004/REC-xml-20040204/</a>
507 I also recommend "XML Pocket Reference" by Robert Eckstein and published by
508 OReilly...the book that got the whole thing started.
510 <h2> Contributors, Contacts, and a Brief History </h2>
512 Thanks very much to everyone who sends suggestions, bugs, ideas, and
513 encouragement. It all helps, and makes this project fun. A special thanks
514 to the contributors on the web pages that keep it lively.
516 So many people have sent in bugs and ideas, that rather than list here
517 we try to give credit due in the "changes.txt" file.
519 TinyXML was originally written by Lee Thomason. (Often the "I" still
520 in the documentation.) Lee reviews changes and releases new versions,
521 with the help of Yves Berquin, Andrew Ellerton, and the tinyXml community.
523 We appreciate your suggestions, and would love to know if you
524 use TinyXML. Hopefully you will enjoy it and find it useful.
525 Please post questions, comments, file bugs, or contact us at:
527 www.sourceforge.net/projects/tinyxml
529 Lee Thomason, Yves Berquin, Andrew Ellerton