Saturday, May 21, 2011

XML: Recipe Document (part 2)

With the diagram now established it is a lot easier to begin working on the schema document, the XSD file. A schema file basically tells what the expectation for a document are supposed to be, what the names of particular tags are for example and what values are allowed for particular tags. For this part of the project, I am kinda shaky, and I depended a lot on XML Schema Part 0: Primer, a guide to the basic syntax of an XSD document. So, check out the diagram from part 1, and then go on to the next paragraph.

First thing to do is take a look at the tree diagram. A tree diagram starts at the root (like a tree) and from there goes to branches and leaves (like a tree.) My tree has the root on the left side, and the leaves on the right side. The root is the "Recipe" box. Now let's set up the xsd document.

First open a text editor and add the following to our document:


Now, what does this all say? The first tag says this is an XMLSchema; an XSD file, with more information about it with all the tag definitions at the given url. The next tag is an annotation, with a documentation tag. The documentation tag tells us it is written in English, and gives us an idea of what to expect from the rest of the document. This is similar to a comment. After that, we have the matching closing tags and the like to completely close the XSD. Much like XML, XSD needs matching opening or closing tags, or a tag marked especially as being empty instead of a pair by typing a '/' before the last angle bracket. Ok, so that was easy, what next?

We see that the root is a recipe, and that a recipe has, or may contain a metadata section, a list of ingredients, a list of equipment, and a procedure. Because of the large complexity of the Recipe type and some of the children it contains, the recipe should be defined as a complexType, that is, it is made out of a bunch of small parts. For people familiar with OO it is similar to stating that you are creating an object, as opposed to a simple type. So, let's define the Recipe type, and what it contains now.


All right, this says that our defined document is going to have recipe tags, and that the recipe tags are going to require a metadata section and an ingredients section, because there needs to be a minimum of 1 of each, and a max of 1 of each. The metadata is a custom type, but the Ingredients is a set of custom types; a list of singular Ingredient nodes. the equipment and procedure sections are optional, because some recipes don't require anything but the ingredients, and some recipes don't even have steps. (In the case of eating raw foods for example.) Now let's define the Metadata Type.


A few new things. Name is notably defined as a string. In most languages, a string means a group of letters, numbers, and various other characters, as it also does in XML. so we can now type a name like "Cherry Pie" as the name of the recipe. author is also a string, but it can have many 0 or many (unbounded) occurrences. So you can add as many author tags as you like to a single recipe. Servings is defined as a positive integer, that is any counting number starting at one (no decimal places; this might be a weakness in some people's eyes, because some recipes sometimes don't make an even amount of servings. Let's pretend they all do.) the mealtime value is a construct called an enumeration, meaning it can only be one of any of the specifically listed values. You might notice that if that is the case, this design does not let us have recipes for brunch items, or for tea time, a flaw certainly, but there is a benefit in terms of searching and indexing of data. When someone wants to create a recipe, they might decide to create a "dinner" recipe. Someone else might decide to create a "supper" recipe. Semantically, they are the same thing to you and I, but to a computer it is very difficult to understand that they might be the same. After a while, if you find you are too limited by the range of options in the enumeration, you can always just add the extra line in there for Party time recipes if you so desire. Preparation time is a duration type, which is a way of writing how long something takes, exactly what it says, and the last one image is a URI. We can use a URI for storing the web address of an image of what the completed recipe might look like. Although it is possible to embed an image into the xml directly it is not exactly the best possible choice for my purposes.

Next, we will define the ingredient node. The ingredient has a number, in case the order it is used in is important, it has a name, a quantity, a unit of measure, and a value indicating whether or not the ingredient is optional. The ingredient node definition looks like the following:


Interesting things to point out about the definition of an ingredient are the quantity and the optional fields. Quantity is a double type. a double is a double precision floating point number, or a number that allows a decimal, that is, a rational number. Optional is a boolean value. This means it can either be true or false, or if you like, 1 or 0. Also, note that the body of the Ingredient field has the attribute mixed = "true" this means that one can type free flowing text around the fields.

Finally, we define the Step type nodes, which also needs the mixed attribute so that we can organize text around the important parts of an ingredient and temperature. This gives us the following:


No big new surprises here, except it is interesting to note that we can reuse the defined ingredient type from before inside of the free form text. This is a demonstration of reusability, an important idea in software engineering that we should design things to be reused as often as possible. It saves time, and makes fixing potential problems easier, so it is nice to see that XSD files allow this.

Surprisingly, I never knew about this mixed tag until reading the documentation on how to make an XSD file today. Cool stuff. Here is the complete, head to tail version of the XSD file now that it is completely written. Note that I fixed a few mistakes from previous sections as I noticed things were not needed anymore, like getting rid of the ingredient number and so on:


Whew, fun stuff.

No comments:

Post a Comment