Sunday, May 22, 2011

XML: Recipe Document (part 3)

Now that we know what kind of information belongs to a recipe, and we have an XSD schema, we can test if it meets our needs in being able to represent a recipe. I saved the XSD from the previous post as recipe.xsd and put it in a folder in which I created the file we will work on today, first_recipe.xml. For this example, I decided to use toast as my recipe. Now, let's start with the basics, open up a text editor and add the following:


This tells our xml file that this recipe is going to follow the definition of a recipe according to the xsd schema file. Now that we have the root element and we have the definition, let's build up the metadata element of the toast recipe.


Nothing here is really surprising, I made sure to use a public domain image, just so everyone knows. All right, the next element. You will notice that my ingredients element looks different than the definition from the previous post, and this is because I discovered some shortcomings in its design yesterday, so let's take a look at the new schema for an ingredient, and then the ingredients for toast:


Ok, notice the attribute element in the new definition. An attribute is a value that is described inside the actual opening tag instead of being defined between the tags. The attribute is the same as the previously defined optional element, same idea, but now it is an attribute. Also, notice that instead of this design being an xsd:sequence, it is now an xsd:all. Sequence forces order; all does not, but when using all, elements may not repeat inside of the complexType. It just so happens that this works for my purposes anyway since all the values have a minimum of 0 (now a minimum of 1) and a maximum of 1. So now let's look at the list of ingredients for toast:


Again, this is pretty straight forward, the recipe requires one slice of bread, and optionally, you may have some butter. The optional tag is itself optional, so we could omit it, for example from the bread, and maybe assume that means that the bread is not optional. Alternatively, we might omit it from the butter, and assume that its absence implies that is optional. In my opinion, since most ingredients are not optional in most recipes, the assumption should be that if the tag is omitted, that it is a required ingredient. Next, is the tiny list of equipment this recipe requires:


Yep, a toaster. Nothing extraordinary to point out about that. Next, we need to look at the procedure node, which it itself remains unchanged, but the definition of the step node has changed, so lets look at the schema for the step node:


Notice, I turned number and image into attributes, also notice that changed the type from sequence to choice, and that choice has minimum and maximum occurrence; This means I can choose to have as many occurrences as I want of first ingredients, then temperatures, followed again by more if I choose. Going back to the discussion on context free grammars and languages from before, we could represent this situation as saying that we want our language to have the following tokens, and the following rules:

TokenCan Become
STEPCHOICE
CHOICECHOICE CHOICE | INGREDIENT | TEMPERATURE | T
INGREDIENTNAME QTY UNIT | NAME UNIT QTY | QTY NAME UNIT | QTY UNIT NAME | UNIT NAME QTY | UNIT QTY NAME
NAMET name T
QTYT quantity T
UNITT unitofmeasure T
TEMPERATUREunsigned integer
Tany string, including the empty string

In this example, all terminating tokens are written in lowercase, and any expandable tokens are written in uppercase, and it can be shown now that any reasonable ordering of elements is possible, such as saying "Set oven to <temperature>300</temperature> degrees and prepare <ingredient>[...]</ingredient>," or "Put <ingredient>[...]</ingredient> into oven when it pre-heats to <temperature>300</temperature> degrees," or indeed any arbitrary ordering and quantity of either set of elements. Now, let's get back to looking at the procedure element:


Notice that the optional tag, which is optional again, was only used on step number 5; the other steps are implied to be mandatory for this recipe, so it isn't necessary to include the optional tag for each of them. Also, most of these steps seem easy to read and write, save for the first step that appears off-puttingly verbose. Realize this: For almost all desired purposes, XML is machine generated. Right now, we are testing the flexibility of the design, so we are more hands on then usually necessary. It does not make it easy to type recipe steps that might include several ingredients right now, (consider what it would look like in a recipe that said "mix flour, eggs, milk, and sugar in a bowl.") But this inconvenience is not a concern, because generally, as said earlier, a program makes an xml document automatically for people, and these fields are also important for demarcating important chunks of information. Say, for example, you decide you would like to write a program that converts a recipe with an arbitrary number of servings to a recipe for specifically 1 or 2 servings? The computer immediately knowing where to look for quantities in the procedure will make writing this software a snap. Or you could use it to convert between metric and imperial measurements, for example, and that is just the tip of the iceberg.

Aside from that, now that we have completely written the recipe for toast, let's take a look at the complete xml document for a toast recipe:


If you are using a text editor with special features, you can probably validate the xml document inside of your editor. Try using Notepad++ and then downloading the xmltools from the built in plugin manager. Then you can do Plugins | XML Tools | Validate Now, and it will tell you if the recipe has any format errors. There are other tools that can work as well. For reference, the current modification of the XSD after the modifications made today is as follows:


Incremental improvements are normal going through the development process, much in the same way I discover every once and again a better way to organize the schema. Now to take a break.

No comments:

Post a Comment