Example of Reading XML Files into R

I’m learning how to get and clean data, and this is my first time playing with XML files in R programming. In the following code, I succeeded in downloading an XML file, parsing it, accessing it’s variables and nodes, and converting it to a list and a data frame.

In the beginning, I was hoping to be able to use the xmlTreeParse() function with the URL of the sample file as one of the arguments, but, unfortunately, I encountered some sort of network error. Instead, I had to download the XML file to my computer and then parse the local file.

Some of the commands are experimental, and the resulting output is useless. Just imagine you’re me, learning a new language, and with every new function, you ask yourself, “What would happen if I typed a command like this?”

 

> library(XML)
> fileURL <- “http://www.w3schools.com/xml/simple.xml”
> fileURL

[1] “http://www.w3schools.com/xml/simple.xml”
> dir.create(“sampledata”)
> download.file(fileURL,”sampledata/breakfastmenu.xml”)
trying URL ‘http://www.w3schools.com/xml/simple.xml’
Content type ‘text/xml’ length 1152 bytes
==================================================
downloaded 1152 bytes
> doc <- xmlInternalTreeParse(“sampledata/breakfastmenu.xml”)
> rootNode <- xmlRoot(doc)
> xmlName(rootNode)
[1] “breakfast_menu”
> names(rootNode)
food food food food food
“food” “food” “food” “food” “food”
> names(doc)
NULL
> xmlName(doc)
Error in UseMethod(“xmlName”, node) :
no applicable method for ‘xmlName’ applied to an object of class “c(‘XMLInternalDocument’, ‘XMLAbstractDocument’)”
> rootNode
<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>Light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<price>$8.95</price>
<description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
<calories>900</calories>
</food>
<food>
<name>French Toast</name>
<price>$4.50</price>
<description>Thick slices made from our homemade sourdough bread</description>
<calories>600</calories>
</food>
<food>
<name>Homestyle Breakfast</name>
<price>$6.95</price>
<description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
<calories>950</calories>
</food>
</breakfast_menu>
> rootNode[[1]]
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
> rootNode[[2]]
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>Light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
> rootNode[[1]][[1]]
<name>Belgian Waffles</name>
> rootNode[[1]][[2]]
<price>$5.95</price>
> rootNode[[2]][[1]]
<name>Strawberry Belgian Waffles</name>
> class(rootNode)
[1] “XMLInternalElementNode” “XMLInternalNode” “XMLAbstractNode”
> class(rootNode[[1]][[2]])
[1] “XMLInternalElementNode” “XMLInternalNode” “XMLAbstractNode”
> str(rootNode)
Classes ‘XMLInternalElementNode’, ‘XMLInternalNode’, ‘XMLAbstractNode’ <externalptr>
> head(rootNode)
$food
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>attr(,”class”)
[1] “XMLInternalNodeList” “XMLNodeList”
> rootNode[1]
$food
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>attr(,”class”)
[1] “XMLInternalNodeList” “XMLNodeList”
> xmlSApply(rootNode,xmlValue)
food
“Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650”
food
“Strawberry Belgian Waffles$7.95Light Belgian waffles covered with strawberries and whipped cream900”
food
“Berry-Berry Belgian Waffles$8.95Light Belgian waffles covered with an assortment of fresh berries and whipped cream900”
food
“French Toast$4.50Thick slices made from our homemade sourdough bread600”
food
“Homestyle Breakfast$6.95Two eggs, bacon or sausage, toast, and our ever-popular hash browns950”
> xpathSApply(rootNode,”//name”,xmlValue)
[1] “Belgian Waffles” “Strawberry Belgian Waffles” “Berry-Berry Belgian Waffles”
[4] “French Toast” “Homestyle Breakfast”
> xpathSApply(rootNode,”/name”,xmlValue)
list()
> xpathSApply(rootNode,”/calories”,xmlValue)
list()
> xpathSApply(rootNode,”//calories”,xmlValue)
[1] “650” “900” “900” “600” “950”
> menusample <- xmlToList(rootNode)
> menusample
$food
$food$name
[1] “Belgian Waffles”
$food$price
[1] “$5.95”
$food$description
[1] “Two of our famous Belgian Waffles with plenty of real maple syrup”$food$calories
[1] “650”$food
$food$name
[1] “Strawberry Belgian Waffles”$food$price
[1] “$7.95”$food$description
[1] “Light Belgian waffles covered with strawberries and whipped cream”

$food$calories

[1] “900”

$food
$food$name
[1] “Berry-Berry Belgian Waffles”

$food$price
[1] “$8.95”

$food$description
[1] “Light Belgian waffles covered with an assortment of fresh berries and whipped cream”

$food$calories
[1] “900”

$food
$food$name
[1] “French Toast”

$food$price
[1] “$4.50”

$food$description
[1] “Thick slices made from our homemade sourdough bread”

$food$calories
[1] “600”

$food
$food$name
[1] “Homestyle Breakfast”

$food$price
[1] “$6.95”

$food$description
[1] “Two eggs, bacon or sausage, toast, and our ever-popular hash browns”

$food$calories
[1] “950”

> menusample <- xmlToList(doc)
> menusample
$food
$food$name
[1] “Belgian Waffles”

$food$price
[1] “$5.95”

$food$description
[1] “Two of our famous Belgian Waffles with plenty of real maple syrup”

$food$calories
[1] “650”

$food
$food$name
[1] “Strawberry Belgian Waffles”

$food$price
[1] “$7.95”

$food$description
[1] “Light Belgian waffles covered with strawberries and whipped cream”

$food$calories
[1] “900”

$food
$food$name
[1] “Berry-Berry Belgian Waffles”

$food$price
[1] “$8.95”

$food$description
[1] “Light Belgian waffles covered with an assortment of fresh berries and whipped cream”

$food$calories
[1] “900”

$food
$food$name
[1] “French Toast”

$food$price
[1] “$4.50”

$food$description
[1] “Thick slices made from our homemade sourdough bread”

$food$calories
[1] “600”

$food
$food$name
[1] “Homestyle Breakfast”

$food$price
[1] “$6.95”

$food$description
[1] “Two eggs, bacon or sausage, toast, and our ever-popular hash browns”

$food$calories
[1] “950”

> menusample <- xmlToDataFrame(doc)
> menusample
name price description calories
1 Belgian Waffles $5.95 Two of our famous Belgian Waffles with plenty of real maple syrup 650
2 Strawberry Belgian Waffles $7.95 Light Belgian waffles covered with strawberries and whipped cream 900
3 Berry-Berry Belgian Waffles $8.95 Light Belgian waffles covered with an assortment of fresh berries and whipped cream 900
4 French Toast $4.50 Thick slices made from our homemade sourdough bread 600
5 Homestyle Breakfast $6.95 Two eggs, bacon or sausage, toast, and our ever-popular hash browns 950

Wolfie

Wolfie lives moment to moment seeking to make life more wonderful for all. She is passionate about people, animals, nature, and health, and she helps others express their creativity and live in harmony.