Howto Parse Excel-XML-Files
Since I wanted to parse an Excel-document and did not count with the namespaces, I ran into trouble and after several hours of wasting time I wanted to share the quite simple solution.
Well, the main goal/sense of namespaces is, to group certain classes or objects under a defined name. So for example we have in Excel the namespace “ss”:
Just by the way, namespaces are used to be splitted with a colon from the class-description, for example “ss:Data”
<ss:Data&rt;Hello World>/ss:Data>
So how could we work now with some more complex ways of such structures?
<Row ss:Height=”15.75″>
<Cell ss:StyleID=”s79″>
<NamedCell ss:Name=”Print_Area”/>
</Cell>
<Cell ss:StyleID=”s102″>
<ss:Data ss:Type=”String” xmlns=”http://www.w3.org/TR/REC-html40″>
<Font html:Size=”11″>
Cell-Title
</Font></ss:Data>
<NamedCell ss:Name=”Print_Area”/>
</Cell>
<Row>
As we can see now, we have got the namespace “ss” – but how can we access now its sub-nodes?
Since it is a namespace it has got a namespace URL, which can be found somewhere in the document defined ![]()
Used to be on the first lines, but I would not rely on it…
Well, now we do resolve the namespace URL, as we can see it in our Workbook-Node:
<Workbook xmlns=”urn:schemas-microsoft-com:office:spreadsheet”
xmlns:o=”urn:schemas-microsoft-com:office:office”
xmlns:x=”urn:schemas-microsoft-com:office:excel”
xmlns:ss=”urn:schemas-microsoft-com:office:spreadsheet”
xmlns:html=”http://www.w3.org/TR/REC-html40″>
So we can see: the namespace “ss” has got the namespace URL “urn:schemas-microsoft-com:office:spreadsheet”
What we simply have to do now: tell our program (for example written in a PHP file) to use this URL:
$xmlobj = simplexml_load_file(.’xmlFile.xml’);
$xmlobj->registerXPathNamespace(’ss’, ‘urn:schemas-microsoft-com:office:spreadsheet’);
If we want to access ALL namespaces with a class "Data" - it could look like:
$ss = $cell->xpath('//ss:Data');
print_r($ss);
We could also access each cell on its own - there are better ways, but for this example its all fine:
foreach ($xmlobj->Worksheet->Table->Row as $row)
{foreach ($row->Cell as $cell)
{foreach ($cell->children('ss',true)->Data as $data)
{// Accessing the children of Data and the very first object
$data = $data->children()->Font[0];}
}
}



