EDITING BOARD
RO
EN
×
▼ BROWSE ISSUES ▼
Issue 22

Getting started with OpenXML

Florentina Suciu
Software engineer
@Fortech



PROGRAMMING

In this article, we are trying to draw a basic map to programmatically manipulate xlsx files using Office Xml library. Many applications require working with excel files, either for reading and importing data from it, or for exporting data into reports, so it is important to know how to programmatically manipulate excel files.

Since 2007, Excel files have completely changed their internal structure. Xls was a proprietary binary file format, whereas xlsx is an Xml Based-format, called Office Open Xml (OOXML).

Excel as zip file

An xlsx file is a zip package containing an xml file for each major part of an Excel file (sheets, styles, charts, pivot tables). If you want to check the contents of an xlsx, all you have to do is to change the extension of the file from xlsx to zip and then unarchive it.

Excel files components

A spreadsheet document contains a central WorkbookPart and separate parts for each worksheet. To create a valid document, you must put together 5 elements, Workbook, WorksheetPart, Worksheet, Sheet, SheetData.

The primary task of a WorkbookPart is to keep track of the worksheets, global settings and the shared components of the Workbook. The document needs to contain at least one Worksheet that is defined inside a WorksheetPart. A worksheet has three main sections:

  • The Sheet, declared in the Workbook, contains the properties such as name, an id used for sorting the sheets and a relationship id that connects it to the WorksheetPart;
  • The SheetData containing the actual data;
  • A part for supporting features such as protection and filtering.
Figure 2 - Components of a Spreadsheet Document

OpenXml library

All the classes needed to manipulate an xlsx file can be found in Open Xml SDK. Here is a simple example of applying a sum on a data column.

using (SpreadsheetDocument document = 
  SpreadsheetDocument.Create(path, 
  SpreadsheetDocumentType.Workbook))
{
    var workbookPart = document. 
      AddNewPart();
 
 workbookPart.Workbook = new Workbook();
 var worksheetPart = document. 
   AddNewPart();
   
 // create sheet data
 var sheetData = worksheetPart.Worksheet.
   AppendChild(new SheetData());

 // create a row and add a data to it
  sheetData.AppendChild(new Row(new Cell() { 
    CellValue = new CellValue("5"), 
    DataType = CellValues.Number }));
 
  sheetData.AppendChild(new Row(new Cell() { 
   CellValue = new CellValue("3"),
   DataType = CellValues.Number }));
    
  sheetData.AppendChild(new Row(new Cell() { 
   CellValue =  new CellValue("65"), 
   DataType = CellValues.Number }));
   
  sheetData.AppendChild(new Row(new Cell() { 
   CellFormula = new CellFormula("=SUM(A1:A3)"),
   DataType = CellValues.Number }));

   // save the worksheet
   worksheetPart.Worksheet.Save();

   // create the sheet properties
   var sheetsCount = document.WorkbookPart.Workbook.
     Sheets.Count() + 100;
   
   document.WorkbookPart.Workbook.Sheets.
     AppendChild(new Sheet()
 {
       
 Id = document.WorkbookPart.
   GetIdOfPart(worksheetPart),
   
 SheetId = (uint)document.WorkbookPart.Workbook.
   Sheets.Count() + 1,
   Name = "MyFirstSheet"
 });
 // save the workbook
 document.WorkbookPart.Workbook.Save();
}

Creating a Pivot Table

A pivot table is a table used for data summarization, that can automatically sort, count or apply average on the data stored in a data table.

A pivot table needs a source data table. We will assume that we already have the data table, in a sheet called "DataSheet".

A pivot table has 4 main parts: WorksheetPart, PivotTablePart, PivotTableCacheDefinitionPart and PivotCacheRecordsPart. Also, we need to instantiate a list of PivotCaches, with one PivotCache child. In the following images, you can see the "map" of a pivot table.

Figure 4 - Components needed for creating a Pivot Table

var pivotWorksheetPart = document.WorkbookPart.
  AddNewPart();

pivotWorksheetPart.Worksheet = new Worksheet();
var pivotTablePart = pivotWorksheetPart. 
  AddNewPart();
var pivotTableCacheDefinitionPart = pivotTablePart.
  AddNewPart();

document.WorkbookPart.AddPart(
  pivotTableCacheDefinitionPart);

var pivotTableCacheRecordsPart = 
  pivotTableCacheDefinitionPart.
  AddNewPart();

var pivotCaches = new PivotCaches();
pivotCaches.AppendChild(new PivotCache()
  {
    CacheId = pivotCacheId,
    Id = document.WorkbookPart.
      GetIdOfPart(pivotTableCacheDefinitionPart)
  });

document.WorkbookPart.Workbook.AppendChild(pivotCaches);
Figure 4 - Components needed for creating a Pivot Table

The PivotTablePart describes the layout. Its child, the PivotTableDefinition stores the location of the table and the PivotFields. There are two kinds of PivotFields: RowFields and DataFields.

  • RowFields are static data and their corresponding PivotField has the Axis property set to "AxisRow";
  • DataFields are calculated data (like totals) and their corresponding PivotField has the property DataField set to true.

The pivot table definition needs also to know the id of the PivotCache we defined above.

In the pivot table definition you can also specify the format in which you want to display the table. These can be: Compact (set compact flag to true), Outline (set the Outline flag to true), or the Tabular format (set the GridDropZones flag to true).

The PivotTableCacheDefinitionPart with its child PivotCacheDefinition, defines the cache fields. We need to declare a cache field for each column in the table. It also contains the cache source type (as SourceValues.Worksheet) and the worksheet source.

The PivotCacheRecordsPart needs only to be defined and appended, this part being automatically populated with the cached values of the table.

Applying Conditional Formatting

Now, let"s see how to apply some conditional formatting on the data, that is to format and highlight some cells based on their values.

In order to do that, you need to define two things.

First, you need to define the styles that you want to apply to the highlighted cells, mainly the fonts and colors. The styles are declared in the Stylesheet of the workbook part.

Next, you need to define the rules with the help of the ConditionalFormatting object that has as a child a ConditionalFormattingRule object. Below you can see an example, where we apply a conditional formatting for the cells having a value less than 3.

var pivotWorksheetPart =  
  document.WorkbookPart.AddNewPart();

pivotWorksheetPart.Worksheet = new Worksheet();
var pivotTablePart = pivotWorksheetPart.AddNewPart();
var pivotTableCacheDefinitionPart = 
  pivotTablePart.
  AddNewPart();

document.WorkbookPart.
  AddPart(pivotTableCacheDefinitionPart);

var pivotTableCacheRecordsPart = 
  pivotTableCacheDefinitionPart.
  AddNewPart();

var pivotCaches = new PivotCaches();
pivotCaches.AppendChild(new PivotCache()
  {
    CacheId = pivotCacheId,
    Id = document.WorkbookPart.
      GetIdOfPart(pivotTableCacheDefinitionPart)
  });

document.WorkbookPart.Workbook.AppendChild(pivotCaches);

Conclusion

In this article we draw a basic "map" of how to navigate through OpenXML in generating xlsx files. Even when trying to present it as easy as possible, you can see that the code for even the most simple operations can and will get complex.

VIDEO: ISSUE 109 LAUNCH EVENT

Sponsors

  • Accenture
  • BT Code Crafters
  • Accesa
  • Bosch
  • Betfair
  • MHP
  • BoatyardX
  • .msg systems
  • P3 group
  • Ing Hubs
  • Cognizant Softvision
  • Colors in projects

VIDEO: ISSUE 109 LAUNCH EVENT