Saturday, June 7, 2014

Generating Word/Excel Documents without Office and any 3rd party components

Recently we’ve came across one requirement, which demands the generation Excel/Word documents with complex formatting from an C# website. Due to constraints on software cost, using Microsoft Office installation (using COM-Interops) or any 3rd party components (like ASPOSE) was not an option.
We could have been gone for, simply generating the content in HTML and rendering it to the browser as a Word or Excel document, by specifying the content type as ‘application/’ or ‘application/vnd.msword’. But this was not a viable option to us as we had to support complex formatting. For eg. For an excel worksheet, individual cell text can be in multiple color. i.e Some part of the cell text will be in blue and rest are in red. See the below screen shot to get an understanding.
After some research, we’ve found that, we can use Microsoft’s Office XML format to generate Office (Excel/Word) documents. This means we can define the office document content as XML, which will be correctly rendered and open with Office. Specifically below are the format we’ve used.
    • EXCEL  – XML Spreadsheet 2003 format (.xml)
    • WORD – Word 2003 XML format (.xml) 
 The methodology is simple as we can devise the entire process in the below steps 
a. Create Template in Microsoft Office XML format:
      Create an office document with some sample data, required styles and formatting in Microsoft Office XML Format.
     Note: If you already have the template in another format (say XLS, DOC), a ‘Save As’ will do
b. Define the input data in plain XML format:
      Define and Create an input XML file containing input data, that need to be fed to the template during the document generation.
c. Define an XSLT transformation to convert Input Data to the resultant Office document:
      Open your template (Created in Step#1) in ‘notepad.exe’, Copy out it contents to a new text file. Replace the sample data with required XSL transformation which insert your input data (Created in Step#2) to the resultant office document.
d. Perform XSL Transformation to generate Office Document:
       Perform XSL Transformation on the input XML file to generate the output Office file.
Why don’t we look at a live example to understand it much easier. In this example, We are considering Excel generation using ‘Spread Sheet 2003 XML’ format. This approach will be the same for other kind of office documents as well (like MS Word, Power Point).
Step1: Create an XML file containing all input data
Step2:  Create a sample EXCEL file with all required styles and some sample data in Office Excel 2007.
Step3:  Save the document in ‘XML Spreadsheet 2003’ format
           Step4:  Open the EXCEL in notepad and copy its contents
           Step5:  Create an XSLT file and paste the copied contents to it
           Step6:  Add XSL Transformation specific tags, wherever applicable.
          Download: BookInventoryTransform.xsl
          Step7:  NB: Remove sample data and use input XML data using XSL Transformation


       Step8:  Perform XSL Transformation using the input XML file to generate the output EXCEL file (Using Visual Studio) 
      Step9:  You’re done with the sample!
      Download: BookInventoryOutput.xml
      Step10:  To incorporate this into website using C# code, Use XslCompledTransform class
            XslCompiledTransform xsl = new XslCompiledTransform(false);
            XsltSettings xslSettings = new XsltSettings();
            xslSettings.EnableScript = true;
            XsltArgumentList xsltArgList = new XsltArgumentList();
            xsl.Load(xsltPath, xslSettings, null);
            using (XmlReader inStream = XmlReader.Create(new StringReader(inputXml)))
                using (MemoryStream xslOutStream = new MemoryStream())
                    XmlWriterSettings set = new XmlWriterSettings();
                    set.Encoding = Encoding.UTF8;
                    using (XmlWriter writer = XmlWriter.Create(xslOutStream, set))
                        xsl.Transform(inStream, xsltArgList, writer);
                        inStream.Seek(0, SeekOrigin.Begin);
                        BinaryWriter outWriter = new BinaryWriter(outSteam);

Here xsltPath, is the path to your  xsl file (XSL Transformation file). inputXML is a string variable contains the input XML content. outSteam is the HTTP output stream of your HTTP Response object (outResponse.OutputStream).
Don’t forget to set the content type for your HTTP Response, before rendering the document as below.
// Prepare the response
            HttpResponse httpResponse = outResponse;
            httpResponse.Buffer = true;
            httpResponse.Charset = string.Empty;
            httpResponse.ContentType = "application/";
       httpResponse.AddHeader("content-disposition", "attachment;filename=\"" + workBookName + "\"");
            return true;
Limitations of Microsoft XML 2003 Format:
There are some known limitations of XML 2003 formats, like they cannot contain VBA projects, charts and graphs.
1.       Though above steps says the output is an excel file, file extension shows ‘.xml’. Why?
Microsoft Spread Sheet XML format is indeed an XML format. So don’t worry, it will open in EXCEL.
If still extension is important for you, change it to .xls
Similarly for word, change .xml to .doc
2.       We don’t have any input XML files defined. We only use entities and model (C# classes). So how should we can adopt the above method without an input XML.
You can serialize your model or entities (C# classes) to XML using XMLSerializer (.NET class). Then stream this serialized XML to the XSLT transformation.
3.       Does this mean, I’ve to learn the complete Microsoft XML 2003 specification for adopting this methodology?
Certainly not, as you are simply copy/paste the XML fragments from the template and you’re only working on a part of it.
Most probably you will end up changing the data rendering part only. But understanding the specification can be beneficial and will give more comfort while working with a complex transformation involving creation of new styles and formats dynamically.
Complete Spread sheet XML specification can be found here:
4.       I am new to XSLT and XPATH. What’s the starting point?
XSLT is an easy to learn templating language.
XSLT uses XPATH to search xml elements from the input XML. XPATH also widely used in XSLT test and select expressions.
XPATH also has a significant use in the .NET XML manipulation classes like XPathNavigator, XElement and XDocument.
5.       Ok Sounds good, But I feel XSLT is very limiting in terms of programming constructs. What should I do if I need to do a complex processing?
Well the nice feature of XSLT is you can add extension objects to it.
i.e In short you can write C# functions inside your XSLT file and can call anywhere in your XSLT.
XSLT also support scripting using java script.
The above given example shows the usage C# ‘string.Trim(()’ function as an example.  Write as many functions as you want, declare variables as you do with C# classes.  All are permissible inside XSLT as far as you stick to .NET 2.0 versions.
If you’re more ambitious and want to use .NET 3.5+ features, it’s still possible. You can pass your custom class object to XSLT as an extension object and can call functions inside that class. Learn it and try it on your own.

1 comment:

  1. This is indeed great...thanks a lot Abraham!!!!