Playing with OpenXML: Let’s Convert docx to Simple html5
I decided to play little with the application from the Introduction to Open XML SDK 2.0 Post and I decided to read text in more details, ex. to iterate through all document paragraphs and for each paragraph to iterate through each of its Runs (Run is a region of text with a common set of properties, within paragraph text is grouped into one or more runs).
Here’s a simple example how after load docx file, we can iterate through all of its paragraphs, for each paragraph to iterate through all of its Runs and read some Run’s attributes to convert them to html:
//t load the .docx document
doc = WordprocessingDocument.Open(_filePathTB.Text, false);
//t iterate through paragraphs
paragraphs = doc.MainDocumentPart.Document.Body.Elements<Paragraph>();
foreach (Paragraph para in paragraphs)
{
foreach (Run r in para.Elements<Run>())
{
//t r is our current Run
}
}
Using the code above we can write a simple docx to html5 convertor
//t load the .docx document
doc = WordprocessingDocument.Open(_filePathTB.Text, false);
text.AppendLine(“<!DOCTYPE html>”);
text.AppendLine(“<html>”);
text.AppendLine(“ <head>”);
text.AppendLine(“ <title>” + _filePathTB.Text + ” HTML5 Export</title>”);
text.AppendLine(“ </head>”);
text.AppendLine(“ <body>”);
//t iterate through paragraphs
paragraphs = doc.MainDocumentPart.Document.Body.Elements<Paragraph>();
foreach (Paragraph para in paragraphs)
{
//t we export paragraphs as <p> tags
text.Append(“<p>”);
foreach (Run r in para.Elements<Run>())
{
if (r.RunProperties != null)
{
//t ADD OPENNING TAGS HERE, if any
if (r.RunProperties.Bold != null)
text.Append(“<b>”);
if (r.RunProperties.Italic != null)
text.Append(“<i>”);
if (r.RunProperties.Underline != null)
text.Append(“<u>”);
if (r.RunProperties.Color != null && r.RunProperties.Color.Val != null)
text.Append(“<span style=\”color:#” + r.RunProperties.Color.Val + “\”>”);
//t ADD TEXT HERE
text.Append(r.InnerText);
//t ADD CLOSING TAGS HERE
//t IN BACK ORDER OF OPENNING TAGS
if (r.RunProperties.Color != null && r.RunProperties.Color.Val != null)
text.Append(“</span>”);
if (r.RunProperties.Underline != null)
text.Append(“</u>”);
if (r.RunProperties.Italic != null)
text.Append(“</i>”);
if (r.RunProperties.Bold != null)
text.Append(“</b>”);
}
else
{
text.Append(r.InnerText);
}
}
text.Append(“</p>”);
}
text.AppendLine(“ </body>”);
text.AppendLine(“</html>”);
_showTextTB.Text = text.ToString();
The StringBuilder variable text contains all the docx text converted to html5, it supports only bold, italic, underline and color properties. I said, it’s simple :)
Here are screenshots of the app in action:
We have a docx with simple text

Now, we copy-paste the result in html file and the result is:

You can download the simple app HERE.


One Comments to “Playing with OpenXML: Let’s Convert docx to Simple html5”
The doesnt run. It’s preferable you post the vs solution or project.