Luzern 4/4/2013 11:27:05 AM
News / Internet

How to read data and formatting of Word documents in .NET

Elerium Word .NET Reader component, new release 2.0 has been introduced by Elerium Software

MS Word documents are one of the most popular formats for the reporting. It allows presenting information with different styles and formatting exactly such as it should look on a paper. Often MS Word is not installed on the server/computer, nevertheless a developer wants to process these reports inside C#/VB.NET/ASP.NET project. The best way is using a professional .NET library that includes various Word API functions. One of these libraries is introduced by Elerium Software.

Elerium Word .NET Reader presents an easy way to read data and formatting of Word documents. Here are the basic steps of getting the text of the document.

First off all, a developer must install Elerium Word .NET Reader to the project:

  1. Download the latest version of the component from this link:
    http://www.eleriumsoft.com/Word_NET/WordReader/Default.aspx
  2. Extract the downloaded archive and put the Word.dll component into /bin folder of the project.
  3. Add the component to the “using” section:
    using Docs.Word;

After that developer can easily read data from the Word document.

C# example:

using System;  
using System.Collections.Generic;  
using System.Linq;  
using System.Text;  
using Docs.Word;  
namespace OpenDocument  
{  
    class Program  
    {  
        static void Main(string[] args)  
        {  
            // Creates an instance of Document class  
            Document Doc = new Document();  
            // Reads a .doc file into internal document structure  
            Doc.ReadDoc(@"..\..\Data\DocFile.doc");  
            // Gets text of 1st paragraph of 1st section of the document  
            string Text = ((Paragraph)Doc.Sections[0].Nodes[0]).Text;  
            // Writes gotten text to console  
            Console.WriteLine(Text);  
            Console.ReadKey();  
        }  }  }
VB.NET Example:
Imports Docs.Word  
Module Module1  
    Sub Main()  
        ' Creates an instance of Document class  
        Dim Doc As New Document()  
        ' Reads a .doc file into internal document structure  
        Doc.ReadDoc("..\..\Data\DocFile.doc")  
        ' Gets text of 1st paragraph of 1st section of the document  
        Dim Text As String = DirectCast(Doc.Sections(0).Nodes(0), Paragraph).Text  
        ' Writes gotten text to console  
        Console.WriteLine(Text)  
        Console.ReadKey()  
    End Sub  
End Module

This sample demonstrates the reading of different text formatting such as Font Name, Size, Color, Background color, Footnotes etc.

C# Example:

using System;  
using System.Collections.Generic;  
using System.Text;  
using System.Windows.Forms;  
using Docs.Word;  
namespace TextRun_Styles  
{  
        private void Form1_Load(object sender, EventArgs e)  
        {  
            // Creates a new instance of Document class and reads a .doc file into this structure  
            Document Doc = new Document();  
            Doc.ReadDoc(@"..\..\Data\WordTextFormatting.doc");  
            // Gets two first text runs, in this example - two sentences  
            for (int i = 0; i < 2; i++)  
            {  
                // Gets text run  
                TextRun tTextRun = ((Paragraph)Doc.Sections[0].Nodes[0]).TextRuns[i];  
                // Writes its properties  
                textBox1.Text += "=== Text run " + (i+1) + " ===" + "\r\n";  
                textBox1.Text += "Text          : " + tTextRun.Text + "\r\n";  
                textBox1.Text += "Font name     : " + tTextRun.Style.FontName + "\r\n";  
                textBox1.Text += "Font size (in half-point) : " + tTextRun.Style.FontSize + "\r\n";  
                textBox1.Text += "Text color            : " + tTextRun.Style.TextColor + "\r\n";  
                textBox1.Text += "Bold          : " + tTextRun.Style.FontStyle.Bold + "\r\n";  
                textBox1.Text += "Italic            : " + tTextRun.Style.FontStyle.Italic + "\r\n";  
                textBox1.Text += "Underlined        : " + tTextRun.Style.FontStyle.Underlined + "\r\n";  
                textBox1.Text += "Strike-out            : " + tTextRun.Style.FontStyle.StrikeOut + "\r\n\r\n";  
            }      }     }     }
VB.NET Example:
Imports Docs.Word  
Public Class Form1  
    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load  
        ' Creates a new instance of Document class and reads a .doc file into this structure  
        Dim Doc As New Document()  
        Doc.ReadDoc("..\..\Data\WordTextFormatting.doc")  
        ' Gets two first text runs, in this example - two sentences  
        For i As Integer = 0 To 1  
            ' Gets text run  
            Dim tTextRun As TextRun = DirectCast(Doc.Sections(0).Nodes(0), Paragraph).TextRuns(i)  
            ' Writes its properties  
            textBox1.Text += "=== Text run " & (i + 1).ToString & " ===" & vbCr & vbLf  
            textBox1.Text += "Text" & vbTab & vbTab & vbTab & ": " + tTextRun.Text & vbCr & vbLf  
            textBox1.Text += "Font name" & vbTab & vbTab & ": " + tTextRun.Style.FontName & vbCr & vbLf  
            textBox1.Text += "Font size" & vbTab & "(in half-point)" & vbTab & ": " + tTextRun.Style.FontSize.ToString & vbCr & vbLf  
            textBox1.Text += "Text color" & vbTab & vbTab & vbTab & ": " + tTextRun.Style.TextColor.ToString & vbCr & vbLf  
            textBox1.Text += "Bold" & vbTab & vbTab & vbTab & ": " + tTextRun.Style.FontStyle.Bold.ToString & vbCr & vbLf  
            textBox1.Text += "Italic" & vbTab & vbTab & vbTab & ": " + tTextRun.Style.FontStyle.Italic.ToString & vbCr & vbLf  
            textBox1.Text += "Underlined" & vbTab & vbTab & ": " + tTextRun.Style.FontStyle.Underlined.ToString & vbCr & vbLf  
            textBox1.Text += "Strike-out" & vbTab & vbTab & vbTab & ": " + tTextRun.Style.FontStyle.StrikeOut.ToString & vbCr & vbLf & vbCr & vbLf  
        Next  
    End Sub  
End Class
About Elerium Software

Elerium Software develops professional solutions for use in .NET projects (C#, VB.NET, ASP.NET) that aimed to read/write/convert different office/web documents and formats. Elerium Software components are based on the unique design and fast algorithms that allow being independent from the third-party applications and libraries.