Optical Character Recognition in C# using Tesseract
How To C Sharp

Optical Character Recognition in C# using Tesseract

Mishel Shaji
Mishel Shaji

In this post, I’ll demonstrate how to use Tesseract to build an Optical Character Recognition (OCR) application in C#.

In my recent post about OCR in C#, I used Puma.NET to create the OCR application.

Optical Character Recognition (OCR) in C# - Mishel
OCR is the process of converting printed or handwritten text to machie-encoded text. This post will help you to create an OCR application in C#.

The main drawbacks of using Puma.NET were:

  • Less accurate
  • Puma.NET should be installed on the machine.
  • Requires older versions on .NET.

Creating an OCR application in C# using Tesseract

  • Open Visual Studio and create a new C# Console application.
  • Open the Package Manager Console and install the Tesseract nuget package.

Install-Package Tesseract

If you hate typing commands, Right-click on the project in the solution explorer and select Manage NuGet Packages… ->Click on Online tab and search Tesserect->Click install.

This will add Tesseract and other binaries to the project.

  • Next, we should add language files. You can get these English language files from here. Create a folder tessdata in the Debug folder of your project and copy the language files to it.
  • Finally, add the C# code and run the project.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Tesseract;
namespace TesserectOCR
{
    class Program
    {
        static void Main(string[] args)
        {
            var ocrengine = new TesseractEngine(@".\tessdata", "eng", EngineMode.Default);
            var img = Pix.LoadFromFile(@"E:\Capture.png");
            var res = ocrengine.Process(img);
            Console.WriteLine(res.GetText());
            Console.ReadKey();
        }
    }
}

Possible errors

You may get the following error when running the project.

The type ‘System.Drawing.Bitmap’ is defined in an assembly that is not referenced. You must add a reference to assembly ‘System.Drawing, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’.

To fix this, Go to Solution Explorer -> Right-click on References -> Add Reference -> Search Drawing -> Select System.Drawing (A checkmark will appear on the left side if selected) from the result and click OK.