In this post, I’ll demonstrate how to use Tesseract to build an Optical Character Recognition (OCR) application in C#.
In my recent post about OCR in C#, I used Puma.NET to create the OCR application.
The main drawbacks of using Puma.NET were:
- Less accurate
- Puma.NET should be installed on the machine.
- Requires older versions on .NET.
Creating an OCR application in C# using Tesseract
- Open Visual Studio and create a new C# Console application.
- Open the Package Manager Console and install the Tesseract nuget package.
Install-Package Tesseract
If you hate typing commands, Right-click on the project in the solution explorer and select Manage NuGet Packages… ->Click on Online tab and search Tesserect->Click install.
This will add Tesseract and other binaries to the project.
- Next, we should add language files. You can get these English language files from here. Create a folder tessdata in the Debug folder of your project and copy the language files to it.
- Finally, add the C# code and run the project.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Tesseract;
namespace TesserectOCR
{
class Program
{
static void Main(string[] args)
{
var ocrengine = new TesseractEngine(@".\tessdata", "eng", EngineMode.Default);
var img = Pix.LoadFromFile(@"E:\Capture.png");
var res = ocrengine.Process(img);
Console.WriteLine(res.GetText());
Console.ReadKey();
}
}
}
Possible errors
You may get the following error when running the project.
The type ‘System.Drawing.Bitmap’ is defined in an assembly that is not referenced. You must add a reference to assembly ‘System.Drawing, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’.
To fix this, Go to Solution Explorer -> Right-click on References -> Add Reference -> Search Drawing -> Select System.Drawing (A checkmark will appear on the left side if selected) from the result and click OK.