OCR - optical character recognition, with MODI - microsoft office document imaging.
Microsoft has developed tool for OCR, which we can use it as a COM object.
First you need to install MODI on your computer, then, you can use it like this:
Document md = new Document(); String ocrText = String.Empty; md.Create(openFileDialog1.FileName); const bool ocrOrientImage = false; const bool ocrStraightenImage = false; md.OCR(MiLANGUAGES.miLANG_ENGLISH, ocrOrientImage, ocrStraightenImage); var image = (MODI.Image)md.Images[0]; var layout = image.Layout; foreach (Word word in layout.Words) { if (ocrText.Length > 0) { ocrText += " "; } ocrText += word.Text; } textBox1.Text = ocrText;
Only problem is if you want to really release the object md, then you ave to use SaveAs method something like:
string path = Path.GetDirectoryName(openFileDialog1.FileName); md.SaveAs(path + "\\deleteMe.tif", MODI.MiFILE_FORMAT.miFILE_FORMAT_DEFAULTVALUE,
MODI.MiCOMP_LEVEL.miCOMP_LEVEL_MEDIUM);
Taken from here, and from this source code.
My source you can download from here, and exe file can be found here.