Tuesday, October 18, 2016

YAGF - Scan and extract text from images in archlinux

YAGF is a graphical OCR front-end for cuneiform and tesseract tools. With YAGF you can open scanned image files or obtain new images via XSane. Once you have a scanned image you can prepare it for text recognition, select particular image areas for recognition, set the recognition language and so on. Recognized text is displayed in a editor window where it can be corrected, saved to disk or copied to clipboard. 

Installation in archlinux:
$sudo pacman -S yagf cuneiform tesseract tesseract-data-eng

Here's the default YAGF user interface:

To start extract text, you can open images, direct scan, or paste from clipboard. If you get "segmentation fault" when loading images or paste from clipboard. Then you can disable "Crop Images when loaded" option in Edit -> Setting -> Image processing. 
Then ok, now try load images to extract text:
Now it loads properly. To recognize the text, press "red circle cycle button" to recognize the texts:

After it's recognized then you can edit the text, or save the text that you extracted from images.

That's it. enjoy YAGF.
Share on Facebook
Share on Twitter
Share on Google+
Tags :