OCR using cuneiform does not work

Bug #654771 reported by FriedChicken
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gscan2pdf (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

Binary package hint: gscan2pdf

The cuneiform version in ubuntu has no libmagick++-support (Bug #654767). Therefore it can only process uncompressed BMP v3 images.

Trying anything else leads to OCR being cancelled. On the console this is printed:
> /tmp/lD1hIbHasU/ThgzjU3Pqw.pnm is not a BMP file.
> *** unhandled exception in callback:
> *** Error: cannot open /tmp/lD1hIbHasU/DWr2n3tweG.txt
> *** ignoring at /usr/bin/gscan2pdf line 12513.

As a workaround gscan2pdf should convert the images before passing them over to cuneiform.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: gscan2pdf 0.9.31-2
ProcVersionSignature: Ubuntu 2.6.35-22.33-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelModules: fglrx
Architecture: amd64
Date: Mon Oct 4 20:51:24 2010
InstallationMedia: Kubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
PackageArchitecture: all
ProcEnviron:
 LANGUAGE=
 LANG=de_DE.utf8
 SHELL=/bin/bash
SourcePackage: gscan2pdf

Revision history for this message
FriedChicken (domlyons) wrote :
Revision history for this message
Jeffrey Ratcliffe (jeffreyratcliffe) wrote : Re: [Bug 654771] Re: Pass images as uncompressed BMP v3 to cuneiform

This patch fixes things for me

Revision history for this message
FriedChicken (domlyons) wrote : Re: Pass images as uncompressed BMP v3 to cuneiform

Thank you! Yes, this should work.

Revision history for this message
FriedChicken (domlyons) wrote :

I'm not sure if I should file a new bug or append it to this one...

Now cuneiform is built with libmagick++-support (Bug #654767 fixed for maverick, fix for Lucid is in proposed repository). So cuneiform can perform nearly any image format. But OCR with cuneiform still doesn't work: The OCR tab simply stays clear.

Manually starting cuneiform an a scanned image in /tmp/CrazyFolderName/RandomImageName.pnm works and shows a pretty exact result. So it's not a problem of cuneiform or a unusable scanned document.

Revision history for this message
FriedChicken (domlyons) wrote :

Replacing the "-f hocr" option for cuneiform by "-f smarttext" solved this. (Instead of "smarttext" "text" should work, too. But smarttext is probably better in most cases.)

Did "-f hocr" work for anyone at all?

tags: added: patch
Revision history for this message
Jeffrey Ratcliffe (jeffreyratcliffe) wrote : Re: [Bug 654771] Re: Pass images as uncompressed BMP v3 to cuneiform

I need to do some more work on that patch.

It seems that the hocr output from this version of cuneiform is a box
per letter, which gets the letters in the right place, but is useless
for searching.

Can anyone check cuneiform 1.0.0 to see if it the same there?
Otherwise, I'll probably switch to plain text.

tags: added: patch-needswork
Daniel T Chen (crimsun)
Changed in gscan2pdf (Ubuntu):
status: New → Incomplete
importance: Undecided → Low
summary: - Pass images as uncompressed BMP v3 to cuneiform
+ OCR using cuneiform does not work
Revision history for this message
Jeffrey Ratcliffe (jeffreyratcliffe) wrote :

It works fine with cuneiform 1.0.0

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for gscan2pdf (Ubuntu) because there has been no activity for 60 days.]

Changed in gscan2pdf (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Marja Erwin (marja-e) wrote :

I am running into the same bug.

Revision history for this message
Marja Erwin (marja-e) wrote :

I marked this as new, because it wasn't showing up when I searched for gscan2pdf.

Changed in gscan2pdf (Ubuntu):
status: Expired → New
Revision history for this message
Jeffrey Ratcliffe (jeffreyratcliffe) wrote :

This is fixed in gscan2pdf 1.0.0

Changed in gscan2pdf (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.