Use a few gray levels for text scans

Bug #521323 reported by Stuart Langridge
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Simple Scan
Fix Released
Low
Unassigned
simple-scan (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

Binary package hint: simple-scan

Simple-scan scans from my Packard Bell Diamond 1200 in pure two-colour black-and-white, not greyscale, meaning that scans are very hard to read. There doesn't seem to be a way of setting greyscale scanning (ideally, the scan would default to either greyscale or colour scanning, since threshold black-and-white scans look horrible).

(This might be a dup of bug #498029, but I don't really know what a "colour profile" is :))

ProblemType: Bug
Architecture: i386
Date: Sat Feb 13 10:08:41 2010
DistroRelease: Ubuntu 10.04
Package: simple-scan 0.8.2-0ubuntu1
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-13.18-generic
SourcePackage: simple-scan
Uname: Linux 2.6.32-13-generic i686

Revision history for this message
Stuart Langridge (sil) wrote :
Revision history for this message
Stuart Langridge (sil) wrote :

Ah. I have just discovered that this is what the "Photo"/"Text" dropdown does. So, I suggest that "Text" should scan in greyscale, not threshold black-and-white.

summary: - Scan is in black-and-white, not greyscale
+ "Text" scan is in black-and-white, not greyscale
Revision history for this message
Robert Ancell (robert-ancell) wrote : Re: "Text" scan is in black-and-white, not greyscale

Yes, the scan is in black and white to compress better and make OCR easier. There's definitely got to be a case for greyscale scanning (though this could be easily done in post-processing).

Question is:
- Should there be three options?
- Should there be a preferences to scan text in greyscale?

The assumption is you only scan text when you want maximum compression.
The document type defaults to text, I will update that to photo.

Changed in simple-scan (Ubuntu):
importance: Undecided → Low
status: New → Triaged
Changed in simple-scan:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Stuart Langridge (sil) wrote :

I don't think it's post-processable - you can't go from threshold b/w to greyscale because you've already thrown away the data you need. You can post-process from greyscale to threshold, though. I would be inclined to scan text as greyscale, and then post-process to threshold b/w if the user asks for OCR. Threshold text is almost unreadable, too, whereas greyscale looks perfectly fine, so if the user scans text but does not intend to OCR, they'll be happy with the result.

Revision history for this message
Robert Ancell (robert-ancell) wrote :

I meant post-processable from full-colour :)

In BZR trunk I've changed the default to photo, I'd expect most users will just keep to this because with JPEG compression and modern bandwidth they're not going to care much about image size.

The idea of text mode is that it is used for archiving where most people I've talked to indicated compression was the most important factor. My guess is that scanners actually do some clever image processing for "Lineart" mode that may not be easily done post-scan (without resorting to very high resolution scans).

Question: Did you intend the document you scanned for archiving? Would you have been happy if the document was full colour? Was it readable enough for archiving purposes?

Also, please attach an example if you have one?

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Subscribing Martin Pitt as he is the person who has given me the most advice about simple-scan+archiving.

Revision history for this message
staedtler-przyborski (staedtler-przyborski-deactivatedaccount) wrote :

Here you can read some good reasons why lineart is the better choice compared to grayscale for OCR:
http://www.scantips.com/basics4f.html

But (there's always a but ;-) in real life things may be different. Depending on Scanner/Driver in some cases grayscale could be the better choice. One of the biggest problems of 'one size fit's all' solutions is: There is no 'bed of Prokustes', means: real scanners (and their drivers) are too different.

Revision history for this message
Martin Pitt (pitti) wrote :

In my previous "scandoc" script I used pngquant to reduce the number of grey levels to 4 (i. e. black, white, dark grey, light grey). That worked well enough for me, while still keeping small file sizes. I do agree that documents with 4 or 8 gray levels are much easier to read.

Revision history for this message
Martin Pitt (pitti) wrote :

In that script I also used convert -modulate 120 -level 30,60% to clean up noise on the white background (such as text on the backside shining through) as well as blackening washed out text. With that and a following pngquant 4 I have gotten quite good results.

Revision history for this message
cameleon (el-cameleon-1) wrote :

I think resolution and color management should be treated as two different settings, because there is the case where you want to scan photo in greyscale, or photo in color but low resolution,...

Revision history for this message
Robert Ancell (robert-ancell) wrote :

camelon, what is the case when you want to scan a photo in black and white?

Why not just scan it in colour and then edit it in a photo-editing tool like any other photo? (because black and white is only one type of post-processing you might want to do).

Revision history for this message
staedtler-przyborski (staedtler-przyborski-deactivatedaccount) wrote :

Scanner interfaces follow (seen over all available scanning applications) currently two paradigm:

a) Give access to all possibilities the scanner offers and do lot's of postprocessing when necessary ( Silverfast is here the best applicationto mention), ideally no external application is needed to give desired results in best available quality.

b) Show only a reduced subset of the possibilities to simplify scanning as much as possible (e.g. HP Twain is a horrible example for that). Ideally no external application is needed, as long as you use the scans only in the simplified use-scenarios. (Saving as PDF, Print, Mail ...).

As you can see in both cases the possibilities are not limited by the scanner itself. It's a decision of developers (and in some cases) of users demand.

Currently no (I mean this serious) scanning application available on Linux can fullfill a) or b).
Simple-Scan is on a good road to b), but lacks some features.

Revision history for this message
cameleon (el-cameleon-1) wrote :

I have try to scan a grayscale document in "Text" mode (black and white), the result looks horrible (look at the attachment). Only a "Photo" scan is great for this kind of document (look at next comment).
So my conclusion is that a third option is needed to scan grayscale documents.

Revision history for this message
cameleon (el-cameleon-1) wrote :

Now the same document in scanned in "Photo" mode:

Revision history for this message
Martin Pitt (pitti) wrote :

cameleon,

what you have there actually is much more suitable for the photo profile. "Text" is for real (black on white) text, such as books, letters, recipes, and the like.

That said, even those look pretty bad with the current monochrome mode. I think it would be much better to use at least 4 or perhaps 8 levels of gray, and perhaps reducing the resolution a bit to not make the files much bigger.

summary: - "Text" scan is in black-and-white, not greyscale
+ Use a few gray levels for text scans
Revision history for this message
MarkieB (ubunt-u-markbenjamin) wrote :

then there are people such as me who had to replace my scanner's glass - I guess the glass from the hardware store is more green / uncoated than the original - so it needs some adjustment in all cases; in the color case usually brightness/contrast adjustments are sufficient to make it look reasonably normal - although realistic-looking photos are a different matter!

in the monochrome case, similarly a greyscale scan then brightness/contrast adjustments are necessary, it would be nice to have preset adjustments available so that the scan as directly saved from simple scan was already reasonably adjusted;

so I'd suggest, for the monochrome case, a brightness/contrast[/gamma?] setting to 'prefilter' the 1bpp type result [currently 'text' setting], so that for all possible scanners it could produce a legible document, possibly in addition to the 2bpp 4-value type result you are mooting?

It would seem that this links to #498029 although I'd reiterate what has already been said, that it would be nicer to filter the image before saving [even before preview really], rather than adding a color profile to the saved image - particularly as a quick look at the patch there seems to suggest that for instance pdf images can't have embedded color profiles

as there is the possibility of calling imagemagick [when installed] from the code, it would seem reasonably straightforward to call it to prefilter images; then the monochrome scan could simply be an 8bpp greyscale scan, that is subsequently filtered into a 1bpp / 2bpp 'preview' image for saving/sending/etc

could possibly add imagemagick/graphicsmagick libs to the deps then call the lib functions directly from the code

Revision history for this message
MarkieB (ubunt-u-markbenjamin) wrote :

whoops should have edited that :-) when I looked more carefully at the patch for #498029 I noticed that pdfs could have color profiles added, providing imagemagick is installed

Revision history for this message
aexl (aexl) wrote :

i need s/w (most of the time) as well as grayscale (seldom).
so i would be happy to have 3 options.

Changed in simple-scan (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Matti Viljanen (direc85) wrote :

Any chance of cherry-picking this to Lucid? The Maverick .deb does not install in Lucid...

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Matti, I've update the simple-scan PPA (https://launchpad.net/~robert-ancell/+archive/simple-scan) with the latest version.

Revision history for this message
David Rahrer (david-rahrer) wrote :

I just upgraded from 1.03 to 2.31.91 on Lucid and text scanning is too gray for OCR without manual post-scan processing. I realize there was a readability issue for some using lineart but it would be nice to have that as a choice. Is there a way to possibly change this behavior back to 1.03 by changing a config file? I know this is supposed to be a simple scanning app, but adding the option for B&W, Grayscale, and Photo doesn't seem like it would complicate things too much.

Revision history for this message
Robert Ancell (robert-ancell) wrote :

David, thanks, I've opened a new request in bug #658792

Changed in simple-scan:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.