Bug #75384 “xsane PDF file sizes could be optimized” : Bugs : xsane package : Ubuntu

Revision history for this message

Hew (hew) wrote on 2008-05-06:

#1

Thank you for taking the time to report this bug and helping to make Ubuntu better. I have reproduced this issue by using xsane to produce both a pdf (1.4MB) and ps (1.8MB) scan, and then converting the ps to a pdf (103.8KB). Upon visual inspection, it is apparent that this large difference in file size is due to image compression (there are clear visual artifacts on the low quality version), and not because of poor optimization. I am therefore marking this bug as invalid. Please don't hesitate to submit bug reports in the future. Thanks again!

Changed in xsane:
status:	New → Invalid

Revision history for this message

Antonio J. de Oliveira (ajoliveira) wrote on 2009-09-04:

#2

Hi! :-)

I am returning to this because I think the bug exists indeed, my secretary has been complaining about this, making me lose time checking why she has hit the wall, or if there is a fix it is not obvious to me...
scanned a color document, 150dpi, full color, xsane 0.994, Jaunty 32 and 64-bit (tried on the 2 versions).

jpg 518k
pdf 4.9M (!)
jpg converted to pdf with gm convert (graphics Magick, default options) 414k.

Note that convert (image magick) with default optios produces a 6.5MB file, it is even worse.

The difference is pdf directly created by xsane is about 10 times larger than the one converted with gm, and it looks the same, even at closer look. If this is an invalid bug, I am puzzled...ok, maybe not a bug, but a nagging annoyance, needing intermediate passes to perform properly in color, and invalidating multi-page color pdf scans, everything must be scanned to jpg and then batcht-converted.

If you need that, the command

#gm convert *.jpg -adjoin output.pdf

is quite clean and efficient, just scan your pdfs as 001.jpg, 002.jpg...nnn.jpg

Please advise

Greetings from sunny Portugal

Revision history for this message

Antonio J. de Oliveira (ajoliveira) wrote on 2009-09-04:

#3

re-opened it, I think it is even worse than described in Jaunty.

Changed in xsane (Ubuntu):
status:	Invalid → New

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-09-04:

#4

Hi-- thanks for your support of my ancient bug. Now that there are two of us who want this, maybe we can petition for it as a wishlist item, at least.

I completely understand why they closed it -- what you and I are doing with imagemagick is to use a "worse" or "lossier" compression scheme to compress the file to be smaller than xsane can possibly make using any of its options. The person closing the bug saw that it produced artifacts in the resulting image, which is to be expected.

Since I almost exclusively scan black and white documents, I have been satisfied with my technique for scanning the document as black and white 300dpi multi-page .ps files, then using ps2pdf to convert the single multi-page file to pdf. This produces an acceptably small pdf with extremely acceptable quality for my particular circumstance.

IF, however, you are scanning color documents, you probably want much more control over the settings. As the person who closed the bug implied, compression artifacts might be unacceptable for some circumstances. Also if your documents use solid color, such as blue or red ink on black and white forms, scanning to a format with a limited palette BEFORE compressing would create a small file and eliminate the compression artifacts. SO it's not necessarily a simple issue.

I was hoping for some more options, maybe a way for xsane to call the imagemagick convert command during the save process so it would be possible to add some additional tweaking, while still preserving the ease of use xsane offers.

It would be REALLY nice to configure and save more options for scanning, such as compression algorithm and level, pdf display name, and maybe the PDF page size size and resolution, if it needs separate control.

NOTE: xsane can save multi-page .ps files, and in Gnome, and you can add ps2pdf as a right-click command in Nautilus (the GNOME file viewer) by right click --> Open With --> Custom command... . SO I have not missed this requested feature, EXCEPT when I am dealing with a file that doesn't work with the defaults in ps2pdf. (For example, a legal or Tabloid size document would get truncated by ps2pdf unless I pass some more arguments. For those situations I usually just scan each image to a file, open and place the images in an OpenOffice document, and save the pdf from there.)

Hi-- thanks for your support of my ancient bug. Now that there are two of us who want this, maybe we can petition for it as a wishlist item, at least.

I completely understand why they closed it -- what you and I are doing with imagemagick is to use a "worse" or "lossier" compression scheme to compress the file to be smaller than xsane can possibly make using any of its options. The person closing the bug saw that it produced artifacts in the resulting image, which is to be expected.

Since I almost exclusively scan black and white documents, I have been satisfied with my technique for scanning the document as black and white 300dpi multi-page .ps files, then using ps2pdf to convert the single multi-page file to pdf. This produces an acceptably small pdf with extremely acceptable quality for my particular circumstance.

IF, however, you are scanning color documents, you probably want much more control over the settings. As the person who closed the bug implied, compression artifacts might be unacceptable for some circumstances. Also if your documents use solid color, such as blue or red ink on black and white forms, scanning to a format with a limited palette BEFORE compressing would create a small file and eliminate the compression artifacts. SO it's not necessarily a simple issue.

I was hoping for some more options, maybe a way for xsane to call the imagemagick convert command during the save process so it would be possible to add some additional tweaking, while still preserving the ease of use xsane offers.

It would be REALLY nice to configure and save more options for scanning, such as compression algorithm and level, pdf display name, and maybe the PDF page size size and resolution, if it needs separate control.

NOTE: xsane can save multi-page .ps files, and in Gnome, and you can add ps2pdf as a right-click command in Nautilus (the GNOME file viewer) by  right click --> Open With --> Custom command... . SO I have not missed this requested feature, EXCEPT when I am dealing with a file that doesn't work with the defaults in ps2pdf. (For example, a legal or Tabloid size document would get truncated by ps2pdf unless I pass some more arguments. For those situations I usually just scan each image to a file, open and place the images in an OpenOffice document, and save the pdf from there.)

Revision history for this message

Antonio J. de Oliveira (ajoliveira) wrote on 2009-09-05:

#5

Hi

The worst part was that I could not see a bit of a difference between a scanned jpg image and a converted graphics magick (not image magick, note) in pdf format or a native scanned xsane pdf. No artifacts, nothing. That was the reason I re-opened this bug. I can send over some examples, but the developers can easily duplicate this. It seems that it is the engine behind pdf conversion that is causing trouble, if we are granted with an engine choice, since I suspect the engine is not a part of the basic package, maybe everything may be cleanly bypassed.

Cheers

Antonio

Revision history for this message

Antonio J. de Oliveira (ajoliveira) wrote on 2009-09-07:

#6

Hi Tommy

Thanks for the ps multipage feature tip. Today we used that together with gm convert (Graphics Magick) to create a splendid pdf. I installed 'context' so as to try pstopdf, but still, the final size is almost doubled in relation to the size of the file created with gm convert, and the resolution is clearly similar.
Well, we do have some shortcuts, the highway is up to the product developers.

Cheers

Antonio

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-09-07:

#7

Thank you for the clarification -- I had forgotten about GraphicsMagick (a "fork" of ImageMagick). GraphicsMagick should be superior for a stable solution, and I should redo my tests using it.

I have not dug into the code of xsane. I suspect a command-line or script option could be "slapped in" relatively easily, but a useful graphical user interface would likely be much harder. I haven't looked for a list or forum for xsane issues, but that might be a good place to start.

Oh, and of course it always helps to be completely familiar with the existing software -- there are details and helpful hints at http://www.xsane.org/

P.S.: There are other front-ends to SANE, but xsane has always been very stable and predictable. I haven't looked at any other options lately, but one of them is called quiteinsane and I see it's still available in Ubuntu. I'm mentioning it because it's always possible another project has the features we are looking for.

Revision history for this message

Antonio J. de Oliveira (ajoliveira) wrote on 2009-09-07:

#8

Got it, and have installed quiteinsane, I was aware of its existence, but as you, when some stuff performs daily duties properly, why change...well, going to give it a try and dig a little more.

Revision history for this message

Micah Gersten (micahg) wrote on 2009-10-12:

#9

Seems like a reasonable request.

Changed in xsane (Ubuntu):
importance:	Undecided → Wishlist

xteejx (xteejx-deactivatedaccount) on 2009-10-12

Changed in xsane (Ubuntu):
status:	New → Confirmed

Revision history for this message

Antonio J. de Oliveira (ajoliveira) wrote on 2009-10-12:

#10

Hi

Good, when anything may be handled possibly by a minor script change, why going through major changes? Let me know if I may be of help.

Cheers

Antonio

Revision history for this message

Timmo Henseler (timmo-henseler) wrote on 2009-11-02:

#11

Hi guys,

Following your discussion I think I am in a different (lower) league but as an end user, trying to get similar results as those I get from the HP scanning software under windows I very much recognise your issues which have troubled me since I started using Unbuntu last year. Today I hope I made some progress:

- I had gscan2pdf installed for a long time already but somehow it never satisfied my needs. Today however I got quite an acceptable color result (knowing a little bit more about jpg-compression than before). Perhaps worth a try and let me know your experience.

- QuiteInsane as far as I can see is a plugin for GIMP but has no pdf-capability (I only scan to pdf).

XSANE could be better but gscan2pdf is simpler and does what I need.

cheers,
Timmo

Revision history for this message

Boris Burtin (boris-burtin) wrote on 2010-02-11:

#12

I'd also like to see improved support of image compression when generating PDF. When you're scanning multi-page documents as opposed to art-work, small file size is more important than optimal image quality.

Revision history for this message

Dylan Justice (dsjstc) wrote on 2010-07-13:

#13

Confirmed in Lucid.

Revision history for this message

Tom Louwrier (tom-louwrier) wrote on 2010-07-21:

#14

Same here.
I use a HP C7280 and almost all my scans are 1-multiple page and 2-output to pdf.

A standard A4 business letter, scanned grayscale to pdf in 200 dpi will get me about 1,2MB per page. That's quite ridiculous really and it makes mailing those docs awkward. Not all my relations have wideband internet and gig-size mailboxes.
I did not find any options in the setup to control the image size/compression when producing pdf's, just the basic scan resolution setting.

Also it seems a bit strange that I can't scan at a lower resolution than 192 dpi, which was possible earlier.

Running 10.04 amd64 (and definitely not going back to XP or Vista to use the HP original software!)

cheers
Tom

Revision history for this message

Tessa Lau (tlau) wrote on 2011-02-11:

#15

I also suffer from this problem. A ten-page scan of a printed document results in a 43MB PDF.

What I found which works well is to apt-get install libtiff-tools and then use xsane to print to TIFF, and tiff2pdf to convert to PDF. The result is only 5MB and appears to have similar quality to the original file.

Moreover, the PDF generated via the original process results in thousands of errors like these when viewed in GNOME Document Viewer:

Error (588813): Illegal character <56> in hex string
Error (588814): Illegal character <a6> in hex string
Error (588815): Illegal character <af> in hex string
Error (588816): Illegal character <5c> in hex string

Revision history for this message

boblinux (robert-grasso) wrote on 2011-05-30:

#16

May I add my own contribution - I am running on Maverick, x86_64; I just scanned a single sheet, with few numbers typed, some of them in color - a pretty raw and almost empty document; I scanned it in 150dpi, full color, the result was a 2.8 MB pdf file; then I applied a trick I discovered by chance a few days ago, as I needed to merge scanned pdfs; so today I used it in order to shrink my single-page pdf :

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf

result : out.pdf size is 187 KB ! and note that this is not a conversion : this is used to MERGE pdfs.

Additionally, when opening the pdf produced by xsane with evince from a terminal, I get 2399 lines such as :

Error (358497): Illegal character <d9> in hex string
Error (358498): Illegal character <16> in hex string
Error (358499): Illegal character <d1> in hex string
Error (358500): Illegal character <9f> in hex string
Error (358501): Illegal character <5b> in hex string
Error (358502): Illegal character <fe> in hex string

when I open the one generated by gs, 0 (zero) error is generated - btw, it has been years that xsane pdfs yield such errors when opened by evince - but I guess that very few people open them from a terminal ...

Revision history for this message

A S (zephyr707) wrote on 2020-05-19:

#17

was this ever resolved?

I'm evaluating scanning programs for linux and xsane comes highly recommended, but a typical color scan of a test document I'm using generated a huge 17440k pdf vs. a 824k from simple-scan and visually I am not seeing any difference. The grayscale scan was 5596k and seems to suffer from some kind of aliasing issue around letters that are on a diagonal, which doesn't happen with simple-scan and results in a much smaller (b&w though) 280k document, albeit with its own issues since it is b&w vs gray. All scans were done at 300dpi.

I've been trying to figure out why xsane generates such massive pdfs and ended up here, is the bug still relevant? Seems very old.

Revision history for this message

A S (zephyr707) wrote on 2020-05-19:

#18

ok I take it back, there are definitely some jpeg artifacts in the simple-scan color scan when I zoom in over 500%, but the file size difference still seems quite dramatic.

still cannot produce a grayscale image that does not have this weird stepping/aliasing artifact and now that I've zoomed in very closely it seems to adversely affect the diagonal letters, but is actually present in all letters/images

Ubuntu
xsane package

xsane PDF file sizes could be optimized

Bug Description

Other bug subscribers

Remote bug watches

Ubuntuxsane package

xsane PDF file sizes could be optimized

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
xsane package