UNPAPER(1) | Reference | UNPAPER(1) |
NAME¶
unpaper - Post-processing tool for scanned sheets of paper.SYNOPSIS¶
unpaper
[ options] {input-pattern output-pattern |
input-file(s) output-file(s)}
OVERVIEW¶
unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR). unpaper tries to clean scanned images by removing dark edges that appeared through scanning or copying on areas outside the actual page content (e.g. dark areas between the left-hand-side and the right-hand-side of a double- sided book-page scan). The program also tries to detect disaligned centering and rotation of pages and will automatically straighten each page by rotating it to the correct angle. This process is called "deskewing". Note that the automatic processing will sometimes fail. It is always a good idea to manually control the results of unpaper and adjust the parameter settings according to the requirements of the input. Each processing step can also be disabled individually for each sheet. Input and output files can be in either .pbm, .pgm or .ppm format, thus generally in .pnm format, as also used by the Linux scanning tools scanimage and scanadf. Conversion to PDF can e.g. be achieved with the Linux tools pgm2tiff, tiffcp and tiff2pdf.INPUT AND OUTPUT FILES¶
Input and output files need to be designed either by using patterns or an ordered list of input and output files; if patterns are used, such as %04d, then they are substituted for the input and output sheet number before opening the file for input or output. If you're not using patterns, then the program expects one or two input files depending on what is passed as --input-pages and one or two output files depending on what is passed as --output-pages, in order. Missing output file names are fatal and will stop processing; missing initial input file names are fatal, and so is any missing input file if a range of sheets is defined through --sheet or --end-sheet.OPTIONS¶
-l { single | double | none }, --layout { single | double | none }Set default layout options for a sheet:
single
Using single or double automatically sets corresponding
--mask-scan-points. The default is single.
-start sheet, --start-sheet start-sheet
One page per sheet.
double
Two pages per sheet, landscape orientation
(one page on the left half, one page on the right half).
none
No auto-layout, mask-scan-points may
individually be specified.
Number of first sheet to process in
multi-sheet mode. (default: 1)
-end sheet, --end-sheet sheet
Number of last sheet to process in multi-sheet
mode. -1 indicates processing until no more input file with the corresponding
page number is available (default: -1)
-# sheet-range, --sheet sheet-range
Optionally specifies which sheets to process
in the range between start-sheet and end-sheet.
-x sheet-range, --exclude sheet-range
Excludes sheets from processing in the range
between start-sheet and end-sheet.
--pre-rotate { -90 | 90 }
Rotates the whole image clockwise (90)
or anti-clockwise ( -90) before any other processing.
--post-rotate { -90 | 90 }
Rotates the whole image clockwise (90)
or anti-clockwise ( -90) after any other processing.
-M { v | h | v,h }, --pre-mirror { v |
h | v,h }
Mirror the image, after possible pre-rotation.
Either v (for vertical mirroring), h (for horizontal mirroring)
or v,h (for both) can be specified.
--post-mirror { v | h | v,h }
Mirror the image, after any other processing
except possible post-rotation. Either v (for vertical mirroring),
h (for horizontal mirroring) or v,h (for both) can be
specified.
--pre-shift h,v
Shift the image before further processing.
Values for h (horizontal shift) and v (vertical shift) can
either be positive or negative.
--post-shift h,v
Shift the image after other processing. Values
for h (horizontal shift) and v (vertical shift) can either be
positive or negative.
--pre-wipe
left,top,right,bottom
Manually wipe out an area before further
processing. Any pixel in a wiped area will be set to white. Multiple areas to
be wiped may be specified by multiple occurrences of this options.
--post-wipe
left,top,right,bottom
Manually wipe out an area after processing.
Any pixel in a wiped area will be set to white. Multiple areas to be wiped may
be specified by multiple occurrences of this options.
--pre-border
left,top,right,bottom
Clear the border-area of the sheet before
further processing. Any pixel in the border area will be set to white.
--post-border
left,top,right,bottom
Clear the border-area of the sheet after other
processing. Any pixel in the border area will be set to white.
--pre-mask x1,y1,x2,y2
Specify masks to apply before any other
processing. Any pixel outside a mask will be set to white, unless another mask
includes this pixel.
Only pixels inside a mask will remain. Multiple masks may be specified. No
deskewing will be applied to the masks specified by --pre-mask.
-s { width,height | size-name },
--size { width,height | size-name }
Change the sheet size before other processing
is applied. Content on the sheet gets zoomed to fit to the appropriate size,
but the aspect ratio is preserved. Instead, if the sheet's aspect ratio
changes, the zoomed content gets centered on the sheet.
Possible values for size-name are: a5, a4, a3,
letter, legal. All size names can also be applied in rotated
landscape orientation, use a4-landscape, letter-landscape
etc.
--post-size { width,height | size-name }
Change the sheet size preserving the content's
aspect ratio after other processing steps are applied.
--stretch { width,height | size-name }
Change the sheet size before other processing
is applied. Content on the sheet gets stretched to the specified size,
possibly changing the aspect ratio.
--post-stretch { width,height | size-name }
Change the sheet size after other processing
is applied. Content on the sheet gets stretched to the specified size,
possibly changing the aspect ratio.
-z factor, --zoom factor
Change the sheet size according to the given
factor before other processing is done.
--post-zoom factor
Change the sheet size according to the given
factor after processing is done.
-bn { v | h | v,h },
--blackfilter-scan-direction { v | h | v,h }
Directions in which to search for solidly
black areas. Either v (for vertical mirroring), h (for
horizontal mirroring) or v,h (for both) can be specified.
-bs { size | h-size,v-size },
--blackfilter-scan-size { size |
h-size,v-size }
Width of virtual bar used for mask detection.
Two values may be specified to individually set horizontal and vertical size.
(default: 20,20)
-bd { depth | h-depth,v-depth },
--blackfilter-scan-depth { depth |
h-depth,v-depth }
Size of virtual bar used for black area
detection. (default: 500,500)
-bp { step | h-step,v-step },
--blackfilter-scan-step { step |
h-step,v-step }
Steps to move virtual bar for black area
detection. (default: 5,5)
-bt threshold, --blackfilter-scan-threshold
threshold
Ratio of dark pixels above which a black area
gets detected. (default: 0.95).
-bx
left,top,right,bottom,
--blackfilter-scan-exclude
left,top,right ,bottom
Area on which the blackfilter should not
operate. This can be useful to prevent the blackfilter from working on inner
page content. May be specified multiple times to set more than one area.
-bi intensity, --blackfilter-intensity intensity
Intensity with which to delete black areas.
Larger values will leave less noise-pixels around former black areas, but may
delete page content. (default: 20)
-ni intensity, -noisefilter-intensity intensity
Intensity with which to delete individual
pixels or tiny clusters of pixels. Any cluster which only contains
intensity dark pixels together will be deleted. (default:
4)
-ls { size | h-size,v-size },
--blurfilter-size { size | h-size,v-size }
Size of blurfilter area to search for
"lonely" clusters of pixels. (default: 100,100)
-lp { step | h-step,v-step },
--blurfilter-step { step | h-step,v-step }
Size of "blurring" steps in each
direction. (default: 50,50)
-li ratio, --blurfilter-intensity ratio
Relative intensity with which to delete tiny
clusters of pixels. Any blurred area which contains at most the ratio
of dark pixels will be cleared. (default: 0.01)
-gs { size | h-size,v-size },
--grayfilter-size { size | h-size,v-size }
Size of grayfilter mask to search for
"gray-only" areas of pixels. (default: 50,50)
-gp { step | h-step,v-step },
--grayfilter-step { step | h-step,v-step }
Size of steps moving the grayfilter mask in
each direction. (default: 20,20)
-gt ratio, --grayfilter-threshold ratio
Relative intensity of grayness which is
accepted before clearing the grayfilter mask in cases where no black pixel is
found in the mask. (default: 0.5)
-p x,y, --mask-scan-point
x,y
Manually set starting point for
mask-detection. Multiple --mask-scan-point options may be specified to
detect multiple masks.
-m x1,y1,x2,y2,
--mask x1,y1,x2,y2
Manually add a mask, in addition to masks
automatically detected around the --mask-scan-point coordinates (unless
--no-mask-scan is specified).
Any pixel outside a mask will be set to white, unless another mask covers this
pixel.
-mn { v | h | v,h }, --mask-scan-direction {
v | h | v,h }
Directions in which to search for mask
borders, starting from --mask-scan-point coordinates. Either v (for
vertical mirroring), h (for horizontal mirroring) or v,h (for
both) can be specified. (default: h, as v may cut text-
paragraphs on single-page sheets)
-ms { size | h-size,v-size },
--mask-scan-size { size | h-size,v-size }
Width of the virtual bar used for mask
detection. Two values may be specified to individually set horizontal and
vertical size. (default: 50,50)
-md { depth | h-depth,v-depth },
--mask-scan-depth { depth | h-depth,v-depth
}
Height of the virtual bar used for mask
detection. (default: -1,-1, using the total width or height of the
sheet)
-mp { step | h-step,v-step },
--mask-scan-step { step | h-step,v-step }
Steps to move the virtual bar for mask
detection. (default: 5,5)
-mt { threshold | h-threshold,v-threshold },
--mask-scan-threshold { threshold |
h-threshold,v-threshold }
Ratio of dark pixels below which an edge gets
detected, relative to maximum blackness when counting from the start
coordinate heading towards one edge. (default: 0.1)
-mm w,h, --mask-scan-minimum
w,h
Minimum allowed size of an auto-detected mask.
Masks detected below this size will be ignored and set to the size specified
by mask-scan-maximum. (default: 100,100)
-mM w,h, --mask-scan-maximum
w,h
Maximum allowed size of an auto-detected mask.
Masks detected above this size will be shrunk to the maximum value, each
direction individually. (default: sheet size, or page size derived from
--layout option)
-mc color, --mask-color color
Color value with which to wipe out pixels not
covered by any mask. Maybe useful for testing in order to visualize the effect
of masking. (Note that an RGB-value is expected: R*65536 + G*256 + B.)
-dn { left | top | right | bottom
},..., --deskew-scan-direction { left | top |
right | bottom },...
Edges from which to scan for rotation. Each
edge of a mask can be used to detect the mask's rotation. If multiple edges
are specified, the average value will be used, unless the statistical
deviation exceeds --deskew-scan-deviation. Use left for scanning
from the left edge, top for scanning from the top edge, right
for scanning from the right edge, bottom for scanning from the bottom.
Multiple directions can be separated by commas. (default:
left,right)
-ds pixels, --deskew-scan-size pixels
Size of virtual line for rotation detection.
(default: 1500)
-dd ratio, --deskew-scan-depth ratio
Amount of dark pixels to accumulate until
scanning is stopped, relative to scan-bar size. (default: 0.5)
-dr degrees, --deskew-scan-range degrees
Range in which to search for rotation, from
-degrees to + degrees rotation. (default: 5.0)
-dp degrees, --deskew-scan-step degrees
Steps between single rotation-angle
detections. Lower numbers lead to better results but slow down processing.
(default: 0.1)
-dv deviation, --deskew-scan-deviation deviation
Maximum statistical deviation allowed among
the results from detected edges. No rotation if exceeded. (default:
1.0)
-W
left,top,right,bottom,
--wipe
left,top,right,bottom
Manually wipe out an area. Any pixel in a
wiped area will be set to white. Multiple --wipe areas may be
specified. This is applied after deskewing and before automatic
border-scan.
-mw { size | left,right },
--middle-wipe { size | left,right }
If --layout is set to double,
this may specify the size of a middle area to wipe out between the two pages
on the sheet. This may be useful if the blackfilter fails to remove some black
areas (e.g. resulting from photo-copying in the middle between two
pages).
-B
left,top,right,bottom,
--border
left,top,right,bottom
Manually add a border. Any pixel in the border
area will be set to white. This is applied after deskewing and before
automatic border-scan.
-Bn { v | h | v,h }, --border-scan-direction
{ v | h | v,h }
Directions in which to search for outer
border. Either v (for vertical mirroring), h (for horizontal
mirroring) or v,h (for both) can be specified. (default:
v)
-Bs { size | h-size,v-size },
--border-scan-size { size | h-size,v-size }
Width of virtual bar used for border
detection. Two values may be specified to individually set horizontal and
vertical size. (default: 5,5)
-Bp { step | h-step,v-step },
--border-scan-step { step | h-step,v-step }
Steps to move virtual bar for border
detection. (default: 5,5)
-Bt threshold, --border-scan-threshold threshold
Absolute number of dark pixels covered by the
border-scan mask above which a border is detected. (default: 5)
-Ba { left | top | right | bottom },
--border-align { left | top | right |
bottom }
Direction where to shift the detected
border-area. Use --border-margin to specify horizontal and vertical
distances to be kept from the sheet-edge. (default: none)
-Bm vertical,horizontal, --border-margin
vertical ,horizontal
Distance to keep from the sheet edge when
aligning a border area. May use measurement suffices such as cm, in.
-w threshold, --white-threshold threshold
Brightness ratio above which a pixel is
considered white. (default: 0.9)
-b threshold, --black-threshold threshold
Brightness ratio below which a pixel is
considered black (non-gray). This is used by the gray-filter. This value is
also used when converting a grayscale image to black-and-white mode (default:
0.33)
-ip { 1 | 2 }, --input-pages { 1 | 2 }
If 2 is specified, read two input
images instead of one and internally combine them to a doubled-layout sheet
before further processing. Before internally combining, --pre-rotation
is optionally applied individually to both input images as the very first
processing steps.
-op { 1 | 2 }, --output-pages { 1 | 2
}
If 2 is specified, write two output
images instead of one, as a result of splitting a doubled-layout sheet after
processing. After splitting the sheet, --post-rotation is optionally
applied individually to both output images as the very last processing
step.
-S { width,height | size-name },
--sheet-size { width,height | size-name }
Force a fix sheet size. Usually, the sheet
size is determined by the input image size (if input-pages=1), or by
the double size of the first page in a two-page input set (if
input-pages=2). If the input image is smaller than the size specified
here, it will appear centered and surrounded with a white border on the sheet.
If the input image is bigger, it will be centered and the edges will be
cropped. This option may also be helpful to get regular sized output images if
the input image sizes differ. Standard size-names like a4-landscape,
letter, etc. may be used (see --size). (default: as in input
file)
--sheet-background { black | white }
Sets a color with which the sheet is filled
before any image is loaded and placed onto it. This can be useful when the
sheet size and the image size differ.
--no-blackfilter sheet-range
Disables black area scan. Individual sheet
indices can be specified.
--no-noisefilter sheet-range
Disables the noisefilter. Individual sheet
indices can be specified.
--no-blurfilter sheet-range
Disables the blurfilter. Individual sheet
indices can be specified.
--no-grayfilter sheet-range
Disables the grayfilter. Individual sheet
indices can be specified.
--no-mask-scan sheet-range
Disables mask-detection. Masks explicitly set
by --mask will still have effect. Individual sheet indices can be
specified.
--no-mask-center sheet-range
Disables auto-centering of each mask.
Auto-centering is performed by default if the --layout option has been
set. Individual sheet indices can be specified.
--no-deskew sheet-range
Disables deskewing. Individual sheet indices
can be specified.
--no-wipe sheet-range
Disables explicit wipe-areas. This means the
effect of parameter --wipe can be disabled individually per
sheet.
--no-border sheet-range
Disables explicitly set borders. This means
the effect of parameter --border can be disabled individually per
sheet.
--no-border-scan sheet-range
Disables border-scanning from the edges of the
sheet. Individual sheet indices can be specified.
--no-border-align sheet-range
Disables aligning of the area detected by
border-scanning (see --border-align). Individual sheet indices can be
specified.
-n sheet-range, --no-processing sheet-range
Do not perform any processing on a sheet
except pre/post rotating and mirroring, and file-depth conversions on saving.
This option has the same effect as setting all --no-xxx options
together. Individual sheet indices can be specified.
--no-qpixels
Disable qpixel-mode for deskewing (do not
internally use a 4x bigger image when rotating).
--no-multi-pages
Disable multi-page processing even if the
input filename contains a % (usually indicating the start of a placeholder for
the page counter).
--dpi dpi
Dots per inch used for conversion of measured
size values, like e.g. 21cm,27.9cm. Mind that this parameter should
occur before specifying any size value with measurement suffix. (default:
300)
-t { pbm | pgm }, --type { pbm | pgm }
Output file type. (default: as input)
-d bits, --depth bits
Output pixel depth. (default: as input)
-T, --test-only
Do not write any output. May be useful in
combination with --verbose to get information about the input.
-si nr, --start-input nr
Set the first page number to substitute for
'%d' in input filenames. Every time the input file sequence is repeated, this
number gets increased by 1. (default: (startsheet-1)*inputpages+1)
-so nr, --start-output nr
Set the first page number to substitute for
'%d' in output filenames. Every time the output file sequence is repeated,
this number gets increased by 1. (default: (startsheet-1)*outputpages+1)
--insert-blank nr [,nr...]
Use blank input instead of an input file from
the input file sequence at the specified index-positions. The input file
sequence will be interrupted temporarily and will continue with the next input
file afterwards. This can be useful to insert blank content into a sequence of
input images.
--replace-blank nr [,nr...]
Like --insert-blank, but the input
images at the specified index positions get replaced with blank content and
thus will be ignored.
--overwrite
Allow overwriting existing files. Otherwise
the program terminates with an error if an output file to be written already
exists.
-q, --quiet
Quiet mode, no output at all.
-v, --verbose
Verbose output, more info messages.
-vv
Even more verbose output, show parameter
settings before processing.
--time
Output processing time consumed.
-V, --version
Output version and build information.
COPYRIGHT¶
September 2011 | unpaper |