NAME¶
scramble - Converts between the SAM, BAM and CRAM file formats.
SYNOPSIS¶
scramble [
options] [
input_file [
output_file]]
DESCRIPTION¶
scramble converts between various next-gen sequencing alignment file
formats, including SAM, BAM and CRAM. It can either act as a pipe reading
stdin and writing to stdout, or on named files.
When operating as a pipe the input type defaults to SAM or BAM, requiring the
-I cram option to indicate input is in CRAM format is appropriate. The
output defaults to BAM, but can be adjusted by using the
-O
format option. When given filenames the file type is automatically
chosen based on the filename suffix.
OPTIONS¶
- -I format
- Selects the input format, where format is one of
sam, bam or cram. Use this when reading via a pipe to avoid input bytes
being consumed when attempting to detect if the input is in SAM or BAM
format.
- -O format
- Selects the output format, where format is one of
sam, bam or cram.
- -1 to -9
- Sets the compression level from 1 (low compression, fast)
to 9 (high compression, slow) when writing in BAM or CRAM format. This is
only used during writing.
- -0 or -u
- Writes uncompressed data. In BAM this still uses BGZF
containers, but with no internal compression. In CRAM it stores blocks in
RAW format instead. The option has no effect on SAM output.
- -R range
- Currently for CRAM input only, but SAM/BAM support is
pending. This indicates a reference sequence name and optionally a start
and end location within that reference, using the syntax ref_name
or ref_name:start-end. For efficient operation the
CRAM file needs a .crai format index (built using the cram_index
program).
- -r ref.fa
- CRAM encoding only. Use this to specify the reference fasta
file. Note that if the input SAM or BAM file a file: or local file
system based URI specified in the @SQ headers then this option may not be
necessary.
- -s number
- CRAM encoding only. Specifies the number of sequecnes per
slice. Defaults to 10000.
- -S number
- CRAM encoding only. Specifies the number of slices per
container. Defaults to 1.
- -V version_string
- CRAM encoding only. Sets the CRAM file format version.
Supported values are "1.0" and "2.0".
- -X
- CRAM encoding only. Embed snippets of the reference
sequence in every slice. This means the files can be decoded without
needing to specify the reference fasta file.
- -x
- CRAM only. When encoding, omit reference based compression
and instead store details of every base verbatim. During decoding -x is
still required to avoid checking that the reference can be loaded.
- -m
- CRAM decoding only. Generate MD:Z: and NM:I: auxiliary
fields based on the reference-based compression.
EXAMPLES¶
To convert a BAM file from stdin to CRAM on stdout, using reference MT.fa.
some_command | scramble -I bam -O cram -r MT.fa | some_command
To convert from CRAM version 1.0 to CRAM version 2.0.
scramble -V 2.0 in.cram out.cram
AUTHOR¶
James Bonfield, Wellcome Trust Sanger Institute