READSEQ(1) | General Commands Manual | READSEQ(1) |
NAME¶
readseq - Reads and writes nucleic/protein sequences in various formats
SYNOPSIS¶
readseq [-options] in.seq > out.seq
DESCRIPTION¶
This manual page documents briefly the readseq command. This manual page was written for the Debian GNU/Linux distribution because the original program does not have a manual page. Instead, it has documentation in text form, see below.
readseq reads and writes biosequences (nucleic/protein) in various formats. Data files may have multiple sequences. readseq is particularly useful as it automatically detects many sequence formats, and interconverts among them.
FORMATS¶
- Formats which readseq currently understands:
-
* IG/Stanford, used by Intelligenetics and others -
* GenBank/GB, genbank flatfile format -
* NBRF format -
* EMBL, EMBL flatfile format -
* GCG, single sequence format of GCG software -
* DNAStrider, for common Mac program -
* Fitch format, limited use -
* Pearson/Fasta, a common format used by Fasta programs and others -
* Zuker format, limited use. Input only. -
* Olsen, format printed by Olsen VMS sequence editor. Input only. -
* Phylip3.2, sequential format for Phylip programs -
* Phylip, interleaved format for Phylip programs (v3.3, v3.4) -
* Plain/Raw, sequence data only (no name, document, numbering) -
+ MSF multi sequence format used by GCG software -
+ PAUP's multiple sequence (NEXUS) format -
+ PIR/CODATA format used by PIR -
+ ASN.1 format used by NCBI -
+ Pretty print with various options for nice looking output. Output only. -
+ LinAll format, limited use (LinAll and ConStruct programs) -
+ Vienna format used by ViennaRNA programs - See the included "Formats" file for detail on file formats.
OPTIONS¶
- -help
- Show summary of options.
- -a[ll]
- Select All sequences
- -c[aselower]
- Change to lower case
- -C[ASEUPPER]
- Change to UPPER CASE
- -degap[=-]
- Remove gap symbols
- -i[tem=2,3,4]
- Select Item number(s) from several
- -l[ist]
- List sequences only
- -o[utput=]out.seq
- Redirect Output
- -p[ipe]
- Pipe (command line, <stdin, >stdout)
- -r[everse]
- Change to Reverse-complement
- -v[erbose]
- Verbose progress
- -f[ormat=]# Format number for output, or
-
-f[ormat=]Name Format name for output:
1. IG/Stanford 11. Phylip3.2
2. GenBank/GB 12. Phylip
3. NBRF 13. Plain/Raw
4. EMBL 14. PIR/CODATA
5. GCG 15. MSF
6. DNAStrider 16. ASN.1
7. Fitch 17. PAUP/NEXUS
8. Pearson/Fasta 18. Pretty (out-only)
9. Zuker (in-only) 19. LinAll
10. Olsen (in-only) 20. ViennaPretty format options:
- -wid[th]=#
- Sequence line width
- -tab=#
- Left indent
- -col[space]=#
- Column space within sequence line on output
- -gap[count]
- Count gap chars in sequence numbers
- -nameleft, -nameright[=#]
- Name on left/right side [=max width]
- -nametop
- Name at top/bottom
- -numleft, -numright
- Seq index on left/right side
- -numtop, -numbot
- Index on top/bottom
- -match[=.]
- Use match base for 2..n species
- -inter[line=#]
- Blank line(s) between sequence blocks
EXAMPLES¶
-
readseq -
-- for interactive use -
readseq my.1st.seq my.2nd.seq -all -format=genbank -output=my.gb -
-- convert all of two input files to one genbank format output file -
readseq my.seq -all -form=pretty -nameleft=3 -numleft -numright -numtop -match -
-- output to standard output a file in a pretty format -
readseq my.seq -item=9,8,3,2 -degap -CASE -rev -f=msf -out=my.rev -
-- select 4 items from input, degap, reverse, and uppercase them -
cat *.seq | readseq -pipe -all -format=asn > bunch-of.asn -
-- pipe a bunch of data thru readseq, converting all to asn
SEE ALSO¶
The programs are documented fully in text form. See the files in /usr/share/doc/readseq
AUTHOR¶
This manual page was written by Stephane Bortzmeyer <bortzmeyer@debian.org>, for the Debian GNU/Linux system (but may be used by others).