main ()
%(filename)s
Classifies sequences in an alignment according to one or two "rules" specifying
amino acid composition at defined positions. Rules are written as nested,
parenthesized pairs of expressions; an expression consists of a position
followed by a character (e.g., 3R). Build the rules by joining pairs of
expressions with parentheses.
Input file should contain aligned sequences in fasta format; fasta format
sequences can also be supplied via stdin, eg cat file.fasta | classify.py ...)
If only one rule is given, sequences satisfying rule 1 are scored as false for
rule 2. If two rules are given, sequences are scored for both rules
independently; in this case a sequence may satisfy neither, one, or both rules.
Use extract.py to recover lists of sequences from input aligments. For example:
classify.py protease.fasta -r1="(63L)" -pick=a | extract.py protease.fasta
-l=- -out=pro_63L.fasta
1A 2B 3C 4D 5E 6F 7G
(1A 2B) (3C 4D) (5E 6F) 7G
((1A 2B) (3C 4D)) ((5E 6F) 7G)
(((1A 2B) (3C 4D)) ((5E 6F) 7G))
Use & and | (logical AND, OR) to join pairs.
(((1A & 2B) | (3C & 4D)) | ((5E | 6F) & 7G))
Negate expressions with a leading ^
(((1A & ^2B) | (3C & 4D)) | ((5E | ^6F) & 7G))
Examples (enclose all rules on the command line with single or double quotes):
1) classify.py protease.fasta -out=pro_63p.classified -r1="(63P)"
2) ... -r1="(63P)" -r2="(63L)"
3) ... -r1="(63P | 63S)" -r2="(63L | 63S)"
In this case, both expressions may be true.
4) ... -r1="(((63L | 63P) | (93R & ^97K)) & 72L)"
5) ... -r1="((63L | 63P) | ((93R & ^97K) & 72L))"
Note that rules 4 and 5 above give different results.
{{r1 Rule 1}}
{{r2 Rule 2 (optional)}}
{{n Output format for name.
s print the name of the sequence
i print the index of the sequence
0 print nothing}}
{{r Output format of result.
num if r2 is not supplied, prints a 1 or 0 after the name according to the truth of r1; if r2 is supplied, prints a 1 or 0 for the result of each of the rules.
ab prints A in the first column after the name if r1 is true and B in the next column if 1) r1 is false and no r2 is supplied, or 2) r2 is true.
0 print nothing}}
{{pick Selects which lines (each line corresponding to a sequence) to print.
ab Print all lines regardless of result.
a Print line if rule 1 is satisfied.
b Print line if rule 2 is satisfied.}}
{{neg Character used to negate an expression.
^
! Requires protection with a backslash when used on the command line.}}
{{out Name of output file. Prints to stdout if unspecified.}}
{{h Prints documentation.}}
{{v Verbosity; use for debugging.
1}}
{{version print version info and exit}}
%(version)s
|