Home Software LaTex CV

Table of Contents

Module: classify noahprogs/classify.py
Imported modules   
import SeqIO
import cp
import os
import re
import string
import sys
Functions   
computeTruth
main
parseLogicalExpression
reconstructExpression
  computeTruth 
computeTruth (
        seqString,
        le,
        neg='^',
        v=0,
        )

Determine if seqString (a string) fulfills requirements of le (list)

  main 
main ()

%(filename)s

Classifies sequences in an alignment according to one or two "rules" specifying amino acid composition at defined positions. Rules are written as nested, parenthesized pairs of expressions; an expression consists of a position followed by a character (e.g., 3R). Build the rules by joining pairs of expressions with parentheses.

Input file should contain aligned sequences in fasta format; fasta format sequences can also be supplied via stdin, eg cat file.fasta | classify.py ...) If only one rule is given, sequences satisfying rule 1 are scored as false for rule 2. If two rules are given, sequences are scored for both rules independently; in this case a sequence may satisfy neither, one, or both rules.

Use extract.py to recover lists of sequences from input aligments. For example:

classify.py protease.fasta -r1="(63L)" -pick=a | extract.py protease.fasta -l=- -out=pro_63L.fasta

1A 2B 3C 4D 5E 6F 7G (1A 2B) (3C 4D) (5E 6F) 7G ((1A 2B) (3C 4D)) ((5E 6F) 7G) (((1A 2B) (3C 4D)) ((5E 6F) 7G))

Use & and | (logical AND, OR) to join pairs.

(((1A & 2B) | (3C & 4D)) | ((5E | 6F) & 7G))

Negate expressions with a leading ^

(((1A & ^2B) | (3C & 4D)) | ((5E | ^6F) & 7G))

Examples (enclose all rules on the command line with single or double quotes):

1) classify.py protease.fasta -out=pro_63p.classified -r1="(63P)" 2) ... -r1="(63P)" -r2="(63L)" 3) ... -r1="(63P | 63S)" -r2="(63L | 63S)" In this case, both expressions may be true. 4) ... -r1="(((63L | 63P) | (93R & ^97K)) & 72L)" 5) ... -r1="((63L | 63P) | ((93R & ^97K) & 72L))" Note that rules 4 and 5 above give different results.

{{r1 Rule 1}} {{r2 Rule 2 (optional)}} {{n Output format for name. s print the name of the sequence i print the index of the sequence 0 print nothing}} {{r Output format of result. num if r2 is not supplied, prints a 1 or 0 after the name according to the truth of r1; if r2 is supplied, prints a 1 or 0 for the result of each of the rules. ab prints A in the first column after the name if r1 is true and B in the next column if 1) r1 is false and no r2 is supplied, or 2) r2 is true. 0 print nothing}} {{pick Selects which lines (each line corresponding to a sequence) to print. ab Print all lines regardless of result. a Print line if rule 1 is satisfied. b Print line if rule 2 is satisfied.}} {{neg Character used to negate an expression. ^ ! Requires protection with a backslash when used on the command line.}} {{out Name of output file. Prints to stdout if unspecified.}}

{{h Prints documentation.}} {{v Verbosity; use for debugging. 1}} {{version print version info and exit}}

%(version)s

  parseLogicalExpression 
parseLogicalExpression ( input,  neg='^' )

  reconstructExpression 
reconstructExpression ( tokenHolder )

Reconstruct parenthesized-expression from string array; this can then be compared to original string--if it matches the expression was entered by user correctly.

Classes   

BadRuleFormatError

Classifier

NoRuleSuppliedError


Table of Contents

This document was automatically generated on Thu Feb 27 16:52:00 2003 by HappyDoc version 2.1