Input format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.

  • The description line starts with a greater than symbol (">").

  • The word following the greater than symbol (">") immediately is the "ID" (name) of the sequence, the rest of the line is the description.

  • The "ID" and the description are optional.

  • All lines of text should be shorter than 80 characters.

  • The sequence ends if there is another greater than symbol (">") symbol at the beginning of a line and another sequence begins.

The following is an example:

>Example1 envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLN

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:

        A --> adenosine           	M --> A C (amino)
       C --> cytidine            	S --> G C (strong)
       G --> guanine            	W --> A T (weak)
       T --> thymidine           	B --> G T C
       U --> uridine             	D --> G A T
       R --> G A (purine)        	H --> A C T
       Y --> T C (pyrimidine)   	V --> G C A
       K --> G T (keto)          	N --> A G C T (any)
                                  	-  gap of indeterminate length

For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:

    A  alanine                         		P  proline
    B  aspartate or asparagine         	Q  glutamine
    C  cystine                         		R  arginine
    D  aspartate                       		S  serine
    E  glutamate                       		T  threonine
    F  phenylalanine                   		U  selenocysteine
    G  glycine                         		V  valine
    H  histidine                       		W  tryptophane
    I  isoleucine                      		Y  tyrosine
    K  lysine                          		Z  glutamate or glutamine
    L  leucine                         		X  any
    M  methionine                      		*  translation stop
    N  asparagine                      		-  gap of indeterminate length

* If you are trying to submit plain/raw sequence, make it FASTA by simply adding a first line consisting of a ">":
>
VHDDLEEEAADLLLVSSR

last updated November 27, 2017; contacts: E-mail: kangueane@bioinformation.net; Phone: +91 9486267369

(©) Biomedical Informatics