psa - biological sequence alignment file format


     psa  is  an  output  format  used  by the pftools package to

     describe alignments between  biological  sequences  (DNA  or

     protein) and PROSITE profiles.

     psa  is  apparented  to  the widely used biological sequence

     file format fasta.  Nevertheless it does not only describe a

     biological sequence, it is especially used to include infor-

     mation of alignments  between  a  motif  descriptor  like  a

     PROSITE  profile  and  a given sequence. This information is

     included in the header and reflected in the structure of the

     sequence following the header line.


     Each sequence in a psa alignment file or output must be pre-

     ceded by a fasta header line.

     The general syntax of such a fasta header line  is  as  fol-


          >seq_id [ free_text ]

     The header must start with a '>' character which is directly

     followed by the seq_id field. This field is  interpreted  by

     most  programs as the sequence's identifier and/or accession

     number. It ends at the first encountered whitespace  charac-


     The  pftools programs will use the free_text to add informa-

     tion about the match score, position and description of  the

     sequence or motif.  Please refer to the man page of the cor-

     responding programs for further information about the output


     The  header  can  only  extend  over one line. The following

     lines up to a new line starting with a '>' character or  the

     end of the file are interpreted as sequence data.

     The  line  following  the  header, starts the alignment data

     between a sequence and a PROSITE profile. This data can span

     over several lines of different length.

     The  data is formed by upper or lower-case characters of the

     corresponding sequence alphabet (DNA or protein).   The  gap

     characters '.' and '-' are also supported.

     The alignment always has at least the length of the matching

     profile.  Insertions  or  deletions  detected   during   the

     motif/sequence  alignment  step  will vary the length of the

     data reported, and can be  identified  using  the  following


          upper-case character

               Any  upper-case character of the sequence alphabet

               identifies a match position between  the  sequence

               and the motif descriptor.

          lower-case character

               A lower-case character of the sequence alphabet is

               used to symbolize an  insertion  in  the  sequence

               compared to the motif descriptor.

          '-' (dash) character

               A '-' character in the output identifies the pres-

               ence of a deletion in the sequence compared to the

               motif descriptor.


     (1)  >YD28_SCHPO 556 pos. 291 - 332 sp|Q10256|YD28_SCHPO


          This   is   an   example  of  the  output  produced  by

          pfsearch(1) using the '-x' (i.e.  psa  output)  option.

          The  first  line starting with the '>' character is the

          fasta header. It also contains  information  about  the

          raw  score  of the alignment as well as its position in

          the input sequence.

          On the next line you find the alignment proper.  Start-

          ing  at  position  6,  we  can find an insertion of the

          'lns' residues in the sequence compared to  the  motif.

          The  last two positions of the motif are not present in

          the sequence (i.e. they are deleted).   This  is  indi-

          cated  by  the presence of two '-' (dash) characters at

          the end of the alignment.


     (1)  The xpsa(5) format defines a more strict syntax of  the

          header  line,  allowing  the  exchange  of  information

          between different sequence analysis tools. It uses key-

          word=value  pairs to annotate the current match between

          a sequence and a motif descriptor. This syntax  can  be

          easily  parsed  and extended, according to the needs of

          bioinformatic tools.


     The current implementation of the pftools package  does  not

     use the '.'  (dot) character in the psa output. Nevertheless

     psa2msa(1) will read it and interpret it in the same  manner

     as the '-' (dash) character.


     xpsa(5),    pfsearch(1),   pfscan(1),   pfw(1),   pfmake(1),



     This manual page was originally written by Volker Flegel.

     The pftools package was developped by Philipp Bucher.

     Any  comments  or  suggestions  should   be   addressed   to