Nomad (Neighborhood Optimization for Multiple Alignment Discovery) is a
program dedicated to the ungapped local multiple alignment (ULMA)
problem, also known as "blocks". By using an
entropy-based objective function that takes into account the
amino acid's
nature, Nomad is well suited to deal with protein sequences. This
objective function, the shared entropy, has been shown to be
significantly more reliable than the relative entropy when protein
sequences to be aligned are distantly related.

Hernandez D. Gras R. Appel R.
Neighborhood Function and Hill-Climbing Strategies dedicated to the
Generalized Ungapped Local Multiple Alignment. Eur J Oper Res, 2006, in press (doi:10.1016/j.ejor.2005.10.076).

Hernandez D. (2005) Stratégies d'optimisation combinatoire pour le problème de l'alignement local multiple sans indels, et application aux séquences protéiques. PhD thesis, Université de Genève, SWITZERLAND.

Hernandez D. (2005) Stratégies d'optimisation combinatoire pour le problème de l'alignement local multiple sans indels, et application aux séquences protéiques. PhD thesis, Université de Genève, SWITZERLAND.

An ULMA is essentially a collection of
n occurrences of size w, chosen in way to be maximally
conserved. Both n and w are fixed by the user. Nomad is
an optimization
program that makes use of a hill-climber to search the n occurrences that maximize an
objective function.

The occurrence distribution in the sequence set can be constrained in
four ways:

The widely used objective function for the ULMA problem is the relative
entropy, which is the information theory point of view of a
log-likelihood ratio statistic. The main drawback of the relative
entropy when aligning protein sequences is that all amino acids are
considered to be independent. The fact that some substitutions may
occur
more often than others is not considered by this function. Nomad
implement the "shared entropy", an objective function which takes into
account an "equivalence" measure between amino acids. The
shared entropy has been shown to be significantly more efficient than
the
relative entropy, both in terms of noise/signal distinction, and
optimization process.

(a) OOPS (One Occurrence Per Sequence)

This is the simplest and the most constrained mode. It is supposed that every sequence contributes exactly once to the ULMA. In this mode, n is implicitly fixed by the number of sequences in the dataset.

(b) ALOOPS (At Least One Occurrence Per Sequence)

All sequences must contribute to the ULMA but some may contribute more than once. n has to be specified as greater than or equal to the number of sequences.

(c) AMOOPS (At Most One Occurrence Per Sequence)

Some sequences can be discarded from the ULMA. n has to be specified as lower than or equal to the number of sequences.

(d) AOPS (Any number of Occurrences Per Sequence)

This mode is the least constrained one. It allows occurrences to be distributed anywhere in the sequence set, as long as they do not overlap with each other. n has to be specified between 2 and a reasonable value.

This is the simplest and the most constrained mode. It is supposed that every sequence contributes exactly once to the ULMA. In this mode, n is implicitly fixed by the number of sequences in the dataset.

(b) ALOOPS (At Least One Occurrence Per Sequence)

All sequences must contribute to the ULMA but some may contribute more than once. n has to be specified as greater than or equal to the number of sequences.

(c) AMOOPS (At Most One Occurrence Per Sequence)

Some sequences can be discarded from the ULMA. n has to be specified as lower than or equal to the number of sequences.

(d) AOPS (Any number of Occurrences Per Sequence)

This mode is the least constrained one. It allows occurrences to be distributed anywhere in the sequence set, as long as they do not overlap with each other. n has to be specified between 2 and a reasonable value.

Dataset:

Paste your sequences in the FASTA format.

Example:

`
>sequence label`

MKALTARQQEVFD...

>sequence label

MEQNPQSQLKLLV...

>sequence label

MGMKISELAKACD...

Width:

Set the width of the ULMA to be searched.

Protein, shared entropy:

This is the default option. The ULMA is optimized with the shared entropy

Protein, relative entropy:

Optimize the ULMA with the "classical" relative entropy objective function. The relative entropy is the widely used function for the ULMA problem.

DNA, relative entropy:

Choose this option if you align DNA sequences.

Occurrence repartition constraints:

Choose one of the OOPS, ALOOPS, AMOOPS or AOPS constrain models and set the number of occurrences in the ULMA.

Sort occurrences:

Check this option to sort occurrences by their own score. The score of an occurrence is a log-likelihood ratio, which reflects how well the occurrence fits the rest of the ULMA.

E-mail address:

Type your e-mail address to get the result in your mail box. This option is recommended and is more reliable if the cpu-time is substantial.

Paste your sequences in the FASTA format.

Example:

MKALTARQQEVFD...

>sequence label

MEQNPQSQLKLLV...

>sequence label

MGMKISELAKACD...

Width:

Set the width of the ULMA to be searched.

Protein, shared entropy:

This is the default option. The ULMA is optimized with the shared entropy

Protein, relative entropy:

Optimize the ULMA with the "classical" relative entropy objective function. The relative entropy is the widely used function for the ULMA problem.

DNA, relative entropy:

Choose this option if you align DNA sequences.

Occurrence repartition constraints:

Choose one of the OOPS, ALOOPS, AMOOPS or AOPS constrain models and set the number of occurrences in the ULMA.

Sort occurrences:

Check this option to sort occurrences by their own score. The score of an occurrence is a log-likelihood ratio, which reflects how well the occurrence fits the rest of the ULMA.

E-mail address:

Type your e-mail address to get the result in your mail box. This option is recommended and is more reliable if the cpu-time is substantial.

This example shows a ULMA under the
OOPS mode performed on 15 helix-turn-helix domain-containing proteins.
The first column shows the label of the sequence, the second
column gives the occurrence positions in the corresponding sequence.
The third column shows the occurrence itself, and finally the fourth
column shows the score of the occurrence. This score reflects how well
the occurrence fits the rest of the alignment. The alignment score is
the objective value that has been optimized, and correspond to the
average occurrence score. Note that these scores cannot be interpreted
as confidence values. They are only relative to the ULMA that has been
optimized and thus cannot been compared between different ULMAs.
Symbols are blue-scaled according to their
objective score contribution. The darker the symbol the
stronger its contribution.

Since Nomad performs stochastic optimizations, two independent runs with the same parameters could produce a different result. If this occurs, simply consider the best scoring alignment.

Since Nomad performs stochastic optimizations, two independent runs with the same parameters could produce a different result. If this occurs, simply consider the best scoring alignment.

`>LEXA_ECOLI_P03033; 26 PTRAEIAQRLGFRSPNAAEEHL 15.691
>RPSD_ECOLI_P00579; 571 YTLEEVGKQFDVTRERIRQIEA 19.645
>MERR_STAAU_P22874; 3 MKISELAKACDVNKETVRYYER 19.422
>ASNC_ECOLI_P03809; 23 TAYAELAKQFGVSPGTIHVRVE 22.185
>ICLR_ECOLI_P16528; 44 VALTELAQQAGLPNSTTHRLLT 18.815
>LACR_STAAW_P16644; 20 IRTNEIVEGLNVSDMTVRRDLI 16.389
>CRP_ECOLI_P03020; 168 ITRQEIGQIVGCSRETVGRILK 20.827
>GNTR_BACLI_P46833; 42 LSENKLAAEFSVSRSPIREALK 17.506
>PMX1_MOUSE_P43271; 122 FVREDLARRVNLTEARVQVWFQ 18.347
>LYSR_ECOLI_P03030; 19 GSLTEAAHLLHTSQPTVSRELA 18.060
>ARSR_STAAU_P30338; 30 LCACDLLEHFQFSQPTLSHHMK 20.778
>ARAC_ECOLI_P03021; 195 FDIASVAQHVCLSPSRLSHLFR 19.659
>NER_BPMU_P06020; 23 LSLSALSRQFGYAPTTLANALE 19.644
>RCRO_BPP22_P09964; 11 GTQRAVAKALGISDAAVSQWKE 18.936
>FIXJ_BRAJA_P23221; 158 LSNKLIAREYDISPRTIEVYRA 17.702
Objective score 18.907
`

For questions, suggestions or
comments,
please contact us.