Cellosaurus logo
expasy logo
Table of contents
1. Introduction

CLASTR (Cell Line Authentication using STR) allows you to search for similarities between your cell line samples and the extensive number of human cell lines STR profiles stored in the Cellosaurus.

2. Parameters
2.1 Markers

Allele data is inputed in the form of values separated by commas. The values are restricted to X and Y (case insensitive) for Amelogenin and numbers for STR markers. The period . character is also allowed in order to describe variant alleles (for example 9.3). Note that allele data is whitespace insensitive.

2.1.1 Number of alleles

There is no limit to the number of alleles that can be submitted for a given marker.

2.1.2 Order of alleles

The order in which the alleles are entered does not matter. For example submitting 14,19.3 or 19.3,14 are equivalent.

2.1.3 Homozygous loci

The number of times a homozygous allele is entered does not matter. For example submitting 12 or 12,12 are equivalent. Homozygous alleles will be counted as one for the score computation.

2.2 Scoring
2.2.1 Algorithms

Three algorithms are provided for the score computation:

Tanabe

Also known as Sørensen-Dice coefficient

Article: DOI=10.11418/jtca1981.18.4_329
Review: PubMed=23136038

Masters (vs. query)

Default version of the Masters algorithm

Article: PubMed=11416159
Review: PubMed=23136038

Masters (vs. reference)

Modified version of the Masters algorithm

Article: PubMed=11416159
Review: PubMed=23136038

While the Tanabe algorithm is nowadays the recommended algorithm for STR profiling, the Masters algorithms remain useful when trying to figure out the identity of a contaminating cell line.

2.2.2 Modes

Three modes are provided for the score computation to handle cases in which allele data is missing for the query or the reference:

Non-empty markers

The score is computed on the markers that have allele data for both the query and the reference.

Query markers

The score is computed on the markers for which the user inputted allele data even if the reference is missing allele data.

References markers

The score is computed on the markers for which reference allele data is available even if the query is missing allele data.

Modes
2.2.3 Amelogenin

By default, Amelogenin is only used for gender confirmation and is consequently not included in the score computation. But you can decide to include it in the scoring.

2.3 Filters
2.3.1 Score filter

Filter defining the minimum score for matches to be reported. Note that in the case of conflicted cell lines, the Best and Worst versions are processed as pairs and only the best score is affected by the threshold. Consequently, some Worst cases with a score below the threshold can still be present in the results.

2.3.2 Min markers

filter defining the minimum number of markers for matches to be reported

2.3.3 Max results

Filter defining the maximum number of results to be returned.

3. Input File

Using the Load File button from the user interface, it is possible to directly import STR profile data from a table file. Both mono and multi-samples files are supported. The functionality can be used to perform a similarity search on several samples at a time or to load quickly and reliably the marker data of a sample into the user interface.

3.1 Formats

The table file can be formatted either as an Excel file (.xls or .xlsx extension) or as a plain text file (.csv, .tsv or .txt extension). By default, the tool will assume that each row (except from the header) is a distinct sample. Note that a column named "Sample Name", "Name" or "Sample" is required and each submitted sample needs to have a corresponding value. The ordering of the marker columns is not important. The name of the markers need to be indicated correctly. For Amelogenin, the program recognizes "Amel" and "AM" as valid names.

Sample NameAmelogeninCSF1POD5S818D7S820D13S317D16S539TH01TPOXvWA
BICR 16X12131011126,98,1117,19
ND31618X12,13108,1011,12117,98,916
Lu-138X11,1210,1210,118,11117,98,1117,18
NCI-H125X71010115,1278,917
GK-5X,Y10,1110,121012119,9.3816,19
3.2 Compatibility

In addition to the format standard described previously, the tool was also made compatible with the following software:

  • GeneMapper ID-X
4. Results
4.1 Problematic cell lines

Problematic cell lines, which can be the result of either a contamination or a misidentification, have their accession number displayed in red in the result table. When hovering over them, a tooltip displays the relative Cellosaurus information concerning their problematic status.

Problematic cell line
4.2 Conflicted markers

In some cases, sources disagree on the allele values of a given marker. These cases are underlined in the result table and the corresponding sources are indicated in a tooltip when hovering over them. When a cell line possesses conflicted markers, the different combinations of alleles are computed and the Best and Worst reported as distinct rows with an additional label after their accession number.

For instance, the Loucy cell line (CVCL_1380) has a conflict for the D5S818 STR marker. DSMZ reports 11,12 while ATCC and Cosmic-CLP report 12:

Conflicted cell line
Conflicted cell line

As a result, the score is computed on two different STR profiles: one with 11,12 and one with 12. Based on the scoring results, the profile with 11,12 as allele value is defined as the Best profile since its a better match for the query. The second profile is consquently defined as the Worst profile:

Conflicted cell line

In more complex cases, a STR marker can have more than two conflicting sources or several STR marker can be conflicted. While the number of generated profile combination is larger, the base principle remains the same. The score is computed for all generated profiles and only the Best and Worst are reported in the results.

4.3 Color indicator

The color indicator displayed at the left side of each row in the result table allows to easily detect the cell lines that are related. Related cell lines have a green indicator while unrelated cell lines have a red indicator. The orange indicator describes mixed results where related cell lines that have high mutation rates can be found.

The color thresholds based on the score are defined as follows:

Tanabe algorithm
score >= 90
90 >score >= 80
score < 80
Masters algorithms
score >= 80
80 >score >= 60
score < 60
4.4 Nº markers

The Nº Markers column reports the number of STR markers that were used in the score computation. It is influenced by the scoring mode selected and the inclusion of Amelogenin in the score computation. Note that a minimum of eight STR markers is recommended for accurate results.

5. REST API

The CLASTR REST API allows to easily search STR profiles without having to go through the user interface. Two main modes are available: Single entry mode query to search a single sample and Batch mode query to search more than one sample at a time.

5.1 Single entry mode query
URL

https://www.cellosaurus.org/str-search/api/query

HTTP methods

GET, POST

POST consumes

application/json

GET & POST produces

application/json  text/csv  application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

GET & POST parameters

For the GET method, the parameters are added as URL variables after a terminal ? character in the URL and separated by & characters. For the POST method, the parameters are inserted in a JSON object as properties.

A more in depth description of the parameters can be found in the Parameters section. Note that all parameters both keys and values are case insensitive.

"marker name"Description: marker alleles separated by commas
Type: string
speciesDescription: species
Type: string
Choices: human, mouse or dog
Default: human
algorithmDescription: scoring algorithm
Type: integer
Choices: 1 for Tanabe, 2 for Masters (vs. query) and 3 for Masters (vs. reference)
Default: 1
scoringModeDescription: scoring mode
Type: integer
Choices: 1 for Non-empty markers, 2 for Query markers and 3 for Reference markers
Default: 1
includeAmelogeninDescription: inclusion of Amelogenin in the score computation
Type: boolean
Default: false
scoreFilterDescription: minimum score
Type: integer
Default: 60
minMarkersDescription: minimum number of common markers
Type: integer
Default: 8
maxResultsDescription: maximum number of results
Type: integer
Default: 200
descriptionDescription: optional tag describing the query
Type: string
outputFormatDescription: format of the API output
Type: string
Choices: json, csv or xlsx
Default: json

GET & POST response status code

200

GET example
https://www.cellosaurus.org/str-search/api/query?Amelogenin=X&CSF1PO=13,14&D5S818=13&D7S820=8,9&D13S317=12&FGA=24&TH01=8&TPOX=11&vWA=16&algorithm=1&scoringMode=1&scoreFilter=70&includeAmelogenin=false&outputFormat=xlsx
POST example

{
"Amelogenin": "X",
"CSF1PO": "13,14",
"D5S818": "13",
"D7S820": "8",
"D13S317": "12",
"FGA": "24",
"TH01": "8",
"TPOX": "11",
"vWA": "16",
"algorithm": 1,
"scoringMode": 1,
"scoreFilter": 70,
"includeAmelogenin": false,
"outputFormat": "xlsx"
}

5.2 Batch mode query
URL

https://www.cellosaurus.org/str-search/api/batch

HTTP method

POST

POST consumes

application/json

POST produces

application/json  application/zip  application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

POST parameters

The parameters of each sample to be searched are inserted in a JSON Array as distinct elements, as seen in the Example. When the csv output format is selected, the produced csv files are compressed together in a zip file.

A more in depth description of the parameters can be found in the Parameters section. Note that all parameters, both keys and values, are case insensitive.

"marker name"Description: marker alleles separated by commas
Type: string
speciesDescription: species
Type: string
Choices: human, mouse or dog
Default: human
algorithmDescription: scoring algorithm
Type: integer
Choices: 1 for Tanabe, 2 for Masters (vs. query) and 3 for Masters (vs. reference)
Default: 1
scoringModeDescription: scoring mode
Type: integer
Choices: 1 for Non-empty markers, 2 for Query markers and 3 for Reference markers
Default: 1
includeAmelogeninDescription: inclusion of Amelogenin in the score computation
Type: boolean
Default: false
scoreFilterDescription: minimum score
Type: integer
Default: 60
minMarkersDescription: minimum number of common markers
Type: integer
Default: 8
maxResultsDescription: maximum number of results
Type: integer
Default: 200
descriptionDescription: optional tag describing the query
Type: string
outputFormatDescription: format of the API output
Type: string
Choices: json, csv or xlsx
Default: json

POST response status code

200

POST example

[{
"description": "Example 1",
"Amelogenin": "X",
"CSF1PO": "13,14",
"D5S818": "13",
"D7S820": "8",
"D13S317": "12",
"FGA": "24",
"TH01": "8",
"TPOX": "11",
"vWA": "16",
"algorithm": 2,
"scoringMode": 1,
"scoreFilter": 70,
"includeAmelogenin": true
},{
"description": "Example 2",
"Amelogenin": "X, Y",
"CSF1PO": "13",
"D5S818": "13, 14",
"D7S820": "8, 19",
"D13S317": "11, 12",
"FGA": "24",
"TH01": "8",
"TPOX": "11",
"vWA": "15",
"algorithm": 2,
"scoringMode": 1,
"scoreFilter": 70,
"includeAmelogenin": true
}]