DNA BarCode

the detection of short diagnostic fragment(s) of DNA. In TrichOKEY the combination of several oligonucleotides specifically allocated in ITS1 and 2 is used. In some cases the term DNA microcoding may be considered as a synonym.

ITS1 and 2

The genes encoding ribosomal RNA (rRNA)are widely used for identifying fungi and for constructing of phylogenetic trees. Ribosomal RNA is needed in a such large amounts to produce the cellular ribosomes that there are multiple codes of the rRNA genes, often arranged in tandem but separated from one another by untranscribed spacers. Each single rRNA gene has coding information for the three types of rRNA found in eukaryotic ribosomes (18S*, 5.8S and 28S), but also contain other valuable information, especially in the internally transcribed spacers (ITS). The rRNA genes initially produce a pre-rRNA, which then undergoes processing, including the excision of the spacer regions, to produce the three ?mature? rRNAs. The 18S rRNA has changed sufficiently over evolutionary time to be used in phylogenetic reconstructions. But the ITS regions are more variable and can often be used to distinguish different fungal species.

* S indicates Svedberg units which reflect the rRNA size detected by the centrifugation in sucrose solution. 28S rRNA is the largest while 5.8S rRNA is the smallest.


the multiloci sequence similarity search tool specific forHypocrea/Trichoderma (Kopchinskiy et al., 2005 ). The tool is powered by the pre-BLAST sequence diagnosis (TrichoMARK) which allows submitting the precise diagnostic area to the similarity search and, thus, to increase the accuracy of the result.

Biodiversity profile

the dynamic Hypocrea/Trichoderma biodiversity profile recognized by TrichOKEY is located on The table contain information on know teleomorph-anamorph connections and reflects the latest phylogeny of the genus

Special features

Special features of the second version of the DNA oligonucleotide BarCode for species identification of Hypocrea/Trichoderma:

recognizes multiple sequences submitted in FASTA format

linked to the Hypocrea/Trichoderma sequence similarity search (TrichoBLAST)

displays results in one of three modes each powered by printer-friendly version

has advanced control on sequence quality, which allows to differentiate the new allele of ITS1 and 2 from other unidentified entries such as (i) non-Trichoderma sequences; (ii) Trichoderma sequences of insufficient quality, (iii) incomplete Trichoderma sequences

linked to the dynamic Hypocrea/Trichoderma biodiversity table

contains HELP

Name of the query set of sequences

The custom name of the query set of sequences may be convenient when several sets of sequences (different isolation sites) need to be compared after the identification

Sequence formats recognized by TrichOKEY

TrichOKEY 2 supports multiple sequences or single sequences submitted in FASTA* format and individual sequences in text formats.

* FASTA files (formats) are traditional input files for alignment/sequencing programs and usually saved in text format. In such files, each sequence is preceded by an identification line (title) that starts with the symbol ?>?. Everything on that line is treated as identification, and not part of the sequence. The line is terminated by a ?hard? line break or return character.

The sequence, without any numbering, then follows. The sequence may consist of one long line without any breaks (one word), or it may include line breaks. Everything will be treated as part of sequences until the next > character is encountered.

The correct FASTA file, which can be used as a submission file for TrichOKEY 2 may look like this:

>seq1 strain G.J.S. 99-90

>seq2 strain G.J.S. 99-155

>seq3 strain G.J.S. 99-159

In this case, the SHORT display mode of TrichOKEY 2 will truncate sequence titles by the first "space" characters. All other modes will display the full title of the sequence

Display modes of TrichOKEY Results

TrichOKEY 2 results may be displayed in three formats:

FULL mode shows the most detailed report on sequence identification
SHORT mode is design for the work with large sets of sequences
FASTA mode is convenient for the subsequent alignment and phylogenetic analysis

All display mores are powered by printer-friendly interfaces

Full mode

Full mode:

displays the complete identification profile of each sequence from the query set.

It includes the location of all genus specific hallmarks (GSH, anchors), length of all regions retrieved to search for species-specific BarCodes, identification on generic, clade (if available) and species levels, link to the Hypocrea/Trichoderma biodiversity profile, reliability code, graphic representation of species hallmarks (SHM) detected in the query sequence and in the type sequence for the detected species. The full mode is powered by the link to the Hypocrea/Trichoderma sequence similarity search (TrichoBLAST)

Short mode

Short mode:

displays identification profile in the shortest view. It is specially designed as a complement to the Full mode in order to give an overview of the global species identification profile for the submitted set of query sequences.

In this mode the user?s name of the sequence is truncated by the first blank symbol and completed by the species name identified by TrichOKEY . Each entry of the short mode is linked to the Full mode for this sequence. Short mode contains the reliability code and links to TrichoBLAST and Hypocrea/Trichoderma biodiversity profile.

FASTA mode

FASTA mode:

displays the query set of sequences in its original FASTA format where the abbreviated TrichOKEY identification result (spec_key) is inserted between the ?>? symbol and the user?s title of each sequence.

Sequence quality control

Sequence quality control:

prior to the identification every sequence is checked on the presence of (i) formatting symbols (line breaks, paragraph marks, spaces etc); (ii) symbols which do not code nucleotides (dashes, question marks and letters which are not used by the sequencing software) and (iii) letters which code uncertain position in the sequence:

R,r -> {AG}; Y,y -> {CT}; M,m -> {AC}; K,k -> {GT}; S,s -> {CG}; W,w -> {AT}; H,h -> {ACT}; B,b -> {CGT}; V,v -> {ACG}; D,d -> {AGT}; N,n -> {ACGT}

The TrichOKEY sequence quality control script automatically removes symbols from the first and the second categories. It gives a possibility to identify sequences retrieved from one or anther sequencing service, from public databases as GenBank or exported from the alignment software.

If symbols coding uncertain nucleotide positions have been detected the warning message is displayed.

Genus specific hallmarks (GSH or anchors)

5 oligonucleotide sequences which are present in all known Hypocrea/Trichoderma ITS1 - 5.8S RNA - ITS2 sequences. The combination of these 5 anchors is specific for the genus and serves for the identification on the generic level (Fig. 2 from Druzhinina et al., 2005). Anchors are used to split the query sequence into 4 regions in order to locate the search for species hallmarks (SHM) in precise areas of the sequence.


Master alignment Regions are fragments of ITS1 - 5.8S RNA - ITS2sequence located between Anchors:

 5' - Anchor 1 region 1 Anchor 2 region 2 Anchor 3 5.8S RNA gene Anchor 4 region 3 Anchor 5 - 3?

Section identification

Section identification:

in TrichOKEY v.2 identification of the section and/or clade is displayed on the result page of the Full mode only if section-specific hallmarks have been designed. In the current version of the program contains hallmarks for 5 out of 14 known clades of Hypocrea/Trichoderma (considering a group of species from phylogenetic lone lineages as one nominative clade). The attribution of the each species to a clade or a section and the corresponding references can be found on Hypocrea/Trichoderma biodiversity profile ( )

Species identification

Species identification:

after the query sequence was identified to belong to Hypocrea/Trichoderma the combination of species hallmarks (SHM) is used to identify the species. In some cases one set of SHM is sufficient for the species which is represented by one or several alleles of ITS1 and 2. In other cases when a variability of diagnostic area(s) of either ITS1 or ITS2 was detected the species identification is based on a set of allele-specific hallmarks.

Species hallmarks (SHM)

Species hallmarks (SHM):


a species-specific set of 2 - 5 five short oligonucleotide sequences specifically allocated in regions of ITS1 and ITS2 sequences.   


SHM1 - region 1; SHM2 and 3 - region 2; SHM4 and 5 - region 3  

Reliability code

Reliability code:

the level of TrichOKEY reliability is specific for each species. It is entirely based on the number of isolates and ITS1 and 2 allels known per each species.

if the BarCode was designed on the basis of 6 - 15 sequences of specimens from independent isolations and the number of alleles does is not higher than 3, the reliability is STANDARD.

if the BarCode was designed on the basis of more than 15 sequences (up to 300 and more for H. lixii/T. harziamum) of specimens from independent isolations, the reliability is HIGH.

This value is assigned to all most frequent Trichoderma strains isolated from native and agricultural soils.

for species described (i) on the basis of one single isolate or (ii) when the species borders are not clearly resolved phylogenetically (iii) or when the number of known isolates is equal to the number of ITS1 and 2 alleles (the only two known strains published as H. tawa/T. tawa have essentially different ITS1 and 2 and also on tef1 intron sequences) the BarCode identification reliability level is LOW.

Type sequence

Type sequence:

in the ideal case it is the sequence of the type strain for the species. Although in some cases when the species was not typified of the sequence was not available, the BarCode was developed on the basis of any vouchered strain. We will try to rich the ideal state with subsequent releases of TrichOKEY.

Copyright: Irina Druzhinina & Alexey Kopchinskiy 2004 - 2008