Overview Design Implementation Algorithm Installation Instructions FAQs Acknowledgement

Calculation of uniqueness values for each peak

We introduced the notion of 'uniqueness' values for each peak given a reference library. A unique peak is defined as a relatively isolated peak around which no peak of other compounds is observed. For any given peak, its uniqueness value is calculated as the total number of surrounding peaks of other compounds within a given distance.

For example, the red peak represents the peak of interests and three peaks are shown in its immediate vicinity. The calculations are performed at five distance levels - 0.01, 0.02, 0.03, 0.04, 0.05 ppm at 1H dimension, and 0.05, 0.10, 0.15, 0.20, 0.25 ppm at 13C dimension. No peak is observed in the first three distance levels. Therefore, the maximum unique scope is (0.03, 0.15). Peak A locates within 0.03~0.04 ppm (1H dimension) and 0.15~0.20 ppm (13C dimension) of the peak; Peak B is observed within 0.04~0.05 ppm (1H dimension) and 0.20~0.25 ppm (13C dimension) of the peak; Peak C is not considered since it is more than 0.05 ppm at 1H dimension. Therefore, the uniqueness values are 0-0-0-1-2.

The uniqueness values are automatically calculated after library creation, and updated after editing compound, via MetaboMiner's user interface.


Unique peak


There are three rounds of search process to perform peak matching

  1. Reverse search using an adaptive threshold method (reference peaks against query peaks)

    During the process of peak matching, each peak in the spectral reference library first adjusts the current threshold to its maximum unique scope then searched against the query peaks. Since our reference library covers essentially all NMR detectable common metabolites in the type of biofluid, this expansion of search scope is usually "safe" - it aims to match peaks without increasing false positive rates. Our tests using both simulated and experimental data showed that this method significantly ameliorate the problem caused by chemical shift drifts when query spectra are collected under very different conditions. In our experiments, over 90% of compounds were identified in this stage

  2. First round forward search (unassigned query peaks against reference peaks of identified compounds)

    This step aims to assign unidentified query peaks to compounds already being identified. No new compounds is identified in this stage. An expanded search thresholds are used: for TOCSY: 0.08 ppm; for HSQC: 0.12 ppm for 1H and 0.4 ppm for 13C.

  3. Second round forward search (unassigned query peak against reference peaks of identified compounds)

    Occasionally, some new compounds will be identified at this stage. However, the confidence on these compounds is low. The same expanded search thresholds are applied during the search process.


Compound identification based on 'minimal signatures'

Due to a variety of reasons, it is unrealistic to expect the 'complete peak set' of each compound will be detected as indicated by their reference spectra. Instead, we implemented a method, called 'minimal signature' for compound identification. A minimal signature is defined as the minimum peak set that can uniquely identify a compound from all others in a given library. Based on the complete peak set of any compound in a given reference library, many minimal signatures can be derived through different combinations of its unique peaks. A single peak match may be considered a minimal signature if it is very unique. More peaks are required for less unique ones. This is essentially a weighted scoring program based on the uniqueness values of each matched peak.


Empirical rules and authenticity checks:

  • The matched percentage (cut-off: 1/2 for TOCSY and 1/6 for HSQC) or
  • The total matched peaks number (cut-off: 3 for TOCSY and 1 for HSQC)
  • Expected compound concentrations in a given biofluid type
  • The peak intensity check (local maximum, if peak intensity values are available)
  • Expanded peak search scope for pH sensitive compounds
    i.e. compounds with amine group such as
    Histidine, Uracil, Imidazol, Adenosine, etc.
  • The appearance of certain peaks for certain compounds.
    i.e. a peak close to (4.63, 4.63) must exist in TOCSY spectra for Glucose to be considered present.

Last updated:   May 25, 2008
Contact:   Jianguo Xia   780-4925786