Help Contents

Search Engine Overview

Text-Based Search

Structure-Based Search

NMR-Based Search

MASS-Based Search

Miscellanea Search

Result Output

Search Engine Overview

Data in the MMCD database can be accessed through one or more of the five search engines located on the home page:

Searches are activated by clicking on the corresponding gold bar. Clicking on the search title will hide the search again, returning the user to the main search menu. The "reset" command will clear all input and return all values to the default setting.

The five search modules work together in an additive manner. Any search component that is activated is included in the current search. For example, one can search on a particular molecular formula (“Structure-based Search”) with specified chemical shift values (“NMR-based Search”) and tolerances and limit the search to metabolites believed to be associated with Arabidopsis (“Miscellanea”). After finishing a query, pushing the “reset” button will clear all previous input so that the system is ready for next searching.

 

The query engine can also be used for high-throughput searching by performing a batch mode search. Batch mode searching is available for the Text-based, NMR-based, and MASS-based search sections. Clicking on the "Batch Mode" bar switches the search engine to batch mode and enables data input from a file.

 

 

overview

 

 

Text-Based Search

A drop-down menu lets users choose from the following search options:

Name/CAS/Kegg/CQ_ID/Exp_NMR

The default search option is Name/CAS/Kegg/CQ_ID/Exp_NMR. This option allows the user to search the database by any of the following:

As an example, a search for D-glucose could be entered as any of the following:

The Text-based Search Section is equipped with a very flexible ambiguous search engine. When the checkbox “Ambiguous search” is checked, the name search will consider as a hit any synonym that includes the input string with a very flexible input format:

Pubchem_SID

Selection of "Pubchem_SID" enables searching by a PubChem Substance ID. For example, a PubChem SID search for "148583" would bring up the entry for glucose.

ChemIDplus_ID

The ChemIDplus ID search identifies compounds based on their ChemIDplus identification number. Using the ChemIDplus_ID search, a user could find glucose by entering "000050997."

CHEBI_ID

The text-based search menu includes an option to search by Chemical Entities of Biological Interest (ChEBI) identification number. A CHEBI_ID search for "17634" would bring up the MMCD entry for glucose.

PDB_Component_ID

Finally, the user can search the database by PDB Component ID. For example, search for "GLO" would bring up the MMCD entry for glucose.

Batch Mode

The Text-Based Search field contains a "Batch Mode" bar. Clicking on this button switches the system to batch mode and enables the user to input data from a file. The user needs to set a numerical index (beginning with 1) corresponding to the query item, and the separator. The search engine will return a list of compounds corresponding to each of the search terms, and will include an option to download the results as a text file. Finally, clicking on the “batch mode” title switches back to the normal search mode.

For example, a txt file containing the following could be used to perform a text-based search for glucose, lactose, and caffeine:

glucose

lactose

caffeine

To use this file, the user would click "Text-Based Search," then "Batch Mode." From the Batch Mode window, the user should select "Browse," then locate the file on his or her computer. The index should correspond to the number of the column containing the search terms (1 in this example), and the separator should be set to whatever was used to separate the search terms (here, no need because only one search item per line). 

Structure-Based Search

In Structure_based Search Section, one can search by:

STRUCTURE

The structure can be entered by 4 different ways:

(1). Enter as a SMILES (Simplified Molecular Input Line Entry Specification) string, e.g., CC(=O)NC1C(CC(OC1C(C(CO)O)(C=O)O)O)O;

(2). Enter as an InChI (The IUPAC International Chemical Identifier) string by selecting the InChI option, e.g., InChI=1/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3-,4+,5-,6+/m1/s1 .

(3). The structure can also be input from a file through the "From File" option. The currently supported file formats can be found here.

(4). User can also draw the structure by a powerful structure editor borrowed from PubChem; clicking the "Draw Structure " button to activate it, after finishing sketch, copy and paste either the SMILES or InChI string from sketch panel into search panel. Structure-based search

 

The structure search can be exact matching, substructure matching, or similarity matching. When “Stereochemistry Match” is checked, the stereochemistry of the target structure must match that of the query structure in order to be considered as a hit. Similarly, when “Ignore Charge” is checked, the search engine will ignore the charge information during matching.

FORMULA

Formula searches can be inclusive, exact, or in a range. A search for "C6H6" will return compounds with six carbon and six hydrogen atoms, plus any number of other atoms. A search for "=C6H6" will return compounds with six carbons, six hydrogen atoms, and no other atoms. Searching for "C6N0" will return compounds with six carbons and no nitrogen atoms. Finally, a search for "C6 N1-3" will return compounds with six carbons and one to three nitrogen atoms.Parentheses may be used to group elements. For example, (CH2)3 is interpreted as C3H6. A more complicated example: =C4-9H6-12(CH2)2-3N0-2
Note: The Atom symbol must be entered as standard IUPAC format: e.g., "C" for carbon, ( not "c"); "Ca" for Calcium (not "ca", nor "CA")

MW

The MW option searches by average molecular weight. The following examples demonstrate the specificity of the results:

Tips for sketching structure:

sketch structure

More Criteria

Click on the "more Criteria " button will activate more structure-based search condition with the priority from up to down (i.e, (((((Criterion_1 AND/OR/NOT Criterion_2) AND/OR/NOT Criterion_3) AND/OR/NOT Criterion_4) AND/OR/NOT Criterion_5) AND/OR/NOT Criterion_6). Clicking on the “Criterion_x” title will close the corresponding structure-based search condition module.

NMR-Based Search

The input can be any combination of chemical shift, carbon multiplicity, and connectivity.  The search uses average chemical shifts, which means that peaks split by J-coupling should be entered as the unsplit value. (Looking for a 1D 1H or 13C raw data query? Visit http://www.bmrb.wisc.edu/metabolomics/query_metab.php instead)

Data can be input as follows:

 

There are a number of additional options that can be modified by the user:

The Data Set menu allows the user to choose between Experimental and Empirical data. The Experimental Data set currently only contains approximately 300 compounds, but a more extensive search is available by selecting the Empirical Data option. When searching the Experimental Data set, users can also choose to specify the sample condition under which the data were collected.

If the "Include Labile H" box is checked, exchange labile hydrogen atoms, such as those bonded to nitrogen or oxygen, are included in the search. If the box is unchecked, these peaks are ignored.

The "Threshold (%)" box allows the user to set the threshold in the search selection. The threshold percentage is a comparison between the number of peaks in the intersection of the query and the target to the number of peaks the target has in total. The ratio of these two values is always between 0 and 100%, and the user can select the minimum percentage of the compounds that will show up in the search results.

The chemical shift tolerance, or how far results can vary from the search value, can be adjusted for hydrogen and carbon in the H_tol and C_tol boxes, respectively. The database cannot currently be searched by nitrogen or phosphate peaks, we will activate this function after the theoretical chemical shift calculations have been finished.

Batch Mode

The "Batch Mode" component allows for automatic and high throughput analysis of 1D-1H, 1D-13C, 2D-[1H,13C]-HSQC/HMQC/HMBC data. For the best results, the NMR sample should be prepared under the same conditions as the standard samples used to build the database. For example, for a sample prepared in D2O, the pD should be adjusted to 7.4. HSQC spectra should be collected with high resolutions in both dimensions (e.g., 4K x 1K). Data should be in Bruker,Varian, or NMRPipe peaklist format , with chemical shift references to DSS or TMS (see below examples). The search uses chemical shift, line shape, peak intensity/width, and J-coupling patterns to search the experimental NMR data of standard compounds in the MMCD. The output is a downloadable table of compound identifications and relative concentrations.

NMR peak list format

1D data: peak list can be flat text, Bruker peak list, Varian peak list, see below examples:

(1). flat text: format: shift(ppm) intensity(arbitrary unit); note: intensity is optional, no relative concentration output if not provided. We suggest user use 500uM DSS as both chemical shift reference and concentration reference. below examples are all acceptable:

with index(must be integer), chemical shift, and intensity (note 500uM DSS was used as both chemical shift and concentration nmr batch modereference):

1 3.1299 8.64
2 3.1106 12.50
3 3.0910 9.41
4 2.1185 1.24
5 2.0986 3.14
6 2.0795 4.67
7 2.0600 2.93
8 2.0408 1.13
9 0.0000 0.20

without index:

3.1299 8.64
3.1106 12.50
3.0910 9.41
2.1185 1.24
2.0986 3.14
2.0795 4.67
2.0600 2.93
2.0408 1.13
0.0000 0.20

without intensity and DSS concentration reference:

3.1298
3.1104
3.0908
2.1186
2.0988
2.0794
2.0598
2.0405

(2).Bruker peak list; users can use the raw peak list output of Bruker XWIN-NMR software:

DU=C:/Bruker/XWIN-NMR, USER=cui, NAME=expnmr_00001, EXPNO=1, PROCNO=1
F1=3.295ppm, F2=-0.405ppm, MI=0.12cm, MAXI=36.00cm, PC=3.000
# ADDRESS FREQUENCY INTENSITY
[Hz] [PPM]
1 20924.4 1252.336 3.1298 10.54
2 20977.3 1244.571 3.1104 12.50
3 21030.9 1236.708 3.0908 11.46
4 23682.3 847.696 2.1186 1.33
5 23736.3 839.773 2.0988 2.92
6 23789.0 832.046 2.0794 4.95
7 23842.6 824.175 2.0598 2.66
8 23895.3 816.456 2.0405 1.17
9 29460.0 0.007 0.0000 0.20

(3) Varian peak list:

index freq(ppm) intensity
1 4.06946 99.3118
2 3.79149 28.1407
3 3.78844 33.7988
4 3.77826 55.0473
5 3.76807 52.5571
6 3.76095 24.9154
7 3.75789 27.7089

2D data: peak list can be flat text, or NMRPipe peak table, see below examples:

(1). flat text; format: shift1 shift2 intensity (intensity is optional, shift1 and shift2 is the proton or carbon chemical shift, the program will automatically determine which one is carbon or proton chemical shift.)

142.610 8.527 2.234671e+08
155.508 8.210 2.113134e+08
89.362 6.125 2.463572e+08
76.947 4.808 1.692275e+08
72.941 4.630 1.699664e+08
86.700 4.407 2.331415e+08
67.766 4.295 7.329752e+07
67.778 4.236 8.865544e+07
67.772 4.209 4.514265e+07

(2). NMRPipe peak table:

REMARK ROI 2D Peak Detection System, File: ./test.ft2
REMARK Detection Levels: -1.99209e+17 and 1.99209e+07
REMARK Detection: X(+/-1) Y(+/-1)
REMARK Interpolation: X(+/-1) Y(+/-1)
REMARK Noise: 1.20996e+06, Chi2-Threshold: 1.000000e-04, Local Adjustment: None
REMARK Position Tolerances: X(2.0) Y(2.0)
REMARK Sinc Detect ON, Height Adjustments: X(1.800) Y(1.200)
REMARK Sinc Detect Linewidths: X(5.0Hz) Y(15.0Hz)
REMARK Total Peaks: 9, Good Peaks: 9, Questionable Peaks: 0
REMARK Clusters: 8, Max Cluster Size: 2
REMARK ROI Spectral Axis Limits:

DATA X_AXIS 13C 1 893 158.910ppm 65.013ppm
DATA Y_AXIS 1H 1 813 8.800ppm 4.036ppm

VARS INDEX X_AXIS Y_AXIS DX DY X_PPM Y_PPM X_HZ Y_HZ XW YW XW_HZ YW_HZ X1 X3 Y1 Y3 HEIGHT DHEIGHT VOL PCHI2 TYPE ASS CLUSTID MEMCNT
FORMAT %5d %9.3f %9.3f %6.3f %6.3f %8.3f %8.3f %9.3f %9.3f %7.3f %7.3f %8.3f %8.3f %4d %4d %4d %4d %+e %+e %+e %.5f %d %s %4d %4d

1 155.845 47.587 0.007 0.024 142.610 8.527 14349.982 3411.911 2.191 3.297 23.210 7.739 155 157 46 50 2.234671e+08 9.945491e+05 1.362484e+09 0.00000 1 None 1 1
2 33.315 101.692 0.010 0.024 155.508 8.210 15647.853 3284.899 2.323 3.392 24.606 7.962 32 35 100 104 2.113134e+08 9.987751e+05 1.431732e+09 0.00000 1 None 2 1
3 661.693 457.043 0.008 0.045 89.362 6.125 8991.906 2450.711 2.330 5.998 24.681 14.081 660 663 454 460 2.463572e+08 9.850588e+05 2.655729e+09 0.00000 1 None 3 1
4 779.625 681.453 0.017 0.058 76.947 4.808 7742.730 1923.906 2.545 4.163 26.953 9.774 778 781 678 684 1.692275e+08 1.008489e+06 1.927181e+09 0.00000 1 None 4 1
5 817.683 711.877 0.012 0.045 72.941 4.630 7339.610 1852.485 2.346 4.904 24.853 11.513 816 819 709 716 1.699664e+08 9.531154e+05 1.854321e+09 0.00000 1 None 5 1
6 686.980 749.819 0.006 0.024 86.700 4.407 8724.051 1763.416 2.327 4.200 24.652 9.859 685 689 747 753 2.331415e+08 9.084896e+05 2.411753e+09 0.00000 1 None 6 1
7 866.847 769.008 0.018 0.341 67.766 4.295 6818.852 1718.371 2.167 9.530 22.951 22.371 866 868 763 772 7.329752e+07 9.218510e+05 1.118694e+09 0.00000 1 None 7 1
8 866.734 779.058 0.021 0.069 67.778 4.236 6820.048 1694.779 2.328 4.571 24.662 10.729 866 868 776 782 8.865544e+07 9.518984e+05 8.768116e+08 0.00000 1 None 8 2
9 866.785 783.617 0.038 0.247 67.772 4.209 6819.507 1684.076 2.306 4.874 24.421 11.441 866 867 781 785 4.514265e+07 9.522889e+05 3.837641e+08 0.00000 1 None 8 2

MASS-Based Search

(Note: The default search is for average mass. This is appropriate for users who want to match a average formula weight to the database. Users who are interested in average mass searches should also try the M.W. search option located in structure-based search module which has several options that make it far more powerful.)

The MASS-Based Search module allows users to search by mass spectroscopy data. Mass spec data are entered into the "Input Mass" box. The user can adjust the tolerance for mass comparison in the "Tol(ppm)" box. Four isotopomer composition are available: C12N14, C13N14, C12N15, and C13N15.

The "Batch Mode" option allows users to input data from a file. The file can be:

For LC-MS / DI-MS:

(1) simple format1:

In this format, each line contains only the mass number (header line can be omitted). For example:

mass (header line)
290.175
410.0992

(2) simple format2:

In this format, each line contains retention_time, m/z value, intensity;(header line can be omitted). For example:

RT mass intensity (header line)
23.794 290.175 25.53
25.896 410.0992
21.17

Or (3). The full format, the file was formatted as in the following examples, using a tab (\t) as a separator (header line can be omitted):

Query_ID

RT

mass

abundance

N (out of 5)

1

23.794

290.175

25.53

3

2

25.896

410.0992

21.17

3

3

43.473

190.1358

24.9

3

4

56.359

186.0798

113.52

2

5

31.15

478.2656

3.41

3

6

34.102

154.0539

28.46

3

 

 

7

31.865

260.0798

100.79

2

8

10.992

252.1119

27.98

3

9

48.712

690.3477

21.11

3

10

25.385

380.0882

23.66

3

 

 

 

 

 

 

 

 

 


 

For MS/MS:

(1) flat text format:

In this format, each line contains only the m/z number and intensity (header line can be omitted). For example:

m/z RI(%)(header line)
43.039 11.732
43.163 21.947
43.288 12.490
43.662 20.877
43.786 15.942
44.035 31.576
44.160 11.076
44.284 13.181
44.409 34.026
44.658 13.873
44.782 17.807
60.227 15.838
60.476 15.184

(2). JCAMP-DX format; for example:

##TITLE= L-Alanine
##JCAMP-DX= 5 $$home made
##DATA TYPE= MASS SPECTRUM
##DATA CLASS= PEAK TABLE
##ORIGIN= UWI, Mona, JAMAICA
##OWNER= Dr Robert Lancashire
##$URL= http://wwwchem.uwimona.edu.jm:1104/spectra
##SPECTROMETER/DATA SYSTEM= Finnigan
##INSTRUMENTAL PARAMETERS= LOW RESOLUTION
##.SPECTROMETER TYPE= TRAP
##.INLET= GC
##.IONIZATION MODE= EI+
##XUNITS= m/z
##YUNITS= relative abundance
##XFACTOR= 1
##YFACTOR= 1
##FIRSTX= 0
##LASTX= 89
##NPOINTS= 17
##FIRSTY= 0
##PEAK TABLE= (XY..XY)
0,0 38,3 39,9 40,24 41,38 42,106 43,39 44,1000 45,47 46,17 55,8 56,3
70,7 71,11 73,2 74,20 89,2
##END=

Miscellanea Search

The Miscellanea Search can be used to narrow down the results from another search. Users can filter their results by order status (MMCD collection of metabolites), category, input species, or inclusion/exclusion of specific criteria.

Order Status

This is relevant to the collection of standard compounds in the MMCD collection. If "DONE!" is selected, only compounds that have been received will be searched. Similarly, "Ordering!" will search compounds for which an order has been placed, but which have not yet been received, and "Not Ordered" will search compounds that have not been ordered.

Category

When "Whole Database" is selected, the search will include the entire MMCD database.

Selecting "Arabidopsis" limits the search to the Arabidopsis database. This database includes the mustard family, including the species Arabidopsis thaliana and Brassica napus (ath/eath/ebna). The "Dicotyledons" option will search the following species: ath/eath/ebna/ecsi/egar/egra/egma/elco/emtr/eptp/epba/ehan/elsa/eles/estu/evvi. For more details, visit the KEGG website at http://www.genome.ad.jp/kegg/catalog/org_list.html.

When "All Plants" is selected, all plant species will be searched. When "All Species" is selected, the search will include all species in the database.

Users are also able to enter a specific species in the "Input Species" box. Multiple species can be included, e.g., ath,hsa.

Filter

The "Filter" option allows users to specify data types to be included or excluded in their search. Selecting "NMR Data" will limit the search to those compounds for which experimental NMR data are available. The "KEGG ID," "PubChem SID," and "ChemIDplus ID" options will limit the search to compounds with a KEGG, PubChem, or ChemIDplus ID, respectively. "CAS" will only return compounds with a CAS registry number. The "Organism" option will limit the search to compounds with organism information. The "PDB Component" option will limit the search to compounds with PDB Component ID.

 

Result output  

Normal output results include the formula, molecular weight, CAS, synonyms, and structure for each compound returned. Links are also included to the relevant pages on KEGG, PubChem, ChemIDplus, CHEBI, Exp NMR, HMDB, and NMRShiftDB. The primary entry ID (CQ ID) is listed at the top of each entry.

In batch mode, the results include a table of the compounds matching the input file. The complete results for a specific compound can be accessed by clicking on the compound's primary ID, located in the CQ_ID column.

Clicking on the primary ID or the structure figure of a compound will open the record detail page.

Return to the MMCD Search Page