Further, the choice of imputation method can substantially affect the results and interpretation of analyses of metabolomics data (Hrydziuszko and Viant 2012). An alternative strategy is to use methods that explicitly account for missing values. biased estimates with the PROTAC Sirt2 Degrader-1 bias increasing as the proportion of observations in the point mass increased while estimates were unbiased with the mixture model except if all missing observations came from censoring. These findings suggest using the AFT model for hypothesis testing and mixture model for estimation. We demonstrated this approach through application to glycomics data of serum samples from women with ovarian cancer and Sav1 matched controls. Keywords:point-mass mixture, accelerated failure time model, missing values, metabolomics, glycomics, mass spectrometry == 1. Introduction == Mass spectrometry has become an important analytical technique for profiling a wide array of compounds such as proteins, metabolites, lipids, and glycans in biological samples. This technology allows investigators to identify and quantify the suite of compounds present in biological samples and is now being widely used to identify compounds as potential diagnostic and prognostic tests, understand biological pathways of disease, and identify potential therapeutic targets. Raw data from a mass spectrometry analysis consists of the observed mass-to-charge ratios, the retention time if liquid or gas chromatography is used for separation prior to injection into the mass spectrometer and a measure of ion intensity (Enot et al. 2011). The mass-to-charge ratios and the retention times serve to identify unique compounds while the ion intensity yields a measure of each compounds relative abundance. Extensive pre-processing of the raw data including baseline correction, noise reduction, smoothing, peak detection and alignment and peak integration is necessary before analysis (Want and Masson PROTAC Sirt2 Degrader-1 2011). The final output of this processing is a data matrix consisting of the unique compounds identified and intensity measures of these compounds in each sample. Depending on the specific technology used, the focus of the study (e.g., glycomics, proteomics, metabolomics), and the sample characteristics, hundreds to thousands of compounds can be identified. However, a common characteristic of the data from these studies is a large number of missing values.Hrydziuszko and Viant (2012)reported that 51.67% to 78.73% of the peaks in four metabolomics data sets contained missing values with the overall percentage of missing values ranging from 14.63% to 28.53%.Wang et al. (2012)note that commonly 2040% of potential values are missing in quantitative mass spectrometry-based proteomics. In our glycomics studies, we have observed 55% to over 90% of the glycans to have missing observations for at least one sample and for the overall percentage of missing values to range from 20 to 80%. For individual glycans, the percentage of missing values can range from no PROTAC Sirt2 Degrader-1 missing values to more than 90% missing such as when a glycan is detected in only one sample. Missing values can result from several mechanisms which might be influenced by the technology in use. A compound can be present in a sample but at a concentration below the detection limit of the mass spectrometer. Alternatively, a particular compound could be truly absent from a sample due to biological reasons. For example, a particular biological pathway could be suppressed in certain genetic variants such that compounds of this pathway are absent in these variants (Kliebenstein et al 2001; Kliebenstein et al. 2001a;Burow et al; 2010). Finally, a compound can be present in a sample at a level above the detection limit but fail to be detected due to technical issues related to sample preparation or processing (Bell 2009); Eidehammer et al. 2013;Michalski et al. 2011; Wang 2009). Regardless of the mechanism, in all three cases, a compound would be reported as a missing value in the resultant data set. Missing data can be classified according to the properties of the processes causing the missingness (Little and Rubin 2002). Data are missing completely at random (MCAR) if the probability that an observation is missing is unrelated to its value and the values of other variables. Missing values resulting from technical measurement errors reflect an MCAR process because the value of the missing observations is independent of its value had it been observed and the value of other peaks. In contrast, for compounds that are censored, their missingness results because they occur at low abundances, below the detection limit of the instrumentation. Missingness for these compounds is missing, not at random (MNAR) because the probability that the observation is missing.
Categories:Cannabinoid (GPR55) Receptors