# Created by Octave 3.6.1, Mon Apr 23 21:08:02 2012 UTC # name: cache # type: cell # rows: 3 # columns: 81 # name: # type: sq_string # elements: 1 # length: 12 bland_altman # name: # type: sq_string # elements: 1 # length: 865 BLAND_ALTMANN shows the Bland-Altman plot of two columns of measurements and computes several summary results. bland_altman(m1, m2 [,group]) bland_altman(data [, group]) R = bland_altman(...) m1,m2 are two colums with the same number of elements containing the measurements. m1,m2 can be also combined in a single two column data matrix. group [optional] indicates which measurements belong to the same group This is useful to account for repeated measurements. References: [1] JM Bland and DG Altman, Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 1999; 8; 135. doi:10.1177/09622802990080204 [2] P.S. Myles, Using the Bland– Altman method to measure agreement with repeated measures British Journal of Anaesthesia 99(3):309–11 (2007) doi:10.1093/bja/aem214 # name: # type: sq_string # elements: 1 # length: 80 BLAND_ALTMANN shows the Bland-Altman plot of two columns of measurements and # name: # type: sq_string # elements: 1 # length: 7 cat2bin # name: # type: sq_string # elements: 1 # length: 755 CAT2BIN converts categorial into binary data each category of each column in D is converted into a logical column B = cat2bin(C); [B,BinLabel] = cat2bin(C,Label); [B,BinLabel] = cat2bin(C,Label,MODE) C categorial data B binary data Label description of each column in C BinLabel description of each column in B MODE default [], ignores NaN 'notIgnoreNAN' includes binary column for NaN 'IgnoreZeros' zeros do not get a separate category 'IgnoreZeros+NaN' zeros and NaN are ignored example: cat2bin([1;2;5;1;5]) results in 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 # name: # type: sq_string # elements: 1 # length: 80 CAT2BIN converts categorial into binary data each category of each column i # name: # type: sq_string # elements: 1 # length: 7 cdfplot # name: # type: sq_string # elements: 1 # length: 565 CDFPLOT plots empirical commulative distribution function cdfplot(X) cdfplot(X, FMT) cdfplot(X, PROPERTY, VALUE,...) h = cdfplot(...) [h,stats] = cdfplot(X) X contains the data vector (matrix data is currently changed to a vector, this might change in future) FMT,PROPERTY,VALUE are used for formating; see HELP PLOT for more details h graphics handle to the cdf curve stats a struct containing various summary statistics including mean, std, median, min, max. see also: ecdf, median, statistics, hist2res, plot References: # name: # type: sq_string # elements: 1 # length: 59 CDFPLOT plots empirical commulative distribution function # name: # type: sq_string # elements: 1 # length: 6 center # name: # type: sq_string # elements: 1 # length: 505 CENTER removes the mean [z,mu] = center(x,DIM,W) removes mean x along dimension DIM x input data DIM dimension 1: column 2: row default or []: first DIMENSION, with more than 1 element W weights to computed weighted mean (default: [], all weights = 1) numel(W) must be equal to size(x,DIM) features: - can deal with NaN's (missing values) - weighting of data - dimension argument - compatible to Matlab and Octave see also: SUMSKIPNAN, MEAN, STD, DETREND, ZSCORE REFERENCE(S): # name: # type: sq_string # elements: 1 # length: 26 CENTER removes the mean # name: # type: sq_string # elements: 1 # length: 8 classify # name: # type: sq_string # elements: 1 # length: 792 CLASSIFY classifies sample data into categories defined by the training data and its group information CLASS = classify(sample, training, group) CLASS = classify(sample, training, group, TYPE) [CLASS,ERR,POSTERIOR,LOGP,COEF] = CLASSIFY(...) CLASS contains the assigned group. ERR is the classification error on the training set weighted by the prior propability of each group. The same classifier as in TRAIN_SC are supported. ATTENTION: no cross-validation is applied, therefore the classification error is too optimistic (overfitting). Use XVAL instead to obtain cross-validated performance. see also: TRAIN_SC, TEST_SC, XVAL References: [1] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. John Wiley & Sons, 2001. # name: # type: sq_string # elements: 1 # length: 80 CLASSIFY classifies sample data into categories defined by the training data # name: # type: sq_string # elements: 1 # length: 24 coefficient_of_variation # name: # type: sq_string # elements: 1 # length: 221 COEFFICIENT_OF_VARIATION returns STD(X)/MEAN(X) cv=coefficient_of_variation(x [,DIM]) cv=std(x)/mean(x) see also: SUMSKIPNAN, MEAN, STD REFERENCE(S): http://mathworld.wolfram.com/VariationCoefficient.html # name: # type: sq_string # elements: 1 # length: 80 COEFFICIENT_OF_VARIATION returns STD(X)/MEAN(X) cv=coefficient_of_variation( # name: # type: sq_string # elements: 1 # length: 3 cor # name: # type: sq_string # elements: 1 # length: 576 COR calculates the correlation matrix X and Y can contain missing values encoded with NaN. NaN's are skipped, NaN do not result in a NaN output. (Its assumed that the occurence of NaN's is uncorrelated) The output gives NaN only if there are insufficient input data COR(X); calculates the (auto-)correlation matrix of X COR(X,Y); calculates the crosscorrelation between X and Y c = COR(...); c is the correlation matrix W weights to compute weighted mean (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) # name: # type: sq_string # elements: 1 # length: 80 COR calculates the correlation matrix X and Y can contain missing values encod # name: # type: sq_string # elements: 1 # length: 8 corrcoef # name: # type: sq_string # elements: 1 # length: 4692 CORRCOEF calculates the correlation matrix from pairwise correlations. The input data can contain missing values encoded with NaN. Missing data (NaN's) are handled by pairwise deletion [15]. In order to avoid possible pitfalls, use case-wise deletion or or check the correlation of NaN's with your data (see below). A significance test for testing the Hypothesis 'correlation coefficient R is significantly different to zero' is included. [...] = CORRCOEF(X); calculates the (auto-)correlation matrix of X [...] = CORRCOEF(X,Y); calculates the crosscorrelation between X and Y [...] = CORRCOEF(..., Mode); Mode='Pearson' or 'parametric' [default] gives the correlation coefficient also known as the 'product-moment coefficient of correlation' or 'Pearson''s correlation' [1] Mode='Spearman' gives 'Spearman''s Rank Correlation Coefficient' This replaces SPEARMAN.M Mode='Rank' gives a nonparametric Rank Correlation Coefficient This is the "Spearman rank correlation with proper handling of ties" This replaces RANKCORR.M [...] = CORRCOEF(..., param1, value1, param2, value2, ... ); param value 'Mode' type of correlation 'Pearson','parametric' 'Spearman' 'rank' 'rows' how do deal with missing values encoded as NaN's. 'complete': remove all rows with at least one NaN 'pairwise': [default] 'alpha' 0.01 : significance level to compute confidence interval [R,p,ci1,ci2,nansig] = CORRCOEF(...); R is the correlation matrix R(i,j) is the correlation coefficient r between X(:,i) and Y(:,j) p gives the significance of R It tests the null hypothesis that the product moment correlation coefficient is zero using Student's t-test on the statistic t = r*sqrt(N-2)/sqrt(1-r^2) where N is the number of samples (Statistics, M. Spiegel, Schaum series). p > alpha: do not reject the Null hypothesis: 'R is zero'. p < alpha: The alternative hypothesis 'R is larger than zero' is true with probability (1-alpha). ci1 lower (1-alpha) confidence interval ci2 upper (1-alpha) confidence interval If no alpha is provided, the default alpha is 0.01. This can be changed with function flag_implicit_significance. nan_sig p-value whether H0: 'NaN''s are not correlated' could be correct if nan_sig < alpha, H1 ('NaNs are correlated') is very likely. The result is only valid if the occurence of NaN's is uncorrelated. In order to avoid this pitfall, the correlation of NaN's should be checked or case-wise deletion should be applied. Case-Wise deletion can be implemented ix = ~any(isnan([X,Y]),2); [...] = CORRCOEF(X(ix,:),Y(ix,:),...); Correlation (non-random distribution) of NaN's can be checked with [nan_R,nan_sig]=corrcoef(X,isnan(X)) or [nan_R,nan_sig]=corrcoef([X,Y],isnan([X,Y])) or [R,p,ci1,ci2] = CORRCOEF(...); Further recommandation related to the correlation coefficient: + LOOK AT THE SCATTERPLOTS to make sure that the relationship is linear + Correlation is not causation because it is not clear which parameter is 'cause' and which is 'effect' and the observed correlation between two variables might be due to the action of other, unobserved variables. see also: SUMSKIPNAN, COVM, COV, COR, SPEARMAN, RANKCORR, RANKS, PARTCORRCOEF, flag_implicit_significance REFERENCES: on the correlation coefficient [ 1] http://mathworld.wolfram.com/CorrelationCoefficient.html [ 2] http://www.geography.btinternet.co.uk/spearman.htm [ 3] Hogg, R. V. and Craig, A. T. Introduction to Mathematical Statistics, 5th ed. New York: Macmillan, pp. 338 and 400, 1995. [ 4] Lehmann, E. L. and D'Abrera, H. J. M. Nonparametrics: Statistical Methods Based on Ranks, rev. ed. Englewood Cliffs, NJ: Prentice-Hall, pp. 292, 300, and 323, 1998. [ 5] Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. Numerical Recipes in FORTRAN: The Art of Scientific Computing, 2nd ed. Cambridge, England: Cambridge University Press, pp. 634-637, 1992 [ 6] http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html on the significance test of the correlation coefficient [11] http://www.met.rdg.ac.uk/cag/STATS/corr.html [12] http://www.janda.org/c10/Lectures/topic06/L24-significanceR.htm [13] http://faculty.vassar.edu/lowry/ch4apx.html [14] http://davidmlane.com/hyperstat/B134689.html [15] http://www.statsoft.com/textbook/stbasic.html%Correlations others [20] http://www.tufts.edu/~gdallal/corr.htm [21] Fisher transformation http://en.wikipedia.org/wiki/Fisher_transformation # name: # type: sq_string # elements: 1 # length: 71 CORRCOEF calculates the correlation matrix from pairwise correlations. # name: # type: sq_string # elements: 1 # length: 3 cov # name: # type: sq_string # elements: 1 # length: 1606 COV covariance matrix X and Y can contain missing values encoded with NaN. NaN's are skipped, NaN do not result in a NaN output. The output gives NaN only if there are insufficient input data The mean is removed from the data. Remark: for data contains missing values, the resulting matrix might not be positiv definite, and its elements have magnitudes larger than one. This ill-behavior is more likely for small sample sizes, but there is no garantee that the result "behaves well" for larger sample sizes. If you want the a "well behaved" result (i.e. positive definiteness and magnitude of elements not larger than 1), use CORRCOEF. However, COV is faster than CORRCOEF and might be good enough in some cases. C = COV(X [,Mode]); calculates the (auto-)correlation matrix of X C = COV(X,Y [,Mode]); calculates the crosscorrelation between X and Y. C(i,j) is the correlation between the i-th and jth column of X and Y, respectively. NOTE: Octave and Matlab have (in some special cases) incompatible implemenations. This implementation follows Octave. If the result could be ambigous or incompatible, a warning will be presented in Matlab. To avoid this warning use: a) use COV([X(:),Y(:)]) if you want the traditional Matlab result. b) use C = COV([X,Y]), C = C(1:size(X,2),size(X,2)+1:size(C,2)); if you want to be compatible with this software. Mode = 0 [default] scales C by (N-1) Mode = 1 scales C by N. see also: COVM, COR, CORRCOEF, SUMSKIPNAN REFERENCES: http://mathworld.wolfram.com/Covariance.html # name: # type: sq_string # elements: 1 # length: 76 COV covariance matrix X and Y can contain missing values encoded with NaN. # name: # type: sq_string # elements: 1 # length: 4 covm # name: # type: sq_string # elements: 1 # length: 1182 COVM generates covariance matrix X and Y can contain missing values encoded with NaN. NaN's are skipped, NaN do not result in a NaN output. The output gives NaN only if there are insufficient input data COVM(X,Mode); calculates the (auto-)correlation matrix of X COVM(X,Y,Mode); calculates the crosscorrelation between X and Y COVM(...,W); weighted crosscorrelation Mode = 'M' minimum or standard mode [default] C = X'*X; or X'*Y correlation matrix Mode = 'E' extended mode C = [1 X]'*[1 X]; % l is a matching column of 1's C is additive, i.e. it can be applied to subsequent blocks and summed up afterwards the mean (or sum) is stored on the 1st row and column of C Mode = 'D' or 'D0' detrended mode the mean of X (and Y) is removed. If combined with extended mode (Mode='DE'), the mean (or sum) is stored in the 1st row and column of C. The default scaling is factor (N-1). Mode = 'D1' is the same as 'D' but uses N for scaling. C = covm(...); C is the scaled by N in Mode M and by (N-1) in mode D. [C,N] = covm(...); C is not scaled, provides the scaling factor N C./N gives the scaled version. see also: DECOVM, XCOVF # name: # type: sq_string # elements: 1 # length: 80 COVM generates covariance matrix X and Y can contain missing values encoded wi # name: # type: sq_string # elements: 1 # length: 13 cumsumskipnan # name: # type: sq_string # elements: 1 # length: 249 CUMSUMSKIPNAN Cumulative sum while skiping NaN's. If DIM is omitted, it defaults to the first non-singleton dimension. Y = cumsumskipnan(x [,DIM]) x input data DIM dimension (default: []) y resulting sum see also: CUMSUM, SUMSKIPNAN # name: # type: sq_string # elements: 1 # length: 51 CUMSUMSKIPNAN Cumulative sum while skiping NaN's. # name: # type: sq_string # elements: 1 # length: 6 decovm # name: # type: sq_string # elements: 1 # length: 384 decompose extended covariance matrix into mean (mu), standard deviation, the (pure) Covariance (COV), correlation (xc) matrix and the correlation coefficients R2. NaN's are condsidered as missing values. [mu,sd,COV,xc,N,R2]=decovm(ECM[,NN]) ECM is the extended covariance matrix NN is the number of elements, each estimate (in ECM) is based on see also: MDBC, COVM, R2 # name: # type: sq_string # elements: 1 # length: 80 decompose extended covariance matrix into mean (mu), standard deviation, the # name: # type: sq_string # elements: 1 # length: 7 detrend # name: # type: sq_string # elements: 1 # length: 837 DETREND removes the trend from data, NaN's are considered as missing values DETREND is fully compatible to previous Matlab and Octave DETREND with the following features added: - handles NaN's by assuming that these are missing values - handles unequally spaced data - second output parameter gives the trend of the data - compatible to Matlab and Octave [...]=detrend([t,] X [,p]) removes trend for unequally spaced data t represents the time points X(i) is the value at time t(i) p must be a scalar [...]=detrend(X,0) [...]=detrend(X,'constant') removes the mean [...]=detrend(X,p) removes polynomial of order p (default p=1) [...]=detrend(X,1) - default [...]=detrend(X,'linear') removes linear trend [X,T]=detrend(...) X is the detrended data T is the removed trend see also: SUMSKIPNAN, ZSCORE # name: # type: sq_string # elements: 1 # length: 80 DETREND removes the trend from data, NaN's are considered as missing values # name: # type: sq_string # elements: 1 # length: 4 ecdf # name: # type: sq_string # elements: 1 # length: 443 ECDF empirical cumulative function NaN's are considered Missing values and are ignored. [F,X] = ecdf(Y) calculates empirical cumulative distribution functions (i.e Kaplan-Meier estimate) ecdf(Y) ecdf(gca,Y) without output arguments plots the empirical cdf, in axis gca. Y input data must be a vector or matrix, in case Y is a matrix, the ecdf for every column is computed. see also: HISTO2, HISTO3, PERCENTILE, QUANTILE # name: # type: sq_string # elements: 1 # length: 80 ECDF empirical cumulative function NaN's are considered Missing values and # name: # type: sq_string # elements: 1 # length: 19 flag_accuracy_level # name: # type: sq_string # elements: 1 # length: 1033 FLAG_ACCURACY_LEVEL sets and gets accuracy level used in SUMSKIPNAN_MEX and COVM_MEX The error margin of the naive summation is N*eps (N is the number of samples), the error margin is only 2*eps if Kahan's summation is used [1]. 0: maximum speed [default] accuracy of double (64bit) with naive summation (error = N*2^-52) 1: accuracy of extended (80bit) with naive summation (error = N*2^-64) 2: accuracy of double (64bit) with Kahan summation (error = 2^-52) 3: accuracy of extended (80bit) with Kahan summation (error = 2^-64) Please note, level 3 might be equally accurate but slower than 1 or 2 on some platforms. In order to determine what is good for you, you might want to run ACCTEST. FLAG = flag_accuracy_level() gets current level flag_accuracy_level(FLAG) sets accuracy level see also: ACCTEST Reference: [1] David Goldberg, What Every Computer Scientist Should Know About Floating-Point Arithmetic ACM Computing Surveys, Vol 23, No 1, March 1991. # name: # type: sq_string # elements: 1 # length: 80 FLAG_ACCURACY_LEVEL sets and gets accuracy level used in SUMSKIPNAN_MEX and # name: # type: sq_string # elements: 1 # length: 26 flag_implicit_significance # name: # type: sq_string # elements: 1 # length: 928 The use of FLAG_IMPLICIT_SIGNIFICANCE is in experimental state. flag_implicit_significance might even become obsolete. FLAG_IMPLICIT_SIGNIFICANCE sets and gets default alpha (level) of any significance test The default alpha-level is stored in the global variable FLAG_implicit_significance The idea is that the significance must not be assigned explicitely. This might yield more readable code. Choose alpha low enough, because in alpha*100% of the cases, you will reject the Null hypothesis just by change. For this reason, the default alpha is 0.01. flag_implicit_significance(0.01) sets the alpha-level for the significance test alpha = flag_implicit_significance() gets default alpha flag_implicit_significance(alpha) sets default alpha-level alpha = flag_implicit_significance(alpha) gets and sets alpha features: - compatible to Matlab and Octave see also: CORRCOEF, PARTCORRCOEF # name: # type: sq_string # elements: 1 # length: 64 The use of FLAG_IMPLICIT_SIGNIFICANCE is in experimental state. # name: # type: sq_string # elements: 1 # length: 22 flag_implicit_skip_nan # name: # type: sq_string # elements: 1 # length: 934 FLAG_IMPLICIT_SKIP_NAN sets and gets default mode for handling NaNs 1 skips NaN's (the default mode if no mode is set) 0 NaNs are propagated; input NaN's give NaN's at the output FLAG = flag_implicit_skip_nan() gets current mode flag_implicit_skip_nan(FLAG) sets mode prevFLAG = flag_implicit_skip_nan(nextFLAG) gets previous set FLAG and sets FLAG for the future flag_implicit_skip_nan(prevFLAG) resets FLAG to previous mode It is used in: SUMSKIPNAN, MEDIAN, QUANTILES, TRIMEAN and affects many other functions like: CENTER, KURTOSIS, MAD, MEAN, MOMENT, RMS, SEM, SKEWNESS, STATISTIC, STD, VAR, ZSCORE etc. The mode is stored in the global variable FLAG_implicit_skip_nan It is recommended to use flag_implicit_skip_nan(1) as default and flag_implicit_skip_nan(0) should be used for exceptional cases only. This feature might disappear without further notice, so you should really not rely on it. # name: # type: sq_string # elements: 1 # length: 80 FLAG_IMPLICIT_SKIP_NAN sets and gets default mode for handling NaNs 1 skips Na # name: # type: sq_string # elements: 1 # length: 17 flag_nans_occured # name: # type: sq_string # elements: 1 # length: 430 FLAG_NANS_OCCURED checks whether the last call(s) to sumskipnan or covm contained any not-a-numbers in the input argument. Because many other functions like mean, std, etc. are also using sumskipnan, also these functions can be checked for NaN's in the input data. A call to FLAG_NANS_OCCURED() resets also the flag whether NaN's occured. Only sumskipnan or covm can set the flag again. see also: SUMSKIPNAN, COVM # name: # type: sq_string # elements: 1 # length: 80 FLAG_NANS_OCCURED checks whether the last call(s) to sumskipnan or covm conta # name: # type: sq_string # elements: 1 # length: 3 fss # name: # type: sq_string # elements: 1 # length: 1739 FSS - feature subset selection and feature ranking the method is motivated by the max-relevance-min-redundancy (mRMR) approach [1]. However, the default method uses partial correlation, which has been developed from scratch. PCCM [3] describes a similar idea, but is more complicated. An alternative method based on FSDD is implemented, too. [idx,score] = fss(D,cl) [idx,score] = fss(D,cl,MODE) [idx,score] = fss(D,cl,MODE) D data - each column represents a feature cl classlabel Mode 'Pearson' [default] correlation 'rank' correlation 'FSDD' feature selection algorithm based on a distance discriminant [2] %%% 'MRMR','MID','MIQ' max-relevance, min redundancy [1] - not supported yet. score score of the feature idx ranking of the feature [tmp,idx]=sort(-score) see also: TRAIN_SC, XVAL, ROW_COL_DELETION REFERENCES: [1] Peng, H.C., Long, F., and Ding, C., Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp.1226-1238, 2005. [2] Jianning Liang, Su Yang, Adam Winstanley, Invariant optimal feature selection: A distance discriminant and feature ranking based solution, Pattern Recognition, Volume 41, Issue 5, May 2008, Pages 1429-1439. ISSN 0031-3203, DOI: 10.1016/j.patcog.2007.10.018. [3] K. Raghuraj Rao and S. Lakshminarayanan Partial correlation based variable selection approach for multivariate data classification methods Chemometrics and Intelligent Laboratory Systems Volume 86, Issue 1, 15 March 2007, Pages 68-81 http://dx.doi.org/10.1016/j.chemolab.2006.08.007 # name: # type: sq_string # elements: 1 # length: 80 FSS - feature subset selection and feature ranking the method is motivated # name: # type: sq_string # elements: 1 # length: 7 geomean # name: # type: sq_string # elements: 1 # length: 1207 GEOMEAN calculates the geomentric mean of data elements. y = geomean(x [,DIM [,W]]) is the same as y = mean(x,'G' [,DIM]) DIM dimension 1 STD of columns 2 STD of rows default or []: first DIMENSION, with more than 1 element W weights to compute weighted mean (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) features: - can deal with NaN's (missing values) - weighting of data - dimension argument also in Octave - compatible to Matlab and Octave see also: SUMSKIPNAN, MEAN, HARMMEAN This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; If not, see . # name: # type: sq_string # elements: 1 # length: 57 GEOMEAN calculates the geomentric mean of data elements. # name: # type: sq_string # elements: 1 # length: 8 gscatter # name: # type: sq_string # elements: 1 # length: 471 GSCATTER scatter plot of groups gscatter(x,y,group) gscatter(x,y,group,clr,sym,siz) gscatter(x,y,group,clr,sym,siz,doleg) gscatter(x,y,group,clr,sym,siz,doleg,xname,yname) h = gscatter(...) x,y, group: vectors with equal length clf: color vector, default 'bgrcmyk' sym: symbol, default '.' siz: size of Marker doleg: 'on' (default) shows legend, 'off' turns of legend xname, yname: name of axis see also: ecdf, cdfplot References: # name: # type: sq_string # elements: 1 # length: 34 GSCATTER scatter plot of groups # name: # type: sq_string # elements: 1 # length: 8 harmmean # name: # type: sq_string # elements: 1 # length: 629 HARMMEAN calculates the harmonic mean of data elements. The harmonic mean is the inverse of the mean of the inverse elements. y = harmmean(x [,DIM [,W]]) is the same as y = mean(x,'H' [,DIM [,W]]) DIM dimension 1 STD of columns 2 STD of rows default or []: first DIMENSION, with more than 1 element W weights to compute weighted mean (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) features: - can deal with NaN's (missing values) - weighting of data - dimension argument also in Octave - compatible to Matlab and Octave see also: SUMSKIPNAN, MEAN, GEOMEAN # name: # type: sq_string # elements: 1 # length: 56 HARMMEAN calculates the harmonic mean of data elements. # name: # type: sq_string # elements: 1 # length: 8 hist2res # name: # type: sq_string # elements: 1 # length: 700 Evaluates Histogram data [R]=hist2res(H) [y]=hist2res(H,fun) estimates fun-statistic fun 'mean' mean 'std' standard deviation 'var' variance 'sem' standard error of the mean 'rms' root mean square 'meansq' mean of squares 'sum' sum 'sumsq' sum of squares 'CM#' central moment of order # 'skewness' skewness 'kurtosis' excess coefficient (Fisher kurtosis) see also: NaN/statistic REFERENCES: [1] C.L. Nikias and A.P. Petropulu "Higher-Order Spectra Analysis" Prentice Hall, 1993. [2] C.E. Shannon and W. Weaver "The mathematical theory of communication" University of Illinois Press, Urbana 1949 (reprint 1963). [3] http://www.itl.nist.gov/ [4] http://mathworld.wolfram.com/ # name: # type: sq_string # elements: 1 # length: 43 Evaluates Histogram data [R]=hist2res(H) # name: # type: sq_string # elements: 1 # length: 3 iqr # name: # type: sq_string # elements: 1 # length: 372 IQR calculates the interquartile range Missing values (encoded as NaN) are ignored. Q = iqr(Y) Q = iqr(Y,DIM) returns the IQR along dimension DIM of sample array Y. Q = iqr(HIS) returns the IQR from the histogram HIS. HIS must be a HISTOGRAM struct as defined in HISTO2 or HISTO3. see also: MAD, RANGE, HISTO2, HISTO3, PERCENTILE, QUANTILE # name: # type: sq_string # elements: 1 # length: 80 IQR calculates the interquartile range Missing values (encoded as NaN) are # name: # type: sq_string # elements: 1 # length: 5 kappa # name: # type: sq_string # elements: 1 # length: 1760 KAPPA estimates Cohen's kappa coefficient and related statistics [...] = kappa(d1,d2); NaN's are handled as missing values and are ignored [...] = kappa(d1,d2,'notIgnoreNAN'); NaN's are handled as just another Label. [kap,sd,H,z,ACC,sACC,MI] = kappa(...); X = kappa(...); d1 data of scorer 1 d2 data of scorer 2 kap Cohen's kappa coefficient point se standard error of the kappa estimate H Concordance matrix, i.e. confusion matrix z z-score ACC overall agreement (accuracy) sACC specific accuracy MI Mutual information or transfer information (in [bits]) X is a struct containing all the fields above For two classes, a number of additional summary statistics including TPR, FPR, FDR, PPV, NPF, F1, dprime, Matthews Correlation coefficient (MCC) or Phi coefficient (PHI=MCC), Specificity and Sensitivity are provided. Note, the positive category must the larger label (in d and c), otherwise the confusion matrix becomes transposed and the summary statistics are messed up. Reference(s): [1] Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46. [2] J Bortz, GA Lienert (1998) Kurzgefasste Statistik f|r die klassische Forschung, Springer Berlin - Heidelberg. Kapitel 6: Uebereinstimmungsmasze fuer subjektive Merkmalsurteile. p. 265-270. [3] http://www.cmis.csiro.au/Fiona.Evans/personal/msc/html/chapter3.html [4] Kraemer, H. C. (1982). Kappa coefficient. In S. Kotz and N. L. Johnson (Eds.), Encyclopedia of Statistical Sciences. New York: John Wiley & Sons. [5] http://ourworld.compuserve.com/homepages/jsuebersax/kappa.htm [6] http://en.wikipedia.org/wiki/Receiver_operating_characteristic # name: # type: sq_string # elements: 1 # length: 70 KAPPA estimates Cohen's kappa coefficient and related statistics # name: # type: sq_string # elements: 1 # length: 8 kurtosis # name: # type: sq_string # elements: 1 # length: 461 KURTOSIS estimates the kurtosis y = kurtosis(x,DIM) calculates kurtosis of x in dimension DIM DIM dimension 1: STATS of columns 2: STATS of rows default or []: first DIMENSION, with more than 1 element features: - can deal with NaN's (missing values) - dimension argument - compatible to Matlab and Octave see also: SUMSKIPNAN, VAR, STD, VAR, SKEWNESS, MOMENT, STATISTIC, IMPLICIT_SKIP_NAN REFERENCE(S): http://mathworld.wolfram.com/ # name: # type: sq_string # elements: 1 # length: 33 KURTOSIS estimates the kurtosis # name: # type: sq_string # elements: 1 # length: 15 load_fisheriris # name: # type: sq_string # elements: 1 # length: 446 LOAD_FISHERIRIS loads famous iris data set from Fisher, 1936 [1]. References: [1] Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950). [2] Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis. (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218. # name: # type: sq_string # elements: 1 # length: 75 LOAD_FISHERIRIS loads famous iris data set from Fisher, 1936 [1]. # name: # type: sq_string # elements: 1 # length: 3 mad # name: # type: sq_string # elements: 1 # length: 855 MAD estimates the Mean Absolute deviation (note that according to [1,2] this is the mean deviation; not the mean absolute deviation) y = mad(x,DIM) calculates the mean deviation of x in dimension DIM DIM dimension 1: STATS of columns 2: STATS of rows default or []: first DIMENSION, with more than 1 element features: - can deal with NaN's (missing values) - dimension argument - compatible to Matlab and Octave see also: SUMSKIPNAN, VAR, STD, REFERENCE(S): [1] http://mathworld.wolfram.com/MeanDeviation.html [2] L. Sachs, "Applied Statistics: A Handbook of Techniques", Springer-Verlag, 1984, page 253. [3] http://mathworld.wolfram.com/MeanAbsoluteDeviation.html [4] Kenney, J. F. and Keeping, E. S. "Mean Absolute Deviation." §6.4 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 76-77 1962. # name: # type: sq_string # elements: 1 # length: 80 MAD estimates the Mean Absolute deviation (note that according to [1,2] this i # name: # type: sq_string # elements: 1 # length: 5 mahal # name: # type: sq_string # elements: 1 # length: 395 MAHAL return the Mahalanobis' D-square distance between the multivariate samples x and y, which must have the same number of components (columns), but may have a different number of observations (rows). d = mahal(X,Y) d(k) = (X(k,:)-MU)*inv(SIGMA)*(X(k,:)-MU)' where MU and SIGMA are the mean and the covariance matrix of Y see also: TRAIN_SC, TEST_SC, COVM References: # name: # type: sq_string # elements: 1 # length: 80 MAHAL return the Mahalanobis' D-square distance between the multivariate samp # name: # type: sq_string # elements: 1 # length: 4 make # name: # type: sq_string # elements: 1 # length: 46 This make.m is used for Matlab under Windows # name: # type: sq_string # elements: 1 # length: 11 This make. # name: # type: sq_string # elements: 1 # length: 4 mean # name: # type: sq_string # elements: 1 # length: 735 MEAN calculates the mean of data elements. y = mean(x [,DIM] [,opt] [, W]) DIM dimension 1 MEAN of columns 2 MEAN of rows N MEAN of N-th dimension default or []: first DIMENSION, with more than 1 element opt options 'A' arithmetic mean 'G' geometric mean 'H' harmonic mean W weights to compute weighted mean (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) usage: mean(x) mean(x,DIM) mean(x,opt) mean(x,opt,DIM) mean(x,DIM,opt) mean(x,DIM,W) mean(x,DIM,opt,W); ' features: - can deal with NaN's (missing values) - weighting of data - dimension argument also in Octave - compatible to Matlab and Octave see also: SUMSKIPNAN, MEAN, GEOMEAN, HARMMEAN # name: # type: sq_string # elements: 1 # length: 43 MEAN calculates the mean of data elements. # name: # type: sq_string # elements: 1 # length: 7 meandev # name: # type: sq_string # elements: 1 # length: 856 MEANDEV estimates the Mean deviation (note that according to [1,2] this is the mean deviation; not the mean absolute deviation) y = meandev(x,DIM) calculates the mean deviation of x in dimension DIM DIM dimension 1: STATS of columns 2: STATS of rows default or []: first DIMENSION, with more than 1 element features: - can deal with NaN's (missing values) - dimension argument - compatible to Matlab and Octave see also: SUMSKIPNAN, VAR, STD, MAD REFERENCE(S): [1] http://mathworld.wolfram.com/MeanDeviation.html [2] L. Sachs, "Applied Statistics: A Handbook of Techniques", Springer-Verlag, 1984, page 253. [3] http://mathworld.wolfram.com/MeanAbsoluteDeviation.html [4] Kenney, J. F. and Keeping, E. S. "Mean Absolute Deviation." §6.4 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 76-77 1962. # name: # type: sq_string # elements: 1 # length: 80 MEANDEV estimates the Mean deviation (note that according to [1,2] this is the # name: # type: sq_string # elements: 1 # length: 6 meansq # name: # type: sq_string # elements: 1 # length: 527 MEANSQ calculates the mean of the squares y = meansq(x,DIM,W) DIM dimension 1 STD of columns 2 STD of rows N STD of N-th dimension default or []: first DIMENSION, with more than 1 element W weights to compute weighted mean (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) features: - can deal with NaN's (missing values) - weighting of data - dimension argument also in Octave - compatible to Matlab and Octave see also: SUMSQ, SUMSKIPNAN, MEAN, VAR, STD, RMS # name: # type: sq_string # elements: 1 # length: 43 MEANSQ calculates the mean of the squares # name: # type: sq_string # elements: 1 # length: 9 medAbsDev # name: # type: sq_string # elements: 1 # length: 373 medAbsDev calculates the median absolute deviation Usage: D = medAbsDev(X, DIM) or: [D, M] = medAbsDev(X, DIM) Input: X : data DIM: dimension along which mad should be calculated (1=columns, 2=rows) (optional, default=first dimension with more than 1 element Output: D : median absolute deviations M : medians (optional) # name: # type: sq_string # elements: 1 # length: 53 medAbsDev calculates the median absolute deviation # name: # type: sq_string # elements: 1 # length: 6 median # name: # type: sq_string # elements: 1 # length: 366 MEDIAN data elements, [y]=median(x [,DIM]) DIM dimension 1: median of columns 2: median of rows N: median of N-th dimension default or []: first DIMENSION, with more than 1 element features: - can deal with NaN's (missing values) - accepts dimension argument like in Matlab in Octave, too. - compatible to Matlab and Octave see also: SUMSKIPNAN # name: # type: sq_string # elements: 1 # length: 46 MEDIAN data elements, [y]=median(x [,DIM]) # name: # type: sq_string # elements: 1 # length: 6 moment # name: # type: sq_string # elements: 1 # length: 627 MOMENT estimates the p-th moment M = moment(x, p [,opt] [,DIM]) M = moment(H, p [,opt]) calculates p-th central moment from data x in dimension DIM of from Histogram H p moment of order p opt 'ac': absolute 'a' and/or central ('c') moment DEFAULT: '' raw moments are estimated DIM dimension 1: STATS of columns 2: STATS of rows default or []: first DIMENSION, with more than 1 element features: - can deal with NaN's (missing values) - dimension argument - compatible to Matlab and Octave see also: STD, VAR, SKEWNESS, KURTOSIS, STATISTIC, REFERENCE(S): http://mathworld.wolfram.com/Moment.html # name: # type: sq_string # elements: 1 # length: 80 MOMENT estimates the p-th moment M = moment(x, p [,opt] [,DIM]) M = moment # name: # type: sq_string # elements: 1 # length: 7 nanconv # name: # type: sq_string # elements: 1 # length: 616 NANCONV computes the convolution for data with missing values. X and Y can contain missing values encoded with NaN. NaN's are skipped, NaN do not result in a NaN output. The output gives NaN only if there are insufficient input data [...] = NANCONV(X,Y); calculates 2-dim convolution between X and Y [C] = NANCONV(X,Y); WARNING: missing values can introduce aliasing - causing unintended results. Moreover, the behavior of bandpass and highpass filters in case of missing values is not fully understood, and might contain some pitfalls. see also: CONV, NANCONV2, NANFFT, NANFILTER # name: # type: sq_string # elements: 1 # length: 63 NANCONV computes the convolution for data with missing values. # name: # type: sq_string # elements: 1 # length: 6 nanfft # name: # type: sq_string # elements: 1 # length: 618 NANFFT calculates the Fourier-Transform of X for data with missing values. NANFFT is the same as FFT but X can contain missing values encoded with NaN. NaN's are skipped, NaN do not result in a NaN output. Y = NANFFT(X) Y = NANFFT(X,N) Y = NANFFT(X,[],DIM) [Y,N] = NANFFT(...) returns the number of valid samples N WARNING: missing values can introduce aliasing - causing unintended results. Moreover, the behavior of bandpass and highpass filters in case of missing values is not fully understood, and might contain some pitfalls. see also: FFT, XCORR, NANCONV, NANFILTER # name: # type: sq_string # elements: 1 # length: 75 NANFFT calculates the Fourier-Transform of X for data with missing values. # name: # type: sq_string # elements: 1 # length: 9 nanfilter # name: # type: sq_string # elements: 1 # length: 519 NANFILTER is able to filter data with missing values encoded as NaN. [Y,Z] = nanfilter(B,A,X [, Z]); If X contains no missing data, NANFILTER should behave like FILTER. NaN-values are handled gracefully. WARNING: missing values can introduce aliasing - causing unintended results. Moreover, the behavior of bandpass and highpass filters in case of missing values is not fully understood, and might contain some pitfalls. see also: FILTER, SUMSKIPNAN, NANFFT, NANCONV, NANFILTER1UC # name: # type: sq_string # elements: 1 # length: 69 NANFILTER is able to filter data with missing values encoded as NaN. # name: # type: sq_string # elements: 1 # length: 12 nanfilter1uc # name: # type: sq_string # elements: 1 # length: 257 NANFILTER1UC is an adaptive filter for data with missing values encoded as NaN. [Y,Z] = nanfilter1uc(uc,X [, Z]); if X contains no missing data, NANFILTER behaves like FILTER(uc,[1,uc-1],X[,Z]). see also: FILTER, NANFILTER, SUMSKIPNAN # name: # type: sq_string # elements: 1 # length: 80 NANFILTER1UC is an adaptive filter for data with missing values encoded as NaN. # name: # type: sq_string # elements: 1 # length: 11 naninsttest # name: # type: sq_string # elements: 1 # length: 112 NANINSTTEST checks whether the functions from NaN-toolbox have been correctly installed. see also: NANTEST # name: # type: sq_string # elements: 1 # length: 80 NANINSTTEST checks whether the functions from NaN-toolbox have been correctly # name: # type: sq_string # elements: 1 # length: 7 nanmean # name: # type: sq_string # elements: 1 # length: 330 NANMEAN same as SUM but ignores NaN's. NANMEAN is OBSOLETE; use MEAN instead. NANMEAN is included to provide backward compatibility Y = nanmean(x [,DIM]) DIM dimension 1 sum of columns 2 sum of rows default or []: first DIMENSION with more than 1 element Y resulting mean see also: MEAN, SUMSKIPNAN, NANSUM # name: # type: sq_string # elements: 1 # length: 39 NANMEAN same as SUM but ignores NaN's. # name: # type: sq_string # elements: 1 # length: 6 nanstd # name: # type: sq_string # elements: 1 # length: 518 NANSTD same as STD but ignores NaN's. NANSTD is OBSOLETE; use NaN/STD instead. NANSTD is included to fix a bug in alternative implementations and to provide some compatibility. Y = nanstd(x, FLAG, [,DIM]) x data FLAG 0: [default] normalizes with (N-1), N = sample size FLAG 1: normalizes with N, N = sample size DIM dimension 1 sum of columns 2 sum of rows default or []: first DIMENSION with more than 1 element Y resulting standard deviation see also: SUM, SUMSKIPNAN, NANSUM, STD # name: # type: sq_string # elements: 1 # length: 38 NANSTD same as STD but ignores NaN's. # name: # type: sq_string # elements: 1 # length: 6 nansum # name: # type: sq_string # elements: 1 # length: 333 NANSUM same as SUM but ignores NaN's. NANSUM is OBSOLETE; use SUMSKIPNAN instead. NANSUM is included to fix a bug in some other versions. Y = nansum(x [,DIM]) DIM dimension 1 sum of columns 2 sum of rows default or []: first DIMENSION with more than 1 element Y resulting sum see also: SUM, SUMSKIPNAN, NANSUM # name: # type: sq_string # elements: 1 # length: 38 NANSUM same as SUM but ignores NaN's. # name: # type: sq_string # elements: 1 # length: 7 nantest # name: # type: sq_string # elements: 1 # length: 366 NANTEST checks several mathematical operations and a few statistical functions for their correctness related to NaN's. e.g. it checks norminv, normcdf, normpdf, sort, matrix division and multiplication. see also: NANINSTTEST REFERENCE(S): [1] W. Kahan (1996) Lecture notes on the Status of "IEEE Standard 754 for Binary Floating-point Arithmetic. # name: # type: sq_string # elements: 1 # length: 80 NANTEST checks several mathematical operations and a few statistical function # name: # type: sq_string # elements: 1 # length: 7 normcdf # name: # type: sq_string # elements: 1 # length: 290 NORMCDF returns normal cumulative distribtion function cdf = normcdf(x,m,s); Computes the CDF of a the normal distribution with mean m and standard deviation s default: m=0; s=1; x,m,s must be matrices of same size, or any one can be a scalar. see also: NORMPDF, NORMINV # name: # type: sq_string # elements: 1 # length: 56 NORMCDF returns normal cumulative distribtion function # name: # type: sq_string # elements: 1 # length: 7 norminv # name: # type: sq_string # elements: 1 # length: 341 NORMINV returns inverse cumulative function of the normal distribution x = norminv(p,m,s); Computes the quantile (inverse of the CDF) of a the normal cumulative distribution with mean m and standard deviation s default: m=0; s=1; p,m,s must be matrices of same size, or any one can be a scalar. see also: NORMPDF, NORMCDF # name: # type: sq_string # elements: 1 # length: 72 NORMINV returns inverse cumulative function of the normal distribution # name: # type: sq_string # elements: 1 # length: 7 normpdf # name: # type: sq_string # elements: 1 # length: 279 NORMPDF returns normal probability density pdf = normpdf(x,m,s); Computes the PDF of a the normal distribution with mean m and standard deviation s default: m=0; s=1; x,m,s must be matrices of same size, or any one can be a scalar. see also: NORMCDF, NORMINV # name: # type: sq_string # elements: 1 # length: 45 NORMPDF returns normal probability density # name: # type: sq_string # elements: 1 # length: 12 partcorrcoef # name: # type: sq_string # elements: 1 # length: 2015 PARTCORRCOEF calculates the partial correlation between X and Y after removing the influence of Z. X, Y and Z can contain missing values encoded with NaN. NaN's are skipped, NaN do not result in a NaN output. (Its assumed that the occurence of NaN's is uncorrelated) The output gives NaN, only if there are insufficient input data. The partial correlation is defined as pcc(xy|z)=(cc(x,y)-cc(x,z)*cc(y,z))/sqrt((1-cc(x,y)�)*((1-cc(x,z)�))) PARTCORRCOEF(X [,Mode]); calculates the (auto-)correlation matrix of X PARTCORRCOEF(X,Y,Z); PARTCORRCOEF(X,Y,Z,[]); PARTCORRCOEF(X,Y,Z,'Pearson'); PARTCORRCOEF(X,Y,Z,'Rank'); PARTCORRCOEF(X,Y,Z,'Spearman'); Mode=[] [default] removes from X and Y the part that can be explained by Z and computes the correlation of the remaining part. Ideally, this is equivalent to Mode='Pearson', however, in practice this is more accurate. Mode='Pearson' or 'parametric' Mode='Spearman' Mode='Rank' computes the partial correlation based on cc(x,y),cc(x,z) and cc(y,z) with the respective mode. [R,p,ci1,ci2] = PARTCORRCOEF(...); r is the partialcorrelation matrix r(i,j) is the partial correlation coefficient r between X(:,i) and Y(:,j) when influence of Z is removed. p gives the significance of PCC It tests the null hypothesis that the product moment correlation coefficient is zero using Student's t-test on the statistic t = r sqrt(N-Nz-2)/sqrt(1-r^2) where N is the number of samples (Statistics, M. Spiegel, Schaum series). p > alpha: do not reject the Null hypothesis: "R is zero". p < alpha: The alternative hypothesis "R2 is larger than zero" is true with probability (1-alpha). ci1 lower 0.95 confidence interval ci2 upper 0.95 confidence interval see also: SUMSKIPNAN, COVM, COV, COR, SPEARMAN, RANKCORR, RANKS, CORRCOEF REFERENCES: on the partial correlation coefficient [1] http://www.tufts.edu/~gdallal/partial.htm [2] http://www.nag.co.uk/numeric/fl/manual/pdf/G02/g02byf.pdf # name: # type: sq_string # elements: 1 # length: 80 PARTCORRCOEF calculates the partial correlation between X and Y after removing # name: # type: sq_string # elements: 1 # length: 10 percentile # name: # type: sq_string # elements: 1 # length: 554 PERCENTILE calculates the percentiles of histograms and sample arrays. Q = percentile(Y,q) Q = percentile(Y,q,DIM) returns the q-th percentile along dimension DIM of sample array Y. size(Q) is equal size(Y) except for dimension DIM which is size(Q,DIM)=length(Q) Q = percentile(HIS,q) returns the q-th percentile from the histogram HIS. HIS must be a HISTOGRAM struct as defined in HISTO2 or HISTO3. If q is a vector, the each row of Q returns the q(i)-th percentile see also: HISTO2, HISTO3, QUANTILE # name: # type: sq_string # elements: 1 # length: 71 PERCENTILE calculates the percentiles of histograms and sample arrays. # name: # type: sq_string # elements: 1 # length: 7 prctile # name: # type: sq_string # elements: 1 # length: 576 PRCTILE calculates the percentiles of histograms and sample arrays. (its the same than PERCENTILE.M) Q = prctile(Y,q) Q = prctile(Y,q,DIM) returns the q-th percentile along dimension DIM of sample array Y. size(Q) is equal size(Y) except for dimension DIM which is size(Q,DIM)=length(Q) Q = prctile(HIS,q) returns the q-th percentile from the histogram HIS. HIS must be a HISTOGRAM struct as defined in HISTO2 or HISTO3. If q is a vector, the each row of Q returns the q(i)-th percentile see also: HISTO2, HISTO3, QUANTILE # name: # type: sq_string # elements: 1 # length: 68 PRCTILE calculates the percentiles of histograms and sample arrays. # name: # type: sq_string # elements: 1 # length: 8 quantile # name: # type: sq_string # elements: 1 # length: 528 QUANTILE calculates the quantiles of histograms and sample arrays. Q = quantile(Y,q) Q = quantile(Y,q,DIM) returns the q-th quantile along dimension DIM of sample array Y. size(Q) is equal size(Y) except for dimension DIM which is size(Q,DIM)=length(Q) Q = quantile(HIS,q) returns the q-th quantile from the histogram HIS. HIS must be a HISTOGRAM struct as defined in HISTO2 or HISTO3. If q is a vector, the each row of Q returns the q(i)-th quantile see also: HISTO2, HISTO3, PERCENTILE # name: # type: sq_string # elements: 1 # length: 67 QUANTILE calculates the quantiles of histograms and sample arrays. # name: # type: sq_string # elements: 1 # length: 5 range # name: # type: sq_string # elements: 1 # length: 371 RANGE calculates the range of Y Missing values (encoded as NaN) are ignored. Q = range(Y) Q = range(Y,DIM) returns the range along dimension DIM of sample array Y. Q = range(HIS) returns the RANGE from the histogram HIS. HIS must be a HISTOGRAM struct as defined in HISTO2 or HISTO3. see also: IQR, MAD, HISTO2, HISTO3, PERCENTILE, QUANTILE # name: # type: sq_string # elements: 1 # length: 80 RANGE calculates the range of Y Missing values (encoded as NaN) are ignored. # name: # type: sq_string # elements: 1 # length: 8 rankcorr # name: # type: sq_string # elements: 1 # length: 668 RANKCORR calculated the rank correlation coefficient. This function is replaced by CORRCOEF. Significance test and confidence intervals can be obtained from CORRCOEF, too. R = CORRCOEF(X, [Y, ] 'Rank'); The rank correlation r = corrcoef(ranks(x)). is often confused with Spearman's rank correlation. Spearman's correlation is defined as r(x,y) = 1-6*sum((ranks(x)-ranks(y)).^2)/(N*(N*N-1)) The results are different. Here, the former version is implemented. see also: CORRCOEF, SPEARMAN, RANKS REFERENCES: [1] http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html [2] http://mathworld.wolfram.com/CorrelationCoefficient.html # name: # type: sq_string # elements: 1 # length: 54 RANKCORR calculated the rank correlation coefficient. # name: # type: sq_string # elements: 1 # length: 5 ranks # name: # type: sq_string # elements: 1 # length: 1062 RANKS gives the rank of each element in a vector. This program uses an advanced algorithm with averge effort O(m.n.log(n)) NaN in the input yields NaN in the output. r = ranks(X[,DIM]) if X is a vector, return the vector of ranks of X adjusted for ties. if X is matrix, the rank is calculated along dimension DIM. if DIM is zero or empty, the lowest dimension with more then 1 element is used. r = ranks(X,DIM,'traditional') implements the traditional algorithm with O(n^2) computational and O(n^2) memory effort r = ranks(X,DIM,'mtraditional') implements the traditional algorithm with O(n^2) computational and O(n) memory effort r = ranks(X,DIM,'advanced ') implements an advanced algorithm with O(n*log(n)) computational and O(n.log(n)) memory effort r = ranks(X,DIM,'advanced-ties') implements an advanced algorithm with O(n*log(n)) computational and O(n.log(n)) memory effort but without correction for ties This is the fastest algorithm see also: CORRCOEF, SPEARMAN, RANKCORR REFERENCES: -- # name: # type: sq_string # elements: 1 # length: 50 RANKS gives the rank of each element in a vector. # name: # type: sq_string # elements: 1 # length: 3 rms # name: # type: sq_string # elements: 1 # length: 560 RMS calculates the root mean square can deal with complex data. y = rms(x,DIM,W) DIM dimension 1 STD of columns 2 STD of rows N STD of N-th dimension default or []: first DIMENSION, with more than 1 element W weights to compute weighted s.d. (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) y estimated standard deviation features: - can deal with NaN's (missing values) - weighting of data - dimension argument also in Octave - compatible to Matlab and Octave see also: SUMSKIPNAN, MEAN # name: # type: sq_string # elements: 1 # length: 67 RMS calculates the root mean square can deal with complex data. # name: # type: sq_string # elements: 1 # length: 16 row_col_deletion # name: # type: sq_string # elements: 1 # length: 739 ROW_COL_DELETION selects the rows and columns for removing any missing values. A heuristic based on maximizing the number of remaining sample values is used. In other words, if there are more rows than columns, it is more likely that a row-wise deletion will be applied and vice versa. [rix,cix] = row_col_deletion(d) [rix,cix] = row_col_deletion(d,c,w) Input: d data (each row is a sample, each column a feature) c classlabels (not really used) [OPTIONAL] w weight for each sample vector [OPTIONAL] Output: rix selected samples cix selected columns d(rix,cix) does not contain any NaN's i.e. missing values see also: TRAIN_SC, TEST_SC # name: # type: sq_string # elements: 1 # length: 79 ROW_COL_DELETION selects the rows and columns for removing any missing values. # name: # type: sq_string # elements: 1 # length: 3 sem # name: # type: sq_string # elements: 1 # length: 695 SEM calculates the standard error of the mean [SE,M] = SEM(x [, DIM [,W]]) calculates the standard error (SE) in dimension DIM the default DIM is the first non-single dimension M returns the mean. Can deal with complex data, too. DIM dimension 1: SEM of columns 2: SEM of rows N: SEM of N-th dimension default or []: first DIMENSION, with more than 1 element W weights to compute weighted mean and s.d. (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) features: - can deal with NaN's (missing values) - weighting of data - dimension argument - compatible to Matlab and Octave see also: SUMSKIPNAN, MEAN, VAR, STD # name: # type: sq_string # elements: 1 # length: 80 SEM calculates the standard error of the mean [SE,M] = SEM(x [, DIM [,W]]) # name: # type: sq_string # elements: 1 # length: 8 skewness # name: # type: sq_string # elements: 1 # length: 405 SKEWNESS estimates the skewness y = skewness(x,DIM) calculates skewness of x in dimension DIM DIM dimension 1: STATS of columns 2: STATS of rows default or []: first DIMENSION, with more than 1 element features: - can deal with NaN's (missing values) - dimension argument - compatible to Matlab and Octave see also: SUMSKIPNAN, STATISTIC REFERENCE(S): http://mathworld.wolfram.com/ # name: # type: sq_string # elements: 1 # length: 34 SKEWNESS estimates the skewness # name: # type: sq_string # elements: 1 # length: 8 spearman # name: # type: sq_string # elements: 1 # length: 683 SPEARMAN Spearman's rank correlation coefficient. This function is replaced by CORRCOEF. Significance test and confidence intervals can be obtained from CORRCOEF. [R,p,ci1,ci2] = CORRCOEF(x, [y, ] 'Rank'); For some (unknown) reason, in previous versions Spearman's rank correlation r = corrcoef(ranks(x)). But according to [1], Spearman's correlation is defined as r = 1-6*sum((ranks(x)-ranks(y)).^2)/(N*(N*N-1)) The results are different. Here, the later version is implemented. see also: CORRCOEF, RANKCORR REFERENCES: [1] http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html [2] http://mathworld.wolfram.com/CorrelationCoefficient.html # name: # type: sq_string # elements: 1 # length: 50 SPEARMAN Spearman's rank correlation coefficient. # name: # type: sq_string # elements: 1 # length: 9 statistic # name: # type: sq_string # elements: 1 # length: 938 STATISTIC estimates various statistics at once. R = STATISTIC(x,DIM) calculates all statistic (see list of fun) in dimension DIM R is a struct with all statistics y = STATISTIC(x,fun) estimate of fun on dimension DIM y gives the statistic of fun DIM dimension 1: STATS of columns 2: STATS of rows N: STATS of N-th dimension default or []: first DIMENSION, with more than 1 element fun 'mean' mean 'std' standard deviation 'var' variance 'sem' standard error of the mean 'rms' root mean square 'meansq' mean of squares 'sum' sum 'sumsq' sum of squares 'CM#' central moment of order # 'skewness' skewness 'kurtosis' excess coefficient (Fisher kurtosis) 'mad' mean absolute deviation features: - can deal with NaN's (missing values) - dimension argument - compatible to Matlab and Octave see also: SUMSKIPNAN REFERENCE(S): [1] http://www.itl.nist.gov/ [2] http://mathworld.wolfram.com/ # name: # type: sq_string # elements: 1 # length: 48 STATISTIC estimates various statistics at once. # name: # type: sq_string # elements: 1 # length: 3 std # name: # type: sq_string # elements: 1 # length: 983 STD calculates the standard deviation. [y,v] = std(x [, opt[, DIM [, W]]]) opt option 0: normalizes with N-1 [default] provides the square root of best unbiased estimator of the variance 1: normalizes with N, this provides the square root of the second moment around the mean otherwise: best unbiased estimator of the standard deviation (see [1]) DIM dimension N STD of N-th dimension default or []: first DIMENSION, with more than 1 element W weights to compute weighted s.d. (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) y estimated standard deviation features: - provides an unbiased estimation of the S.D. - can deal with NaN's (missing values) - weighting of data - dimension argument also in Octave - compatible to Matlab and Octave see also: RMS, SUMSKIPNAN, MEAN, VAR, MEANSQ, References(s): [1] http://mathworld.wolfram.com/StandardDeviationDistribution.html # name: # type: sq_string # elements: 1 # length: 39 STD calculates the standard deviation. # name: # type: sq_string # elements: 1 # length: 10 sumskipnan # name: # type: sq_string # elements: 1 # length: 1234 SUMSKIPNAN adds all non-NaN values. All NaN's are skipped; NaN's are considered as missing values. SUMSKIPNAN of NaN's only gives O; and the number of valid elements is return. SUMSKIPNAN is also the elementary function for calculating various statistics (e.g. MEAN, STD, VAR, RMS, MEANSQ, SKEWNESS, KURTOSIS, MOMENT, STATISTIC etc.) from data with missing values. SUMSKIPNAN implements the DIMENSION-argument for data with missing values. Also the second output argument return the number of valid elements (not NaNs) Y = sumskipnan(x [,DIM]) [Y,N,SSQ] = sumskipnan(x [,DIM]) [...] = sumskipnan(x, DIM, W) x input data DIM dimension (default: []) empty DIM sets DIM to first non singleton dimension W weight vector for weighted sum, numel(W) must fit size(x,DIM) Y resulting sum N number of valid (not missing) elements SSQ sum of squares the function FLAG_NANS_OCCURED() returns whether any value in x is a not-a-number (NaN) features: - can deal with NaN's (missing values) - implements dimension argument. - computes weighted sum - compatible with Matlab and Octave see also: FLAG_NANS_OCCURED, SUM, NANSUM, MEAN, STD, VAR, RMS, MEANSQ, SSQ, MOMENT, SKEWNESS, KURTOSIS, SEM # name: # type: sq_string # elements: 1 # length: 36 SUMSKIPNAN adds all non-NaN values. # name: # type: sq_string # elements: 1 # length: 5 sumsq # name: # type: sq_string # elements: 1 # length: 391 SUMSQ calculates the sum of squares. [y] = sumsq(x [, DIM]) DIM dimension N STD of N-th dimension default or []: first DIMENSION, with more than 1 element y estimated standard deviation features: - can deal with NaN's (missing values) - dimension argument also in Octave - compatible to Matlab and Octave see also: RMS, SUMSKIPNAN, MEAN, VAR, MEANSQ, References(s): # name: # type: sq_string # elements: 1 # length: 37 SUMSQ calculates the sum of squares. # name: # type: sq_string # elements: 1 # length: 4 tcdf # name: # type: sq_string # elements: 1 # length: 254 TCDF returns student cumulative distribtion function cdf = tcdf(x,DF); Computes the CDF of the students distribution with DF degrees of freedom x,DF must be matrices of same size, or any one can be a scalar. see also: NORMCDF, TPDF, TINV # name: # type: sq_string # elements: 1 # length: 54 TCDF returns student cumulative distribtion function # name: # type: sq_string # elements: 1 # length: 7 test_sc # name: # type: sq_string # elements: 1 # length: 1441 TEST_SC: apply statistical and SVM classifier to test data R = test_sc(CC,D,TYPE [,target_Classlabel]) R.output output: "signed" distance for each class. This represents the distances between sample D and the separating hyperplane The "signed distance" is possitive if it matches the target class, and and negative if it lays on the opposite side of the separating hyperplane. R.classlabel class for output data The target class is optional. If it is provided, the following values are returned. R.kappa Cohen's kappa coefficient R.ACC Classification accuracy R.H Confusion matrix The classifier CC is typically obtained by TRAIN_SC. If a statistical classifier is used, TYPE can be used to modify the classifier. TYPE = 'MDA' mahalanobis distance based classifier TYPE = 'MD2' mahalanobis distance based classifier TYPE = 'MD3' mahalanobis distance based classifier TYPE = 'GRB' Gaussian radial basis function TYPE = 'QDA' quadratic discriminant analysis TYPE = 'LD2' linear discriminant analysis TYPE = 'LD3', 'LDA', 'FDA, 'FLDA' (Fisher's) linear discriminant analysis TYPE = 'LD4' linear discriminant analysis TYPE = 'GDBC' general distance based classifier see also: TRAIN_SC References: [1] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. John Wiley & Sons, 2001. # name: # type: sq_string # elements: 1 # length: 61 TEST_SC: apply statistical and SVM classifier to test data # name: # type: sq_string # elements: 1 # length: 8 tiedrank # name: # type: sq_string # elements: 1 # length: 272 TIEDRANK compute rank of samples, the mean value is used in case of ties this function is just a wrapper for RANKS, and provided for compatibility with the statistics toolbox of matlab(tm) R = tiedrank(X) computes the rank R of vector X see also: RANKS # name: # type: sq_string # elements: 1 # length: 80 TIEDRANK compute rank of samples, the mean value is used in case of ties this # name: # type: sq_string # elements: 1 # length: 4 tinv # name: # type: sq_string # elements: 1 # length: 330 TINV returns inverse cumulative function of the student distribution x = tinv(p,v); Computes the quantile (inverse of the CDF) of a the student cumulative distribution with mean m and standard deviation s p,v must be matrices of same size, or any one can be a scalar. see also: TPDF, TCDF, NORMPDF, NORMCDF, NORMINV # name: # type: sq_string # elements: 1 # length: 70 TINV returns inverse cumulative function of the student distribution # name: # type: sq_string # elements: 1 # length: 4 tpdf # name: # type: sq_string # elements: 1 # length: 261 TPDF returns student probability density pdf = tpdf(x,DF); Computes the PDF of a the student distribution with DF degreas of freedom x,DF must be matrices of same size, or any one can be a scalar. see also: TINV, TCDF, NORMPDF, NORMCDF, NORMINV # name: # type: sq_string # elements: 1 # length: 43 TPDF returns student probability density # name: # type: sq_string # elements: 1 # length: 16 train_lda_sparse # name: # type: sq_string # elements: 1 # length: 1689 Linear Discriminant Analysis for the Small Sample Size Problem as described in Algorithm 1 of J. Duintjer Tebbens, P. Schlesinger: 'Improving Implementation of Linear Discriminant Analysis for the High Dimension/Small Sample Size Problem', Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 423-437, 2007. Input: X ...... (sparse) training data matrix G ...... group coding matrix of the training data test ...... (sparse) test data matrix Gtest ...... group coding matrix of the test data par ...... if par = 0 then classification exploits sparsity too tol ...... tolerance to distinguish zero eigenvalues Output: err ...... Wrong classification rate (in %) trafo ...... LDA transformation vectors Reference(s): J. Duintjer Tebbens, P. Schlesinger: 'Improving Implementation of Linear Discriminant Analysis for the High Dimension/Small Sample Size Problem', Computational Statistics and Data Analysis, vol. 52, no. 1, pp. 423-437, 2007. Copyright (C) by J. Duintjer Tebbens, Institute of Computer Science of the Academy of Sciences of the Czech Republic, Pod Vodarenskou vezi 2, 182 07 Praha 8 Liben, 18.July.2006. This work was supported by the Program Information Society under project 1ET400300415. Modified for the use with Matlab6.5 by A. Schloegl, 22.Aug.2006 $Id$ This function is part of the NaN-toolbox http://pub.ist.ac.at/~schloegl/matlab/NaN/ # name: # type: sq_string # elements: 1 # length: 80 Linear Discriminant Analysis for the Small Sample Size Problem as described in # name: # type: sq_string # elements: 1 # length: 8 train_sc # name: # type: sq_string # elements: 1 # length: 7655 Train a (statistical) classifier CC = train_sc(D,classlabel) CC = train_sc(D,classlabel,MODE) CC = train_sc(D,classlabel,MODE, W) weighting D(k,:) with weight W(k) (not all classifiers supported weighting) CC contains the model parameters of a classifier which can be applied to test data using test_sc. R = test_sc(CC,D,...) D training samples (each row is a sample, each column is a feature) classlabel labels of each sample, must have the same number of rows as D. Two different encodings are supported: {-1,1}-encoding (multiple classes with separate columns for each class) or 1..M encoding. So [1;2;3;1;4] is equivalent to [+1,-1,-1,-1; [-1,+1,-1,-1; [-1,-1,+1,-1; [+1,-1,-1,-1] [-1,-1,-1,+1] Note, samples with classlabel=0 are ignored. The following classifier types are supported MODE.TYPE 'MDA' mahalanobis distance based classifier [1] 'MD2' mahalanobis distance based classifier [1] 'MD3' mahalanobis distance based classifier [1] 'GRB' Gaussian radial basis function [1] 'QDA' quadratic discriminant analysis [1] 'LD2' linear discriminant analysis (see LDBC2) [1] MODE.hyperparameter.gamma: regularization parameter [default 0] 'LD3', 'FDA', 'LDA', 'FLDA' linear discriminant analysis (see LDBC3) [1] MODE.hyperparameter.gamma: regularization parameter [default 0] 'LD4' linear discriminant analysis (see LDBC4) [1] MODE.hyperparameter.gamma: regularization parameter [default 0] 'LD5' another LDA (motivated by CSP) MODE.hyperparameter.gamma: regularization parameter [default 0] 'RDA' regularized discriminant analysis [7] MODE.hyperparameter.gamma: regularization parameter MODE.hyperparameter.lambda = gamma = 0, lambda = 0 : MDA gamma = 0, lambda = 1 : LDA [default] Hint: hyperparameter are used only in test_sc.m, testing different the hyperparameters do not need repetitive calls to train_sc, it is sufficient to modify CC.hyperparameter before calling test_sc. 'GDBC' general distance based classifier [1] '' statistical classifier, requires Mode argument in TEST_SC '###/DELETION' if the data contains missing values (encoded as NaNs), a row-wise or column-wise deletion (depending on which method removes less data values) is applied; '###/GSVD' GSVD and statistical classifier [2,3], '###/sparse' sparse [5] '###' must be 'LDA' or any other classifier 'PLS' (linear) partial least squares regression 'REG' regression analysis; 'WienerHopf' Wiener-Hopf equation 'NBC' Naive Bayesian Classifier [6] 'aNBC' Augmented Naive Bayesian Classifier [6] 'NBPW' Naive Bayesian Parzen Window [9] 'PLA' Perceptron Learning Algorithm [11] MODE.hyperparameter.alpha = alpha [default: 1] w = w + alpha * e'*x 'LMS', 'AdaLine' Least mean squares, adaptive line element, Widrow-Hoff, delta rule MODE.hyperparameter.alpha = alpha [default: 1] 'Winnow2' Winnow2 algorithm [12] 'PSVM' Proximal SVM [8] MODE.hyperparameter.nu (default: 1.0) 'LPM' Linear Programming Machine uses and requires train_LPM of the iLog CPLEX optimizer MODE.hyperparameter.c_value = 'CSP' CommonSpatialPattern is very experimental and just a hack uses a smoothing window of 50 samples. 'SVM','SVM1r' support vector machines, one-vs-rest MODE.hyperparameter.c_value = 'SVM11' support vector machines, one-vs-one + voting MODE.hyperparameter.c_value = 'RBF' Support Vector Machines with RBF Kernel MODE.hyperparameter.c_value = MODE.hyperparameter.gamma = 'SVM:LIB' libSVM [default SVM algorithm) 'SVM:bioinfo' uses and requires svmtrain from the bioinfo toolbox 'SVM:OSU' uses and requires mexSVMTrain from the OSU-SVM toolbox 'SVM:LOO' uses and requires svcm_train from the LOO-SVM toolbox 'SVM:Gunn' uses and requires svc-functios from the Gunn-SVM toolbox 'SVM:KM' uses and requires svmclass-function from the KM-SVM toolbox 'SVM:LINz' LibLinear [10] (requires train.mex from LibLinear somewhere in the path) z=0 (default) LibLinear with -- L2-regularized logistic regression z=1 LibLinear with -- L2-loss support vector machines (dual) z=2 LibLinear with -- L2-loss support vector machines (primal) z=3 LibLinear with -- L1-loss support vector machines (dual) 'SVM:LIN4' LibLinear with -- multi-class support vector machines by Crammer and Singer 'DT' decision tree - not implemented yet. {'REG','MDA','MD2','QDA','QDA2','LD2','LD3','LD4','LD5','LD6','NBC','aNBC','WienerHopf','LDA/GSVD','MDA/GSVD', 'LDA/sparse','MDA/sparse', 'PLA', 'LMS','LDA/DELETION','MDA/DELETION','NBC/DELETION','RDA/DELETION','REG/DELETION','RDA','GDBC','SVM','RBF','PSVM','SVM11','SVM:LIN4','SVM:LIN0','SVM:LIN1','SVM:LIN2','SVM:LIN3','WINNOW', 'DT'}; CC contains the model parameters of a classifier. Some time ago, CC was a statistical classifier containing the mean and the covariance of the data of each class (encoded in the so-called "extended covariance matrices". Nowadays, also other classifiers are supported. see also: TEST_SC, COVM, ROW_COL_DELETION References: [1] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. John Wiley & Sons, 2001. [2] Peg Howland and Haesun Park, Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8), 2004. dx.doi.org/10.1109/TPAMI.2004.46 [3] http://www-static.cc.gatech.edu/~kihwan23/face_recog_gsvd.htm [4] Jieping Ye, Ravi Janardan, Cheong Hee Park, Haesun Park A new optimization criterion for generalized discriminant analysis on undersampled problems. The Third IEEE International Conference on Data Mining, Melbourne, Florida, USA November 19 - 22, 2003 [5] J.D. Tebbens and P. Schlesinger (2006), Improving Implementation of Linear Discriminant Analysis for the Small Sample Size Problem Computational Statistics & Data Analysis, vol 52(1): 423-437, 2007 http://www.cs.cas.cz/mweb/download/publi/JdtSchl2006.pdf [6] H. Zhang, The optimality of Naive Bayes, http://www.cs.unb.ca/profs/hzhang/publications/FLAIRS04ZhangH.pdf [7] J.H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84:165–175, 1989. [8] G. Fung and O.L. Mangasarian, Proximal Support Vector Machine Classifiers, KDD 2001. Eds. F. Provost and R. Srikant, Proc. KDD-2001: Knowledge Discovery and Data Mining, August 26-29, 2001, San Francisco, CA. p. 77-86. [9] Kai Keng Ang, Zhang Yang Chin, Haihong Zhang, Cuntai Guan. Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface. IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). 1-8 June 2008 Page(s):2390 - 2397 [10] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A Library for Large Linear Classification, Journal of Machine Learning Research 9(2008), 1871-1874. Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear [11] http://en.wikipedia.org/wiki/Perceptron#Learning_algorithm [12] Littlestone, N. (1988) "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm" Machine Learning 285-318(2) http://en.wikipedia.org/wiki/Winnow_(algorithm) # name: # type: sq_string # elements: 1 # length: 80 Train a (statistical) classifier CC = train_sc(D,classlabel) CC = train_s # name: # type: sq_string # elements: 1 # length: 7 trimean # name: # type: sq_string # elements: 1 # length: 266 TRIMEAN yields the weighted mean of the median and the quartiles m = TRIMEAN(y). The trimean is m = (Q1+2*MED+Q3)/4 with quartile Q1 and Q3 and median MED N-dimensional data is supported REFERENCES: [1] http://mathworld.wolfram.com/Trimean.html # name: # type: sq_string # elements: 1 # length: 80 TRIMEAN yields the weighted mean of the median and the quartiles m = TRIMEA # name: # type: sq_string # elements: 1 # length: 8 trimmean # name: # type: sq_string # elements: 1 # length: 664 TRIMMEAN calculates the trimmed mean by removing the fraction of p/2 upper and p/2 lower samples. Missing values (encoded as NaN) are ignored and not taken into account. The same number from the upper and lower values are removed, and is compatible to various spreadsheet programs including GNumeric [1], LibreOffice, OpenOffice and MS Excel. Q = trimmean(Y,p) Q = trimmean(Y,p,DIM) returns the TRIMMEAN along dimension DIM of sample array Y. If p is a vector, the TRIMMEAN for each p is computed. see also: MAD, RANGE, HISTO2, HISTO3, PERCENTILE, QUANTILE References: [1] http://www.fifi.org/doc/gnumeric-doc/html/C/gnumeric-trimmean.html # name: # type: sq_string # elements: 1 # length: 80 TRIMMEAN calculates the trimmed mean by removing the fraction of p/2 upper and # name: # type: sq_string # elements: 1 # length: 5 ttest # name: # type: sq_string # elements: 1 # length: 1474 TTEST (paired) t-test For a sample X from a normal distribution with unknown mean and variance, perform a t-test of the null hypothesis `mean (X) == M'. Under the null, the test statistic T follows a Student distribution with `DF = length (X) - 1' degrees of freedom. TTEST treads NaNs as "Missing values" and ignores these. H = ttest(x,m) tests Null-hypothesis that mean of x is m. H = ttest(x,y) size of x and size of y must match, it is tested whether the difference x-y is significantly different to m=0; H = ttest(x,y,alpha) H = ttest(x,y,alpha,tail) H = ttest(x,y,alpha,tail,DIM) [H,PVAL] = ttest(...) H=1 indicates a rejection of the Null-hypothesis at a significance level of alpha (default alpha = 0.05). With the optional argument string TAIL, the alternative of interest can be selected. If TAIL is '!=' or '<>' or 'both', the null is tested against the two-sided Alternative `mean (X) ~= mean (Y)'. If TAIL is '>' or 'right', the one-sided Alternative `mean (X) > mean (Y)' is used. Similarly for '<' or 'left', the one-sided Alternative `mean (X) < mean (Y)' is used. The default is the two-sided case. H returns whether the Null-Hypotheses must be rejected. The p-value of the test is returned in PVAL. TTEST works on the first non-singleton dimension or on DIM. If no output argument is given, the p-value of the test is displayed. # name: # type: sq_string # elements: 1 # length: 80 TTEST (paired) t-test For a sample X from a normal distribution with unkno # name: # type: sq_string # elements: 1 # length: 6 ttest2 # name: # type: sq_string # elements: 1 # length: 1514 TTEST2 (unpaired) t-test For two samples x and y from normal distributions with unknown means and unknown equal variances, perform a two-sample t-test of the null hypothesis of equal means. Under the null, the test statistic T follows a Student distribution with DF degrees of freedom. TTEST2 treads NaNs as "Missing values" and ignores these. H = ttest2(x,y) H = ttest2([x;y],C,W) H = ttest2(x,y,alpha) H = ttest2(x,y,alpha,tail) H = ttest2(x,y,alpha,tail,vartype) H = ttest2(x,y,alpha,tail,vartype,DIM) [H,PVAL] = ttest2(...) [h,p,ci,stats] = ttest2(...) H=1 indicates a rejection of the Null-hypothesis at a significance level of alpha (default alpha = 0.05). With the optional argument string TAIL, the Alternative of interest can be selected. If TAIL is '!=' or '<>' or 'both', the null is tested against the two-sided Alternative `mean (X) ~= mean (Y)'. If TAIL is '>' or 'right', the one-sided Alternative `mean (X) > mean (Y)' is used. Similarly for '<' or 'left', the one-sided Alternative `mean (X) < mean (Y)' is used. The default is the two-sided case. vartype support only 'equal' (default value); the value 'unequal' is not supported. H returns whether the Null-Hypotheses must be rejected. The p-value of the test is returned in PVAL. TTEST2 works on the first non-singleton dimension or on DIM. If no output argument is given, the p-value of the test is displayed. # name: # type: sq_string # elements: 1 # length: 80 TTEST2 (unpaired) t-test For two samples x and y from normal distributions # name: # type: sq_string # elements: 1 # length: 3 var # name: # type: sq_string # elements: 1 # length: 772 VAR calculates the variance. y = var(x [, opt[, DIM]]) calculates the variance in dimension DIM the default DIM is the first non-single dimension opt 0: normalizes with N-1 [default] 1: normalizes with N DIM dimension 1: VAR of columns 2: VAR of rows N: VAR of N-th dimension default or []: first DIMENSION, with more than 1 element W weights to compute weighted variance (default: []) if W=[], all weights are 1. number of elements in W must match size(x,DIM) usage: var(x) var(x, opt, DIM) var(x, [], DIM) var(x, W, DIM) var(x, opt, DIM, W) features: - can deal with NaN's (missing values) - weighting of data - dimension argument - compatible to Matlab and Octave see also: MEANSQ, SUMSQ, SUMSKIPNAN, MEAN, RMS, STD, # name: # type: sq_string # elements: 1 # length: 29 VAR calculates the variance. # name: # type: sq_string # elements: 1 # length: 5 xcovf # name: # type: sq_string # elements: 1 # length: 1059 XCOVF generates cross-covariance function. XCOVF is the same as XCORR except X and Y can contain missing values encoded with NaN. NaN's are skipped, NaN do not result in a NaN output. The output gives NaN only if there are insufficient input data [C,N,LAGS] = xcovf(X,MAXLAG,SCALEOPT); calculates the (auto-)correlation function of X [C,N,LAGS] = xcovf(X,Y,MAXLAG,SCALEOPT); calculates the crosscorrelation function between X and Y SCALEOPT [character string] specifies the type of scaling applied to the correlation vector (or matrix). is one of: 'none' return the unscaled correlation, R, 'biased' return the biased average, R/N, 'unbiased' return the unbiassed average, R(k)/(N-|k|), 'coeff' return the correlation coefficient, R/(rms(x).rms(y)), where "k" is the lag, and "N" is the length of X. If omitted, the default value is "none". If Y is supplied but does not have the ame length as X, scale must be "none". see also: COVM, XCORR # name: # type: sq_string # elements: 1 # length: 43 XCOVF generates cross-covariance function. # name: # type: sq_string # elements: 1 # length: 7 xptopen # name: # type: sq_string # elements: 1 # length: 723 XPTOPEN read of several file formats and writing of the SAS Transport Format (*.xpt) Supported are ARFF, SAS-XPT and STATA files. XPTOPEN is a mex-file and must be compiled before use. More detailed help can be obtained by the command xptopen without an additional argument X = xptopen(filename) X = xptopen(filename,'r') read file with filename and return variables in struct X X = xptopen(filename,'w',X) save fields of struct X in filename. The fields of X must be column vectors of equal length. Each vector is either a numeric vector or a cell array of strings. The SAS-XPT format stores Date/Time as numeric value counting the number of days since 1960-01-01. # name: # type: sq_string # elements: 1 # length: 80 XPTOPEN read of several file formats and writing of the SAS Transport Format (* # name: # type: sq_string # elements: 1 # length: 4 xval # name: # type: sq_string # elements: 1 # length: 2980 XVAL is used for crossvalidation [R,CC] = xval(D,classlabel) .. = xval(D,classlabel,CLASSIFIER) .. = xval(D,classlabel,CLASSIFIER,type) .. = xval(D,{classlabel,W},CLASSIFIER) .. = xval(D,{classlabel,W,NG},CLASSIFIER) example: load_fisheriris; %builtin iris dataset C = species; K = 5; NG = [1:length(C)]'*K/length(C); [R,CC] = xval(meas,{C,[],NG},'NBC'); Input: D: data features (one feature per column, one sample per row) classlabel labels of each sample, must have the same number of rows as D. Two different encodings are supported: {-1,1}-encoding (multiple classes with separate columns for each class) or 1..M encoding. So [1;2;3;1;4] is equivalent to [+1,-1,-1,-1; [-1,+1,-1,-1; [-1,-1,+1,-1; [+1,-1,-1,-1] [-1,-1,-1,+1] Note, samples with classlabel=0 are ignored. CLASSIFIER can be any classifier supported by train_sc (default='LDA') {'REG','MDA','MD2','QDA','QDA2','LD2','LD3','LD4','LD5','LD6','NBC','aNBC','WienerHopf', 'RDA','GDBC', 'SVM','RBF','PSVM','SVM11','SVM:LIN4','SVM:LIN0','SVM:LIN1','SVM:LIN2','SVM:LIN3','WINNOW'} these can be modified by ###/GSVD, ###/sparse and ###/DELETION. /DELETION removes in case of NaN's either the rows or the columns (which removes less data values) with any NaN /sparse and /GSVD preprocess the data an reduce it to some lower-dimensional space. Hyperparameters (like alpha for PLA, gamma/lambda for RDA, c_value for SVM, etc) can be defined as CLASSIFIER.hyperparameter.alpha, etc. and CLASSIFIER.TYPE = 'PLA' (as listed above). See train_sc for details. W: weights for each sample (row) in D. default: [] (i.e. all weights are 1) number of elements in W must match the number of rows of D NG: used to define the type of cross-valdiation Leave-One-Out-Method (LOOM): NG = [1:length(classlabel)]' (default) Leave-K-Out-Method: NG = ceil([1:length(classlabel)]'/K) K-fold XV: NG = ceil([1:length(classlabel)]'*K/length(classlabel)) group-wise XV (if samples are not indepentent) can be also defined here samples from the same group (dependent samples) get the same identifier samples from different groups get different classifiers TYPE: defines the type of cross-validation procedure if NG is not specified 'LOOM' leave-one-out-method k k-fold crossvalidation OUTPUT: R contains the resulting performance metric CC contains the classifier plota(R) shows the confusion matrix of the results see also: TRAIN_SC, TEST_SC, CLASSIFY, PLOTA References: [1] R. Duda, P. Hart, and D. Stork, Pattern Classification, second ed. John Wiley & Sons, 2001. [2] A. Schlögl, J. Kronegg, J.E. Huggins, S. G. Mason; Evaluation criteria in BCI research. (Eds.) G. Dornhege, J.R. Millan, T. Hinterberger, D.J. McFarland, K.-R.Müller; Towards Brain-Computer Interfacing, MIT Press, 2007, p.327-342 # name: # type: sq_string # elements: 1 # length: 35 XVAL is used for crossvalidation # name: # type: sq_string # elements: 1 # length: 12 zScoreMedian # name: # type: sq_string # elements: 1 # length: 326 zScoreMedian removes the median and standardizes by the 1.483*median absolute deviation Usage: Z = zScoreMedian(X, DIM) Input: X : data DIM: dimension along which z-score should be calculated (1=columns, 2=rows) (optional, default=first dimension with more than 1 element Output: Z : z-scores # name: # type: sq_string # elements: 1 # length: 59 zScoreMedian removes the median and standardizes by the 1. # name: # type: sq_string # elements: 1 # length: 6 zscore # name: # type: sq_string # elements: 1 # length: 622 ZSCORE removes the mean and normalizes the data to a variance of 1. Can be used for Pre-Whitening of the data, too. [z,r,m] = zscore(x,DIM) z z-score of x along dimension DIM r is the inverse of the standard deviation m is the mean of x The data x can be reconstrated with x = z*diag(1./r) + repmat(m,size(z)./size(m)) z = x*diag(r) - repmat(m.*v,size(z)./size(m)) DIM dimension 1: STATS of columns 2: STATS of rows default or []: first DIMENSION, with more than 1 element see also: SUMSKIPNAN, MEAN, STD, DETREND REFERENCE(S): [1] http://mathworld.wolfram.com/z-Score.html # name: # type: sq_string # elements: 1 # length: 70 ZSCORE removes the mean and normalizes the data to a variance of 1.