Matlab Data Analysis: Cheat Codes

Here are various Codes/Commands you would like to use while doing Data Analysis/Statistical Analysis in Matlab Tool.Various codes are shown as per components of analysis you would go through:
Data Input / Output
Import Wizard for data import:
>>File->Import Data
File input with load:
>>B = load(‘inputfile.txt’) //inputfile.txt is the name of file user would like to upload as data-set.
File output with save:
>>save(‘dataout’, ‘A’, ‘-ascii’)
Missing Data
Removing missing data-
Removing NaN elements from vectors:
>> x = x(~isnan(x))
Removing rows with NaN from matrices:
>> X(any(isnan(X),2),:) = []
Interpolating missing data:
YI = interp1(X, Y, XI, ‘method’) //Methods: ‘spline’, ‘nearest’, ‘linear’,etc
Correlation
Definition:Tendency of two variables to increase or decrease together.
Measure:Pearson product-moment coefficient
Import Data: example.dat //file format is dat,uploaded via same procedure as explained above.
Correlation coefficient & confidence interval is calculated as:
>> [R, P] = corrcoef(X);
>> [i, j] = find(P < 0.05);
Basic Fitting
Fitting is performed via section called as Figure Editor in Matlab.
Figure Editor : Tools->Basic Fitting
Descriptive Statistics,we calculated various metrics like:
Measure of Central tendency:Mean,Mode.Media,Mode as
>> m = mean(X)
>> gm = geomean(X)
>> med = median(X)
>> mod = mode(X)
Measure of Dispersion:
>> s = std(X)
>> v = var(X)
Probability Distributions
Probability density functions:
>> Y = exppdf(X, mu)
>> Y = normpdf(X, mu, sigma)
Cumulative density functions:
>> Y = expcdf(X, mu)
>> Y = normcdf(X, mu, sigma)
Parameter estimation:
>> m = expfit(data)
>> [m, s] = normfit(data)
Statistical Plots
>> bp = boxplot(X, group) // to create box-plots
>> polytool(X, Y) // Polynomial Fitting Tool
>> dfittool //Distribution Fitting Tool
Linear Models
Definition: y = aX + b
y: n x 1 vector of observations
X: n x p matrix of predictors
a: p x 1 vector of parameters
b: n x 1 vector of random disturbances
Multiple linear regression:
>> [B, Bint, R, Rint, stats] = regress(y, X)
where B: vector of regression coefficients,Bint: matrix of 95% confidence intervals for B,R: vector of residuals, Rint: intervals for diagnosing outliners, stats: vector containing R2 statistic etc.
Residuals plot:
>> rcoplot(R, Rint)
Hypothesis Testing
Definition: use of statistics to determine the probability that a given hypothesis is true.
It contains Null hypothesis (observations are the result of pure chance) and alternative hypothesis. Test statistic is performed to assess truth of null hypothesis.
P-value gives the probability of test statistic to be that significant if null hypothesis were true where Comparison of P-value is done to acceptable α-value.
Analysis of Variance (ANOVA):
One-way ANOVA
>> anova1(X,group)
Multiple Comparisons
>> [p, tbl, stats] = anova1(X,group)
>> [c, m] = multcompare(stats)
Two-way ANOVA
>> [P, tbl, stats] = anova2(X, reps)
Other hypothesis tests
>> H = ttest(X)
>> H = lillietest(X)