# Matlab Data Analysis: Cheat Codes

Here are various Codes/Commands you would like to use while doing Data Analysis/Statistical Analysis in Matlab Tool.Various codes are shown as per components of analysis you would go through:

**Data Input / Output**

Import Wizard for data import:

*>>File->Import Data*

File input with load:

*>>B = load(‘inputfile.txt’)* //inputfile.txt is the name of file user would like to upload as data-set.

File output with save:

*>>save(‘dataout’, ‘A’, ‘-ascii’)*

**Missing Data**

Removing missing data-

Removing NaN elements from vectors:

*>> x = x(~isnan(x))*

Removing rows with NaN from matrices:

>> X(any(isnan(X),2),:) = []

Interpolating missing data:

*YI = interp1(X, Y, XI, ‘method’) //*Methods: ‘spline’, ‘nearest’, ‘linear’,etc

**Correlation**

Definition:Tendency of two variables to increase or decrease together.

Measure:Pearson product-moment coefficient

Import Data: example.dat //file format is dat,uploaded via same procedure as explained above.

Correlation coefficient & confidence interval is calculated as:

*>> [R, P] = corrcoef(X);*

*>> [i, j] = find(P < 0.05);*

**Basic Fitting**

Fitting is performed via section called as Figure Editor in Matlab.

Figure Editor : Tools->Basic Fitting

Descriptive Statistics,we calculated various metrics like:

Measure of Central tendency:Mean,Mode.Media,Mode as

*>> m = mean(X)*

*>> gm = geomean(X)*

*>> med = median(X)*

*>> mod = mode(X)*

Measure of Dispersion:

*>> s = std(X)*

*>> v = var(X)*

**Probability Distributions**

Probability density functions:

*>> Y = exppdf(X, mu)*

*>> Y = normpdf(X, mu, sigma)*

Cumulative density functions:

*>> Y = expcdf(X, mu)*

*>> Y = normcdf(X, mu, sigma)*

Parameter estimation:

*>> m = expfit(data)*

*>> [m, s] = normfit(data)*

**Statistical Plots**

>> bp = boxplot(X, group) // to create box-plots

>> polytool(X, Y) // Polynomial Fitting Tool

>> dfittool //Distribution Fitting Tool

** Linear Models**

Definition: *y *= a*X + b*

y: *n x 1 *vector of observations

X: *n x p *matrix of predictors

a: *p x 1 *vector of parameters

b: *n x 1 *vector of random disturbances

Multiple linear regression:

*>> [B, Bint, R, Rint, stats] = regress(y, X) *

where B: vector of regression coefficients,Bint: matrix of 95% confidence intervals for B,R: vector of residuals, Rint: intervals for diagnosing outliners, stats: vector containing R2 statistic etc.

Residuals plot:

*>> rcoplot(R, Rint)*

**Hypothesis Testing**

Definition: use of statistics to determine the probability that a given hypothesis is true.

It contains Null hypothesis (observations are the result of pure chance) and alternative hypothesis. Test statistic is performed to assess truth of null hypothesis.

P-value gives the probability of test statistic to be that significant if null hypothesis were true where Comparison of P-value is done to acceptable α-value.

Analysis of Variance (ANOVA):

One-way ANOVA

*>> anova1(X,group)*

Multiple Comparisons

*>> [p, tbl, stats] = anova1(X,group)*

*>> [c, m] = multcompare(stats)*

Two-way ANOVA

*>> [P, tbl, stats] = anova2(X, reps)*

Other hypothesis tests

*>> H = ttest(X)*

*>> H = lillietest(X)*