0
Posted April 14, 2012 by Team AnalyticpediA in Analytics
 
 

Matlab Data Analysis: Cheat Codes

matlab codes 7
matlab codes 7

Here are various Codes/Commands you would like to use while doing Data Analysis/Statistical Analysis in Matlab Tool.Various codes are shown as per components of analysis you would go through:

Data Input / Output


Import Wizard for data import:

>>File->Import Data

File input with load:

>>B = load(‘inputfile.txt’)  //inputfile.txt is the name of file user would like to upload as data-set.

File output with save:

>>save(‘dataout’, ‘A’, ‘-ascii’)

Missing Data


Removing missing data-

Removing NaN elements from vectors:

>> x = x(~isnan(x))

Removing rows with NaN from matrices:

>> X(any(isnan(X),2),:) = []

Interpolating missing data:

YI = interp1(X, Y, XI, ‘method’)  //Methods: ‘spline’, ‘nearest’, ‘linear’,etc

Correlation


Definition:Tendency of two variables to increase or decrease together.

Measure:Pearson product-moment coefficient

Correlation Example:matlab codes 1 Matlab Data Analysis: Cheat Codes

Import Data: example.dat //file format is dat,uploaded via same procedure as explained above.

Correlation coefficient & confidence interval is calculated as:

>> [R, P] = corrcoef(X);

>> [i, j] = find(P < 0.05);

Basic Fitting


Fitting is performed via section called as Figure Editor in Matlab.

Figure Editor : Tools->Basic Fittingmatlab codes 2 Matlab Data Analysis: Cheat Codes

Descriptive Statistics,we calculated various metrics like: 

Measure of Central tendency:Mean,Mode.Media,Mode as

>> m = mean(X)

>> gm = geomean(X)

>> med = median(X)

>> mod = mode(X)

Measure of Dispersion:

>> s = std(X)

>> v = var(X)

Probability Distributions


Probability density functions:

>> Y = exppdf(X, mu)

>> Y = normpdf(X, mu, sigma)

Cumulative density functions:

>> Y = expcdf(X, mu)

>> Y = normcdf(X, mu, sigma)

Parameter estimation:

>> m = expfit(data)

>> [m, s] = normfit(data)

Statistical Plots


>> bp = boxplot(X, group)  // to create box-plots

matlab codes 3 300x243 Matlab Data Analysis: Cheat Codes

matlab codes 4 300x252 Matlab Data Analysis: Cheat Codes

>> polytool(X, Y)   // Polynomial Fitting Tool

 >> dfittool  //Distribution Fitting Tool

 Linear Models


Definition: y = aX + b

y: n x 1 vector of observations

X: n x p matrix of predictors

a: p x 1 vector of parameters

b: n x 1 vector of random disturbances

Multiple linear regression:

>> [B, Bint, R, Rint, stats] = regress(y, X)    

where B: vector of regression coefficients,Bint: matrix of 95% confidence intervals for B,R: vector of residuals, Rint: intervals for diagnosing outliners, stats: vector containing R2 statistic etc.

Residuals plot:

>> rcoplot(R, Rint)

Hypothesis Testing


Definition: use of statistics to determine the probability that a given hypothesis is true.

It contains Null hypothesis (observations are the result of pure chance) and alternative hypothesis. Test statistic is performed to assess truth of null hypothesis.

P-value gives the probability of test statistic to be that significant if null hypothesis were true where Comparison of P-value is done to acceptable α-value.

Analysis of Variance (ANOVA): 

One-way ANOVA

>> anova1(X,group)

Multiple Comparisons

>> [p, tbl, stats] = anova1(X,group)

>> [c, m] = multcompare(stats)

Two-way ANOVA

>> [P, tbl, stats] = anova2(X, reps)

Other hypothesis tests

>> H = ttest(X)

>> H = lillietest(X)


Team AnalyticpediA

 
Avatar of Team AnalyticpediA
Team Analyticpedia pledge to grow even steeper than analytics and bring u the latest knowledge,news,happenings,reviews around the globe and beyond from the realms of analytics and technology.