Worked in a group of seven to analyze pipe data from California Steel Industries, Inc. We used different statistical and data analysis tools and algorithms to find relationships between variables. Also, we used different programming languages and applications to find patterns between the variables by running computer simulations.
We used both the Statistical Toolbox in Matlab and the Probabilistic Modeling Toolkit (PMTK) package to simulate and observe k-means clusters. Moreover, we used the Mean Squared Error (MSE) to measure the expected value of the squared error loss or quadratic loss of the data points. The conditional probability method was the next step in analyzing the data given in the excel spreadsheet.
Our Matlab simulations by factor analysis method showed that power is relative to thickness and diameter of each pipe, and where this wasn't true in the data set, there were clusters around the failures.