Hacker News new | past | comments | ask | show | jobs | submit login

Since PCA is geared towards reducing dimensions, it would be an example of data which has many features (aka dimensions). Data on 'errors in a manufacturing line' would be a good example because you could be capturing a large number of variables which may be contributing towards a defective product. You would be capturing features like ambient temperature, speed of the line, which employees were present, etc. You would (virtually) be throwing in the kitchen sink for features (variables) in the hope of finding what could be causing defective Teslas, for instance.

What PCA does (to reduce this large number of dimensions) is hang this data on new set of dimensions by letting the data itself indicate them. PCA starts off by choosing its first axis based on the direction of the highest degree of variance. The second axis is then chosen by looking perpendicular (orthogonal) to the first and finding the highest variance here. Basically, you continue until you've captured a majority of the variance, which should be feasible within a lower number of dimensions than that which you started. Mathematically, these features are found via eigenvectors of the covariance matrix.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: