We can prevent overfitting by adding a regularization term to the expression we are minimizing, such as the sum of the squares of all coefficients scaled by a factor.
Also, if you do testing (using a separate test dataset to determine how well your model works on unseen inputs) you can determine if you are overfitting (learning even the noise present in the data - which is detrimental) or underfitting (not learning enough from the data - which is detrimental, too). In the end it's a sweet spot, and many times the features number in the hundreds or thousands, so you can't analyze by hand.
Automatic feature selection and disentangling is an amazing new advancement that came 7 years ago with the deep learning papers. Watch lectures on Restricted Boltzmann Machines by Geoffrey Hinton and Andrew Ng for this. It's what allowed Google to achieve the best speech recognition and image recognition results ever recorded.
Also, if you do testing (using a separate test dataset to determine how well your model works on unseen inputs) you can determine if you are overfitting (learning even the noise present in the data - which is detrimental) or underfitting (not learning enough from the data - which is detrimental, too). In the end it's a sweet spot, and many times the features number in the hundreds or thousands, so you can't analyze by hand.
Automatic feature selection and disentangling is an amazing new advancement that came 7 years ago with the deep learning papers. Watch lectures on Restricted Boltzmann Machines by Geoffrey Hinton and Andrew Ng for this. It's what allowed Google to achieve the best speech recognition and image recognition results ever recorded.