I took a course in undergrad, and was exposed to it in grad school again, and for the life of me I still don't understand the derivations either Galerkin or variational.
I learned from the structural engineering perspective. What are you struggling with? In my mind I have this logic flow: 1. strong form pde; 2. weak form; 3. discretized weak form; 4. compute integrals (numerically) over each element; 5. assemble the linear system; 6. solve the linear system.
Luckily the integrals of step 4 are already worked out in text books and research papers for all the problems people commonly use FEA for so you can almost always skip 1. 2. and 3.