It sounds like you do know how to optimize. Your important metric is just different. You're optimizing for your time rather than the computer's because that's by far the more valuable resource in your set of constraints.
Mostly unrelated: When I write heavily optimized code I prefer to write the stupidest, simplest thing that could possibly work first, even if I know it's too slow for the intended purpose. I'll leave the unoptimized version in the code base.
- It serves as a form of documentation for what the optimized stuff is supposed to do. I find this most beneficial when the primitives being used in the optimized code don't map well to the overarching flow of ideas (like how _mm256_maddubs_epi16 is just a vectorized 8-bit unsigned multiply if some preconditions on the inputs hold). The unoptimized code will follow the broader brush strokes of the fast implementation.
- More importantly, you can drop it into your test suite as an oracle to check that the optimized code actually behaves like it's supposed to on a wide variety of test cases. The ability to test any (small enough) input opens a door in terms of robustness.