It depends, if the optimization is too hardware-dependent it might hurt/regress performance on other platforms. One would have to find ways to generalize and auto-tune it based on known features of the local hardware architecture.
Yes, easiest is to separate it into a set of options. Then have a bunch of Json/yaml files, one for each hw configuration. From there, the community can fiddle with the settings and share new settings if new hardware is released.