Does mixture-of-experts works well the other way around, as a way to minimize power and hardware in common sized problems ?
And would it work in low resolution networks, like BinaryConnect ?
Does mixture-of-experts works well the other way around, as a way to minimize power and hardware in common sized problems ?
And would it work in low resolution networks, like BinaryConnect ?