Unless this is some non-standard variant, ZRTP only negotiates a key exchange for use when encrypting the audio packets (the 'S' in 'SRTP'). Neither of those protocols has anything to do with codec selection, which is done via a SDP sent over SIP, or some other signaling protocol.
Sorry. I should just shut up about things I don't know much about. I thought the rtp part did negotiation, since they specify a "payload type" field and remembered the zrtp config in jitsi where you can specify codecs, and jumped to conclusions.
The payload type field ends up just letting you do stuff like send RTP events (like DTMF tones) over RTP by sending a different payload type that the other end can interpret in a different way than as being part of your main audio stream. Either way tho, all the payload types that you should expect to see over the channel should be negotiated beforehand, using another protocol.
But no worries... there are a ton of moving parts in these protocols, and even though I've been working with them for a while, I still tend to forget details here and there, too.