It’s an interesting idea for sure. Some drawbacks I can think off:
- bigger resource usage. You will need to maintain a dump of the TLS session AND an easily extractable version
- difficulty of verification. OpenSSL / BoringSSL / etc. will all evolve and say, completely remove support for TLS versions, ciphers, TLS extensions… This might make many dumps unreadable in the future, or requiring the exact same version of a given software to read it. Perhaps adding the decoding binary to the dump would help, but then, you’d get Linux retro-compatibility issues.
- compression issues: new compression algorithms will be discovered and could reduce data usage. You’ll have a hard time doing that since TLS streams will look random to the compression software.
I don’t know. I feel like it’s a bit overkill — what are the incentives for tampering with this kind of data?
Maybe a simpler way of going about it would be to build a separate system that does the « certification » after the data is dumped; combined with multiple orgs actually dumping the data (reproducibility), this should be enough the prove that a dataset is really what it claims to be.
- bigger resource usage. You will need to maintain a dump of the TLS session AND an easily extractable version
- difficulty of verification. OpenSSL / BoringSSL / etc. will all evolve and say, completely remove support for TLS versions, ciphers, TLS extensions… This might make many dumps unreadable in the future, or requiring the exact same version of a given software to read it. Perhaps adding the decoding binary to the dump would help, but then, you’d get Linux retro-compatibility issues.
- compression issues: new compression algorithms will be discovered and could reduce data usage. You’ll have a hard time doing that since TLS streams will look random to the compression software.
I don’t know. I feel like it’s a bit overkill — what are the incentives for tampering with this kind of data?
Maybe a simpler way of going about it would be to build a separate system that does the « certification » after the data is dumped; combined with multiple orgs actually dumping the data (reproducibility), this should be enough the prove that a dataset is really what it claims to be.