Semantic Validation in Rust

Dowwie · on Sept 4, 2019

I've been really happy with Keats's validator library [1]. 'semval' doesn't seem as user friendly as 'validator' but maybe it's trying to solve a different problem?

[1] https://github.com/Keats/validator

maxdeviant · on Sept 5, 2019

The problem I have with this style of validation is that it doesn't "make illegal states unrepresentable". If the validation is not run then it is still possible for instances of `ContactData` that violate the business rules to enter the system.

  > Our reservation business requires that contact data entities are only accepted if all of the following conditions are satisfied:
  >
  > - The e-mail address is valid
  > - The phone number is valid
  > - Either e-mail address, or phone number, or both are present

In this case, all of the above rules can be encoded into the type system itself. Here's how I would approach this particular problem:

  struct PhoneNumber(String);
  
  impl PhoneNumber {
      fn new(value: String) -> Result<Self, ValidationError> {
          if is_valid_phone_number(value) {
              Ok(Self(value))
          } else {
              Err(ValidationError)
          }
      }
  }
  
  struct EmailAddress(String);
  
  impl EmailAddress {
      fn new(value: String) -> Result<Self, ValidationError> {
          if is_valid_email_address(value) {
              Ok(Self(value))
          } else {
              Err(ValidationError)
          }
      }
  }
  
  enum ContactData {
      PhoneOnly(PhoneNumber),
      EmailOnly(EmailAddress),
      PhoneAndEmail {
          phone: PhoneNumber,
          email: EmailAddress
      }
  }

This approach, when applied alongside Rust's modules and visibility modifiers, would make it impossible for any of these types to be in an invalid state.

nerdponx · on Sept 4, 2019

Marshmallow in Python is a bit of an odd tool.

It does a great job of validating and deserializing inputs to Python objects, which is what it's meant to do.

But this means that now you have to define your "schema" twice -- once when defining the data structure where the data will eventually reside, and once to define the validator. There are few ways around this, but there's nothing well-established yet. The default experience is still implementing everything twice.

ORMs handle the problem by automatically generating a database schema from your class definitions in code. Maybe we should be generating input validation schema from our class definitions in code as well. But then composability becomes a problem; the ORMs in Python tend to "take over" any class defining a model, and it's not obvious how you'd write a class that's both a Django/SQLAlchemy model and a Marshmallow schema, without some really verbose and messy code.

Maybe this is a Python-specific problem, and in Rust people just don't write classes very often. But I think it goes beyond the specific problem of writing class definitions, and more generally into the problem of how to ergonomically define data structures at compile time while defining validation for those data structures to be applied at run time. Edit: and perhaps even more generally, defining a single interface for the same logical data structure that can be applied across multiple systems.

I'd be interested to see how semval and validator (mentioned in another top-level comment) handle this situation, if at all.

algorithmsRcool · on Sept 4, 2019

The only real features that i see on display here are Option and Sum types. Which are great, but far from unique to Rust.

F# and it's ML ancestors are great for this also.