The thing I like the most about the "as types" approach is that once the thing is wrapped, passing it around has zero boilerplate.
In this program below, the only place that suffers from having created a newtype for the strings is the place where they're created (parseArgs) and where they're used (copyFile). None of the other functions (main, helper1, helper2) ever need to peek inside the type, so there's no boilerplate but still all the type safety.
newtype Source = Source Filename
newtype Dest = Dest Filename
-- Boilerplate at usage
copyFile :: Source -> Dest -> IO ()
copyFile (Source from) (Dest to) = ...
-- Boilerplate at creation
parseArgs :: IO (Source, Dest)
parseArgs = do
[arg1, arg2] <- getArgs
(Source arg1, Dest arg2)
-- Helpers with NO boilerplate
helper1 :: Source -> Dest -> IO ()
helper1 from to = copyFile from to
helper2 :: Source -> Dest -> IO ()
helper2 from to = helper1 from to
main :: IO ()
main = do
(from, to) <- parseArgs
helper2 from to
And by "all the type safety," I mean it's a type error to accidentally swap the args incorrectly to the function call:
class Source(str):
pass
class Dest(str):
pass
def helper1(a, b):
assert isinstance(a, Source)
assert isinstance(b, Dest)
copyFile(a, b)
if __name__ == “__main__”:
# obvs use argparse instead and store
# directly to Source & Dest objects
# but here to be quick:
a, b = sys.argv[1:3]
a, b = Source(a), Dest(b)
helper1(a, b)
You’re assuming I would endorse this as a good approach in Python in the real world.
I was merely trying to show how to quickly emulate the Haskell example with less code in Python. Both the Haskell approach and asserting on types in Python seem like wastes of time to me personally, and I’d rather just have copyFile by itself and consenting adults can pass the args they want to pass. Use unit tests to make sure you’re not passing paths the wrong way.
Also, what’s with the attitude? “You know how to complicate something very simple” (from ~15 lines of Python?) — geez, you must be a joy to work with.
It addresses exactly the cases you care about, whereas trying to bake it into designs or type system usually adds heavy restrictions that don’t necessarily have anything to do with your exact use cases.
Basically from a You Aren’t Gonna Need It point of view, doing it in tests lets you minimally address the real use case with much less risk of premature or incorrect abstraction.
In my experience, premature abstraction is one of the worst problems in business software. To contrast two far ends of the spectrum, minimalistic patchworks of gross hacking are asymmetrically way better than overhead of premature or incorrect abstractions.
At some point you will have to make a human decision which file to copy, only thing helping you out in that situation is that it clearly states on call-site which arg is src and which arg is dest, instead of remembering left or right argument.
Sure, I mean ultimately you could shoot cosmic rays at your computer to rearrange bits and bypass any type safety you want. I don’t get why you think this bears relevance?
I'm not sure how @Too's example is any worse than named parameters - surely the point is that it's just as likely to transpose source/destination either way.
I think the point is that it’s irrelevant, because type checking or type assertions would take place at the parameter ingestion step specifically to render transposing the arguments impossible.
Any solution could be screwed up on purpose if you’re imagining someone at an interpreter prompt getting the arguments wrong or omitting unit tests that verify a certain file is used as source and a different file as dest... which is why this kind of objection just totally doesn’t matter.
But it's the "wrong" solution for python. Python is heavily focused around the idea of duck-typing, and forcing things like this completely blocks duck-typing. What if I want to pass down a complex object that can be used as a string? Python functions should let me pass any string-like thing I want, which is why named arguments are the preferred way to deal with this. You can even force named arguments using kwargs, I believe this is what the boto interface to AWS does.
Your argument doesn’t work. Python affords you tools like metaclasses, __instancecheck__, __subclasshook__, properties and descriptors exactly so you can use the type system to determine how duck typing is handled in whatever cases you want.
Using inheritance and metaprogramming to enforce how and when a given instance or class object satisfies the contract of some interface is a core, fundamental, first-class aspect of Python.
Just because you don’t need to use it very often doesn’t mean “it’s the wrong choice for Python” or anything like that at all.
It is a thing Python goes way out of its way to enthusiastically support, so therefore it is absolutely a Pythonic way to solve problems.
In static typing paradigms, the idea most certainly is to make it foolproof and not merely easy to spot. Literally to convert the problem into something the compiler can prove is unbroken.
Not saying this makes static typing better or anything, just pointing out that “easy to spot in code review” is massively different from “absolute mathematical proof this problem isn’t affecting me.”
I’d argue you rarely care about such compiler proofs in real software development, but that’s beside the point if you assume a static typing paradigm has already been chosen.
Not necessarily, you can use coerce or Newtype instances, so you'd import one thing to deal with all the boilerplate wrapper types in all the modules they come from. This is specific to Haskell, but you could mimic it in other languages.
> Types are good but having 10 types of string doesn't make sense.
It really does make sense. Types like string, int, float, etc. are poorly defined for most things they represent. There's no context and are just one step away from being dynamic, like values in a dynamic language.
By enforcing a contextual contract (perhaps with additional constructors that constrain the value) you leverage the type-system to do the hard work of checking for stupid mistakes. On large code-bases this reduction in cognitive complexity makes a whole class of bugs go away.
So, 100s of types of string could make sense, it really does depend how many types of thing you have which are distinct and can be represented as a string.
But once they're defined, they're not strings any more, they have an internal state which is a string, yes, but their type is not a string.
The hard part - at least in languages I've worked with - is how hard it is to create a new string that acts like a string should act but isn't a string. In many (C's typedef) you can create two new types, but nothing stops you from using mixing them up as they freely convert to each other without even a warning. In some object oriented languages you can often create a new string - but only by writing by hand every single member function.
Of course none of the above are unsolvable problems. I've never seen anything that actually makes it easy, but I won't claim to have seen everything.
I want to agree, but the problem is that you now need 10 versions for many operations that are in a string library. Hidden assumptions -- like 'replacing a character in a string can never fail' -- may fail (eg in an email address type which only allows only a limited char set). Some constraints can't be expressed without dependent types (say you want to define a URI type that limits the length to 2000 chars total but is composed of variable length components, like the domain name, query strings etc).
> but the problem is that you now need 10 versions for many operations that are in a string library
Do you? Or do you need just the operations for the things that are relevant to the type at hand? Limiting a broad API to the needed interface for the type you're modeling is a GOOD thing.
In this program below, the only place that suffers from having created a newtype for the strings is the place where they're created (parseArgs) and where they're used (copyFile). None of the other functions (main, helper1, helper2) ever need to peek inside the type, so there's no boilerplate but still all the type safety.
And by "all the type safety," I mean it's a type error to accidentally swap the args incorrectly to the function call: