Weak Data Typing is Weak

When I’m teaching PHP or Perl, a conversation invariably springs up early in class related to loose data typing. The format is almost always the same…a student with some moderate amount of programming experience in a strongly typed language (C or C++ generally) is elated to find they can just stuff anything into a variable. When I opine that this is a terrible and dangerous flaw in the languages, the argument proceeds down the lines of “it’s so much easier” or how “archaic” my point of view is.

Listen, weak data typing is not easier; it’s sloppier, which is not the same thing. In fact, that sloppiness often makes things harder! This weekend, for instance, I whipped a quick little script together for a friend that would parse an automatically generated XML file and make a series of corrections for him. In mid stream, however, I was stymied by a troubling little bug…one of the fields would occasionally cause a crash.

As it turns out, XML::Simple was reading a blank tag (that the program generating the XML erroneously output as <tag></tag> rather than <tag />, but I digress) and rather than storing it as a null string, it opened up a new, empty hash.

Of course, it is stupid of the initial app to output the XML in that format…and yes, XML::Simple was being idiotic for not noting that edge case…but these are the exact sorts of real-world situations that strong data-typing would have prevented (or, at least illustrated earlier).

I spend an enormous amount of time looking at other people’s code whether it be working for my ‘day job’, for a consulting contract, on open source projects, or examining the work of my students—an a single, unifying theme in loose data typing is that it is a gotcha that is often terribly difficult to track down.

So instead of adopting the burden of explicitly casting your data from type to type, you adopt the burden of searching for edge cases to prevent…to my mind, not easier.