Wednesday, April 29, 2015

Refactoring By Any Other Name

I like philosophy.  Specifically, I like semantics and ontology.  I like thinking about naming things, and naming them distinctively.

"There are only two hard things in computer science: cache invalidation, and naming things." - Phil Karlton[1]

Normally, "naming things" in this quote refers to naming your variables and namespaces, but we can imagine it applying equally to programming concepts themselves.

Refactoring, as defined by Martin Fowler et. al.[2], is the process of improving a code base through incremental changes to the source code.

I don't like terms whose definitions require subjectivity, but I'll be a nice guy and let Fowler have the word 'refactoring'.

Today I'm going to talk about the more objective definition of refactoring: changing code without changing its behavior.  I'm even going to give it a new name, so that we have some common (and hopefully sane) vocabulary to discuss changes to code and the affects of those changes on the behavior of the code.

So what are we talking about anyway?  We're talking about changing code.  We're talking about a transformation of the source code of a file from one state to a different state.

Definition time:

    Def.
      Transformation:
        Some change or alteration to the source code.

As you can see, this is not limited to changes which result in working/compilable code.  Adding a comment is a transformation.  Fixing a bug (usually) requires one or more transformations.

Okay, so we know what 'change' means.  But there's another word in the objective definition of refactoring.  "Behavior".  Sure, yea, we all *know* what 'behavior' is.  It's what the code does!

Here is a more formal definition:

    Def.
      Behavior:
        The mapping of the state of the program at one point of computation to another point of computation.

A "point of computation" can usually be thought of as a particular line of code, if you're programming in a structured fashion. In fact, most times we will be assuming the beginning and ending lines of a function/method definition.  As a matter of convenience, I argue that this should be the default.  So in a colloquial sense, the behavior is the mapping of your program's state from before a function/method is run to after that function/method is run.

But do note that "behavior" must be discussed with respect to two points of computation.  So the definition "what the program does" is not enough information to convey 'behavior'.

So now that we have a common ground for transformations and behaviors, we can start to classify transformation by their affects on behavior.

The first will be the Behavior-Maintaining Transformation, or BMT.

    Def.
      Behavior-Maintaining Transformation (BMT):
        A change in the source code such that the behavior (mapping of program state between two points) is maintained for all execution paths between those two points.

An example of a BMT is adding a comment (unless there's something gravely wrong with your compiler/interpreter).

When people talk about refactoring, they're usually referring to BMTs.  However, this definition also applies to minification and linting.  I doubt many proponents of refactoring would consider minification an example of it.

The other classification of transformations are those that do change the behavior of the code between two points.  I'll call these Behavior-Changing Transformations, or BCTs.

    Def.
      Behavior-Changing Transformation (BCT):
        A change in the source code such that the behavior (mapping of program state between two points) is altered for any execution paths between those two points.

Note here I said any execution paths.  Even if your change changes one special case, it is a BCT.  Fixing a bug is a BCT.  A bug fix alters the current behavior, so it must be behavior-changing (otherwise you wouldn't fix the bug).

Since we live in the real world where real work is done, the situation is more hairy.  For example, you may introduce a BCT whose changes don't affect you because you never use the code in a manner in which the change is evident.  For example, take the case of a function which you change and becomes incorrect for integer values that cannot be expressed with the total matter of the universe.  This can safely be considered a BMT.  But of course, there are closer cases, such as ones where you aren't currently affected by the change, but you could write a test which exposes the new bug, but it may never be the case that you use the code that way.  Is this a BMT or a BCT? I'll leave the answer up to the reader to ponder.  But I maintain (no pun intended) that it is an important question and should be considered.



So who cares?  What can we do with these new-found definitions?  I'd say it's useful in the following sense.  If you know all of the transformations in your programming language which will not change code behavior, then you can freely alter your source code to a form which is more palatable.  We already know a few of these transformations.  Extract method, extract class, replace literal with variable, etc.  almost every refactoring is a BMT (one which is not a BMT is extract class, since you have a new class you can talk about elsewhere in your code).

Also, we can start to construct an Algebra of Programming (Transformations).  In trigonometry, we learned about identities, and how to show that one trigonometric expression was identical to another through a series of algebraic transformations, each of which were provably correct transformations (like multiplying by sin(x)/sin(x) or breaking tan(x) into sin(x)/cos(x)).

The only difference was that we were never allowed to make changes in math that resulted in incorrectness, but in programming we do it all the time, since things can work well enough without being "correct".

Here is an implementation of square root in Ruby through a series of easily digestible steps/transformations.  I've labelled each step as a BCT or a BMT.  It's surprising how much work you can get done with only BMTs.  Of course, BCTs have to be written eventually, otherwise your code won't actually do anything!
https://gist.github.com/nicklink483/aefec4a7fec814300ba4

[1] https://twitter.com/timbray/status/506146595650699264
[2] http://martinfowler.com/books/refactoring.html