Thursday, November 7, 2013

Why Ruby’s Binary Search is Meh



As of Ruby 2.0.0, the Array#bsearch and Range#bsearch instance methods have been added to do binary searching within an array or a range.


While this awesome algorithm is much welcomed, Ruby has it implemented in a slightly broken manner, (in my probably silly opinion).


First, let’s look at how it works, and by works I mean works in the way expected and produces a meaningful and reasonable result.


some_array = [ 0, 1, 4, 7, 10, 15, 99, 100 ]


Let’s try to find the first element in the array that is bigger than or equal to 72.


some_array.bsearch { |i| i >= 72 }
#=> 99


Cool!  And if the algo is running the way it should, that took 3 iterations, instead of the 7 iterations a linear search might take.


Now, this only works if the array is sorted, which is a requirement of the binary search algorithm in general.  Let’s see why:


The binary search algorithm works as fast as it does because it makes assumptions.  It assumes that the values are in order, because it uses this assumption to “throw away” portions of the collection at a time.  In our example, it did something like this:


[ 0, 1, 4, 7, 10, 15, 99, 100 ]
hmm, some_array is 8 elements large. I’m gonna check the middle first.
okay, the middle has the number 10.  well, 10 is NOT greater than or equal to 72, and since everything is in order, everything below it is useless too, so my new array is… 
[ 15, 99, 100 ]
okay, this is 3 values large, so I’ll check the middle again
99 IS larger than or equal to 72, so everything bigger than it can suck it!
New array is:
[ 15 ] 
well, the only element in this one is LESS than 72, so that 99 was the right answer!


Okay, this works nicely.  But what if I set the condition to be the last guy less than 7?
That should be 4, the third element of the array.

Unfortunately, you can’t tell it to give you “the last element which…”, but only “the first element which…”


And even if you try to use this logic, you’ll get unexpected results.


Let’s try to find the first guy less than 7.  Should be 0, right?


[ 0, 1, 4, 7, 10, 15, 99, 100 ] 
hmm, some_array is 8 elements large. I’m gonna check the middle first.
okay, the middle has the number 10.  well, 10 is NOT less than 7, and since everything is in order, everything below it is useless too, so my new array is… 
[ 15, 99, 100 ]


Wait, wha…?  It still threw away the bottom half!  That’s where the 0 was...


[ 15, 99, 100 ]
okay, this is 3 values large, so I’ll check the middle again
99 is NOT less than 7, so everything bigger than it can suck it!
New array is:
[ 15 ] 
well, the only element in this one is BIGGER 7, so there is no right answer!
I’d better tell the user the answer is nil.


Okay, that’s not quite what I wanted.  Is there a better way to do this?  Should the search ‘save’ the other half and look through it if the right hand fails?  Should there be other bsearch-type methods that work a little differently (like bsearch_max or something)?

Feel free to comment below to tell me why I'm wrong or what should be done.

Cheers!

No comments:

Post a Comment