Wednesday, April 30, 2008

And you, sir? What have you done lately?

In Joel Spolsky's latest tirade, we learn that:

  1. Microsoft is an illegal monopoly
  2. Microsoft and Google are hiring people to pay foosball
  3. FolderShare is a tool for uploading and download files to the internet that nobody has ever heard of.

Simply put, to Joel everyone else does everything wrong. He, meanwhile, is milking a not-even-ASP bug-track application (he probably couldn't figure out how to write a ColdFusion version). Or wrapping a proprietary DynamicDNS-like wrapper around invented-elsewhere VNC client (and somehow avoiding the far more evolved UltraVNC variant). Or writing the worst little CMS monstrosity ever created. Joel is bitter about the cost of CS graduates and blames Google and Microsoft (curiously leaving Yahoo out of the mix) for driving it up.

The thing is, he's not entirely wrong... when a company can't figure out or invest in a stronger product, like @Task or Team Foundation, FogBugz isn't a bad compromise. When your dad needs IT support behind his random cable-modem IP, CoPilot isn't a bad tool and far cheaper to use than WebEx or GoToMyPc. And well, the best I can say about CityDesk is that my technophobe wife can figure it out and Joel isn't pushing it.

Seriously, though... what have you done lately that really is worth bragging about, Joel? It's been how many years since FogBugz came out and it still can't track my billable time or hold my requirements docs? It's been years since CityDesk saw a new release, and CoPilot was written by a few interns...over a summer. What has your (self) vaunted managerial skill produced lately. You can only rest on your Excel project manager laurels for so long.

Tuesday, April 15, 2008

SQL Server Data Services via cURL

I just noticed a really cool article on using Microsoft's new SQL Server Data Services which explains how to use cURL at the command line to talk to the SSDS RESTful interface.

What's cURL?

If you've never heard of cURL, it is similar to wget in that it allows you to make HTTP requests of any web service. It can handle all the standard verbs (GET,PUT,POST,DELETE) and also supports all those lovely redirections, security and all that other nonsense. It's great for crufting up any batch/command file that could do all sorts of things as well as to ping-test REST services.

What's SSDS

If you've never heard of SSDS, it is similar to the Amazon SimpleDB. It offers the ability to push a database out to the public cloud and allow access from web applications, thick clients or whatever. What differentiates it from Google's APE-based access to BigTable or Amazon's S3/SimpleDB setup is that both of those systems are tuple-based (or name-value-based) non-relational databases. The SSDS stuff, on the other hand, is a Linq-based. This makes querying MUCH simpler to do.

The killer feature to me is that SSDS doesn't make you (the developer) worry about consistency. With SimpleDB or BigTable, the provider only guarantees "eventual consistency". This means that the changes you make will eventually be propogated through the Amazon/Google cloud. During the time the change was post, but not yet propogate your clients may see stale data which makes these services useable mostly for rarely-changed data.

SSDS doesn't have this restriction. Once your call is complete any access will result in the commited data being returned. This is a much simpler model to program against and it puts the replication issues squarely on the database server/service where it belongs. What remains to be seen is if Microsoft will really be able to scale this out reasonably.

Thursday, April 10, 2008

You can't hold onto nothing

A while back, we were looking for an easy way to count "hits" against content in a CMS-like system. For the sake of discussion, pretend we have a table called ContentEntry that represents the content. We decided we wanted to track the hits by-hour against a particular content entry, so that's the ContentEntryPlusPlus table on the right. The foreign-key is from ContentEntryPlusPlus.ContentEntryID to ContentEntry.ID.

ContentEntry

Now the trick is to insert the row if needed for a particular entry and time-slot then increment the Hits column. The simplest thing to do is to is check to see if the row exists, insert it if not, then do the update.  Something like this to find the row's ID:

SELECT TOP 1 ID FROM dbo.ContentEntryPlusPlus WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0))

Then we have to insert the row if missing:

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) VALUES (@ContentEntryID, DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) SELECT Scope_Identity() AS ID

Then we do the update like this:

UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ID = @ID -- from above SELECT or INSERT's Scope_Identity()

Obviously we have to do this inside a transaction or we could have issues and I hate multiple round-trips, so we crafted this cute statement pair to insert the row if needed and then update. Note the use of INSERT FROM coupled with a fake table whose row count is controlled by an EXISTS clause checking for the desired row. This gets executed as a single SQL command.

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) SELECT TOP 1 @ContentEntryID AS ContentEntryID ,DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0) AS TimeSlot FROM (SELECT 1 AS FakeColumn) AS FakeTable WHERE NOT EXISTS (SELECT * FROM dbo.ContentEntryPlusPlus WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)

This got tested and deployed, working as expected. The only problem is that every once in a while, for some particularly popular content, we would get a violation of the clustered-key's uniqueness check on the ContentEntryPlusPlus table. This was quite surprising, honestly as the code obviously worked when we tested it.

The only thing that could cause this is if the two calls executed the inner existence-check simultaneously and both decided an INSERT was warranted. I had assumed that locks would be acquired, and they are, for the inner SELECT, but since there are no rows to when this is executed, there are no rows locked, so both statements will plow on through. So, I just had to add a quick WITH (HOLDLOCK) hint to the inner SELECT and poof it works.

So, the moral of the story? You can't hold onto nothing...

The final version is:

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) SELECT TOP 1 @ContentEntryID AS ContentEntryID ,DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0) AS TimeSlot FROM (SELECT 1 AS FakeColumn) AS FakeTable WHERE NOT EXISTS (SELECT * FROM dbo.ContentEntryPlusPlus WITH (HOLDLOCK) WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)

Sunday, March 09, 2008

Sometimes you make a hash of things.

So, hash has been on my mind lately.  No, not that kind of hash, or that kind either.  First, there was last week, when I installed Internet Explorer 8 beta 1.  I was reading the release notes and was amazed to find that # (you know, octothorpe, pound sign) was not considered part of the URL by this version.  Thus you can't link directly to a named element on a page. Eeew!

Then today, Hugh Brown dropped a comment on my diatribe post about value-types, reference-types, Equals and GetHashCode. The post has been live for many months now, and has quite a bit of Google juice. Until now, nobody has ever quibbled with the stuff I wrote, but Hugh had some interesting observations.

First the little stuff

In a minor part of his comment, he was surprised by the many overloads of GetHashCode that I suggest, wondering why I didn't just always expect callers to use the params int[] version. Quite simply, this is because by providing several overloads for a small number of arguments (5 in my example), I avoid paying the cost of allocating the array of integers and copying the values for each call to the CombineHashCodes. While this may seem like a trivial savings, remember that GetHashCode is called many times when dealing with HashTable collections and thus it is worth it to provide expedited code paths for the more common usages. Additional savings inside the CombineHashCodes method are garnered by avoiding the loop setup/iteration overhead. Finally, in optimized builds, these simpler method calls will be inlined by the compiler and/or JIT, where methods having loops in the body are never inlined (in CLR releases thus far). It is worth noting that the .Net runtime implementation does the same thing for System.Web.Util.HashCodeCombiner and System.String.Format.

To the meat of the comment

The main body of his comment was that my code actually didn't return useful values. That concerned me a lot. Given his use of Python and inlined implementation, I had to write my own test jig. Unfortunately it confirmed his complaint. On the one hand, the values he was using to test were not normal values you would expect from GetHashCode. Normally GetHashCode values are well-distributed across the entire 32-bit range of an Int32. He was using sequential, smallish, numbers which was skewing the result oddly. That said, the values SHOULD have been different for very-similar inputs. I delved a little into the code I originally wrote and found that what's on the web page does NOT match what is now in use in the BCL's internal code to combine hash codes (which is where I got the idea of left-shifting by 5 bits before XORing). I think that my code was originally based on the 1.1 BCL but I'm not really sure.

In the .Net 2.0 version, there's a class called System.Web.Util.HashCodeCombiner that actually reflects essentially the same technique as my code, with one huge and very significant difference. Where I simply left-shift the running hash code by 5 bits and then XOR in the next value, they are doing the left-shift and also adding in the running hash, then doing the XOR.

Why so shifty, anyway?

You might be wondering why do the left shift in the first place. The simple answer is that by doing a left-shift by some number of bits, we preserve the low order bits of the running hash somewhat. This prevents the incoming value from XORing away all the significance of the bits thus far, and also insures that low-byte-only intermediate hash codes don't simply cancel each other out. By shifting left 5 digits, we're simply multiplying by 32 (and thus preserving the lowest 5 digits). Then the original running hash value is added in on more time, making the effective multiplier 33. This isn't far off from Hugh's suggestion of multiplying by 37, while being significantly faster in the binary world of computers. Once the shift and add (e.g. multiplication by 33) is completed, the XOR of the new values results in much better distribution of the final value.

I've updated my code in the Utilities library, and I'm going back to the original post to point to this post and the new code. So, I owe you one, Hugh...and maybe Microsoft does too because while I was reviewing their code in the newly released BCL source code, I found a very unexpected implementation. This is the snippet in question:

    internal static int CombineHashCodes(int h1, int h2) {
        return ((h1 << 5) + h1) ^ h2; 
    }
 
    internal static int CombineHashCodes(int h1, int h2, int h3) { 
        return CombineHashCodes(CombineHashCodes(h1, h2), h3);
    } 

    internal static int CombineHashCodes(int h1, int h2, int h3, int h4) {
        return CombineHashCodes(CombineHashCodes(h1, h2), CombineHashCodes(h3, h4));
    } 

    internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5) { 
        return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), h5); 
    }
Did you see the oddity? That implementation taking 4 values does its work by calling the two-value one three times. Once to combine the first pair (h1 and h2) of arguments, once to combine the second pair (h3 and h4), then finally to combine the two intermediate values. That's a bit different than doing what the 3-value and 5-value overloads use. I personally think it should have called the 2-value against output of the 3-value to combine the 4th value (h4). That would be more like what the 3-value and 5-value overload do. In other words, the method should be:
    internal static int CombineHashCodes(int h1, int h2, int h3, int h4) {
        return CombineHashCodes(CombineHashCodes(h1, h2, h3), h4);
    }

Perhaps they don't care that the values are inconsistent, especially since they don't provide a combiner that takes a params int[] overload, but imagine if I had blindly copied that code and you got two different values from this:

   Console.WriteLine("Testing gotcha:");
   Console.WriteLine(String.Format("1,2: {0:x}", Utilities.CombineHashCodes(1, 2)));
   Console.WriteLine(String.Format("1,2,3: {0:x}", Utilities.CombineHashCodes(1, 2, 3)));
   Console.WriteLine(String.Format("1,2,3,4: {0:x}", Utilities.CombineHashCodes(1, 2, 3, 4)));
   Console.WriteLine(String.Format("1,2,3,4,5: {0:x}", Utilities.CombineHashCodes(1, 2, 3, 4, 5)));
   Console.WriteLine(String.Format("[1,2]: {0:x}", Utilities.CombineHashCodes(new int[] { 1, 2 })));
   Console.WriteLine(String.Format("[1,2,3]: {0:x}", Utilities.CombineHashCodes(new int[] { 1, 2, 3 })));
   Console.WriteLine(String.Format("[1,2,3,4]: {0:x}", Utilities.CombineHashCodes(new int[] { 1, 2, 3, 4 })));
   Console.WriteLine(String.Format("[1,2,3,4,5]: {0:x}", Utilities.CombineHashCodes(new int[] { 1, 2, 3, 4, 5 })));

Where we are at now

Here is the revised version of the CombineHashCodes methods from my Utilities library

    public static partial class Utilities
    {
        public static int CombineHashCodes(params int[] hashes)
        {
            int hash = 0;

            for (int index = 0; index < hashes.Length; index++)
            {
                hash = (hash << 5) + hash;
                hash ^= hashes[index];
            }

            return hash;
        }

        private static int GetEntryHash(object entry)
        {
            int entryHash = 0x61E04917; // slurped from .Net runtime internals...

            if (entry != null)
            {
                object[] subObjects = entry as object[];

                if (subObjects != null)
                {
                    entryHash = Utilities.CombineHashCodes(subObjects);
                }
                else
                {
                    entryHash = entry.GetHashCode();
                }
            }

            return entryHash;
        }

        public static int CombineHashCodes(params object[] objects)
        {
            int hash = 0;

            for (int index = 0; index < objects.Length; index++)
            {
                hash = (hash << 5) + hash;
                hash ^= GetEntryHash(objects[index]);
            }

            return hash;
        }

        public static int CombineHashCodes(int hash1, int hash2)
        {
            return ((hash1 << 5) + hash1)
                   ^ hash2;
        }

        public static int CombineHashCodes(int hash1, int hash2, int hash3)
        {
            int hash = CombineHashCodes(hash1, hash2);
            return ((hash << 5) + hash)
                   ^ hash3;
        }

        public static int CombineHashCodes(int hash1, int hash2, int hash3, int hash4)
        {
            int hash = CombineHashCodes(hash1, hash2, hash3);
            return ((hash << 5) + hash)
                   ^ hash4;
        }

        public static int CombineHashCodes(int hash1, int hash2, int hash3, int hash4, int hash5)
        {
            int hash = CombineHashCodes(hash1, hash2, hash3, hash4);
            return ((hash << 5) + hash)
                   ^ hash5;
        }

        public static int CombineHashCodes(object obj1, object obj2)
        {
            return CombineHashCodes(obj1.GetHashCode()
                , obj2.GetHashCode());
        }

        public static int CombineHashCodes(object obj1, object obj2, object obj3)
        {
            return CombineHashCodes(obj1.GetHashCode()
                , obj2.GetHashCode()
                , obj3.GetHashCode());
        }

        public static int CombineHashCodes(object obj1, object obj2, object obj3, object obj4)
        {
            return CombineHashCodes(obj1.GetHashCode()
                , obj2.GetHashCode()
                , obj3.GetHashCode()
                , obj4.GetHashCode());
        }

        public static int CombineHashCodes(object obj1, object obj2, object obj3, object obj4, object obj5)
        {
            return CombineHashCodes(obj1.GetHashCode()
                , obj2.GetHashCode()
                , obj3.GetHashCode()
                , obj4.GetHashCode()
                , obj5.GetHashCode());
        }
     }

Wednesday, January 02, 2008

Random thoughts to start the year.

Little bunny FoFo, hopping through the forest, scooping up the field mice and bopping them on the heads.

Down came the good fairy, "Little bunny FoFo, I don't want to see you scooping up the field mice and bopping them on the heads. I'll give you three chances and then I'll turn you into a goon".

  1. The Blues deserve a good bounce. I just finished watching the Blues v. Stars game and there's no question that they outplayed the Stars all but the first couple minutes. That crappy bounce off Tkachuk's skate beat us. That's all.
  2. Xen's bout with RSV teaches a couple lessons. First, no matter how hard the nights are, you will miss your boy's loud cry when it is replaced by a weak wail of despair. My soul can't handle that kind of trial very often and I thank God that he knows what I can bear. It's simple, when you can't do anything to help, you feel useless... when everything you do (sucking his nose clear, pounding the phlegm out of his lungs) actually make your child cry more, it's HARD. Second, there are things that make you remember where you are in life... I'm a senior developer, the old-wise-guy at church, and a BABY as a dad. I don't know squat.
  3. When you need something that isn't .Net all the way, expect it to be hard. In this case, Subversion to Team Foundation tools SUCK. And the Subversion paradigm expects you to Alt.Net it and NOT use TFS, even though it is several dozen times better a tool.  Expect me to be announcing something based on the TFS Migration Toolkit soon, cause damn sure Microsoft isn't going to bother.
  4. Be VERY careful what skills you teach your children. No matter if they are good, or bad, they will be used against you.
    The other day, at a Blues home game; my 4 year old daughter, Arianna, was squirming a bit in the seat. I told her that if she didn't stop I was going to turn her into a goon. The next whistle (she's a GOOD hockey fan) she asks, "Dad, are fairies real?".
    My spidey sense being dull, I answered, "No, they're just like Santa Claus, just a character."
    She replies with no delay, "Then you can't turn me into a goon."