Wednesday, April 30, 2008

And you, sir? What have you done lately?

In Joel Spolsky's latest tirade, we learn that:

  1. Microsoft is an illegal monopoly
  2. Microsoft and Google are hiring people to pay foosball
  3. FolderShare is a tool for uploading and download files to the internet that nobody has ever heard of.

Simply put, to Joel everyone else does everything wrong. He, meanwhile, is milking a not-even-ASP bug-track application (he probably couldn't figure out how to write a ColdFusion version). Or wrapping a proprietary DynamicDNS-like wrapper around invented-elsewhere VNC client (and somehow avoiding the far more evolved UltraVNC variant). Or writing the worst little CMS monstrosity ever created. Joel is bitter about the cost of CS graduates and blames Google and Microsoft (curiously leaving Yahoo out of the mix) for driving it up.

The thing is, he's not entirely wrong... when a company can't figure out or invest in a stronger product, like @Task or Team Foundation, FogBugz isn't a bad compromise. When your dad needs IT support behind his random cable-modem IP, CoPilot isn't a bad tool and far cheaper to use than WebEx or GoToMyPc. And well, the best I can say about CityDesk is that my technophobe wife can figure it out and Joel isn't pushing it.

Seriously, though... what have you done lately that really is worth bragging about, Joel? It's been how many years since FogBugz came out and it still can't track my billable time or hold my requirements docs? It's been years since CityDesk saw a new release, and CoPilot was written by a few interns...over a summer. What has your (self) vaunted managerial skill produced lately. You can only rest on your Excel project manager laurels for so long.

Tuesday, April 15, 2008

SQL Server Data Services via cURL

I just noticed a really cool article on using Microsoft's new SQL Server Data Services which explains how to use cURL at the command line to talk to the SSDS RESTful interface.

What's cURL?

If you've never heard of cURL, it is similar to wget in that it allows you to make HTTP requests of any web service. It can handle all the standard verbs (GET,PUT,POST,DELETE) and also supports all those lovely redirections, security and all that other nonsense. It's great for crufting up any batch/command file that could do all sorts of things as well as to ping-test REST services.

What's SSDS

If you've never heard of SSDS, it is similar to the Amazon SimpleDB. It offers the ability to push a database out to the public cloud and allow access from web applications, thick clients or whatever. What differentiates it from Google's APE-based access to BigTable or Amazon's S3/SimpleDB setup is that both of those systems are tuple-based (or name-value-based) non-relational databases. The SSDS stuff, on the other hand, is a Linq-based. This makes querying MUCH simpler to do.

The killer feature to me is that SSDS doesn't make you (the developer) worry about consistency. With SimpleDB or BigTable, the provider only guarantees "eventual consistency". This means that the changes you make will eventually be propogated through the Amazon/Google cloud. During the time the change was post, but not yet propogate your clients may see stale data which makes these services useable mostly for rarely-changed data.

SSDS doesn't have this restriction. Once your call is complete any access will result in the commited data being returned. This is a much simpler model to program against and it puts the replication issues squarely on the database server/service where it belongs. What remains to be seen is if Microsoft will really be able to scale this out reasonably.

Thursday, April 10, 2008

You can't hold onto nothing

A while back, we were looking for an easy way to count "hits" against content in a CMS-like system. For the sake of discussion, pretend we have a table called ContentEntry that represents the content. We decided we wanted to track the hits by-hour against a particular content entry, so that's the ContentEntryPlusPlus table on the right. The foreign-key is from ContentEntryPlusPlus.ContentEntryID to ContentEntry.ID.

ContentEntry

Now the trick is to insert the row if needed for a particular entry and time-slot then increment the Hits column. The simplest thing to do is to is check to see if the row exists, insert it if not, then do the update.  Something like this to find the row's ID:

SELECT TOP 1 ID FROM dbo.ContentEntryPlusPlus WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0))

Then we have to insert the row if missing:

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) VALUES (@ContentEntryID, DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) SELECT Scope_Identity() AS ID

Then we do the update like this:

UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ID = @ID -- from above SELECT or INSERT's Scope_Identity()

Obviously we have to do this inside a transaction or we could have issues and I hate multiple round-trips, so we crafted this cute statement pair to insert the row if needed and then update. Note the use of INSERT FROM coupled with a fake table whose row count is controlled by an EXISTS clause checking for the desired row. This gets executed as a single SQL command.

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) SELECT TOP 1 @ContentEntryID AS ContentEntryID ,DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0) AS TimeSlot FROM (SELECT 1 AS FakeColumn) AS FakeTable WHERE NOT EXISTS (SELECT * FROM dbo.ContentEntryPlusPlus WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)

This got tested and deployed, working as expected. The only problem is that every once in a while, for some particularly popular content, we would get a violation of the clustered-key's uniqueness check on the ContentEntryPlusPlus table. This was quite surprising, honestly as the code obviously worked when we tested it.

The only thing that could cause this is if the two calls executed the inner existence-check simultaneously and both decided an INSERT was warranted. I had assumed that locks would be acquired, and they are, for the inner SELECT, but since there are no rows to when this is executed, there are no rows locked, so both statements will plow on through. So, I just had to add a quick WITH (HOLDLOCK) hint to the inner SELECT and poof it works.

So, the moral of the story? You can't hold onto nothing...

The final version is:

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) SELECT TOP 1 @ContentEntryID AS ContentEntryID ,DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0) AS TimeSlot FROM (SELECT 1 AS FakeColumn) AS FakeTable WHERE NOT EXISTS (SELECT * FROM dbo.ContentEntryPlusPlus WITH (HOLDLOCK) WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)