Thursday, April 10, 2008

You can't hold onto nothing

A while back, we were looking for an easy way to count "hits" against content in a CMS-like system. For the sake of discussion, pretend we have a table called ContentEntry that represents the content. We decided we wanted to track the hits by-hour against a particular content entry, so that's the ContentEntryPlusPlus table on the right. The foreign-key is from ContentEntryPlusPlus.ContentEntryID to ContentEntry.ID.

ContentEntry

Now the trick is to insert the row if needed for a particular entry and time-slot then increment the Hits column. The simplest thing to do is to is check to see if the row exists, insert it if not, then do the update.  Something like this to find the row's ID:

SELECT TOP 1 ID FROM dbo.ContentEntryPlusPlus WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0))

Then we have to insert the row if missing:

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) VALUES (@ContentEntryID, DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) SELECT Scope_Identity() AS ID

Then we do the update like this:

UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ID = @ID -- from above SELECT or INSERT's Scope_Identity()

Obviously we have to do this inside a transaction or we could have issues and I hate multiple round-trips, so we crafted this cute statement pair to insert the row if needed and then update. Note the use of INSERT FROM coupled with a fake table whose row count is controlled by an EXISTS clause checking for the desired row. This gets executed as a single SQL command.

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) SELECT TOP 1 @ContentEntryID AS ContentEntryID ,DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0) AS TimeSlot FROM (SELECT 1 AS FakeColumn) AS FakeTable WHERE NOT EXISTS (SELECT * FROM dbo.ContentEntryPlusPlus WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)

This got tested and deployed, working as expected. The only problem is that every once in a while, for some particularly popular content, we would get a violation of the clustered-key's uniqueness check on the ContentEntryPlusPlus table. This was quite surprising, honestly as the code obviously worked when we tested it.

The only thing that could cause this is if the two calls executed the inner existence-check simultaneously and both decided an INSERT was warranted. I had assumed that locks would be acquired, and they are, for the inner SELECT, but since there are no rows to when this is executed, there are no rows locked, so both statements will plow on through. So, I just had to add a quick WITH (HOLDLOCK) hint to the inner SELECT and poof it works.

So, the moral of the story? You can't hold onto nothing...

The final version is:

INSERT INTO dbo.ContentEntryPlusPlus(ContentEntryID, TimeSlot) SELECT TOP 1 @ContentEntryID AS ContentEntryID ,DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0) AS TimeSlot FROM (SELECT 1 AS FakeColumn) AS FakeTable WHERE NOT EXISTS (SELECT * FROM dbo.ContentEntryPlusPlus WITH (HOLDLOCK) WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)) UPDATE dbo.ContentEntryPlusPlus SET Hits = Hits + 1 WHERE ContentEntryID = @ContentEntryID AND TimeSlot = DateAdd(hh, DateDiff(hh, 0, GetUtcDate()), 0)

2 comments:

digiguru said...

So - rather than having a dual update / insert statement, why not have a single table that stores each view as a row.

ContentEntryView
ID
ContentID
IPAddress (for tracking purposes)
DateTime

Then you can make ContentEntryPlusPlus a view on the previous table.

Select
Count(ID)
ContentID,
CAST(CONVERT(varchar, GetDate(), 101) AS DateTime) as DateTime,
FRoM ContentEntryView
Group By ContentID, CAST(CONVERT(varchar, GetDate(), 101) AS DateTime)


Plus you have the ability to graph views over time. I am assuming that you want fast insert, slow select, because this is an adminstrators view?

Marc said...

Your change would be good if I

A) could store the hits [too many]

B) wasn't storing them elsewhere [have a track server]

C) didn't care about performance [I display this hit count EVERY time I display the entry, so it has to be very fast on query]

So, given the schema is what I have to have, this is a very safe way to do UPSERT-like behavior for SQL Server 2000+. I wonder if there is a better technique, but this is very fast and known to work.