GUID Grief, Charlie Brown!
Posted May 28th, 2004 @ 06:59pm by Erik J. Barzeski
A blogger who may identify himself in the comments if he wishes wrote to me concerning behavior he'd observed in PulpFiction. This gentleman is going to be adding a "summary" feed alongside his "full article" feed and was concerned with PulpFiction's handling of these two feeds. He subscribed to his (currently secret) summary feed and checked for new articles in PulpFiction. He was surprised when none appeared.
PulpFiction did not offer up new articles because there were no new articles. The GUID - globally unique identifier - matched articles that already existed in PulpFiction's database. The question then became whether this was "a good thing&tm;." Mark Pilgrim says of this (regarding Atom feeds):
If the same entry appears in two different feeds, it must have the same ID in both places. This is not an exception to the "globally unique" rule; it's an integral part of it. An entry's ID is the key for that entry across all time and space. If the same entry appears in two places, it must have the same ID in both places - otherwise it's not really the same entry.
This gentleman is worried that readers, upon seeing the new summary feed, will switch their subscription to this feed, hit "Get Articles," and be disappointed as he was when none appear. Though Mr. Pilgrim advises against this, I suggested an alternative: prepend "sum_" (or something like this) to all of the summary feed's GUIDs, which would result in, for example:
Full Article GUID: 1234@http://domain.com/ Summary GUID: s1234@http://domain.com/
If this guy wishes to go this route, with separte GUIDs across feeds, "new" articles would appear in each feed. Readers subscribed to both feeds (not that this is common, but it may be with PulpFiction's storage, searching, and filtering capabilities) get both as well. The only people who lose are those that switch feeds and have to delete 15 old articles - assuming they want to delete the articles anyway.
Brent has a little to say as well. What's your take?
Posted 28 May 2004 at 7:16pm #
My thoughts:
Two documents with similar contents aren't the same document. They're two separate, individual documents. Different versions of the same book usally have different ISBNs. If this wasn't the case, it would be very difficult to find the book you're looking for.
Internet resources need globally unique indentifiers. Without that, programmatic algorithims becomes needlessly complex. Although a human is able to look at two documents are realize they are "the same", a computer needs more definitive logic in most cases.
Keep things simple, give every resource a unique ID. If you want to relate two documents to each other, we need a new attibute.
Posted 28 May 2004 at 7:19pm #
It strikes me now that the camp could be divided in two ways: does the GUID identify the item in the feed, or the article to which it links?
Posted 28 May 2004 at 7:23pm #
Isn't Atom the solution here? I was under the impression there was "room" for both summaries and full content. Of course PF would have to give the user a choice as to which to display.
Posted 31 May 2004 at 12:20pm #
Anonymous -- yes, Atom explicitly supports summaries and full content for a given entry. Best of both worlds.
Posted 16 Jun 2004 at 7:15pm #
John is doing something daring indeed with Daring Fireball: asking for you to support his efforts. Even though I was given a key awhile ago (I purposely changed some details and flipped them around so as not to give any...