backenduserlandcom:
How to hook into UserLand.Com through XML, XML-RPC and SOAP.

 
XML

About

Home

RSS

OPML

XML-RPC

SOAP



Members
Join Now
Login

   

Algorithm for ttl element

Author:   Dave Winer  
Posted: 5/27/2002; 10:27:59 AM
Topic: Algorithm for ttl element
Msg #: 141 (top msg in thread)
Prev/Next: 140/155
Reads: 24155

I'm working on a sample for the new features of RSS that I'm playing with.

For the sake of discussion, I'm assuming these will be part of a format called RSS 0.94.

Here's Scripting News, rendered in this format. This is the feed people should subscribe to. It updates every time I update the HTML version of the weblog.

I'm also maintaining, for now (not necessarily for perpetuity) a folder of archives for past issues of Scripting News in this format. (The archive starts on 5/26/02.)

On this page I'm documenting my work on an algorithm for setting the ttl element. I'll use this algorithm to set the ttl element for Scripting News. I'm sharing my ideas on this to help other developers, and perhaps to get some new ideas.

Scenario (authoring) 

Between the hours of approx 7AM and 2PM, I update Scripting News frequently, updating perhaps as often as 100 times. I add links, comments, mini-essays, images, etc. I edit, embellish, filter. Then I generally take a break, come back around 7PM and update a few times, and often do a final pass before the emails go out at 10PM. I rarely update Scripting News before the next morning.

Scenario (reading) 

Imagine an aggregator that was ttl-aware. It could get the content by reading the RSS over HTTP, or it could get the content from a Gnutella-aware app. In either case, it could rely on the ttl element to determine how often it should refresh from the source. Mark Nottingham suggests that we use HTTP caching, a good point, and I would do it this way if HTTP were the only transport to use for RSS. I want to use Gnutella, for a variety of reasons, and there's no ttl concept in Gnutella, according to Darrell Smith at Morpheus.

Goals 

If the algorithm works well, it would set the value to 15 between 7AM and 2PM, suggesting that the content be refreshed every fifteen minutes. Then at 2PM it would switch to 60 (to allow for the possibility that I might do some updates after the normal break time). Then at 7PM it would return to 15, and at 10PM it would go to 480 so the caches could fill up and not time out until the next morning at 7AM.

Implementation 

1. First, I already have a callback script that runs every time I update Scripting News. In that script, I added code to maintain a persistent array of 24 elements, one for each hour. When I update, I add one to the counter for the current hour. So, over time, the software will have a good idea of how frequently I update in any period.

2. Previously, I was only building the RSS when I update. I added code that rebuilds the RSS at least once an hour, so that the ttl element can change, even when I'm not updating.

3. Now it gets tricky. In step 1, I started tracking updates on an hourly basis. But at least for a few days, until the array fills up, that won't be a good way to generate the ttl value. So in the interim, I've implemented an algorithm that depends on how long its been since the last update. How it works: If it's been less than one hour since the last update, ttl is 15. If it's been less than three hours, it's 60. Otherwise it's 480.

4. In a few days, when the array has real data in it, I'll write code that predicts time to live, by looking forward through the persistent array, to predict how often I'll update in the coming hours.




Last update: Monday, May 27, 2002 at 11:40:12 AM Pacific.

This is a Manila site.