Oran Dennison

Friday, April 17, 2009

The Quest for Raw Feeds in the Live Framework

Ever since I started playing with Live Framework, I’ve wanted to import Atom feeds from external sources such as blogs, Twitter, and various Google Data APIs. Surprisingly, as we will see later, this is not easy.

I am most interested in annotating Twitter feeds and synchronizing those annotations between multiple devices, apps, and users. Why would I want to do this? Because this can address a number of limitations with Twitter and existing Twitter clients:

Read/unread status isn’t shared between apps and devices
Groups and saved searches aren’t synchronized
No third option between public and protected accounts
Favorites are public (no private or semi-private favorites)
Availability issues due to fail whales, being offline, etc.
You must be online to tweet, favorite, and follow
You don’t own your data, Twitter does
No good path for migrating from centralized tweets (twitter.com) to a decentralized, federated model
Twitter’s crossdomain.xml doesn’t support direct access by Silverlight and Flash apps

That last point on crossdomain.xml is particularly frustrating since you would think that between Silverlight 3 Out-of-Browser and Live Framework it would be straight-forward to build a reasonable competitor to TweetDeck, Twhirl, or Seesmic Desktop (Adobe AIR’s killer apps), but that is not the case.

Atom in the Live Framework

Before discussing the issues with importing Atom feeds, it is useful to consider how the Live Framework uses Atom. As you may know, Atom is the Live Framework’s native infoset. These Atom feeds can be accessed using other representations such as RSS, JSON, and POX, but the abstract infoset is fundamentally Atom, supplemented by AtomPub for CRUD and FeedSync for sync.

Live Framework then layers a resource model on top of Atom. This resource model adds schemas for data such as News, Contacts, MeshDevices, MeshObjects, and more. These schemas can be discovered by using the OPTIONS HTTP verb or by reading the Resource Model documentation. As you can see from the documentation, all schemas are based on the abstract Resource schema which includes a number of general-purpose Atom properties as well as Mesh-specific properties such as Triggers.

What’s wrong with a little schema?

The resource type closest to a raw Atom entry is the DataEntryResource. Conveniently, it supports arbitrary element and attribute extensions, so it looks like you ought to be able to shove any arbitrary Atom entry into a DataEntry.

It turns out that while DataEntry’s schema provides a home for every possible Atom data element, in practice some of the elements are reserved or have special behavior. Further, this behavior is inconsistent between the local LOE and the cloud LOE, although the inconsistency can be used to hack around some of the limitations, highlighting the power of the back door endpoints used by the Live Framework Client.

Reserved elements

The Live Framework doesn’t let you store your own arbitrary data in the following elements of an Atom entry:

id
published
updated
content

If you attempt to provide your own values for these elements, they will be overwritten with auto-generated values by the cloud Live Operating Environment.

The <id> element will always be set to a random GUID such as:

The <published> and <updated> elements will always be set to the cloud LOE’s DateTime.UtcNow. Actually, this isn’t true for <published> on the local LOE but we’ll get to that later.

Forcing <published> and <updated> to DateTime.UtcNow causes problems when bulk importing entries. All of the entries will share the same time, making sorting impossible. This is particularly problematic because currently you can’t sort on custom elements, so even if you store the original <published> and <updated> elements under different names, you’re out of luck when it comes to sorting and filtering.

The <content> element is used as a grab bag for all sorts of LiveFX-specific content:

<content> is also where you will see Triggers if any have been attached to the resource.

Clobbering the <content> element is a big problem for importing blog feeds because this is where the blog post body content lives.

On a side note, it appears that the Live Framework only supports <title> and <subtitle> elements where type=“text”, but I could be wrong.

Additional elements

Live Framework preserves the original <author> element, but adds a second <author> element with the <name>, <uri>, and <email> of the LiveID user account that imported the entry. The Live Framework author element appears before any external author elements. It is perfectly valid for Atom entries to have more than one author, but this is something to watch out for if your app assumes only one author per entry or if author information has app-level significance.

I haven’t tried importing entries with pre-existing <category> or <link> elements, but I assume they would successfully import and be supplemented by additional LiveFX-specific <category> and <link> elements.

Local LOE differences

After learning how the cloud LOE behaves, I tried importing Atom feeds from Twitter into the local LOE. Unlike the cloud LOE, the local LOE preserved the original <published> element. However, the <updated> element was again set to DateTime.UtcNow, just like on the cloud LOE. At least I can now use LiveFX’s support for sorting, filtering, and paging imported Twitter feeds by the <published> date.

After successfully preserving the <published> element by using the local LOE, I checked the same feed on the cloud LOE after it synchronized. Amazingly, the cloud LOE now showed the correct original <published> date! “Aha,” I thought, “I should be able to make the cloud LOE accept the original <published> date by talking to it using FeedSync instead of AtomPub.”

Unfortunately, accessing the cloud LOE DataFeed’s Sync feed using FeedSync did not preserve the original <published> date. Very interesting! Then how did the local LOE successfully sync the original <published> dates to the cloud LOE?

The parallel universe of Windows Live Core

“This is your last chance. After this, there is no turning back. You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes.” – Morpheus, The Matrix

Ok, perhaps I’m being overly dramatic. :-) But seriously, what you know as MeshObjects, ApplicationInstances, and so on are actually CoreObjects living in the grittier, undocumented world of the Windows Live Core (WLC). The local and cloud LOEs are like the Matrix, hiding the WLC from you and letting you believe that the shiny world of Live Framework is all there is.

There are three main endpoints in WLC:

accounts.developer.mesh-ctp.com
storage.developer.mesh-ctp.com
enclosure.developer.mesh-ctp.com

Accounts is responsible for managing the 3 types of identity in the mesh: users, devices, and apps. Accounts is also responsible for ApplicationClaims (mapping apps to users) and DeviceClaims (mapping devices to users). Storage is where CoreObjects live. Enclosure is where CoreObject media resources live.

The Live Framework Client synchronizes with the cloud using storage.developer.mesh-ctp.com, not user-ctp.windows.net, and instead of hitting the DataFeed’s /Sync URI, it hits the /Sse URI. SSE is Simple Sharing Extensions, the old name for FeedSync.

storage.developer.mesh-ctp.com’s FeedSync implementation is more tolerant of arbitrary data than user-ctp.windows.net. By using this back door, the Live Framework Client is able to preserve our original <published> date.

If you want to learn more about Windows Live Core, you can use Fiddler to inspect the Live Framework Client’s communication. You can also use Reflector to check out the Microsoft.Live.Core.Resources namespace in Microsoft.MeshOperatingEnvironment.Runtime.Client.WlcProxies.dll located in C:\Users\[username]\AppData\Local\Microsoft\Live Framework Client\Bin\Moe2\.

DataFeed vs. DataEntries feed

So far I have focused on the issues when importing individual Atom entries within a feed. There are also issues when importing metadata for the feed itself. But first it is useful to discuss the differences between a DataFeed and its DataEntries feed.

In the picture above, green=entry and orange=feed. The most obvious thing to note here is that DataFeed is an entry that links to an Entries feed containing multiple DataEntries.

The Entries feed’s <title> is read-only and is always “DataEntries”. <id> and the rest of the elements listed above are auto-generated by the Live Operating Environment. The DataEntries feed isn’t extensible, so you can’t add links to things such as the original feed’s self link or alternate link.

This means that if you want to preserve any feed metadata from your original feed, you must store it somewhere else. The DataFeed entry is a likely candidate.

You can store the original feed’s title in the DataFeed’s title. You can also add any links from the original feed that don’t conflict with LOE-managed links such as self link. If you need to store the original feed’s self link, you can do this by renaming its rel and title attributes.

It should now be quite clear that importing external feeds is like putting a square peg in a round hole. Even if you find a way to do it, you won’t be able to take existing Twitter clients and easily tweak them to use Live Framework’s imported Twitter feeds.

Other considerations for apps

I noticed a few other interesting things as I was exploring. The local LOE correctly lists imported tweets in reverse chronological order (the order in which they were imported). When these tweets are synchronized with the cloud LOE, they are listed in chronological order (the exact opposite of the local LOE).

You can use Resource Scripts to import external feeds, as I demonstrated in the code sample for this post. With the new Loop statement, this becomes even easier. However, Resource Scripts can only read external feeds when the script is executed locally, not in the cloud.

Whether you’re importing one item at a time or using Resource Scripts for bulk imports, it is clearly preferable to import using the local LOE. But what if you want to use delegated auth to import data using a 3rd-party website? Unless you figure out how to hack WLC (which I doubt supports delegated auth), you’re out of luck when it comes to the <published> date because you don’t have a local LOE.

Hope for the future: Federated Storage

I don’t know much about Federated Storage Services since I didn’t have access to the pre-CTP release that supposedly exposed them, but I believe the idea is to allow third parties to create proxies to their services, similar to how Contacts and Profiles from Hotmail’s Address Book Clearing House (ABCH) are currently exposed. You can see hints of Federated Services if you paste the following link into the Live Framework Resource Browser:

https://user-ctp.windows.net/V0.1/Mesh/SomeRandomUri

I would imagine that this would provide a better option for “importing” 3rd-party data into the mesh. You would probably have much greater control over the URIs, titles, and more. Furthermore, this data wouldn’t need to be stored in Microsoft’s datacenters. This is especially important if you want to “import” feeds with sizeable enclosures such as pictures and video.

One possible downside of Federated Storage is that I imagine you won’t be able to arbitrarily annotate data in external feeds, reducing their potential in data mashup scenarios. This limitation already manifests itself in the inability to persist Delete triggers for Contacts and Profiles.

Another request for “Yahoo Pipes for AtomPub”

At the end of this blog post, I proposed using Live Framework to enable something like Yahoo Pipes for AtomPub. This would enable you to import and mash up arbitrary feeds, perhaps supplemented by a visual Resource Script designer. I believe support for raw Atom feeds is crucial for this scenario. Support for raw RSS feeds may also be desirable.

Feature request

If you want to easily build Live Framework apps that import feeds from external sources, please vote on this feature request on Connect. Thanks!

Wednesday, April 08, 2009

Live Mesh and Live Framework Presentation

Yesterday I gave a presentation on Live Mesh and the Live Framework for Software Professionals of Alaska.

High-quality WMV download (78.6 MB)

Slides (5.6MB pptx)

What did I talk about?

As you can see by Yertle the Meshified Turtle, I had some fun with the presentation.

Besides putting my own spin on the typical introductory material, I gave some guidance on things that tend to trip up developers coming from a background of relational databases, file systems, and app-owned data (36:42 onwards).

The last part of my presentation is the most interesting part (to me anyway). This is where I speculate wildly about what the future might hold. There are some good scoops in there that most people probably don’t know about. I might get in trouble for talking about them, but I assure you that every scoop or bit of speculation has a publicly available source to back it up. I’ve got them right here in my notes. Check it out starting at 42:12.

More info

As I mentioned in the presentation, the Live Framework Forum is the best one stop shop for more resources to get you started. There are several sticky threads at the top that contain everything you need.

There were two things I intended to demo but forgot. First is mobile web access to Live Mesh at http://m.mesh.com. It works in desktop browsers too, so check it out. Make your browser window tiny and pretend it’s a phone. ;-)

The second thing I forgot to demo was the Live Framework Resource Browser. Fortunately, there is already a video showing it in action. Real clicky-clicky browsing action starts around the 7-minute mark.

Afterward several people said they wanted to see what it’s like to develop a Mesh-Enabled Web App. Check out this nice 11-minute video showing how to build a Silverlight MEWA.

Since we’re on a roll with videos, I’d also recommend you check out the very well done Live Mesh ad I mentioned in the “futures” part of my presentation. If you have the bandwidth, go to the source and click the HD button. It’s beautiful.

A note on presenting

I thought I would be embarrassed to listen to a recording of myself, but it turned out better than I expected. One thing I learned is that Dan was right when he held up the “Talk louder” sign. Thanks, Dan. Sometimes I fade away at the end of sentences. I blame it on my time in Finland (their sentences all run downhill, even questions). Something to work on for next time.

Listening to myself in the third person without visual cues, I noticed several times when my dry humor was so deadpan, you might be left wondering whether it was intentional. So if you have “was that intentional?” moments while listening, the answer is probably “yes.” There was probably an equally subtle smile on my face as I said it. ;-)

Thursday, March 12, 2009

RESTful UDP: a Live Framework Feature Request

Yes, I admit, “RESTful UDP” sounds unnatural, maybe even unethical. I also admit that I’m getting ahead of myself, talking about design before requirements. So what motivates me to ask for such a feature?

I want real-time P2P messaging between users, apps, and devices. Notifications get us most of the way there, but they aren’t enough. Before I discuss the issues with notifications, let’s talk about scenarios that motivate this.

Scenarios

Imagine walking up to a large-screen Mesh-enabled device such as an XBox, Microsoft Surface, or public kiosk. You pair your Mesh-enabled smartphone and project its apps and data onto the big screen, with your smartphone acting as the data entry device.

Expanding on this scenario, imagine a game that takes advantage of the smartphone’s accelerometer and camera, turning your phone into a high-powered Wiimote with the entire touch screen used for control surface. You might want to attach a wrist strap…

Imagine the cool apps you could write if a group of people shares real-time GPS data from smartphones and carputers.

This feature would be useful for more than just extending the capabilities of smartphones. You could remotely control media playback, chat with people, push real-time financial data, and build a variety of interesting distributed apps that are designed to run in real-time across a mesh of devices, aggregating specialized device capabilities into a single composite experience.

Cross-platform support exponentially increases the possibilities and relevance of the mesh. You can imagine special-purpose devices whose entire reason for existence is to be plugged into the mesh to supplement apps and user experiences. This is true even without real-time messaging, but this capability is crucial for enabling the most seamless composite device experiences.

Who else wants this?

Back in December, Kevin Hoffman kicked off an extended discussion on this feature request. Here’s a brief excerpt:

Low Latency: Other actions that people take within the application need to happen quickly. I need very low latency between when the action takes place and when the other client(s) are notified about the action. Think of these as instant messages, though with a domain-specific purpose. Some can be directed at an individual MEWAs, others can be broadcasts. I do not currently have a solution for the low latency.

I know that Silverlight applications cannot receive push messages because of their highly restricted network Sandbox. However, I'm wondering if they would be able to create an HTTP WCF service via .NET Services and host the proxy in the cloud that would allow near-real-time HTTP message posting between MEWAs... Is this possible?

Strages has a big wish list in the Mesh forum that includes:

as mobile phones and laptops will get close and closer together.. there will get a time that you just plug your phone to a local monitor and keyboard.. or even better bring your own flexible seized monitor etc etc.. you get it

John Macintyre’s PDC session had a demo of remotely controlling a Media Center. Sync delays in the demo made for a not so seamless experience.

Warning: detailed discussion ahead

I could probably end the feature request here. Please take everything that follows as optional food for thought. If you like what you’ve heard so far and just want to vote for this, please visit this feature request on the Live Framework forum and vote it up. If you have comments, please post them on the forum rather than here.

Why UDP?

First, I don’t intend UDP to be taken literally (well maybe I do, but that’s an implementation detail). Specifically, I’m interested in the following UDP characteristics:

One-way messaging
Low latency
Stateless (lossy, no sessions, unordered, etc.)
Multicasting

It shouldn’t be necessary to expose the concept of P2P session initiation, even if that is ultimately an implementation detail.

Why REST?

I do intend REST to be taken literally. I’m interested in the following REST characteristics:

URI-addressable resources
Hyperlinks to other resources
Arbitrary user content
Transport-agnostic (yes, I think that’s RESTful)
POST and GET (PUT and DELETE don’t make sense here)

Links to arbitrary resources are an important tool for keeping message size small.

Being transport-agnostic is important, for performance and for constrained network environments. Of course HTTP should be supported, but it should also be possible to use TCP, UDP, the Messenger Relay, or even use SMS like Mesh4x does. Just like with Notifications, there should be no need for senders and receivers to use the same transport or representation.

Making this feature available to plain old DHTML Mesh apps would be pretty amazing.

Meshisms

I would also like support for some Mesh-specific features:

AtomPub feed-based model
Multiple representations (ATOM, POX, JSON, Binary XML)
Expansions
“LINQ to REST”
Send messages from triggers (a fine-grained alternative to subscriptions)
Local LOE support

Why aren’t Notifications enough?

In my Notifications post I covered some of the issues that make Notifications less than ideal for real-time messaging, even when paired with Activities.

Three round-trips per notification cycle (can be reduced to two using expansions)
Watermarks are destructive
You can’t directly publish your own notifications
Notification queues are per-user and designed for single-threaded use (one per client)
Each app does its own polling rather than sharing one connection, unlike iPhone’s push notifications
Notifications are one-way from cloud to client (no client-to-cloud or P2P)
Notification polls don’t chain from local LOE to cloud LOE
While Notifications are usually instantaneous, they can become backlogged and delayed by several minutes
The local LOE doesn’t currently implement Activities
There is no way for multiple clients to efficiently receive only unseen Activities in a single round trip

A proposed solution

There may be better solutions, but I visualize this problem being solved by a REST front-end to a “transport-agnostic UDP” messaging system. Under the hood, the system prefers direct UDP communication but can use Messenger Relay or HTTP if necessary. The front-end seen by developers is an AtomPub interface very much like the Notifications interface. When using HTTP, push messaging is enabled by parking requests for up to 30 seconds if no messages are waiting.

It is preferable to program against the local LOE. Polling the local LOE for messages causes the local LOE to poll the cloud LOE, and if any participating devices are reachable, they are also polled if another push mechanism can’t be established. It is important to be able to establish local P2P connections even if the cloud LOE isn’t reachable. Care is needed to avoid round-robin message loops.

Messages would auto-generate a short MaxAge upon receipt by each LOE, similar to Activities. It is probably most efficient to simply let Messages expire instead of explicitly deleting them.

In order to support multiple recipients of the same message, clients should be able to poll the queue with a nondestructive query string watermark. An alternative might be to let each client create its own destructive NotificationQueue that receives full copies from the main message queue, but this assumes the 3-round-trips issue is solved.

This can be a soft state service with no need for additional infrastructure recovery features.

If it is necessary to impose the UDP equivalent of Twitter’s 140-character limit, that’s fine. 65,507 bytes seems like plenty. Users can always just send links to large resources. If expansions are supported, the receiver can optionally inline the large data on demand without necessarily transmitting it across the network if it already exists in the local LOE.

Addressability

Messages should be able to target one or more users and devices, and scope a broadcast by MeshObject/AppInstance. This could be solved in a variety of ways.

You could scope all messages by MeshObject (MeshObjects/{id}/MessageQueue), with the option to subscope them further with links to Members and Mappings. This is the option I prefer. It doesn’t require directly addressing users and devices, it doesn’t require any changes to enable the “cloud device” to be addressed, and it should provide decent partitioning. A downside is that it requires separate requests for separate MeshObjects. However, if you program against a local LOE which multiplexes everything under the hood, this shouldn’t be a problem.

Or you could have one queue to rule them all at Mesh/MessageQueue. Messages would have one or more links to users, devices, and/or MeshObjects. Subscribers would poll the queue with the option of filtering on these links by query string.

Or you could add /MessageQueue to various scoping contexts such as Devices/{id} and {id}/Profiles. This would require exposing the cloud LOE as a device entry.

I’m not sure if devices or mappings from other users are currently discoverable. This would need to be addressed.

Device connectivity

I suspect there may already be a way to access existing device connectivity information, but if not, it would be very helpful to see which devices (including the cloud device) are reachable for real-time messaging.

Issues with just exposing Mesh’s P2P support

At the PDC I heard talk of exposing Mesh’s peer-to-peer channel to enable developers to establish connections between devices for streaming data. Although this is a great solution for some scenarios (and I would like to have such a feature), it’s not so great for other scenarios.

Streaming in this context is a reliable, sessionful feature. In order to get the lowest latency possible, I want unreliable, sessionless communication.

You can optimize for latency or for throughput, but not both. Streaming is optimized for high throughput. I want low latency (UDP, or disable TCP’s Nagle algorithm).

Bonus points: ad-hoc device discovery

This is a tangentially related issue that can be addressed independently. Although apps can be written to select from a known list for pairing with other users and devices, I would love to have ad-hoc user/device pairing.

There are many, many ways to accomplish this. I would like to use Microsoft Tag for pairing devices from unknown users and meshes. One device displays a tag containing its mesh address, and the other device scans the tag. The tag could be a printed tag, or displayed on a screen. The tag exchange could be two-way for greater security. For public kiosks, the tag could be changed every minute or generated on demand.

There are a whole host of additional issues raised by ad-hoc device pairing, but it sure sounds like the future to me. Someone’s going to solve this problem in a generic way, please don’t let Apple get there first. ;-)

Conclusion

Hopefully the additional detail is a helpful starting point for discussion. I am quite open to alternate solutions that can provide real-time P2P messaging. As I noted earlier, if you like this idea, please visit this feature request on the Live Framework forum and vote it up. If you have any comments, please post them on the forum thread. Thanks!

Sunday, March 08, 2009

Exploring Live Framework Notifications

In order to tune in to the data that matters to you, Live Framework lets you receive near-real-time notifications of changes by subscribing to objects and feeds. Although the SDK documentation is fairly sparse at the moment, Viraj Mody wrote an excellent blog post describing how the notification system works, and John Macintyre also gave a great PDC session on notifications. Being the curious sort, I still had many questions, so I dug in further and this blog post and ResourceClient library are the result.

Overview

Notifications are exposed in the high-level programming model through the ChangeNotificationReceived event. Under the hood, this is made possible by queues, subscriptions, and notifications working together. These three building blocks can be accessed through the resource-oriented programming model, enabling even more interesting scenarios. Even if you have no desire to program at this level, understanding how notifications work can help you take fuller advantage of the high-level model.

Slide 9 from John Macintyre’s presentation shows how the pieces tie together.

The typical usage pattern looks like this:

Client creates a queue
Client subscribes its queue to one or more resources or feeds
Client polls the queue’s Notifications feed
A resource or feed changes
Subscription Service posts a notification to each of the queues that are subscribed to the resource or feed
Client’s poll returns with one or more notifications
Client takes the watermark of the most recent notification and posts it to the queue
Client takes some action based on the notifications such as requesting the newest version of the changed resource or feed
Client polls the queue’s Notifications feed…

After the initial queue creation, a minimum of three round-trips are necessary to complete each cycle. Later we will see a trick for doing this in two round-trips.

Although the client has to poll the queue, this is actually more of a push than a pull. The HTTP request stays “parked” at the server until a notification arrives or a timeout expires (just like relay binding’s HTTP “parked requests” in ~~BizTalk Services~~ .NET Service Bus). This means that when a notification arrives, the server can immediately push it to the client on the HTTP response of the parked request, saving half a round-trip of latency. Jeremy Mazner has more to say on the subject in this Channel9 thread.

So far I have described the behavior of the cloud LOE. We will cover the client LOE’s slightly different behavior later.

Recovery from failures

In John’s diagram above, the Queue Service and Subscription Service are both in-memory soft state services. This means it is possible for a Queue Service instance to go down resulting in queue loss (including all its notifications). A Subscription Service instance can also go down, resulting in loss of subscriptions. In both cases, the client will receive a specific error notification the next time it tries to poll the queue.

In the case of queue loss, it is the responsibility of the client to create a new queue and resubscribe to each resource. In the case of subscription loss, the client is told “resources from these addresses lost subscriptions” and it just needs to resubscribe to each one using its existing queue.

As Viraj notes in his blog post,

A short summary of the solution is that in cases where one or several Queue and/or PubSub Servers go down, the system is able to detect exactly what happened and take remedial action to restore state in the cloud in cooperation with clients (because clients were the original source for all the transient data that was resident on those servers before they lost state).

What can you subscribe to?

From the high-level object model, only the following types are subscribable via ChangeNotificationReceived:

MeshObject
MeshDevice
LiveItemCollection (feeds)
- Mesh.Devices
- Mesh.MeshObjects
- Mesh.News
- MeshObject.DataFeeds
- MeshObject.Mappings
- MeshObject.Members
- MeshObject.News
- These may not work yet:
  - Contact.Profiles
  - LOE.Contacts
  - LOE.Profiles
  - Member.Profiles

If you use the resource-oriented programming model, you can subscribe to all of the resources behind the high-level objects as well as:

ApplicationResource
ApplicationInstanceResource

In addition to the high-level feeds, you can also subscribe to feeds for:

MeshObject.Activities
Applications
InstalledApplications

One quirk is that MeshObject isn’t subscribable from the local LOE, although you can still subscribe to the local MeshObjects feed.

It is useful to consider the MeshObject->DataFeed->DataEntry hierarchy in terms of what is and isn’t subscribable.

Mesh/MeshObjects/Subscriptions
Mesh/MeshObjects/{id}/Subscriptions
Mesh/MeshObjects/{id}/DataFeeds/Subscriptions
- this is a “feed of feeds”
~~Mesh/MeshObjects/{id}/DataFeeds/{id}/Subscriptions~~
- this doesn’t exist
Mesh/MeshObjects/{id}/DataFeeds/{id}/Entries/Subscriptions
~~Mesh/MeshObjects/{id}/DataFeeds/{id}/Entries/{id}/Subscriptions~~
- this doesn’t exist

What notifications do you receive?

You would expect all resources and feeds to notify you when resources are created, updated, or deleted, but that isn’t the case. Some resources and feeds don’t notify you when entries are updated, some aren’t subscribable, and some are read-only. This varies between the cloud LOE and the local LOE and between various objects. Here’s an incomplete listing of which notification triggers work based on my experimentation:

	Local	Cloud
MeshObject	Not subscribable	Update, Delete
MeshDevice	Not subscribable	Nothing? at least not Update
MeshObjects feed	Create, Update, Delete	Create, Delete (no Update)
Devices feed	Can’t update locally	Update, maybe others
DataFeeds	Create, Update, Delete	Create, Update, Delete
DataEntries	Create, Update, Delete (with double notifications for each)	Create, Update, Delete

I’m not sure why, but a subscription to DataEntries creates two notifications for each change on the local LOE.

How subscriptions work

As you have seen, you can subscribe to any resource or feed that has a Subscriptions URL (ex: Mesh/MeshObjects/{id}/Subscriptions). This Subscriptions feed only supports HTTP POST (no GET). Individual subscriptions only support PUT (no DELETE).

Subscriptions have the following interesting properties:

NotificationQueueLink
ExpirationDuration
ResourceEntityTag

NotificationQueueLink is the only thing you need to include when you create your subscription. This link should point to the SelfLink of the NotificationQueue you have created, not its NotificationsLink.

ExpirationDuration is expressed in seconds and is assigned a random number between 2700 and 4500 (45 min to 75 min) from the cloud LOE and a fixed value of 3600 (60 min) from the local LOE. The internal class NotificationManager that is used to implement ChangeNotificationReceived has a hard-coded subscriptionRenewalInterval of 60 minutes, so it seems there’s a chance of cloud subscriptions being in an unrenewed state for up to 15 minutes, but I could be wrong.

ResourceEntityTag is the ETag of the resource when the subscription was created. I believe a notification is sent each time the resource’s ETag changes.

Note that the subscription doesn’t have a link to the resource or feed you subscribed to. It is important for you to maintain your own copy of the URI to the resource you subscribed to. First, if you want to imitate ChangeNotificationReceived and execute different event handlers for different subscriptions, you will need this URI to demux from notifications to event handlers. Second, you will need this URI to create new subscriptions when you receive an AllSubscriptionsLost notification.

Clients should track their subscriptions for a variety of reasons:

You can’t GET the subscription feed
You can’t GET a subscription via its SelfLink
For subscription renewal
To recover from queue or subscription loss

How notification queues work

Notification queues are created by posting to Mesh/NotificationQueues. If using AtomPub, you can simply post an empty entry:

You can’t GET Mesh/NotificationQueues to see which queues exist, so you will need to maintain a reference to the queue you get back.

Queues are intended to only have a single consumer (one queue per client), and that consumer’s queue usage should effectively be single-threaded.

There are a few important properties on queues:

ExpirationDuration
Watermark
SelfLink
NotificationsLink

ExpirationDuration is expressed in seconds and is 300 (5 min) from the cloud LOE and 600 (10 min) from the local LOE. ExpirationDuration is the duration after which the notification queue expires if its Notifications feed isn’t polled.

Watermark is used together with PUT to remove all notifications in the queue with watermarks less than or equal to the watermark you sent. Updating a watermark is destructive. You can’t PUT an earlier watermark to roll back the queue, and that’s fine with me.

SelfLink is the URI you use for a new subscription’s NotificationQueueLink. You also PUT to the SelfLink URI when updating the queue’s watermark. SelfLink is PUT only. You can’t GET or DELETE a queue.

NotificationsLink is the feed where new notifications appear. This feed is not your typical feed. As I mentioned earlier, polling the NotificationsLink “parks” the HTTP request on the server until a notification appears or a timeout expires. This timeout varies between 25 and 30 seconds for the cloud LOE and is hopefully low enough so that intermediary proxies don’t prematurely close the connection. Requests will return immediately if the queue already contains notifications. Unfortunately, requests to the NotificationsLink on the local LOE always return immediately whether or not the queue is empty. This is a significant problem not only because the programming model is inconsistent, but because it means that in order to get the same low latency as the cloud LOE you have to hammer the snot out of the local LOE, pegging the CPU in the process. Hopefully this gets fixed. The SDK sidesteps this issue by only polling every 5 seconds, but that loses much of the benefit of push notifications.

The notifications feed behaves strangely when it contains more than one notification. Sometimes it displays all of the notifications (up to 10), and sometimes it only displays the first one. For example, you can poll the queue and see 4 notifications. You can poll it again and perhaps only see one. Polling a third time might show all 4 again. At least it always displays notifications in order (oldest first). This means you shouldn’t count on seeing more than one notification at once, but be aware that it is possible. Simply act on all of the notifications you receive, PUT the watermark of the last one, and poll the queue again. Using this technique you will eventually see all of the notifications in the queue, even if it appears there is only one or if the queue appears to be clipping the results at 10 notifications.

Queues are semi-private. They can’t be seen by other user accounts and can’t be shared. Since you can’t enumerate queues with GET and their URLs are randomly generated, the only way another app or client within the same user account can see your queue is if you choose to share the queue’s URI, although I can’t think of any good reason to do that other than for testing.

Queue loss

Queues can be lost either by failing to poll them within their ExpirationDuration or by the unexpected death of a queue manager in the LOE. You will find out the queue is gone when you poll the queue’s notifications feed and receive an AllSubscriptionsLost notification.

You might be wondering, if the queue is gone, then how can I poll its notifications feed? It turns out that if a queue doesn’t exist at a particular URI, a new “dead” queue will be automatically created at that URI. You can see this by connecting to the cloud LOE with the LiveFX Resource Browser and visiting the following URI:

Mesh/NotificationQueues/1234-1234-1234/Notifications

Note that this only works on the cloud LOE. The local LOE will return an error. Apparently you shouldn’t expect to lose a queue on the local LOE.

A weird bit of trivia is that you can create two different queues with identical URIs in two different user accounts. I’m guessing this is because queue managers are allocated per-user.

Another weird bit of trivia is that the AllSubscriptionsLost notification has a watermark that increments each time a subscription attempts (and fails) to post a notification to the queue. This is one reason why the name AllSubscriptions lost is misleading and ought to be renamed to something like QueueLost. The subscriptions are most certainly still alive.

How notifications work

You can’t create notifications directly. In other words, you can’t POST a new notification to a queue’s Notifications feed. Notifications are only created by subscriptions (and queue or subscription loss).

Notifications have the following interesting properties:

NotificationType
ResourceLink
Watermark

NotificationType can be one of the following:

ResourceChanged
SubscriptionLost
AllSubscriptionsLost
System

ResourceChanged is what you will see as the result of subscribing to a resource. SubscriptionLost is received when a Subscription Service unexpectedly dies on the cloud LOE. SubscriptionLost is not received when a subscription expires due to its ExpirationDuration reaching zero. As mentioned earlier, AllSubscriptionsLost would probably be better named QueueLost. I’m not sure if System is used anywhere in the public API. Perhaps it’s used for device connectivity.

ResourceLink points to the feed or object that changed, or in the case or SubscriptionLost, I believe it points to the resource whose subscription was lost.

Watermark is a counter string that increases with each new entry. On the cloud LOE these look like “1.248.0”, “2.248.0”, “3.248.0”. On the client LOE they are simply “1”, “2”, “3”.

Notification SelfLinks are incrementing integers on the cloud LOE and GUID-like strings on the local LOE. You can’t GET a notification using its SelfLink. You can only see notifications by polling the notifications feed. You also can’t do queries on notification feeds. If you try to use something like $skip or $top, you will get an AllSubscriptionsLost notification and every notification in the queue gets discarded without you ever seeing them. However, the subscriptions still work and the queue continues to function normally afterward (other than the data loss).

Notifications and expansions

A cool trick for saving a round-trip is to use expansions to return changed resources and feeds inline in the notification results. A not-so-cool potential side-effect is that if the notifications feed returns more than one notification for the same resource or feed, the expansion will result in duplicate expanded data, wasting bandwidth.

If you could combine watermark updates with the next poll request, you could get the poll-watermark-update cycle from 3 round-trips down to just 1 round-trip.

Notifications and activities

When you combine notifications, expansions, activities, and the cloud LOE, this enables near-real-time messaging between clients. By subscribing to an Activities feed, clients can poll the Notifications feed using $expand and receive complete activity entries as soon as someone posts a new activity. The latency in this scenario is half a round-trip to post the activity plus half a round-trip to receive the activity through your parked HTTP request to the notifications feed. Since notifications, subscriptions, and activities all use in-memory stores, this should have quite good performance. There are a number of issues with this technique that I’ll cover in a future blog post, but it is promising for near-real-time communications.

Sync bypasses update notifications

Sync appears to bypass notification of updated feed entries. Specifically, if I subscribe to the MeshObjects feed on the local LOE, I am notified when sync from the cloud causes a MeshObject to be added or removed from the feed, but I am not notified if sync causes a MeshObject to be updated. I haven’t experimented with other feed types, but the same issue might exist with feeds such as DataEntries.

Trigger support

You can use Create and Update triggers on subscriptions and queues. Delete triggers aren’t persisted and therefore won’t work. You might use this to POST a queue and create subscriptions for it in its PostCreateTrigger, saving a round trip. Of course these subscriptions wouldn’t be tracked and managed by the SDK. You could also create a MeshObject and add a subscription to it in the MeshObject’s PostCreateTrigger.

Updating resources and feeds through Resource Scripts or triggers doesn’t bypass notifications.

Client vs. Cloud

The client LOE waits for data to sync to it from the cloud before notifying you of any changes. This can take a while, so if you want quick notifications, be sure to subscribe to the cloud LOE, not the local LOE. It would sure be nice if local subscriptions caused the local LOE to subscribe to the same resource in the cloud (if connected), taking advantage of push notifications to achieve the same latency for local subscriptions as if you were subscribed directly to the cloud LOE.

It is probably stating the obvious at this point, but I think it’s worth noting that queues are one-way from cloud to client.

Notification delays

Although notifications are normally posted to queues immediately, it is possible for notifications to be delayed if there is a large backlog generated by lots of rapid updates to a subscribed resource or feed. You might see a few updates trickle in, then 15 seconds later another batch of updates appears, and so on, for several minutes.

Device connectivity

Supposedly device connectivity uses the subscription and notification services under the hood as a signal channel for P2P session establishment. It may be possible to see this in action and even participate in the process, but I haven’t explored this. There must be a reason you can subscribe directly to individual MeshDevices, but I haven’t seen anything interesting pop up yet. Check out George Moore’s description of P2P notifications and file sync for a fascinating scenario that I’m not sure is possible with the current CTP.

Other transports

Supposedly there is a TCP transport for receiving push notifications, but it doesn’t appear to be used or available at the moment. This could be useful for chaining subscriptions from the local LOE to the cloud LOE (or other clients) while preserving an HTTP programming experience for developers.

The high-level programming model

If you have managed to read this far, you should now have a healthy appreciation for the services provided by the high-level programming model. At this level, all we see is the ChangeNotificationReceived event on feeds, MeshObject, and MeshDevice. You simply subscribe to this event on the appropriate object and your event handler will be called when entries are created (on feeds only of course), updated, and deleted.

Here are some of the gory details it takes care of for you:

Queue creation
Queue polling
Updating the watermark
Subscription creation
Subscription renewal
Recovering from queue loss
Recovering from subscription loss
Demuxing from notifications to event handlers

Let’s dig into that last one a bit deeper. If you use Reflector to examine how ChangeNotificationReceived works, you will see that each object subscribes to receive any and all notifications that arrive on the queue. When these notifications are received, each object iterates through all notification entries, checking to see if the notification’s ResourceLink matches its own SelfLink. If there is a match, the object raises the ChangeNotificationReceived event and stops iterating the notification list, effectively filtering out multiple notifications for itself that might have been received in a single response. It appears that if the client’s LiveOperatingEnvironment is configured with AutoLoadRelationships, the object will then be reloaded, after the event is raised, meaning that if you examine the object in your event handler, it may not contain the latest changes. This is reported in the forums here and here and is supposed to be fixed in the next release. Those threads also mention a performance issue with notifications when using the Silverlight library that will be fixed in the next release.

Programming at the Resource level

As part of my explorations, I wrote a library called ResourceClient that makes it easier to work directly with Resources such as queues, subscriptions, and notifications. Here’s a brief example of the syntax it enables.

using (new ResourceClientContext(username, password))

    var queue = Uris.Cloud.NotificationQueues.Post(new NotificationQueueResource());

    queue.StartAutoPoll((notifications, context) =>

        if (notifications.Entries.Count() > 0)

            Console.WriteLine(notifications.ToAtomString());

});

    Uris.Cloud.MeshObjects.Subscribe(queue);

    var mo = Uris.Cloud.MeshObjects.Post(new MeshObjectResource("My Object"));

    mo.DataFeedsLink.Subscribe(queue);

    var feed = mo.DataFeedsLink.Post(new DataFeedResource("My Feed"));

This code writes to the console Atom-formatted notifications for the new MeshObject and the new DataFeed. I will cover the details of this library in another blog post. For now I will just mention that in addition to being a generic resource-oriented API, it contains notification-specific helpers for automatic queue polling, manual polling, sending watermarks, subscribing, and dispatching to different event handlers for each subscription.

MeshNotificationPlayground sample app

I have written a little WPF app that demonstrates queues, subscriptions, notifications, and activities in action.

You can create a queue, poll it, copy the queue’s URL to the clipboard for pasting into Resource Browser, select a notification and send its watermark, select a MeshObject and subscribe to its Activities feed, and create new Activities for the selected object.

Here are some things to try:

Poll an empty queue and see that the request returns after 25 to 30 seconds
Poll the queue with the WPF app and Resource Browser at the same time, notice that they both wait, then cause a new notification and see that both requests return immediately
Subscribe to multiple Activities feeds, create activities in each of them, and notice the resulting notifications have different ResourceLinks
Send a watermark that isn’t the last watermark and see that the queue only empties up to the watermark you sent
Wait 5 minutes and poll the queue to see an AllSubscriptionsLost notification
After receiving AllSubscriptionsLost, create more activities for subscribed feeds and see that AllSubscriptionsLost’s watermark increases each time
Click Refresh Activities List and watch MaxAge count down for each activity. Notice the random MaxAges. When a maxAge reaches zero, the activity disappears. See that this causes a new notification.
Create more than 10 notifications and see that no more than 10 are returned. Send the latest watermark, poll again, and see that the remaining notifications appear.
Poll the queue repeatedly when it has multiple notifications and see that sometimes only 1 notification appears.
Click Create Activity many times in a row and see that notifications for that Activities feed continue to trickle in several minutes later.

You can download the code here. It includes and uses the ResourceClient library.

Comparison to iPhone’s Push Notifications

If for some reason your eyes haven’t glazed over yet, check out Viraj’s analysis of the iPhone Push Notification Service. If you read between the lines, this is a fascinating compare-and-contrast to Live Mesh’s notification solution, with Live Mesh being better in many ways. It hints that Microsoft might imitate Apple and use notifications to mine usage metrics for apps.

Apple’s solution is slightly more efficient in that it creates a single channel from the cloud to each device, whereas Live Framework currently requires each app to establish its own channel. This could be solved by having apps subscribe to the local LOE using local queues which would then transparently chain the subscriptions and queues through a single channel to the cloud LOE.

Conclusion

Hopefully this helps clarify how you should expect notifications to behave in your apps, as well as providing ideas for more creative uses of the building block features of queues, notifications, and subscriptions. I plan to follow up with more details on my ResourceClient library, and write up a feature request that combines the best parts of notifications and activities to enable better real-time communication between clients.

Saturday, February 21, 2009

Exploring Live Framework Activities

Live Framework has an interesting undocumented feature called Activities. You can think of Activities as a more general-purpose, transient alternative to News in the Mesh. Currently the only resource for learning more about Activities is John Macintyre’s PDC session Live Services: Notifications, Awareness, and Communications. Some of the information in this post is from John’s presentation, but much of it comes from my own exploration.

What are Activities good for?

Unlike News, Activities aren’t necessarily meant to be displayed or used in a predefined way. Live Mesh uses Activities to track transient state such as which users are online, which users are currently in a folder, and which users are currently using a particular app (more details on this later). You can use Activities in your own apps to build features such as chat, remote control of apps, and near-real-time transmission of small messages (when used with notifications).

Where can you use Activities?

Only cloud LOE MeshObjects expose Activities feeds. There is no Activities link from client LOE MeshObjects. Technically ApplicationInstance also has an Activities feed, but in practice the ApplicationInstance Activities feed points to the Activities feed of the app’s MeshObject.

An interesting side note is that ApplicationInstance is essentially the same resource as its MeshObject. It shares the same entry id and contains a copy of all the MeshObject’s elements, plus a few more app-specific elements. I assume these entries map to a single CoreObject under the hood.

What do Activities look like?

An Activity has the following interesting data elements:

MemberLink (Uri)
ActivityTime (DateTimeOffset)
MaxAge (short int)
Type (string)
UserData (serialized object)

MemberLink is created for you automatically and points to the parent MeshObject’s Member entry corresponding to the user who created the Activity. You can’t create a MemberLink that points to a different user’s Member entry.

ActivityTime is optional and is separate from the entry’s published and updated times. You can specify any ActivityTime you want in the past or in the future, although you would typically set it to something like DateTimeOffset.UtcNow.

MaxAge is the number of seconds until the Activity expires. MaxAge is automatically generated for you.

Type can contain anything you want. Later we’ll see an example of how Live Mesh encodes user presence in Type.

UserData is the same GetUserData/SetUserData extensibility point you find on MeshObject and DataEntry.

The Title of an Activity is always blank, even if you try to supply one.

How do Activities behave?

The most interesting Activity behavior is the automatic expiration based on MaxAge. When an Activity is created, a random MaxAge between 600 and 900 (between 10 and 15 minutes) is assigned. This randomness is intended to smooth out the server load in the datacenter. The same technique is used for Subscriptions. See slide 11 of John’s presentation for an illustration of what happened before they started using random TTLs:

Once an Activity has been created, its MaxAge value counts down to zero. You can see the countdown happen as you poll the Activity or its feed. When MaxAge reaches zero, the Activity entry automatically disappears from the feed. The Activity actually hangs around for a few seconds while MaxAge is zero, and during this time the MaxAge element is missing from the entry.

You can update an Activity entry which will reset its MaxAge to a new random value between 10 and 15 minutes. Later we will see how Live Mesh user presence uses this technique.

You can also delete an Activity immediately without waiting for MaxAge to expire if you so choose.

Activity feeds are subscribable, so you can register to receive notifications as entries are added or removed. Entries that are removed due to MaxAge reaching zero also trigger a notification. I’m not sure if you will receive a notification if Activity state is lost due to unexpected server state loss (remember, Activities are in-memory only).

Activity feeds are not extensible, meaning you can’t use ElementExtensions and AttributeExtensions. As mentioned earlier they do support UserData which is hopefully sufficient for most scenarios.

Activities support Create and Update triggers but not Delete triggers. This is the same partial trigger support exhibited by Contacts.

Earlier I mentioned you can’t create a MemberLink that points to a different user’s Member entry. It turns out that you also can’t delete an Activity created by another user, even though you can delete your own Activities in the same feed. That’s right, in this case it is possible to have different permissions on each item within a single feed! So much for MeshObject being the most granular unit of permissioning… ;-)

The OPTIONS verb and $metadata do not work for Activities.

What is the programming model?

Today, Activities are only exposed through the resource model as MeshObjectActivityResource. There is only an ActivitiesLink property on MeshObject, not an Activities collection, and there is no LiveItem-based Activity object.

Actually, that’s only true for the .NET and Silverlight SDKs. The JavaScript SDK has a MeshObjectActivity class and MeshObject has an Activities property, but I haven’t used the JavaScript SDK.

For now, using Activities from .NET or Silverlight requires using AtomPubClient, Resource Scripts, or raw HTTP.

I’m guessing Activities aren’t yet exposed through the high-level programming model because of the additional complexity of MaxAge expiration, automatically handling MaxAge resets, and the possibility of wrapping Activities in a scenario-specific programming model for user presence.

Behind the scenes

The Mesh is composed of a variety of services spread across many servers in the data center. Some of these services use reliable state stores and others use soft state stores.

Reliable state includes the Accounts store (accounts.developer.mesh-ctp.com), user-data structured storage (storage.developer.mesh-ctp.com), and user-data blob storage (enclosure.developer.mesh-ctp.com).

Soft state includes storage for device presence, notification queues, subscriptions, activities, and dictionary state for the Live Desktop. You can see most of these on slide 21 from Abolade’s PDC session:

The design of the Activity Service is similar to the notification and subscription services. This means the Activity Service uses only in-memory tables for high performance, so state loss is a possibility. Therefore you shouldn’t use Activities for any state that you can’t afford to lose. On the bright side, the high performance in-memory design of Activities fits together quite nicely with notifications and subscriptions.

~~The client LOE doesn’t appear to implement any of the soft state stores at this time. This is probably why client LOE MeshObjects don’t have an Activities link.~~ Update: The client LOE implements Notifications and Subscriptions, although they behave slightly differently from the cloud LOE. However, the client LOE still doesn't implement Activities.

How does Live Mesh use Activities?

One way Live Mesh uses Activities is to track which users are currently in a Live Folder. Here’s an example of such an activity:

<entry>
  <id>...</id>
  <title type="text">
  </title>
  <published>2009-02-21T08:27:15Z</published>
  <updated>2009-02-21T08:27:15Z</updated>
  <link rel="LiveFX/Member" title="LiveFX/Member" href="Mesh/MeshObjects/.../Members/..." />
  <link rel="self" title="self" href="Mesh/MeshObjects/.../Activities/...-..." />
  <link rel="edit" title="edit" href="Mesh/MeshObjects/.../Activities/...-..." />
  <category term="Activity" label="Activity" scheme="http://user.windows.net/Resource" />
  <content type="application/xml">
    <ActivityContent>
      <UserDataBuffer>
      </UserDataBuffer>
      <ActivityTime>2009-02-21T08:27:13Z</ActivityTime>
      <MaxAge>539</MaxAge>
      <Type>UserActivity:Type[Presence];LiveFolderId[::{4ba12b8a-865f-4f52-99a4-12901be64d54}];</Type>
    </ActivityContent>
  </content>
</entry>

The important pieces of information are the link to the user who is in the folder, the ActivityTime when they opened the folder, and UserActivity encoded in the Type field. LiveFolderId is the entry id of the folder’s MeshObject. Type[Presence] implies that other user activity types are probably tracked using the same format.

If you open a folder on the Live Desktop and watch the corresponding Activity entry, you will see that the Live Desktop does a PUT to reset the MaxAge when 15 seconds are left. All of the other entry details such as ActivityTime remain unchanged.

If you close the folder, the Activity entry is deleted immediately without waiting for MaxAge to expire.

It turns out that anything that has a Mesh companion bar tracks user presence activity. This means that Mesh-Enabled Web Applications also get this same behavior for free. In this case the UserActivity’s LiveFolderId is the entry id of the app’s MeshObject.

The Activity’s Member link is used by the Mesh companion bar to display an orange box next to users who are currently in the folder or app.

In this case you can see that I am playing Collaborative Crossword all by myself. Apparently Ray doesn’t have time for me anymore. ;-)

There is nothing stopping you from creating an Activity that makes another user believe you are in a folder or app when you’re not, although you would have to periodically update the Activity to continue to appear present. So if you must be present to spoof your presence, how much are you really spoofing? Hmm… It may also be possible to delete Activities to hide the fact that you’re in a folder or app, although I assume the Activity will eventually be recreated.

You would think that user presence should be determined by the combination of the Member link and the specially formatted Type field’s Presence and LiveFolderId, but it turns out that all you need to do to appear present is create any Activity whatsoever in a MeshObject’s Activities feed.

Source Code

I have source code that demonstrates much of what I’ve described, but it is part of a larger project that also demonstrates notifications and subscriptions, so I will publish the source with my notifications and subscriptions blog post. If you were surprised by how much there is to know about activities, just wait until you see notifications and subscriptions!

Monday, February 16, 2009

Mesh4Linux

Last week my eyes lit up when I saw the following tweet from Miguel de Icaza, leader of the Mono project.

migueldeicaza

Miguel de Icaza

RT: @bradyanderson:is trying to authenticate my Linux Live Operating Environment using Windows Live Delegated authentication (Mesh4Linux)

4:21 PM Feb 11th from web

I immediately tried to find out more, but this is the only mention of “Mesh4Linux” I could find on the web. Brady is a distinguished engineer at Novell who works on the Mono project. I checked out his tweets, and he’s been talking about building a Live Mesh / Live Framework implementation for Linux ever since PDC.

bradyanderson

is attending the "What I learned building My first live mesh app" session at #pdc2008

11:35 AM Oct 28th, 2008 from twitterrific

is excited to start hacking on a Live Operating Environment for the Linux Desktop. Opens the door for some amazing applications

2:04 PM Oct 29th, 2008 from twitterrific

created his first MeshObject on Linux! Now DataFeed, DataEntry, Membership.... bows his head and goes away quietly

1:02 PM Dec 10th, 2008 from twitterrific

just implemented DataFeed creation and Mesh/MeshObject enumeration

4:06 PM Dec 11th, 2008 from twitterrific

"is hoping this is the first twitter message pushed from the sync framework (2)"

2:18 PM Jan 27th from web

is researching Differential Synchronization algorithms

10:04 AM Jan 28th from twitterrific

is catching up on the latest Live Mesh developments. I can't seem to find the January tools update :-(

8:11 AM Feb 11th from web

is trying to authenticate my Linux Live Operating Environment using Windows Live Delegated authentication - getting closer.

2:55 PM Feb 11th from web

the CTP versions of Live Mesh and Azure Services are so slow they're barely usable. *screams profanities*

3:10 PM Feb 11th from web

I’m guessing Brady is the guy Ori Amiga is referring to in the following quote from this PDC session (15:50 onwards):

The whole point I wanted to make, it's just plain good old HTTP, and if you can talk that, every device, programming language, stack is welcome to the party.

Some guy walked up to me after the stage yesterday, if you're here I'd love to keep chatting with you, said man I want to write Live Operating Environment for Linux. Can I do that? I was like, hell yeah, we'll hire you, come write it even in-house if you want to.

But really the idea is the Mesh will never be, I can't imagine we'll be successful in making people's lives better if we only stick to a Microsoft stack. That makes no sense. let's say, I'd admit, most of my devices at home and my receivers, my TVs, all the media stuff we have, the car, they don't run Windows, and that's ok, there's nothing wrong with that. It's great that my Windows devices are gonna behave really well in the Mesh, but I'd love for everything else that's sort of net connected to behave that way as well.

It is worth noting that Ori Amiga, a Principal Group Program Manager on the Live Mesh team, has built multiple carputers, one using Linux and another using Live Mesh.

I pinged Brady and Miguel for details on Mesh4Linux but haven’t heard back yet. There appears to be no connection to Mesh4x, another open source project with many similarities to Live Mesh, including FeedSync support.

Mesh4Linux is in the early stages at this point, but I’m already dreaming of the possibilities it will enable not just on Linux desktops but on the iPhone, Google Android phones, and embedded devices such as carputers. Go Brady!

2/28/09 Update:

Scott Hanselman used kyte to live stream Miguel’s Mono on iPhone session at the Alt.NET conference in Seattle. Via the online comments, I asked about Mesh4Linux (as well as Miguel’s ~~communist C#~~ Turkish flag t-shirt). Scott got a chance to ask Miguel about Mesh4Linux at the end (50:08) right after Miguel mentioned the benefits of sync for disconnected scenarios:

SH: What about Mesh4Linux?

MdI: There is no Mesh4Linux as far as I know. I know there is an engineer at Novell who wants Mesh4Linux.

SH: So it’s a dream, not a project.

MdI: Yes.

Thursday, January 15, 2009

Exploring Live Framework Triggers

The Live Framework has the ability to add triggers to resources. There is some documentation on triggers here and here (pgs. 14-15), but after reading it I was left with more questions than answers. So I took a deep dive exploring the nooks and crannies of triggers and this blog post is the result.

Overview of triggers

Triggers are scripts that can be executed before and after resources are created, updated, and deleted. The scripts are written using Resource Scripts (AKA MeshScripts), a tiny DSL for working with AtomPub and FeedSync in Live Mesh. Think of it as the T-SQL of Live Mesh. MeshScripts be used as sprocs as well as triggers, but I’ll be focusing on triggers in this post. See my previous posts for examples of sproc-style usage.

There are six triggers that can be attached to each resource:

PreCreateTrigger
PostCreateTrigger
PreUpdateTrigger
PostUpdateTrigger
PreDeleteTrigger
PostDeleteTrigger

The Create triggers run before and after each HTTP POST of a resource, the Update triggers run before and after each HTTP PUT of a resource, and the Delete triggers run before and after each HTTP DELETE of a resource. This enables you to pack quite a bit of custom business logic inside a single call to the server.

Trigger parameters

The resource that you’re creating, updating, or deleting is accessible from inside each trigger as a script parameter. For Create and Update triggers, the parameter is the actual resource sent from the client to the server in the POST or PUT request. For Delete triggers, the parameter is the server’s version of the resource being deleted since a resource isn’t sent from the client to the server for delete requests (the client simply specifies the URL of the resource to delete).

Three steps are necessary to use a script parameter:

Define the parameter
Bind to the parameter from one or more statements
Add the parameter to the script’s root statement

Here’s what this looks like using the syntax I created in my helper library. I’ve bolded the three steps.

var param =

    S.ResourceParameter<MeshObjectResource>();

mo.Resource.Triggers.PostCreateTrigger =

    S.Sequence(

        S.CreateResource(news)

            .Bind(s => s.CollectionUrl,

                param, p => p.NewsFeedLink)

            .Bind(s => s.Request.Title,

                param, p => p.Title)

    .AddParameters(param)

    .Compile();

The script snippet above adds a news entry to the news feed of the MeshObject you are creating (after it has been created, of course). You can see this code in the context of a working sample in the download at the end of this post. The sample also shows the equivalent “classic” syntax for the same trigger script.

Parameters are optional. If you don’t need to access the original resource from your trigger script then you can safely omit all three steps and simply create a trigger script without any parameters.

There is only one actual resource parameter per script. If you add more than one to the script, they are all treated as the same parameter. This makes sense since all resource parameters are named “$Resource” under the hood.

There is another kind of script parameter called the ConstantParameter that lets you specify a name for the parameter, thus letting you to have more than one of them per script, but I have been unable to get ConstantParameters to work so we’ll ignore them for now. I’m guessing they are used for looping statements which aren’t available in the current CTP.

Create/Update triggers

Create and Update triggers share many similarities, so I will cover them together.

Create and Update triggers are a one-shot deal. You must attach new Create or Update triggers each time you Add() or Update() the resource. Only the triggers appropriate for the HTTP verb are used. So for POST, the Create triggers are executed but the Update triggers are silently tossed, and for PUT, the Update triggers are executed and the Create triggers are tossed. By “tossed” I mean they aren’t executed, and the trigger is set to null in the response you get back.

In case it’s not clear, Create and Update triggers are not persisted on the server. They only exist for the duration of the HTTP request/response.

Unlike sproc-style MeshScripts, the trigger script’s Source property becomes null after the script has executed. At first I thought this was a bug, but then I realized that this was necessary so that if you then proceeded to call Update() on the item it wouldn’t re-run the same trigger again.

Just like sproc-style scripts, Create and Update triggers return the results of script execution in the Result property of the trigger script which you can inspect for details. Use them immediately or lose them because they won’t stick around for subsequent requests.

Original vs. updated values

The script parameter for Update triggers contains the updated resource being PUT by the client. If you need access to the original value that will be replaced by the PUT, you can access it in the PreUpdateTrigger using the following code, replacing MeshObjectResource with the appropriate resource type:

originalValue = S.ReadResource<MeshObjectResource>()

    .Bind(s => s.EntryUrl, param, p => p.SelfLink)

You can then bind to originalValue in subsequent statements. Note that “param” in the sample is the trigger script’s resource parameter.

Delete triggers

Only Delete triggers have a non-null Source property after a POST or a PUT. This is because only Delete triggers are persisted along with the resource on the server. Delete triggers can be added to a resource using either POST or PUT. Since Delete triggers are round-tripped (the Source doesn’t become null in the response), you don’t need to remember to re-add them on subsequent updates, unlike Update triggers. However, they are re-persisted each time you do an update. This means that you can remove a Delete trigger by setting it to null and calling Update().

Delete triggers are executed when you perform an HTTP DELETE on the URL of a resource that already has a Delete trigger added to it by a previous operation. Since no actual resource is posted or returned by the DELETE operation, there is no way to examine the script results or learn about errors.

How triggers deal with errors

They don’t. :-) To be more precise, errors are simply ignored. They don’t cancel the POST/PUT/DELETE operation. Similar to sproc-style scripts, no script Result is returned to the client if an error occurs. Unlike sproc-style scripts, the error is not returned to the client.

Transactions

While we’re on the subject of sproc-style scripts, it should be noted that sproc-style scripts are not transactional, and trigger-style scripts aren’t transactional either. Sure, they may execute within the scope of a single HTTP request/response “transaction” but there is no rollback on failure. Future releases are expected to include compensation/undo support.

Comparison to SQL triggers

Various databases support statement-level triggers and row-level triggers. Statement-level triggers are executed once for a batch of rows resulting from a single statement, while row-level triggers are executed once for each row. Statement-level triggers and row-level triggers attached to each table in the database.

While Live Framework triggers can inspect data “per row,” the triggers are actually attached to each “row,” not to each “table.” And as you already know, only Delete triggers actually remain attached to the “row.”

This means that it isn’t possible to put triggers on feeds (the equivalent of tables) that fire when entries are added, updated, or removed from the feed.

And as I explain in the next section, you can’t currently modify the incoming data before it is added or updated, unlike with SQL triggers.

Parameters are read-only (I think…)

At first I was under the impression that the incoming POST/PUT data exposed in the parameter to the PreCreate and PreUpdate triggers could be modified and the modified values would be passed along to the actual POST or PUT operation. I made this assumption based on the following quote from page 15 of this document:

"The output of the PreCreateTrigger can be data-bound to the actual POST request entity and the data is propagated dynamically in the request pipeline. Similarly, the response entity of the POST operation can be data bound to the PostCreateTrigger. A similar binding can be done using the PreUpdateTrigger to the request entity of the PUT operation and the response of the PUT operation and the PostUpdateTrigger. Note that such a model to flow the data dynamically between the PostDeleteTrigger script and the response entity is not applicable to the DELETE operation since we do not return response entity in the DELETE operation."

This sounds promising, but unfortunately I have been unable to find a way to update the script parameter.

The problem is that I can’t find a way to bind to the resource parameter. The resource parameter is exposed as a StatementParameter, not as a Statement. All of the Bind() methods that take a StatementParameter have the parameter on the right-hand-side. This means that you can assign from a resource parameter, but you can’t assign to it.

So I tried binding to “Parameters[0].Value” on the root statement of the script, but that didn’t work. Then I tried binding to the parameter using its secret “$Resource” name, but that didn’t work either.

Perhaps someone forgot to add the appropriate Bind() overload, or perhaps there’s another way to get at the parameter that I’m not thinking of. But until this is sorted out, parameters are read-only, at least on my box.

Once parameters can be modified, it will be interesting to see if you can completely replace the parameter (even set it to null?), or only update properties on it. It will also be interesting to see if you can delete the resource in the PostCreate trigger and return a completely different resource to the client. This could be a useful technique for creating singleton Mesh objects.

Triggers and the local LOE

Triggers don’t work at all if you’re connecting to the local client LOE. If you add triggers to a resource and then Add() or Update() it, the resource comes back with all its triggers set to null. This makes sense because the ability to execute scripts inside the client LOE is expected to be added in a later release.

But not even the Delete triggers are persisted and propagated up to the server. It turns out that Delete triggers also don’t propagate from the server down to the client. This made me nervous, wondering what will happen if I update a client-side resource that has a server-side Delete trigger. Will the absence of a client-side trigger clobber the server-side trigger? Thankfully the server properly merges the client-side update with the server-side resource’s Delete triggers. Must be some FeedSync magic.

Then I tried deleting a resource on the client that had server-side Delete triggers. The resource was successfully removed on the server, but the server-side triggers failed to execute! So synchronization bypasses triggers.

Speculation regarding client script execution

Once client script execution is added in a future release, how will this probably change the situation?

Create/Update triggers will run on the client if you connect via ConnectLocal().

Assuming synchronization of Delete triggers is fixed, you will be able to add Delete triggers on either the client or the server. If you delete the resource via Connect(), the trigger will run on the server. If you delete via ConnectLocal(), the trigger will run on the client.

But what if you want a trigger to always run on the server? Perhaps the trigger accesses external resources that you are unable to access while the client is offline. Or perhaps the trigger accesses resources that aren’t synced to the client such as Contacts, Profiles, or MeshObjects that aren’t mapped to that particular device. Perhaps there could be a client-side queue of pending triggers that are synchronized up to the server?

Creating triggers inside of scripts

Officially, you can’t add triggers to resources from inside of scripts. If you try, you will get the following error message: “Trigger can not be associated with a resource which is being modified using meshscripts.” Hey, look! They said MeshScripts! Personally, I think that’s a far better name than Live Framework Resource Scripts, as you can tell from the titles of my previous blog posts. :-)

Anyway, it is possible to add Delete triggers to resources from inside of a script. The trick is that you must copy them from a pre-existing resource, like so:

S.Sequence(

    originalCollection = S.ReadResourceCollection<MeshObjectResource>(ScriptHelper.MeshObjectsUrl)

    .WithQuery<MeshObjectResource, MeshObject>(

        q => q.Where(o => o.Resource.Title.StartsWith("Original"))),

    S.CreateResource(ScriptHelper.MeshObjectsUrl,

        new MeshObjectResource("I have delete triggers"))

    .Bind(s => s.Request.Triggers.PreDeleteTrigger,

        originalCollection, c => c.Response.Entries[0].Triggers.PreDeleteTrigger)

    .Bind(s => s.Request.Triggers.PostDeleteTrigger,

        originalCollection, c => c.Response.Entries[0].Triggers.PostDeleteTrigger)

).Compile().RunAtServer();

Technically, you can use this technique to add Create and Update triggers too. This can be verified by inspecting the the script result and seeing that the resource was returned with Create and Update triggers containing the Source script that you specified. However, these triggers don’t run. Why not?

Scripts bypass trigger execution

Just as synchronization bypasses trigger execution, scripts also bypass trigger execution. This is why our Create and Update triggers were added but didn’t run.

What happens if we use a script to delete a resource with Delete triggers on the server? The script deletes the resource without running its triggers.

Consequences of bypassing triggers

If you choose to use Delete triggers, you must be careful to do all of your Delete operations through direct HTTP DELETE calls to the server. Don’t use ConnectLocal(), and don’t use MeshScripts to delete resources.

This loophole could be useful in “oops” situations where you don’t want the triggers to run.

The bigger issue here is that you can’t reliably enforce server-side business logic. I spoke with Abolade about this after his PDC session and he mentioned that perhaps the content screening hook points (used to block enclosures containing viruses and other inappropriate content) could be exposed to users for running custom business logic that is capable of rejecting content. This could also be used to implement table-style triggers that are guaranteed to always run. At first I thought this would be cool to have, but now I’m starting to think that such a server-centric feature isn’t an appropriate fit with the design philosophy of Mesh. I may elaborate why in a future post.

Triggers on non-Mesh objects

Currently the root ServiceDocument at https://user-ctp.windows.net/ exposes Profiles and Contacts in addition to Mesh. I think these are known as Federated Storage Services, but I’m not sure. Contacts map out of the Mesh to your actual Hotmail contacts. Anyway, you access /Profiles and /Contacts using the same resource-based programming model as the rest of /Mesh. Anything that is a Resource can have triggers, so what happens if we add triggers to a Contact?

I added a new Contact containing Create and Delete triggers. The Create triggers worked, but the Delete triggers weren’t persisted and therefore didn’t run when I deleted the Contact.

I’m guessing there’s a service integration layer that translates back and forth between Mesh’s resource-based programming model and external services. The Contacts service probably doesn’t have a place to store arbitrary data such as triggers, so they get lost in translation. But the Create and Update triggers can still run because they don’t need to be persisted anywhere, so they can live entirely in the world of Mesh’s resource-oriented request/response pipeline that wraps the calls to the Contacts service. Hmm, maybe there are benefits to not having to persist triggers… But it would also be nice to have a consistent programming model for Create, Update, and Delete.

Summary of limitations

There are a number of limitations scattered throughout this blog post, so here’s a more concise list:

Create and Update triggers aren’t persisted
No row-level/statement-level triggers on feeds
Trigger parameters are read-only (I think)
Can’t add triggers from scripts
Synchronization bypasses triggers
Scripts bypass triggers
Delete triggers don’t work on non-Mesh objects
Local LOE doesn’t support triggers
Triggers can’t reliably enforce business logic

Download

You can download the sample code here. The samples use my Fluent MeshScripts library with a few minor updates.

While writing this I discovered and fixed a bug in my library’s expression-to-string code when it encounters expressions such as “c => c.Response.Entries[0].Triggers.PreDeleteTrigger”. I also created an AddParameters overload that takes an SResourceParameter<TResource>.

The code includes examples of using all the trigger types, creating a resource with triggers from a script, bypassing delete triggers with a script, triggers on Contacts, and the “can’t add triggers from meshscripts” error.

Wrap-up

Besides providing some detailed documentation and code samples for Live Framework triggers, hopefully this post has helped you think about scenarios where you might want to use them, as well as provided some pointers on when to avoid them or use them with care. I also hope this can be used to improve the usability and functionality of this powerful feature of the Live Framework.

Update: it appears that triggers don't work on DataFeeds and DataEntries. See Raviraj's post in this forum thread for details.