OK, so browsers are not supposed to send named anchors (or anything after them) to the server. However, I noticed today that SWFAddress links like this:
http://example.com/#/portfolio/myClient/myProject
...when sent as plain text via email to an iPhone, get URL encoded when displayed in the mail client, so Safari receives the URL from Mail like this:
http://example.com/%23/portfolio/myClient/myProject
...and happily sends the whole URI to the server, which looks for something to do with a path containing %23, and comes up with a 404.
This is a clear violation of URI RFC 3986, which states:
2.2. Reserved Characters
URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. Percent-encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications. Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI
Furthermore:
2.4. When to Encode or Decode
Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts. This is when an implementation determines which of the reserved characters are to be used as subcomponent delimiters and which can be safely used as data. Once produced, a URI is always in its percent-encoded form.
In other words, keep your dirty mitts off of my URI's!
I spent some time poking around in Apple Mail (full OS X version, not on the iPhone), and noticed that the Edit/Link/Add dialog does not allow named anchors in URLs! As soon as you enter a # in the dialog, the "OK" button is disabled.
So clearly Apple is aware of the problem, but have yet to give us a good solution. This got me even more curious - I wanted to see how the iWork apps handle anchors. Turns out that Pages, Numbers, and Keynote '08 all encode anchors in URLs added via the hyperlink dialog! This means that we have a bigger problem than just the iPhone edge case, any links in documents produced using the iWork suite can potentially be malformed. For sites that make heavy use of SWFAddress, this is a huge problem.
I fired up MS Word 2004 for Mac, and was pleasantly surprised, it has a fairly robust interface for working with links that contain named anchors, which results in properly formed URLs.
So what can we do about this? On the server side, we can use mod_rewrite to trap incoming URIs that contain %23 (or an actual # if it comes in, even though it shouldn't), and basically redirect it right back to the client unencoded, so the browser can call the proper URI, and handle the named anchor appropriately.
RewriteCond %{REQUEST_URI} ^(.*)?#(.*) RewriteRule .+ %1#%2 [NE,R=301,L]
The problem with that shotgun approach is that it will trigger the redirect for URIs that rightfully contain URL encoded hash marks. Consider this; your app displays news posts, and pulls the post data from the server using nice semantic URLs, like http://example.com/news/My+Post+Title. The first time you have a post with a title like "We are #1", you will be unable to access the data, as the server will receive a request for http://example.com/news/Were+are+%231, and send a redirect back to the browser to http://example.com/news/Were+are+#1, at which point your browser will fire off a new request for http://example.com/news/Were+are+, which will result in a 404. This will not do.
For browser based client apps that implement SWFAddress, we need a more surgical approach to detecting and redirecting URLs with bogus encoded anchor delimiters.
Here are some mod_rewrite rules for making this happen (mod_rewrite docs can be found here):
RewriteRule ^#(.*) /#$1 [NE,R=301,L]
If your client app loads at the site root, and is only accessible from /, here is a simple solution. This traps and redirects URLs like http://mydomain.com/%23/some/stuff to http://mydomain.com/#/some/stuff
RewriteRule ^path/to/my/app/loadpage.html#(.*)
↵ /path/to/my/app/loadpage.html#$1 [NE,R=301,L]
If your app is further down in the site structure, you can include the path to it, perhaps including an HTML page that loads it, if appropriate. http://mydomain.com/path/to/my/app/loadpage.html/%23/some/stuff to http://mydomain.com/path/to/my/app/loadpage.html/#/some/stuff
RewriteRule ^path/to/my/app/(index\.html)?#(.*)
↵ /path/to/my/app/index.html#$2 [NE,R=301,L]
If you're using an index page to load your client app, it may be accessed either by the path to the directory, or the full path including the file name. Putting an optional check for the file name cracks that nut.
Needless to say, this does nothing to help standard named anchors in HTML pages, it's just a band-aid for client apps that use SWFAddress. Apple really needs to address this issue, and I think it's safe to assume that there are other apps and services out there with the same problem.
UPDATE:
One of my partners just informed me that Microsoft's Windows Mail that currently ships with Vista suffers from these same URI encoding issues!
"I have just discovered that the MS Mail client on Vista has the same problem as Apple Mail, when it comes to handling urls that include a "hash" component. The hash-sign gets url encoded before it is sent out to the browser, and so the browser thinks it's part of the url and sends it on to the server, rather than treating it as a hash.
I sent someone a link to the *** stuff I did, and it got busted by their mail -- When I finally figured out what was happening, I had to pause briefly and confirm that they weren't using a Mac.
It's so simple it kills me... and MS and Apple are both supposed have the best minds in the world working on this stuff !?
If you ask me, this is pretty good proof that Vista is heavily based on on OSX (conceptually, that is). I mean... they've even copied the bugs!"