Blog

Adventures in Cross-Site Scripting

vectors are fun

Dont cross the streams!

One big component of making our search relevancy tool Quepid simple to use is the ability to just paste in a Solr search URL and go. Thankfully, Solr executes searches entirely with HTTP GET requests. Unfortunately, pasting in an arbitrary URL and executing HTTP requests to that arbitrary third-party violates the browser’s same-origin policy for performing HTTP requests. The browser wants to keep you sandboxed to the domain you started at (ie quepid.com).

The most seamless way to get around this is to talk to Solr through a method known as JSONP. JSONP is certainly somewhat of a hack, but its a fairly well accepted hack. JSONP leverages the fact that your browser can load resources from any domain. A script tag is a resource, so if we dynamically insert a script tag like so:

jsonpReq.src = 'http://mysolrsearch.com' +               '/solr/collection1/select?' +               'wt=json&json.wrf=loadResults&q=searchquery';jsonpReq.type = "text/javascript";document.body.appendChild(jsonpReq);

(see a full example in this jsfiddle)

Solr takes a json.wrf argument (here “loadResults”) which identifies a JavaScript function that the search results should be called on. Solr returns executable javascript that looks like:

loadResults(/*search results as JavaScript object*/);

So when this dynamic script tag is loaded, a global callback “loadResults” will be executed with our search results as an argument.

We’ve used JSONP quite a bit in pure Javascript search applications. It lets us get rid of a middle layer of server-side glue code and focus on a rich, beautiful, client-side application. We love it!

The problem with JSONP

Unfortunately, JSONP has a pretty glaring hole. If your request fails to load, you get the same information you’d get if a script tag failed to load. You get extremely basic error information. You unfortunately don’t get an HTTP status code and you don’t get the data sent back with that error. Unfortunately for us, Solr reports errors via an HTTP error with the included HTTP data describing the exact error.

For example, screwing up the echoParams parameter in this search request returns HTTP 400 with the following error message

Solr Query:http://mysolrsearch.com/solr/collection1/select?echoParams=foo&wt=xmlSolr Response:          Invalid value 'foo' for echoParams parameter, use 'EXPLICIT' or 'ALL'        400  

While this turns out to be not a big deal for a user-facing application, it stinks for a search developer workbench like Quepid. An advanced developer using Quepid needs more information than “an error has occurred” with their search. Perhaps in the search developers experimentation with the Solr relevancy parameters they mistyped a parameter, and Solr simply couldn’t parse what was sent. As a search developer, getting these errors is a pretty big deal. Missing them is tantamount to a code compiler that gives you no error messages. Not very helpful :).

So knowing that JSONP stinks in this regard, how can we extract the Solr errors even if they come back with an HTTP error?

CORS

CORS, Cross-Origin Resource Sharing is a more standardized way of doing cross-site requests within the HTTP protocol. Perhaps this would be a better way to do cross-domain requests and extract both data and errors?

I briefly looked into this for Quepid, but it quickly became apparent that it wasn’t nearly as seamless as JSONP. CORS requires that the server white-list domains that it will accept cross-domain requests from. Our users would need to white-list Quepid’s domain in their Solr web server’s config. I can just imagine our users calls to IT: “Hey there’s this Quepid thing we’d like to try and it requires this other CORS thing. Could you please reconfigure Solr’s web server and add Quepid’s domain to this list?” How long would that take to get approved? Some users may even be using hosted Solr solution where changing the web config is impossible. Even scarier, I’ve seen in a number of places that doing CORS in Solr may require a bit of Java code to be inserted, making it even less seamless.

A big business goal behind Quepid is for it to be easy to try. Therefore, I don’t want to put an undue burden on users. The least friction to trying Quepid the better. Still, perhaps I should keep this in mind. Perhaps initially trying out the product would simply involve using JSONP and more advanced customers could white-list the domain to allow CORS to be used.

Iframes Can Do Cross-Site Requests Too!

One realization I had when playing around with cross-site scripting was that I can also insert iframes into a page using the same method (yes I know Im crazy). Simply by doing

I’ll get search results inserted into the page. Furthermore, iframes will display returned data even if there’s an HTTP error. So could this be a way to expose Solr errors to devs?

Initially I had hoped that I could simply dynamically add an iframe to the page and directly access the contents of the document body of the iframe from Quepid’s Javascript code, parse out the error, and show something pretty to the user. Hopefully I could grab the iframe’s contents by simply doing something like this code:

iFrameBody = iFrame.contentWindow.document.getElementsByTagName(‘body’);

Browser implementers are one step ahead of me here. This will work if my iframe is of the same domain as the requesting JavaScript. Unfortunately for my case theyve made sure that my domain’s JavaScript can’t access another domain’s content, so the browser complains with a Security Error that I’m violating the cross-domain security policy by trying to pull out my Solr data.

Talking to Your Iframes

The next realization I had was that I can talk to JavaScript code within an iframe using HTML’s new cross-document messaging. This new feature allows two window objects (the main window and an iframes for example) to post messages to each other. So if I could construct a Solr response that would respond to a message, I could post back the error data and do something meaningful with it. IE if I could do something like

// ****************************************// main Quepid codevar receiver = function(e) {   console.log(e.data.error_text);}window.addEventListener('message', receiver, false);// ****************************************// iframe with Solr errorvar receiver = function(e) {    if (e.origin == /*quepid domain*/) {      e.source.postMessage(solrErrorInfo, e.origin);    }};window.addEventListener('message', receiver, false);

Great! Now the next problem is how can I make Solr generate a response that has the error along with the javascript to go along with it?

At this point I got rather stuck. I tried various kinds of hackery. One technique that the browser’s dutifully didnt let me do was any kind of script injection from my Quepid code into the Solr response. Neither hacking the json.wrf argument I referred to earlier or trying out equivalent functionality in another Solr response format (the velocity template writer) got me what I wanted. When I was finally able to insert some JavaScript into the page, the browser detected the insertion as too similar to what I put in the URL and refused to execute the script (Good for you Chrome!).

With these kinds of techniques, I clearly was going to hit walls. Using iframes to do this is already weird. Adding the extra level of script injection was even yuckier. Even if I got it to work, it certainly had little chance of continuing to work as browser security advanced.

Enter XSLT (yes I said XSLT)

However, there was one legal, though slightly inconvenient way I could get Solr to return the cross-document communication JavaScript. I can use Solr’s XSLT response writer. This feature of Solr lets you take Solr’s XML output and transform it to say HTML – with JavaScript and all the works. My XSLT can gin up some JavaScript code that implements the cross-document messaging safely. A preliminary version of this XSLT looks something like:

      

If I save this XSLT as “errors_to_quepid.xsl”, I can refer to it in an iframe:

Quepid’s Javascript can communicate with this iframe as follows:

var solrErrorWindow = document.getElementById("solr_errors").contentWindow;solrErrorWindow.postMessage("", "http://mysolrsearch.com");var receiver = function(e) {   console.log(e.data.error_text);}window.addEventListener('message', receiver, false);

And guess what, it works! Check out this jsfiddle for a demo.

The downside to this is that it requires users to add an XSLT file to the right place in their Solr config directory. A one-time inconvenience that users can opt-in to for better error reporting. An inconvenience Im imagining is likely easier to deploy then changes to the web server’s configuration.

The upside to this approach is that it plays by the rules. I’m not doing hacky things trying to insert javascript into the solr response. I’m not circumventing any browser protections. And it works rather well.

A Problem Thatll Drive You Insane

vectors are fun

Brendan Eichs favorite graphic to describe the Web. Evolution makes something thats often not very pretty, but it works!

Reflecting on this problem leaves me wondering. While I’m aware how easy it is to inject cross-site scripts with malicious intent, I’m left wishing the web did a better job here. This is a rough problem to have to solve from an implementation point-of-view. It feels like if a service returns just data we ought to feel a little safer about how an application outside the domain consumes the content. For example, the browser could make decisions based on the mimetype coming back from the other end and relax the restrictions a bit. My naive understanding is that its cross-domain text/html we fear, not XML or JSON. Sure a malicious user that can examine my code can inject just the right XML or JSON into a response to exploit my lack of sanity checking. But its not in the same ballpark as doing cross-site requests where HTML and executable JavaScript is involved.

Complaining aside, it was kind of a fun problem to work on. Getting cross-domain requests right certainly blurs the line between hacking and well “hacking”. I’d be curious if you have any thoughts on solving this problem or a better solution? If so let me know. And of course, shameless plug. Check out Quepid! Its a neat tool if you care about managing your search quality.