Andy Chou

New Blog Site

Posted by Andy Chou Sep 5, 2013

We've decided to move our blogging platform onto a new site that we've been working on for a while: http://security.coverity.com.  Thanks for reading, and enjoy the new site!

 

Andy

Advisory

 

Overview

(Note: this write-up uses the Maven sample application provided by Struts2. Refer to the Appendix section at the bottom to install the application. References to the blank-archetype application refer to this sample application.)

From the Struts 2 website

Apache Struts 2 is an elegant, extensible framework for creating enterprise-ready Java web applications. The framework is designed to streamline the full development cycle, from building, to deploying, to maintaining applications over time.

Struts 2 heavily utilizes OGNL as a templating / expression language. OGNL, similar to other expression languages, is vulnerable to a class of issues informally termed "double evaluation". That is, the value of an OGNL expression is mistakenly evaluated again as an OGNL expression. For a background on previous OGNL double evaluation issues, I recommend @meder's "Milking a horse or executing remote code in modern Java frameworks" presentation. (The exploit used below is based on @meder's exploit, just condensed.) For other examples of double evaluation in different expression languages, check out Aspect Security's "Remote Code with Expression Language".

 

Struts 2 calls its controllers Actions. Actions are mapped to URLs and views within an XML configuration file or via Java annotations. For a good background on Struts 2 and Actions, refer to their "Getting Started" page.

 

Struts 2 allows a developer to configure wildcard mappings in its XML configuration files. The blank-archetype application has the following wildcard example in its XML configuration:

 

<action name="*" class="tutorial2.example.ExampleSupport">
  <result>/example/{1}.jsp</result>
</action>

 

This allows one to specify an arbitrary Action name. If the name doesn't match any of the other more specific mappings in the XML configuration (or possibly others annotated in the Java code), then this acts as a catch-all. The Action name provided is substituted as a component of the file name. Struts then dispatches to the selfsame JSP defined in the result section.

 

Vulnerability and Exploit

There exists a vulnerability in this Action name to replacement mapping. If the Action name provided is in the form of ${STUFF_HERE} or %{STUFF_HERE}, and the contents of the expression are OGNL, then Struts2 unsafely double evaluates the contents.

To view this exploit, start up the blank-archetype application using jetty:run. The following URL exploits a vulnerability within the replacement support in Struts 2. If the exploit is successful, something similar to the following should be displayed:

HTTP ERROR 404

Problem accessing /struts2-blank/example/0.jsp. Reason:

Not Found

Note the "0.jsp" part in the 404 page. When successfully executed, Process.waitFor returns a value of "0". This is then used as the JSP file name, "0.jsp". This implies the touch aaa executed successfully. A patched version doesn't have a return value since the process never executed.

 

Root Cause Analysis

Using JavaSnoop, instrumenting the blank-archetype application application, and setting canaries for strings to match against the payload URL showed numerous potential traces. Scoping the trace to org.apache.struts2 packages shows an interesting call to StrutsResultSupport.conditionalParse:

 

/**
* Parses the parameter for OGNL expressions against the valuestack
*
* @param param The parameter value
* @param invocation The action invocation instance
* @return The resulting string
*/
protected String conditionalParse(String param, ActionInvocation invocation) {
    if (parse && param != null && invocation != null) {
        return TextParseUtil.translateVariables(param, invocation.getStack(),
                new TextParseUtil.ParsedValueEvaluator() {
                    public Object evaluate(String parsedValue) {
                        if (encode) {
                            if (parsedValue != null) {
                                try {
                                    // use UTF-8 as this is the recommended encoding by W3C to
                                    // avoid incompatibilities.
                                    return URLEncoder.encode(parsedValue, "UTF-8");
                                }
                                catch(UnsupportedEncodingException e) {
                                    if (LOG.isWarnEnabled()) {
                                        LOG.warn("error while trying to encode ["+parsedValue+"]", e);
                                    }
                                }
                            }
                        }
                        return parsedValue;
                    }
        });
    } else {
        return param;
    }
}

 

The method above is called from the StrutsResultSupport.execute(ActionInvocation). It then calls TextParseUtil.translateVariables:

 

/**
 * Function similarly as {@link #translateVariables(char, String, ValueStack)}
 * except for the introduction of an additional <code>evaluator</code> that allows
 * the parsed value to be evaluated by the <code>evaluator</code>. The <code>evaluator</code>
 * could be null, if it is it will just be skipped as if it is just calling
 * {@link #translateVariables(char, String, ValueStack)}.
 *
 * <p/>
 *
 * A typical use-case would be when we need to URL Encode the parsed value. To do so
 * we could just supply a URLEncodingEvaluator for example.
 *
 * @param expression
 * @param stack
 * @param evaluator The parsed Value evaluator (could be null).
 * @return the parsed (and possibly evaluated) variable String.
 */
public static String translateVariables(String expression, ValueStack stack, ParsedValueEvaluator evaluator) {
  return translateVariables(new char[]{'$', '%'}, expression, stack, String.class, evaluator).toString();
}

 

This method evaluates expressions surrounded with ${} or %{}. The subsequent call to translateVariables method evaluates the expression via the parser.evaluate call:

 

/**
 * Converted object from variable translation.
 *
 * @param open
 * @param expression
 * @param stack
 * @param asType
 * @param evaluator
 * @return Converted object from variable translation.
 */
public static Object translateVariables(char[] openChars, String expression, final ValueStack stack, final Class asType, final ParsedValueEvaluator evaluator, int maxLoopCount) {
    ParsedValueEvaluator ognlEval = new ParsedValueEvaluator() {
        public Object evaluate(String parsedValue) {
            Object o = stack.findValue(parsedValue, asType);
            if (evaluator != null && o != null) {
                o = evaluator.evaluate(o.toString());
            }
            return o;
        }
    };
    TextParser parser = ((Container)stack.getContext().get(ActionContext.CONTAINER)).getInstance(TextParser.class);
    XWorkConverter conv = ((Container)stack.getContext().get(ActionContext.CONTAINER)).getInstance(XWorkConverter.class);
    Object result = parser.evaluate(openChars, expression, ognlEval, maxLoopCount);
    return conv.convertValue(stack.getContext(), result, asType);
}

 

That passes the expression to an instance of OgnlTextParser.evaluate. And then it's game over.

 

Other Vectors

Suspicious calls to TextParseUtil.translateVariables were also examined for exploitability.

 

org.apache.struts2.dispatcher.HttpHeaderResult.execute (Tested)

HttpHeaderResult.execute has the following call to TextParseUtil.translateVariables:

 

if (headers != null) {
    for (Map.Entry<String, String> entry : headers.entrySet()) {
        String value = entry.getValue();
        String finalValue = parse ? TextParseUtil.translateVariables(value, stack) : value;
        response.addHeader(entry.getKey(), finalValue);
    }
}

 

The blank-archetype application's HelloWorld XML example was modified below to test out the call. This is probably a very unlikely scenario and also can be mitigated by the <param name="parse">false</param> setting. (By default, this value is true.) In this case, the ${message} value is user-controllable within the HelloWorld class. This tainted value is then supplied as a header. While it's an obvious header injection, it's also a RCE vector.

 

<action name="HelloWorld" class="com.coverity.internal.examples.example.HelloWorld">
  <result name="success">/example/HelloWorld.jsp</result>
  <result name="foobar" type="httpheader">
    <param name="headers.foobar">${message}</param>
  </result>
</action>

 

org.apache.struts2.views.util.DefaultUrlHelper.* (Tested)

Pretty much every method in the DefaultUrlHelper class allows for RCE if one of the parameters is tainted. This is because of the DefaultUrlHelper.translateVariable method is called by most methods in the class. This class is also heavily utilized throughout Struts2 as the default UrlHelper class via struts-default.xml.

 

Here is an instance of the defect, mocked up from the blank-archetype application HelloWorld.jsp:

 

<s:url id="url" action="HelloWorld">
    <s:param name="request_locale"><s:property value="message"/></s:param>
</s:url>

 

Assume the s:property 'message' is tainted via ?message=${OGNL_HERE}. Since the s:url / URL component uses the DefaultUrlHandler.urlRenderer (via ServletUrlRenderer), the parameter is double evaluated as OGNL.

 

org.apache.struts2.util.URLBean.getURL (Untested)

URLBean seems to mainly be used in Velocity via a macro. If URLBean is called w/o a setPage() method and either the addParameter() method contains tainted data or no addParameter() method calls occur, then URLBean seems susceptible to RCE via the DefaultUrlHelper issue above.

 

Non-Vectors

Some potential vectors that seemed to double evaluate OGNL were tested but found not to be exploitable using this technique.

When the action name is used as a replacement value within the method attribute of the Action, the replacement value is not double evaluated. Rather, the unevaluated value is passed as a method name via reflection. The blank-archetype application has this example, which is not exploitable:

 

<action name="Login_*" method="{1}" class="tutorial.example.Login">
  <result name="input">/example/Login.jsp</result>
  <result type="redirectAction">Menu</result>
</action>

 

Another non-vector tested in the blank-archetype application is to enable Dynamic Method Invocation. Then modify HelloWorld Action mapping in the blank-archetype application as follows:

 

<action name="HelloWorld" class="tutorial.example.HelloWorld">
  <result>/example/${message}.jsp</result>
</action>

 

Finally, call the getMessage function on the HelloWorld Action (HelloWorld!getMessage?message=${PAYLOAD}). Test stack traces didn't show the OGNL expression being evaluated twice.

 

Untested

Struts2 annotations may be susceptible but have not been tested.

 

Testing

Outside of testing for vulnerable versions of Struts 2, testers can use a blind-ish dynamic technique:

  • Identify Actions (usually via a .action suffix) and fingerprint responses to the Actions. For this example URL http://www.example.com/app/Bar.action, the Action name is Bar.
  • For each Action, substitute the Action name  with ACTION_NAME in the following expression: $%7B%23foo='ACTION_NAME',%23foo%7D. For example: $%7B%23foo='Bar',%23foo%7D.
  • Replace the Action name in the URL with the substituted expression. For example: http://www.example.com/app/$%7B%23foo='Bar',%23foo%7D.action.
  • If the Action is susceptible to this double evaluation vector, the application ought to return the same page as before. If it's not vulnerable, a 404 or other page will probably be returned.

To test out the Welcome.action blank-archetype above via jetty:run, use this URL.

 

Remedy

Struts developers recommend upgrading to 2.3.14.2. Refer to S2-013 and S2-014 for details. The Struts developers mitigate the effects of double evaluation. While double evaluation still occurs within the sample application, remote code execution is not possible using @meder's vector.

 

Maven Appendix

First, get Maven. Then create an application based on the blank-archetype.

 

mvn archetype:generate -B -DgroupId=tutorial -DartifactId=tutorial -DarchetypeGroupId=org.apache.struts -DarchetypeArtifactId=struts2-archetype-blank
cd tutorial  # ensure the struts2 entry in the pom.xml points to 2.3.14
mvn package jetty:run

 

To test out the application, try accessing the Welcome Action by navigating to following URL.

As part of our work at Coverity, the Security Research Laboratory (SRL) performs security reviews of our own software. Last week I had a look at some of the new code in our reporting webapp. It turned out that we were using some technologies and vulnerable code patterns I had not seen before, so I thought I would share what I saw with the world in case anyone sees the same pattern I did.

 

As I began to review this code for the first time, I identified a series of Closure Templates that were being used to generate HTML output, with the usual {$modelKey.property} syntax being used to insert model values into the output. After a little bit of looking around, I saw that they hadn't changed Closure's default escaping away from HTML escaping, and they were also not putting these properties into any contexts where HTML escaping was incorrect.

However, I also noticed that there was a lot of syntax similar to {literal}{{modelVar.property}}{/literal} and many HTML attributes such as ng-repeat that I hadn't seen before.

 

Somewhat confused by this, I dug a bit further and discovered that rather than being directly displayed, the output of these views was being passed to a JavaScript framework called Angular.

 

If you are like me and have never heard of Angular before, it seems to be a complete MVC framework for JavaScript and it turned out that our HTML output was actually being used as the views for Angular.

 

Like most server-side MVC, most of the view in Angular is plain old HTML. Angular augments this by a series of directives that are specified as attributes (e.g., the ng-repeat attribute I had seen) and the ability to bind data into attributes or right into HTML as Angular expressions. So the {literal}{{modelVar.property}}{/literal} I was seeing in our Closure Templates was being output as {{modelVar.property}} and being consumed by Angular, which was binding the values from its own model in there.

 

So, in my eternal quest for more XSS, I went to investigate what would happen if a user was able to inject {{}} expressions into an Angular template. I was convinced that this would let me XSS our application somehow.

 

After digging a bit, it turns out that Angular doesn't simply eval() these expressions, but rather has its own expression tokenizer/evaluator written in JavaScript.

 

The things that this evaluator supports are:

  • Model access: {{modelVar}}
  • Field access: {{modelVar.field}}
  • Function calls: {{modelVar.function(1)}}
  • Built-in filter functions: {{ modelVar | json }}
  • Arithmetic and logical operators: {{1+2}}, {{true&&false}}, etc.
  • Object and Array constructors: {{ {name: 'test'} }}

 

The first thing I tried was to find a filter function that would allow me to somehow execute JavaScript, but after reviewing the built-in filter functions as well as the ones our application injected, I determined this wasn't going to work.

 

One of the filters our application created was called autoescape, which did some HTML escaping and was being applied to some data like {{tts.get('id')|autoescape}} which made me wonder if I could have my XSS by creating an expression that would return <script>alert(1)</script> without using any characters that would be encoded on the server side. Using this JsFiddle to test what would happen if I could I could get an expression that returned an XSS string, I realized that this too was a dead end for my XSS dreams.

 

So I decided to review the expression parser to see what was going on in there, and if anything looked sketchy. I discovered that the underlying objects that Angular was using in its expressions were native JavaScript objects (rather than providing their own custom object model, which some view technologies do); so field access was implemented by simply accessing the native fields of an object. This was interesting because besides properties you assign to an object, native objects will have several other fields attached.

 

After staring at the MDN for a bit, I discovered the same thing that many before me had discovered:

{}.toString.constructor('alert(1)') creates a new function from a string by invoking the Function constructor.

 

So to wrap this all up, if you can inject {{}} into an angular template, the following will execute JavaScript when data values are bound into the template:

{{{}.toString.constructor('alert(1)')()}} 

And now our developers have one more XSS to fix before the code ships to our customers. If you'd like to play around with it yourself, I've set-up another jsFiddle with the payload already working.

Update March 1st: Since the PDF on RSA website seems broken, I have attached a version to this blog post.

 

---

I'll be at RSA this week. My session is Friday morning (10:20am, Room 132) and is called:

Why haven't we stamped out XSS and SQL yet?

 

RSA talk content

Since all the slides are apparently available for everyone on the RSA website, I can give some more insight about what I will be talking about. We ran an experiment at Coverity in which we analyzed many Java web applications and looked for where developers add dynamic data. The goal was to try to understand what contexts (both HTML contexts and SQL contexts) are frequently used.

 

The tone of the talk is fairly straightforward: security pros. have been giving advice to developers for a long time, yet we still have these issues on a frequent basis, so we map common advice with what we see from the data.

 

What you can expect from this talk:

  • Some information about observed HTML contexts: that's about 26 different contexts stacks, 45% of them had 2 elements in the stack (e.g., HTML attribute -> CSS code), and the longest ones had 3 elements.
  • A list of SQL contexts and good notes about what developers usually do.
  • Advice for security pros. on how to communicate with developers (things that led to the creation of our Fixing XSS: A practical guide for developers document).

 

Anyhow, this blog post is not only to announce this talk, but also to give some insight on how we extracted the data from these applications.

 

Analysis technique

We created and modified different checkers from Coverity Security Advisor in order to extract all injection sites that are related to dynamic data regardless of its taintedness. For each injection site, we computed the context in which it belonged to the sub language (one of HTML, JavaScript, CSS, SQL, HQL, and JQPL). This represents our working dataset.

 

Here's an example of an injection site (using JSP and EL):

<script type="text/javascript">
var content = '${dynamic_data}';
// context ::= {HTML SCRIPT TAG -> JS STRING}
</script>






 

We tracked the construction of this snippet of the HTML page and recorded the injection site such as ${dynamic_data} and its associated context (JS STRING inside HTML SCRIPT TAG). Since we do not care about the taintedness of dynamic_data, we didn't need to track all paths that could lead to a defect (XSS here), and that's where what we did is very different from our XSS checker.

Note that we still needed to properly track parts of the HTML page that's being constructed to properly compute the context. This is however part of our context-aware global data flow analysis...

 

For SQL related queries, we essentially needed to do the same thing, but we also needed to track the parameters being inserted in a query using a parameterized notation: remember, we needed to find all dynamic data that could eventually go into a query.

That's why the following code:

String sql = "select foo, bar from table where 1=1";
if (cond1)
  sql += " and user='" + user_name + "'"; // context ::= {SQL_STRING}
if (cond2)
  sql += " and password=?"; // context ::= {SQL_DATA_VALUE}






has 2 interesting injection sites for the experiment, and we didn't need to understand the full abstract string (an eventual set of 4 possible strings) from this piece of code.

 

Note that if there is this fairly common construct:

String sql1 = "select foo, bar from table where ";
String and_beg = " and (";
String and_end = " ) ";
sql1 += and_beg + "user = '" + user_name + "'" + and_end;
sql1 += sql2; // `sql2` is another part of the query coming
              // from a different procedure or so

we will still properly track the contexts even if all parts (sql1, and_beg, etc.) are inter-procedurally created.

 

Limitations

I will quickly explain this during the talk, but essentially tracking HTML contexts on a global data flow analysis is not a trivial thing. Moreover, considering the impact of some JavaScript code on the resulting web page (and therefore where the HTML contexts could potentially be transformed at runtime) is an even more complex problem. We did not analyze JavaScript.

I'm happy to announce a new document we just made available: Fixing XSS: a practical guide for developers. If you're currently at the RSA conference, you should come to Coverity's booth (#1759) and either get a hardcopy or a USB stick with this document on it.

 

As the title suggests, this document is a guide for developers on how to handle dynamic data in various locations and common constructs in HTML. We leveraged the data we got from our research for our talk at RSA to come up with some of the most common HTML contexts and nested contexts, and improved the Coverity Security Library to have a solution for all of these cases.

 

Looking at the documentation available for XSS, several things strike us:

  1. It often talks about how to exploit an XSS and not how to fix this issue.
  2. The HTML contexts information is always lacking precision and often makes the documentation complex to read (we're also guilty of this in some previous blog posts).
  3. The fixes are limited or too restrictive (i.e., not applicable for developers).

That's mostly why we decided to create our own document for developers.

 

The first release of this document contains 13 common HTML constructs, and we plan on adding more to it. We also describe what HTML contexts are and why it's important to think about them when outputting dynamic data in a web page. However, we also want to create collateral that gives more complete information about HTML contexts and why it matters for XSS.

 

In this document, you can expect to learn what happens when you want to add dynamic data in a HTML context such as HTML snippet inside a JavaScript string:

<div id="forMyContent"></div>
<script>
  var foo = "<h1>${cov:jsStringEscape(cov:htmlEscape(content))}</h1>";
  $("#forMyContent")
  .html(foo);
</script>

and why you need to first use an HTML escaper, then a JavaScript string escaper.

 

You'll also see the usage of a newly introduced function asUrl from CSL that helps writing fully dynamic URLs inside an HTML attribute such as:

<a href="${cov:htmlEscape(cov:asURL(content))}">
  Click me
</a>

 

The current document uses the Java Expression Language (EL) notation to show the dynamic data (here ${content}), but all functions are also available directly from Java when using CSL.

 

Whether you develop web applications, have developers, or do security review, you should read and share this document. We're also happy to receive any feedback to keep improving this document.

We've released a new version of our Coverity Security Library on Github and Maven Central, and I'd like to talk a bit about the addition I made of a new class called Filter. I would also like to note that you should not use v1.1 due to an issue described later. Some of the implementation decisions I made are arguable, so I would really like to hear the community's thoughts on these. So if you're at RSA, feel free to swing by Booth #1759 at RSA on Tuesday to grill me or the rest of SRL about it.

 

The implementation of this Filter class is located in coverity-escapers/src/main/java/com/coverity/security/Filter.java

 

The Filter class contains a few methods which are not technically escapers, but serve a very similar purpose, namely the main methods we have added are:

  • Filter.asURL
  • Filter.asNumber
  • Filter.asCssColor

 

They have been added since these are the most common contexts we have seen that do not have a defined escaper. To give a clearer picture of what I mean, have a look at this code snippet which shows some potential usage (via Java EL):

 

<%@ taglib uri="http://coverity.com/security" prefix="cov" %>
<iframe src="${cov:htmlEscape(asURL(param.taintedURL))}"> </iframe>
<script type="text/javascript">
    var x = ${cov:asNumber(param.taintedNumber)};
</script>
<style>
.user {
    background-color: ${cov:asCssColor(param.taintedColor)};
}
</style>

 

These three contexts present different problems, but essentially, there is obviously no way to turn an arbitrary string into a number of valid CSS color, and URLs face their own problems we'll get into a little bit later.

 

asNumber / asCssColor

These two functions validate the string passed is a valid number or CSS color. If the string does not pass that validation, then instead of throwing an exception we return a default value.

 

For asCssColor, our choice for a default is slightly hacky, since we chose to return the string invalid. I specifically chose this string, since it is an invalid CSS color, but one that we know is safe, and seems somewhat descriptive in the HTML output. The reason for choosing an invalid CSS color as the default string is that the CSS parser will simply ignore this single directive, and continue parsing the remaining CSS as if it had never encountered the invalid directive.

 

The main contender for an alternative was inherit, which is the default value when specifying the text color in CSS. However the default value for background-color directive is transparent instead of inherit, so choosing one or the other would lead to slightly (more) broken pages.

 

The default for asNumber is 0. The other alternatives considered were null and NaN, but since we couldn't be sure what context asNumber would be called for (e.g. JavaScript, CSS, HTML attribute), we decided it would be more correct to simply choose a number that should not have a huge effect.

 

Since these values are potentially contentious, we have also provided a version of these functions where you can specify your own default value (check out the Filter and FilterEL classes).

 

The other potentially contentious thing I have implemented in asNumber is removing support for octal numbers. In JavaScript, the following will alert the value 511:

<script>
  alert(0777);
</script>

 

This makes some sense for integer constants that a programmer has written, since they may want the octal notation. However since this is unexpected behavior for most users and not consistent across HTML/CSS, I decided to strip leading 0s from octal numbers. I think this is not only correct, but an improvement over just doing validation.

 

If you have examples of where these decisions would break applications or make these filters unusable, I would love to heard about it, so we can make improvements for a later release.

 

asURL / asFlexibleURL

The main use for these functions is when you want to generate URLs that are going to be used for links and iframes, or other scenarios where javascript:, data:, and vbscript: are dangerous. The main reason we have written this function is that trying to create a validation routine is quite difficult due to all the decoding that the browser does, e.g. this link will show a JavaScript alert:

 

<a href="
jaVa
&#115; cript&#58; alert(1)">encoding</a>

 

On top of this, HTML is not the only context URLs will be written into, JavaScript is a big contender:

 

<script>
document.location = "jaVascr ipt\
\x3aalert(1)";
</script>

 

Note that in both those cases the whitespaces are tabs, not spaces, otherwise those examples will not work.

 

Anyway, the point is that trying to create a blacklist that will catch all these variants, across different contexts, without false positives, is really tricky.

 

We can say that any URLs starting with the following strings are safe:

  • / (a path that is anchored at the domain root or is scheme-relative, i.e. //google.com/)
  • \\ (a path to a UNC share)
  • http:
  • https:
  • ftp:
  • mailto:

 

The problem with specifying such a restrictive whitelist is that you need to ask yourself what to do with any rejected URLs.

 

One approach that we have seen in other projects, that we have taken some inspiration from, is assuming that if it is not a string with a clearly defined and well known protocol, you can rewrite it with the current web directory as a base path, e.g. on this page java/test.html javascript:alert(1) would become https://communities.coverity.com/blogs/security/javascript:alert(1) essentially neutralising the string, but not changing the semantics of URLs such as java/test.html (which would be rewritten as https://communities.coverity.com/blogs/security/java/test.html).

 

During some brain storming about how to gain access to the current directory to do this rewriting, I realized that we do not actually need to know what the current URL is, since we can let the browser work it out by simply using ./ as our prefix instead of the real URL. The browser will interpret this as a URL relative to the current page directory, which is exactly what the browser would have been doing before.

 

So, that is the crux of our implementation of asURL, if we can determine that this is a safe URL that is not relative to the current directory, we let it go through as is, otherwise we prepend ./ effectively forcing it to be.

 

However, while we think this will be ok for almost all web applications, we feel like this might be an issue for mobile, or other application scenarios where custom URL protocols are relatively common. When you actually want users to provide links to them, you can use the horribly named asFlexibleURL.

 

asFlexibleURL utilises the same approach of prepending ./ to make URLs safe, but it uses a blacklist on the scheme names to decide whether it should do so. Now, as we showed above, a blacklist is a very risky thing to try and construct for every possible context, so we have taken a fairly careful approach:

 

  • First, we have some special case handling for URLs that start with / or \\ since they do not have a scheme, and should not be forced to be directory-relative
  • Next, we find the first non-scheme character (i.e. not a-zA-Z0-9\.+-), then if this character is a colon (:), we convert the preceding string to lower case and check it against out blacklist (javascript, data, vbscript, about), and if it does not match the blacklist, we allow it through
  • In all other cases, we prepend the URL with ./

 

The assumption this relies on is that you cannot do any encoding or parsing tricks with only the scheme characters, so if they are immediately followed by a colon, then we are parsing the scheme the same way a browser would. So far this seems quite safe to us, but we would appreciate feedback.

 

Is this even the right approach?

I'm pretty confident in this approach for asURL, however there is definitely an argument to be made that number and CSS validation should be done in the general business logic validation. However, I believe in trying to fix security bugs as close the 'sink' as possible, where the sink for XSS is the document creation, since this makes being certain that you do not have any vulnerabilities much easier, and adding these functions in addition to business logic validation would work juts as well.

 

There is also the question of whether default values are the right thing, another option is to raise an exception, however this seems like an ugly solution, since small amounts of malformed input would simply break pages in the application. There is an argument to be made that these exceptions give the developer a way to notice when malformed input is being supplied and not validated, but this could just as easily be achieved by logging validation failures.

 

Upgrade to version 1.1.1 not 1.1

We pushed out a version of CSL on Friday night that contained a vulnerability in asFlexibleURL; if an attacker were to specify a blacklisted URL that wasn't completely in lower case, the function would let it through, e.g. jaVascript:alert(1).

 

I have changed the implementation so that the scheme validation function only ever sees lower case URLs and does not have to worry about case issues in the future.

 

I have also modified asURL to allow mixed case URLs such as hTtP://www.coverity.com/ so that users hopefully encounter less unexpected behavior.

Interesting Links has been on a bit of a hiatus, but the interesting links have just kept coming, so we're bringing this back for the moment.

 

1) The last few weeks have been a pretty terrible time to be a ruby on rails admin with the vulnerabilities just pouring down, but this vulnerability found by joernchen of Phenoelit is potentially the most interesting. It has what could be the makings of a new bug class for dynamically typed languages if MySQL doesn't change it's behaviour. My current conspiracy theory on where this is going to crop up next is apps in dynamically typed languages which explicitly parse JSON (or similar) and put the results into parameterised queries.

 

2) Rich Lundeen continued to beat up ASP.NET MVC's CSRF protection and tease us with content for his BlackHat EU talk that I'm definitely looking forward to.

 

3) Mathew Green made a post about why he hates CBC-MAC, which taught me some new crypto tricks for breaking systems that use CBC-MAC. It's amazing how much real cryptographers actually know about cryptography

 

5) The Azimuth Security blog made a comeback with two posts dissecting phone jailbreaks, with Tarjei Mandt on the evasi0n jailbreak for iPhones and Dan Rosenberg on the Framaroot jailbreak for some Android handsets.

 

6) While Java was the new hotness a few weeks ago a few people published a lot of interesting research on attacking the JVM from an Applet context, but one particular report from Security Explorations caught my eye for section 3.4 Remote, Server-Side Code Execution which is a pretty short read and worthwhile for anyone hacking Java code.

 

7) While rooting around Mozilla's wiki I found that they're currently prototyping a client-side XSS Filter for Firefox. This is obviously a tricky and dangerous path, but hopefully they will learn from the mistakes of other browsers and have an easier time implementing it.

 

8) On the topic of browser XSS filters, Gareth Heyes has a post about about about some bypasses he and Mario Heiderich found in Chrome's XSS Auditor.

 

9) Julien Tinnes sent an email to oss-sec containing an exploit for a linux kernel race condition that seems pretty neat

I'm proud to announce Coverity has been issued a new patent Methods for Selectively Pruning False Paths in Graphs That Use High-Precision State Information.  The patent covers techniques that apply modern solvers such as SAT and SMT to the problem of eliminating false paths in programs.  False paths are one of the main causes of false positives in static analysis results - in our measurements of open source software, the techniques in this patent eliminates 1/3 of all false positives.

 

The naive way of leveraging solvers in false path pruning analyzes only one path at a time, or converts a whole function at a time into a constraint problem.  Both of these approaches fail on larger code bases. Real-world programs have an exponentially large number of paths, even within single functions.  And converting entire functions doesn't scale when interprocedural analysis is needed.  Worse, these methods don't play well with existing analysis infrastructure that is closer to dataflow analysis. The techniques in this patent address these concerns by generalizing from proofs provided by SAT/SMT solvers that certain paths are infeasible.  The proofs isolate the core contradictions that occur between path conditions, which can then be quickly tested for on other paths.  This can rule out an exponentially large number of infeasible paths without invoking the solver on large numbers of redundant constraint sets.

 

This technology has been in Coverity's product for a few years now.  All modern versions of Coverity Quality Advisor include this technology under the --enable-constraint-fpp option.  It's all part of our ongoing efforts to continually improve our analysis results using the latest technologies.

Alright, this is a bit of a different post. Summer approaches, and we are looking for an intern.

 

If you're interested in security and program analysis, and have a good background in one of the two fields, then please reach out to us (I believe you can send DM on this blog, but otherwise, you can contact me at rgaucher@coverity.com).

 

We have several ideas that could be developed based on the skills of the individual:

  • Pushing the limits of our dynamic whitebox sanitization fuzzer
  • Improvement of generation and placement of remediation advices
  • Identifying string constraints on tainted input for some set of sinks
  • Etc.

 

Also, if you have a cool idea (related to program analysis & security obviously) that you'd want to work on for a couple of months, we're happy to hear it.

We have been doing research to ensure our current XSS checker can accurately find and give remediation guidance for XSS defects. For this research we have been examining parsers involved in rendering a HTML page in more detail so that our analysis knows what is and isn't safe in a given context. In addition, this knowledge has become the backbone of the Coverity Security Library that we have open sourced

 

I gave a lightning talk last month at Bluehat v12 to help shed some light on the quirks we've been finding, and received a significant amount of feedback indicating that most of this knowledge wasn't really well known beforehand, so we've decided to publish that information here so that the community can be better informed about some unexpected results.

 

Javascript String Escapers

 

Jon Passki already blogged about one of the issues here, and you should read that post if you haven't already, but I feel like he left out one particularly interesting case for potential XSS defects, if you don't have time to read that, try to imagine what would happen if a browser were to parse this HTML:

 

<script>
var foo = "<!--<script>";
</script>
<a href="</script><script>alert('hax')</script>">link</a>

 

If you save this as a html file and browse to it, you'll see that an alert saying hax fires, as per Jon's blog post. And if you have a look at your browser's developer tools (press F12 in Chrome or IE), you can have a look at what the DOM looks like and you will see that the block from the first script tag to the first closing script tag inside the href attribute is one single script block:

 

double_parse.png

 

This is due to the "Script data double escaped state" that Jon mentions in his blog post, that we can force the browser into by opening a HTML comment and a new script tag.

 

So how is this relevant from a security perspective? If we look back at another one of Jon's blog posts here, we'll see that Spring, Apache, and most definitely other javascript string escapers avoid someone specifying the typical </script><script>alert(1)</script> payload by replacing / with \/ They do nothing to escape the characters < or >.

 

So, going back to the code example up the top, if you imagine the two parts highlighted in blue as attacker controlled, then we can see that these javascript string escapers will allow us to insert the first part of our attack <!--<script> and now we just need to insert the second part.

 

So, in our example above, injecting </script><script> into an attribute is actually "safe" by itself, since it does not break out of the href tag, however it's not very likely for this to exist since the most common way to secure an injection into a tag is to html encode it. My pick for this to be actually exploitable is to find a textarea tag that follows, since one of the common ways to fix this is to filter the closing </textarea> tag.

 

While we are currently working with Spring through SPR-9983 to address this issue, this is something you should check your own escapers for, since this is not an uncommon model for escaping JavaScript.

 

CSS String Escapers

 

In some cases developers want to let users control the contents of a CSS string, maybe to control a URL or a CSS selector. To get an example of what this would look like, consider this style tag:

 

<style>
span[id="TAINTED_DATA_HERE"] {
    background-color: #ff00ff;
}
</style>

 

As part of our investigation of what a CSS string escaper needs to do, we took note of the most obvious examples of needing to escape ' and " as string delimeters and \ as the escaping character, but we also noticed a series of characters that needed to be escaped because they would otherwise cause parse errors.

 

In particular, according to the CSS spec, any of the following characters can potentially cause parse errors:

 

  • Line Feed (U+000A)
  • Form Feed (U+000C)
  • Carriage Return (U+000D)

 

In practice, this turned out to be true for Chrome, but false for Firefox and IE.

 

In any case, parse errors, particularly in CSS are not obviously security issues by themselves, however if you have a look at the CSS spec you will find that CSS is somewhat unique in that parse errors are not actually fatal and a compliant CSS parser will attempt to recover and re-sync with the CSS stream and begin parsing CSS again. This essentially allows us to escape the quoted string context and jump into the CSS data context and specify arbitrary CSS, like this chrome-only example which sets the background of the entire page to red:

 

<style>
  span[id="
{} body { background-color: red; }"] {
  background-color: #ff00ff;
  }
</style>

 

Having demonstrated that we can use these characters to escape the string context the obvious question becomes what can we do with this in Google Chrome? We haven't delved too deeply into this topic at Coverity, and chrome doesn't seem to have an obvious way to directly execute javascript, however there has been quite a bit of work done by other researchers on conducting JavaScript-less attacks that can still steal your data, this recent paper in particular is a has a good survey of what can be done with CSS.

 

One additional thing to note is that the parse error recovery behaviour has implications for CSS validators such as HTMLPurifier or AntiSamy since they need to be aware of this behaviour to ensure their parsing of CSS is the same as a browser's.

Over the past year or so, the HTML 5 specification has been a non-friendly but necessary reference to me (/us). Indeed, this is the only place that really explains how an HTML 5 document gets tokenized (a necessary step before parsing).

 

If you're doing research related to XSS or HTML contexts, and you never had a look at this document, I suggest you go ahead and dive into it. That's mostly the key to finding something like the script data double escaped state as described by Jon.

 

However, if you had a serious look into it, I'm sure you had one of these reactions:

Where am I know?

How did I end up here?

So, just for the sake of making our life easier, I created a small visualization page for the spec. I mostly scraped the tokenizer spec, and generated a graph for it.

 

The result is a self-contained HTML document that helps you navigate through the tokenization specification, and lets you click on states and remembers where you're coming from. It's really just to make our life easier:

sshot.png

 

This document is available here: HTML5 tokenization visualization.

 

If you're interested in how to get the data, we published the script that generates a JSON or DOT file on Github: security/html5-tokenizer-extraction at master · coverity/security · GitHub

When writing the last blog post, I swear the HTML 5 tokenizer spec had a Steve Urkel moment with me. Well, not with me, but with a lot of existing JavaScript escapers out there. Why is that?

 

HTML 5 tokenizer introduces a handful of new HTML states that only exist within the Script data state (e.g. in a <script> tag). Some of these are really interesting.

 

Here's a mocked up <script> block with the states:

<script type="text/javascript">

   <!-- That second dash to the left is the Script data escaped dash dash state

       This is script data escaped state

       <script> This is script data double escaped state

       </script>

       This is script data escaped state

    -->

</script>

 

Here's ABNF formatting of the above in case state machines aren't your thing.

 

Boring you say.

 

Not at all I say What's interesting about these states is that the current JavaScript is also in play. "What?" I hear from you.

<script type="text/javascript>

var foo = "<!-- <script> guess what state I'm in!";

</script>

... rest of document

 

Normally, the JavaScript state would be in double quoted string, within the Script data state. (Check out the previous blog for a rundown of ECMAScript and String literals.) Now, the JavaScript state is in a double quoted string, within the Script data double escaped state. And the wheels start turning

 

Well, what does that mean? Given the above, the state transitions back to Script data escaped state when the original closing script tag (i.e. </script>) is consumed. Let's say there isn't a "-->" for the remainder of the document. Then the tokenizer ought to get to the end of file (EOF). If so, this is a parse error. The state ought to be switched to data state, and then the document is rendered, albeit not much.

 

<script type="text/javascript>

var foo = "<!-- <script> guess what state I'm in!";

</script>

... some of the document

<!-- Remember to remove this code for the next version -->

... remainder of the document

 

Again, following the tokenizer spec, the tokenizer ought to be at the Script data escaped state when it encounters the less-than sign "<". It then transitions to the Script data escaped less-than sign state. But it doesn't care about the exclamation point "!" here and switches back to Script data escaped. I'm hand waving the next couple state transitions until it gets to that "-->". At that point, it'll go through the following:

 

OK... what does that mean?

 

Remember the previous JavaScript context the page was in? Double quoted string literal. We're now back to that point in the script nesting. So, our string is this document text, except string literals cannot contain new line characters. A syntax error ought to be thrown by the browser's JavaScript parser.

 

Now what did HTML 5 do? It made ineffective a lot of Java-based JavaScript escapers. Many escapers don't care about "<", "!", "-" or ">". Coverity Security Library (CSL) does care about "<" and ">" so it's not affected.

 

What are the risks?

 

Well, unfortunately it depends.

 

If you're not using HTML 5 (i.e. <!DOCTYPE html> isn't at the top of your template), then you're OK. If you are using HTML 5 and your escaper doesn't escape those characters, I politely recommend giving CSL a spin. If you can't change escapers, then it depends upon where the injection occurs and here's where there be dragons.

 

foo.jsp

<script type="text/javascript>
var foo = "${some:JsEscaper(param.foo)}";
<%-- no other injection points --%>
</script>
<%-- ... no more script blocks --%>




 

If the above is your code, then the worse that can occur is the remainder of the document is not rendered by the browser. There's no direct risk for XSS. If that's acceptable, then huzzah. Although you still have a defect

 

mightbebad.jsp

<script type="text/javascript>
// totally contrived
var isBarSafe = true;
</script>
<script type="text/javascript>
var foo = "${some:JsEscaper(param.foo)}";
var bar = "${some:JsEscaper(param.bar)}";
isBarSafe = testBar(bar);
</script>
</head>
<body>
<script type="text/javascript>
if (isBarSafe) {
eval(${some:JsEscaper(param.bar)});
//...




 

Yes, the above is contrived. It's just illustrating where the wheels came come off the JavaScripting bus. An attacker could inject <!--<script> via the foo parameter and a --> via the bar parameter. This ought to cause the aforementioned syntax error above, nullifying anything in that second script block. The fail open scenario leaves the site at risk for XSS.

 

seriouslybad.jsp

<script type="text/javascript>
<!-- <script> some example ${some:JsEscaper(param.example)} </script> -->
</script>


 

The above is a nasty location to inject tainted data. Injecting --> ends the Script data escaped state, which leaves us at... Script data state. We're now in global JavaScript land with full XSS potential.

 

Where else could this be an issue?

  • Possibly in frame busting code that injects tainted data.
  • Possibly in JSON directly inserted into a script context.
  • ???

 

It's probably not worth the cycles to see if the defect is exploitable when there are reasonable remedies out there. If CSL isn't your cup of tea, bother your current vendor on patching their escaper. But if you haven't tried out CSL, I recommend it! And if you're using Maven, it's just one dependency away.

 

Update

 

@0x6D6172696F (.mario) had a conversation on Twitter back in August with @mathias regarding HTML 5 comments in JS. The WHATWG JavaScript section on Comments goes into better detail on what should or shouldn't be honor. This makes more sense than the trial and error testing I did above. These comments ought to be treated as single line comments. Therefore breaking them up across lines changes the meaning in JS. Thanks Mario!

While researching and developing the remediation advice for the cross-site scripting checker for Security Advisor, we noticed some idiosyncrasies with some popular escapers for HTML, XML, and JavaScript contexts. Romain and Andy have gone into the Coverity Security Library (CSL), which encapsulates some of this research.

 

JavaScript escapers, like Spring's JavaScriptUtils.javaScriptEscape() escape the forward slash (/) character. Spring mentions a reference to the Mozilla Core JavaScript guide, which has an interesting section on String literals. However, that section makes no mention of the forward slash. JavaScript is a dialect of the ECMAScript, defined currently in ECMA 262. Looking at the current (as of this blog post) revision of 5.1, ECMAScript defines String Literals in section 7.8.4. The Grammar Section in Annex A at the end of the document also is a good spot to read up on the syntax rules. To summarize, a String Literal can be anything between a single-quote (') or double-quote (") character except that quote character and Line Terminators. Drilling into the spec more, there's no obvious reason why the forward slash is escaped.

 

Now you could be saying to yourself, it's to escape someone from inserting a comment. Since the escaper escapes the quote characters, and a String Literal can contain a forward slash, there's no obvious way this would end the quoted context and start a new single-line comment context. So that's probably not it. And looking into the escapers, they're not trying to do anything with regular expressions, so that doesn't seem to be it.

 

Spring isn't alone here. The Apache Commons StringEscapeUtils.escapeEcmaScript() escaper also escapes the forward slash. Huh. So something is going on here. Now I don't know exactly why they do the things they do, but I can speculate .

 

A reasonable explanation starts with assuming these escapers are used in JSPs or templates. (Spring's escaper is used via their JSP tag library, e.g. spring:message.) If the developer inserts attacker-controlled tainted data into a JavaScript string context within this template, one could assume it's probably occurring between <script> tags. While it could occur within an HTML attribute context, let's go with the <script> tags. HTML 5 ought not change the state from "script data state" until a less-than sign (<) occurs. Then if the next character is a forward slash, the state ought to change to the "script data end tag open state". And here's the ahah moment. Since these escapers JavaScript backslash escape the forward slash character and the injected data contained </script>, the new escaped text should be <\/script>. An HTML 5 tokenizer like your browser ought to change back to the script data state. This prevents an attacker from injecting a </script> tag, closing the JavaScript script context, and starting whatever next context the attacker chooses.

 

Let's say the JSP wasn't using an escaper. Here's a snippet from somefile.jsp:

<script type="text/javascript">
  var foo = "${param.foo}";
</script>

 

Then if the following vector was passed to the application: ?foo=</script><script>var foo=1;//

 

The JSP ought to return the following HTML:

<script type="text/javascript">

  var foo = "</script><script>var foo=1;//";

</script>

The browser's JavaScript parser ought to register a syntax error within the first block and ignore it. The second block's foo ought to be respected, resulting in a value of 1. User-controlled data has changed the intent of the JavaScript. And this could result in an XSS attack.

 

Now, let's say it was. Here's somefile.jsp updated:

<%@ taglib prefix="spring" uri="http://www.springframework.org/tags"%>
...
    <script type='text/javascript'>
        var foo = "<spring:message javaScriptEscape="true" text="${param.foo}" />";
    </script>

 

Using the same vector, the JSP ought to return the following HTML:

<script type="text/javascript">

  var foo = "<\/script><script>var foo=1;\/\/";

</script>

This ought to be parsed as a normal double-quoted string by the browser. No XSS here.

 

In case you're wondering, the CSL conservatively escapes characters that could cause context transitions like these. If any are not obvious, please let us know on the GitHub page.

Getting the right HTML context is not always easy to say the least. It’s often easy to forget to apply some escaping, such as for one of the parent HTML contexts. That’s where Coverity Security Advisor comes into play. Our static analysis for cross-site scripting is HTML context aware, and will provide very accurate remediation advices on what escaper(s) to use or sanitization to leverage.

 

I want to emphasize that there is no dependency between the Coverity Security Library (CSL) escapers we provide and the Coverity products. We developed these escapers because there is no simple and good solution available for most java web developers. Also, we found it easier to mention these escapers in the remediation advices from Security Advisor when appropriate.

 

To illustrate the relation between CSL and Security Advisor, here is an example of what you can get out of it, and how the remediation advice really helps fixing the defects. The application is very simple and uses Spring MVC 3, JPA, JSP and EL; a very common web application stack.

 

In this case, I told the analysis not to trust the database, which is not our default trust model.

 

In the application, I first have a Spring MVC entry point that makes a query based on the title of a blog entry and returns a list of candidate (approximate search based on supplied title). I select the first one (line 53), extract the title, and add it to the model (line 56) under the name found_title. If you’re not familiar with Spring MVC, the model variable will essentially be added to the request attribute, and be available from expression language (EL).

simple-blog-add-to-model.png

 

 

Then the entry point dispatches to a JSP file (entry.jsp):

simple-blog-display-nested-context.png

The ${found_title} EL variable is the title that we retrieved from the database (and it is considered tainted as this is what I instructed the analysis to do).  Some tests are done to see if we should display or not a banner. We take the true branch in the test and consider the body of the <c:if> tag (line 7) to be displayed.

 

This is where we have a stored cross-site scripting defect that requires a non-trivial fix due to nested HTML contexts. In this case, the tainted value ${found_title} is appended in a jQuery selector (approximated to a JavaScript string by our current HTML context parser), which is inside the onclick DOM event.

 

The analysis reports the stack of HTML contexts that are associated with this tainted data:

  1. HTML_ATTR_VAL_DQ: HTML double quoted attribute value
  2. JS_STRING_SQ: JavaScript single quoted string

 

Also, the analysis reports the appropriate remediation pointing in this case to the Coverity Security Library and the JSTL: the fix is to actually use the EL construct ${fn:escapeXml(cov:jsStringEscape(found_title))} instead of using directly the ${found_title} variable.

 

The recommendation is to:

  1. Use an escaper for the JavaScript string so that the value of found_title cannot escape out of the JavaScript string: the EL Coverity escaper cov:jsStringEscape
  2. Use an HTML escaper to ensure that the HTML attribute (outer context) cannot be escaped out: the JSTL HTML escaper fn:escapeXml

 

Et voila! That's what I call enabling good code.

 

Note: Writing this post, I realized that in the events we report "Cross-site script (XSS) injection (XSS)" which is quite incorrect; this is has been corrected since.

Today we're launching the Coverity Security Library.  We built it because when we tried to develop remediation guidance for security defects (especially XSS), we couldn't do it in a concise way.  For me, this really highlighted why developers end up doing crazy things in their code to try to "fix" XSS defects.  The rules are just too arcane.  There are no convenient, easy to use, and freely available libraries that take care of the problem.  Being a busy developer, writing up a custom regex or hacked up escaper to do a quick fix is highly tempting under time pressure.  If we want developers to do the right thing, we need to make the path of least resistance the right thing to do.

 

That's the design philosophy behind the Coverity Security Library.  Some might find it embarrassingly small to be called a "library".  To some extent that's because we're naming it looking forward to what it will become.  But its simplicity and small size is also by design.  It has no external dependencies so it's easy to incorporate and keep up to date.  The functions perform escaping in a straightforward way, making the code easy to understand and review.  Need HTML escaping?  Just do this:

 

     Escape.html(data)

 

JavaScript string escaping?  Try this:

 

     Escape.jsString(data)

 

The class is named Escape, so the method names specify what kind of escaping.  Less typing, no stuttering! The functions are static, taking a string and returning a string.  No complexity to instantiate or use.

 

We take it a little further by providing EL hooks, so the escapers can be used in JSPs in a natural way.  For example:

 

     ${cov:htmlEscape(data)}

     ${cov:jsStringEscape(data)}

 

That's convenience.

 

Why yet another security library?  Many existing libraries were incomplete.  The complete ones were complex and inefficient.  There wasn't a freely available library we felt comfortable recommending to users who got remediation advice from our products.  So we created one.

 

We'll be expanding this library going forward, and we also welcome contributions from the community.  We'll be carefully vetting changes, running them through a battery of tests and also static analysis, fuzzing, and manual code review.  We hope to earn the trust of our users and believe that making this library available under a liberal BSD-like open source license helps increases the transparency that results in trust.  We hope to earn that trust over time as we continue to improve this library.

 

Get it today from GitHub.