The Importance of Escaping All The Things

Nick Daugherty is a WordPress VIP Senior Software Engineer. Here he shares some important information about escaping in code and how that can increase security in WordPress sites anywhere in the world. 

If there’s one issue we flag more often than all others in code reviews…it’s escaping.

For starters, we should all agree that escaping (fundamentally, sanitizing input and escaping output) is a critical aspect of web application security. What may be less universally agreed upon is where to escape. On that point, we strongly recommend late escaping–escaping as close as possible to the point of output.

You may now be thinking:

“Do I really need to “late escape” everything? Always? Even core WordPress functions?”

We hear you. And, here’s why this is important to us:

At WordPress VIP, we combine human review with our automated code scanning technology. And, while the original author of a particular piece of code may know exactly where they’ve already escaped their output and/or it’s convenient to trust a WordPress core function’s escaping, it’s much, much faster and more reliable to check for “late escaping”. This way a reviewer can be 100% positive that output has been escaped properly by simply looking at the point of output.

We acknowledge this standard requires a bit more effort from developers. But, we see the benefit as threefold:

  1. “Late escaping” makes review more efficient, which means your code is reviewed and deployed faster,
  2. A consistent practice of “late escaping” makes missed escaping obvious, thereby reducing the chances that unescaped output makes it into production,
  3. A consistently applied escaping standard–and we’ve chosen “late escaping” as ours–allows automated tools to better augment our human reviewers…further improving on #1 and #2 above.

To illustrate the importance of escaping everything, let’s look at a pattern where escaping is commonly omitted: Widget form elements.

A Widget form may look like this:

<label for="<?php echo $this->get_field_id( 'title' ); ?>"><?php _e( 'Title:' ); ?></label>
<input type="text" id="<?php echo $this->get_field_id( 'title' ); ?>" title="<?php echo $this->get_field_id( 'title' ); ?>" name="<?php echo $this->get_field_name( 'title' ); ?>" value="<?php echo esc_attr( $title ); ?>"/>

Those get_field_id( 'title' ); ?> and get_field_name( 'title' ); ?> calls should be safe right, since they are core WordPress functions?

Let’s see what happens when we drop this bit of code anywhere in our codebase:

add_action( 'widget_form_callback', function( $instance, $widget ){
    $widget->id_base = '"><img src="" data-wp-preserve="%3Cscript%3Ealert(%22Greetings!%20You%20have%20been%20hacked.%22)%3B%3C%2Fscript%3E" data-mce-resize="false" data-mce-placeholder="1" class="mce-object" width="20" height="20" alt="<script>" title="<script>" />"<';

    return $instance;
}, -999, 2);

Oh no! Javascript has been injected where it shouldn’t be.

Here is a more real world case illustrating how easy it is to get to a point where we’re outputting values of indeterminate origin:

add_action( 'widget_form_callback', function( $instance, $widget ){
    My_Widget_Controller::setup_widget_form( $instance, $widget );

    return $instance;
}, 10, 2);

// ...

class My_Widget_Controller {
    static function setup_widget_form( $instance, $widget ) {
        $widget->id_base    .= self::get_widget_id_base( $instance, $widget );
        $widget->name       .= self::get_widget_name( $instance, $widget );
    }

    static function get_widget_id_base( $instance, $widget ) {
        global $my_config_object;

        return get_option( 'my_widget_id_base_prefix' ) . '_' . $my_config_object['current_site']['widgets']['id_base'];
    }

    static function get_widget_name( $instance, $widget ) {
        $name = '';

        // ... arbitrary processing to arrive at a $name

        return $name;
    }
}

Now we’re down a rabbit hole, and it’s not so clear that get_field_id( 'title' ) will give us safe values.

Even values that are ‘100% safe and there is no way this could ever be abused’ need to be escaped, because future refactorings can introduce hard-to-detect vulnerabilities if there is unescaped code hanging around:

$class = ( 'featured' == $category ) ? 'home-featured' : 'standard';

?>

<div class="<?php echo $class; ?>">...

Seems harmless enough – $class can ever only have two values. Great, we’re safe!

Until 6 months from now, when a new business need refactors this to:

function my_get_post_class( $post ) {
   // ... arbitrary processing to determine a post class. Maybe we pull it from meta now?
   return get_post_meta( $post->ID, 'custom_post_class' );
}

// ...

$class = my_get_post_class( $post );

?>

<div class="<?php echo $class; ?>">...

Hmmm, now we’re outputting meta values directly, and there is no way to know that without following a potentially complex program flow – a recipe for an exploitable site.

What about constants? Those are the foolproof, never changing pillars of security, right? Consider the following:

// let's say this is for setting a class name, depending on the site we're on
$my_setting = get_option( 'safe_data' );

// ... elsewhere

define( 'MY_SAFE_CONSTANT', $my_setting );

// ...

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

later down the line, our option gets updated (somehow):

update_option( 'safe_data', '"><img src="" data-wp-preserve="%3Cscript%3Ealert(%22hax0rd%22)%3B%3C%2Fscript%3E" data-mce-resize="false" data-mce-placeholder="1" class="mce-object" width="20" height="20" alt="<script>" title="<script>" />' );

Another example of how constants can be exploited is conditional constants:

if ( ! defined( 'MY_SAFE_CONSTANT' ) ) {
    define( 'MY_SAFE_CONSTANT', 'safe-value' );
}

// ... elsewhere

<div class="<?php echo MY_SAFE_CONSTANT; ?>">...</div>

As a hacker, all I need to do to inject anything I like into the page is to add this somewhere before the previous code:

define( 'MY_SAFE_CONSTANT', 'unsafe value' );

What About Core Functions?

This concept applies to nearly all code in a theme, including many core functions that return a value. Some core functions that output, such as bloginfo(), have output escaping applied automatically – we recommend using the equivalent ‘return’ function and manually escaping

Example: bloginfo( 'name' ); could be rewritten as esc_html( get_bloginfo( 'name' ) );. This approach ensures everything is properly escaped and removes ambiguity.

The Bottom Line

If it’s not escaped on output, it’s potentially exploitable. Never underestimate the abilities of an attacker–they’re experts at finding the way to make the ‘this should never, ever, be possible‘ things happen :). For maximum security, we must escape all the things.

Get the latest content updates

Want to be notified about new content? Leave your email address below and we’ll make sure you stay updated.

Author

Nick Daugherty, Senior Software Engineer, WordPress VIP

Get the latest content updates

Want to be notified about new content?

Leave your email address and we’ll make sure you stay updated.