Single Sign On

Single Sign On (SSO, not to be confused with Jetpack SSO) is possible for clients using any identity provider (IdP) that supports SAML (or Security Assertion Markup Language). We do not support other SSO technologies at this time. We also cannot install any middleware required in some Shibboleth configurations. Most IdPs can support SAML.

Setting up the IdP

SAML IdP’s require you to register the VIP Go application as a service provider. They have different ways of approaching this but the purpose is to:

  • Set up the application as a legitimate service provider.
  • Tell the IdP where and how to communicate with your VIP Go application.
  • Generate the certificate and URLs the IdP will use to send and encrypt communication with the VIP Go application.

Most IdPs have an application creation here’s the documentation for creating custom applications on major IdPs:

You will need:

  • The ACS location, usually example.com/wp-login.php/?saml_acs (where example.com is your domain)
  • The entity-id: php-saml

Once you create your SAML application, the IdP will provide the following:

  • Entity ID (a unique URL)
  • Single Sign-on URL
  • X.509 Certificate to setup WordPress.

Setting up WordPress

In order for our team to continue to provide support for your application, we have the following requirements:

  • You must configure your SSO plugin to create local user accounts
  • If you force SSO for all users, you must provide a way for support users to circumvent the SSO flow on login
  • If you force SSO on all pages of the site, you must expose the XML-RPC endpoints to Jetpack requests.

To allow users to circumvent the SSO flow, the easiest way is to provide a url parameter like wp-login.php?normal that directs users to the wp-login form. A more secure way is to detect if a user is accessing the site through VIP’s proxy servers.

Use one of these plugins:

  • OneLogin’s WordPress SAML
  • Human Made’s WordPress Simple SAML

Onelogin’s WordPress SAML plugin

OneLogin’s WordPress SAML plugin is managed through a settings page where you can fully configure your system. If you’re using this plugin, make sure you also have our helper plugin installed to your client-mu-plugins directory which takes care of some of the required details above and also ensuring cookies and other SSO settings pass through our cache layers.

Options and Settings

You can mostly choose how to configure your own SSO. Some settings may be dictated by your IdP. If you’re doing a lot of custom configuration, we highly recommend you thoroughly test your SSO setup on your VIP Go application before launch.

Here are our recommended settings (these are under the “Options” heading of the OneLogin plugin):

  • Create user if not exists: This causes WordPress to create local accounts for users that sign in over SSO. Required
  • Update user data: This causes user attributes like first name, last name, and email address to change on WordPress when they change in your IdP. Recommended
  • Single Log Out: only useful if the client’s IdP supports it. Not recommended
  • Alternative ACS Endpoint: Not supported
  • Match WordPress account by: You can choose how to match your users to their IdP accounts.

There are many additional options and settings. For the most part, you shouldn’t need to change these unless your IdP requires it.

WordPress Simple SAML plugin

Human Made’s WordPress Simple SAML plugin stores the SAML configuration in code and facilitates SAML without extra settings screens. Because of how Human Made approached this and how our platform works, we require some extra code in your theme’s functions.php file. If you need help generating this code, reach out, and we’ll provide the code for use with this plugin. The helper code handles configuring the IdP and mapping your roles. Your developers will want to take a close look at this before launch. Loading the SAML configuration from an XML file provided by your IdP is currently not supported on VIP Go.

Notes on role mapping

Sometimes the role sent by an IdP doesn’t match a role in the WordPress install. If this is the case, you have three options for resolving the mismatch. Any users without a matching role will be assigned the default, usually “Subscriber.”

  • Create roles in your WordPress application that match your IdP.
  • Create roles in your IdP that match roles in WordPress.
  • Map your IdP’s roles to existing roles in WordPress. You do not need to map every role, and more than one role can be mapped to a given WordPress role.

Preventing unauthenticated site access with SSO

The only way to make a VIP Go site “private” is by requiring SSO authentication to access any page on your entire site. To do this, use the OneLogin plugin and enable the “Force SAML Login” option. You must still provide a method for VIP Support to circumvent SSO to access the site.

Requiring SSO to login

We require the creation of local accounts on the WordPress install so that we can more easily troubleshoot when users are having problems. This doesn’t prevent the client from requiring SSO to log in. If the client requires SSO for all logins from their users, enable the following options in the OneLogin plugin’s settings:

  • Prevent reset password: This will prevent users from resetting their WordPress account passwords.
  • Prevent change password: This will prevent users from changing their WordPress password.
  • Prevent change mail: This will prevent users from changing the email address in their WordPress account profile.

QA Recommendations

We have a few recommendations for clients to test their SSO configuration before launch.

Check users

  • Create test users within the IdP, create one for each role that mapped to WordPress to make sure users have the right role when they sign in.
  • Test any known role conflicts to make sure they are resolved as you expected.
  • Test whether users can successfully log in and out without affecting other SSO sessions in their organization

Test content protections

  • If the entire site requires authentication, make sure clients verify by anonymously access the site
  • Make sure all login requests go through the single sign-on process.

Validating, sanitizing, and escaping

Your code works, but is it safe? When writing your theme and plugin code, you’ll need to be extra cautious of how you handle data coming into WordPress and how it’s presented to the end user. This commonly comes up when building a settings page for your theme, creating and manipulating shortcodes, or saving and rendering extra data associated with a post. There is a distinction between how input and output are managed, and this document will walk you through that.

(If you’re interested in more thoughts on why WordPress.com VIP takes these practices so seriously, read The Importance of Escaping All The Things from June 2014.)

Guiding Principles

  1. Never trust user input.
  2. Escape as late as possible.
  3. Escape everything from untrusted sources (like databases and users), third-parties (like Twitter), etc.
  4. Never assume anything.
  5. Never trust user input.
  6. Sanitation is okay, but validation/rejection is better.
  7. Never trust user input.

“Escaping isn’t only about protecting from bad guys. It’s just making our software durable. Against random bad input, against malicious input, or against bad weather.”
–nb

Validating: Checking User Input

To validate is to ensure the data you’ve requested of the user matches what they’ve submitted. There are several core methods you can use for input validation; usage obviously depends on the type of fields you’d like to validate. Let’s take a look at an example.

Say we have an input area in our form like this:

<input id="my-zipcode" type="text" maxlength="5" name="my-zipcode" />

Just like that, we’ve limited my user to five characters of input, but there’s no limitation on what they can input. They could enter “11221” or “eval(“. If we’re saving to the database, there’s no way we want to give the user unrestricted write access.

This is where validation plays a role. When processing the form, we’ll write code to check each field for its proper data type. If it’s not of the proper data type, we’ll discard it. For instance, to check “my-zipcode” field, we might do something like this:

$safe_zipcode = intval( $_POST['my-zipcode'] );
if ( ! $safe_zipcode )
$safe_zipcode = '';
update_post_meta( $post->ID, 'my_zipcode', $safe_zipcode );

The intval() function casts user input as an integer, and defaults to zero if the input was a non-numeric value. We then check to see if the value ended up as zero. If it did, we’ll save an empty value to the database. Otherwise, we’ll save the properly validated zipcode.

Note that we could go even further and make sure the the zip code is actually a valid one based on ranges and lengths we expect (e.g. 111111111 is not a valid zip code but would be saved fine with the function above).

This style of validation most closely follows WordPress’ whitelist philosophy: only allow the user to input what you’re expecting. Luckily, there’s a number of handy helper functions you can use for most data types.

Sanitizing: Cleaning User Input

Sanitization is a bit more liberal of an approach to accepting user data. We can fall back to using these methods when there’s a range of acceptable input.

For instance, if we had a form field like this:

<input id="title" type="text" name="title" />

We could sanitize the data with the sanitize_text_field() function:

$title = sanitize_text_field( $_POST['title'] );
update_post_meta( $post->ID, 'title', $title );

Behind the scenes, the function does the following:

  • Checks for invalid UTF-8
  • Converts single < characters to entity
  • Strips all tags
  • Remove line breaks, tabs and extra white space
  • Strip octets

The sanitize_*() class of helper functions are super nice for us, as they ensure we’re ending up with safe data and require minimal effort on our part.

In some instances, using wp_kses and it’s related functions might be a good idea as you can easily clean HTML while keeping anything relevant to your needs present.

Escaping: Securing Output

For security on the other end of the spectrum, we have escaping. To escape is to take the data you may already have and help secure it prior to rendering it for the end user. WordPress thankfully has a few helper functions we can use for most of what we’ll commonly need to do:

esc_html() we should use anytime our HTML element encloses a section of data we’re outputting.


<h4><?php echo esc_html( $title ); ?></h4>

esc_url() should be used on all URLs, including those in the ‘src’ and ‘href’ attributes of an HTML element.

<img alt="" src="<?php echo esc_url( $great_user_picture_url ); ?>" />

esc_js() is intended for inline Javascript.

<div onclick='<?php echo esc_js( $value ); ?>' />

esc_attr() can be used on everything else that’s printed into an HTML element’s attribute.

<ul class="<?php echo esc_attr( $stored_class ); ?>">

wp_kses() can be used on everything that is expected to contain HTML.  There are several variants of the main function, each featuring a different list of built-in defaults.  A popular example is wp_kses_post(), which allows all markup normally permitted in posts. You can of course roll your own filter by using wp_kses() directly.

<?php echo wp_kses_post( $partial_html ); echo wp_kses( $another_partial_html , array( 'a' => array(
        'href' => array(),
        'title' => array()
    ),
    'br' => array(),
    'em' => array(),
    'strong' => array(),
);) ?>

As an example, passing an array to wp_kses() containing the member

'a' => array( 'href' , 'title', )

means that only those 2 HTML attributes will be allowed for a tags — all the other ones will be stripped. Referencing a blank array from any given key means that no attributes are allowed for that element and they should all be stripped.

There has historically been a perception that wp_kses() is slow. While it is a bit slower than the other escaping functions, the difference is minimal and does not have as much of an impact as most slow queries or uncached functions would.

It’s important to note that most WordPress functions properly prepare the data for output, and you don’t need to escape again.

<h4><?php the_title(); ?></h4>

rawurlencode() should be used over urlencode() for ensure URLs are correctly encoded. Only legacy systems should use urlencode()`.

<?php echo esc_url( 'http://example.com/a/safe/url?parameter=' . rawurlencode( $stored_class ) ); ?>

Always Escape Late

It’s best to do the output escaping as late as possible, ideally as data is being outputted.

// Okay, but not that great
$url = esc_url( $url );
$text = esc_html( $text );
echo '<a href="'. $url . '">' . $text . '</a>';

// Much better!
echo '<a href="'. esc_url( $url ) . '">' . esc_html( $text ) . '</a>';

This is for a few reasons:

  • It makes our code reviews and deploys happen faster because rather than hunting through many lines of code, we can glance at it and know it’s safe for output.
  • Something could inadvertently change the variable between when it was firstly cast and when it’s outputted, introducing a potential vulnerability.
  • Future changes could refactor the code significantly. We review code under the assumption that everything is being output escaped/cast – if it’s not and some changes go through that make it no longer safe to output, we may incorrectly allow the code through, since we’re assuming it’s being properly handled on output.
  • Late escaping makes it easier for us to do automatic code scanning (saving us time and cutting down on review/deploy times) – something we’ll be doing more of in the future.
  • Escaping/casting on output simply removes any ambiguity and adds clarity (always develop for the maintainer).

Escape on String Creation

It is sometimes not practical to escape late. In a few rare circumstances you cannot pass the output to wp_kses since by definition it would strip the scripts that are being generated.

In situations like this always escape while creating the string and store the value in a variable that is a postfixed with _escaped, _safe or _clean. So instead of $variable do $variable_escaped or $variable_safe.

If a function cannot output internally and late escape, then it must always return “safe” html, that does not rely on them being late escaped. This allows you to do echo my_custom_script_code(); without needing the script tag to be passed through a version of wp_kses that would allow such tags.

Case Studies and FAQs

We know that validating, sanitizing and escaping can be a complex topic; we’ll add some specific case studies and frequently asked questions here as we think they might be helpful.

Q: Doesn’t a function like WP_Query handle sanitizing user input before running a query for me? Why do I need to also sanitize what I send to it?

A: For maximum security, we don’t want to rely on WP_Query to sanitize our data and hope that there are no bugs or unexpected interactions there now or in the future. It’s a good practice to sanitize anything coming from user-land as soon as you begin to interact with it, treating it as potentially malicious right away.

Q: Isn’t WP_KSES_* slow?
A: Even on large strings WP_KSES_* will not add a significant overhead to your pageload. Most of your pageloads should be cached pageloads and the first thing to focus on should be to make sure as many of your end users as possible are getting cached pages. Slow SQL Queries as well as Remote requests are often next on the list. Escaping is often negligible compared to those items.

Zack Tollman wanted to know more about wp_kses functions, so he did a pretty thorough investigation about them here. https://www.tollmanz.com/wp-kses-performance/. He found that wp_kses functions can be 20-40x slower than esc_* functions on PHP 5.6, but the performance hit is much smaller when using HHVM. The post was written before PHP 7 came out, but PHP 7 is likely to have similar performance to HHVM, meaning that wp_kses functions aren’t as much as a performance drain as they used to be, at least on PHP 7 servers. WordPress.com is using PHP 7.

Q: Why do I need to escape these values? It is impossible for them to be unsafe.
A: It is currently impossible for them to be unsafe. But a later code change could easily make it that the variable is modified and therefore can no longer be trusted. Always late escaping whenever possible makes the code much more robust and future proof.

Conclusion

To recap: Follow the whitelist philosophy with data validation, and only allow the user to input data of your expected type. If it’s not the proper type, discard it. When you have a range of data that can be entered, make sure you sanitize it. Escape data as much and as late as possible on output to avoid XSS and malformed HTML.

Take a look through the Data Validation Plugin Handbook page  to see all of the sanitization and escaping functions WordPress has to offer.

Fetching remote data

If you need to fetch data from another server, you should remember that doing so is a relatively slow process and that you can run into problems if there are any timeouts.

To help you to efficiently and robustly fetch your data, we have created two helper functions that you can use:

wpcom_vip_file_get_contents()

wpcom_vip_file_get_contents() works much like PHP’s built-in file_get_contents() function (although it no longer internally uses it). It returns either the HTML result as a string or false on failure. However, it caches and even returns previously cached data if a new remote request fails. We strongly recommend using this function for any remote request that does not require receiving fresh, up-to-the-second results, i.e. anything on the front end of your blog.

  1. The URL you want to fetch. This is the only required argument.
  2. The timeout limit in seconds. Can be 1 to 10 seconds and it defaults to 3 seconds. We strongly discourage using a timeout greater than 3 seconds since remote requests require that the user wait for them to complete before the rest of the page will load.
  3. The minimum cache time in seconds. It cannot be less than 60 and it defaults to 900 (15 minutes). Setting this higher will result in a faster site as remote requests are relatively slow. Results may be cached even longer if the remote server sends a cache-control header along with its response, and if that value is larger than this value. See below for details and how to disable this.
  4. An array of additional advanced arguments. See below.

The fourth parameter is an optional argument that can be used to set advanced configuration options. The current additional advanced arguments are:

  • obey_cache_control_header — By default, if the remote server sends a cache-control header with a max-age value that is larger than the cache time passed as the third parameter of this function, then this remotely provided value will be used instead. This is because it’s assumed that it’s safe to cache data for a longer period of time if the remote server says the data is not going to change. If you wish to ignore the remote server’s header response and forcibly cache for only the time specified by the third parameter, then a function call along these lines should be used:
    echo wpcom_vip_file_get_contents( 'http://example.com/file.txt', 3, 900,
    array( 'obey_cache_control_header' => false ) );
    
  • http_api_args — Allows you to pass arguments directly to the wp_remote_get() call. See the WordPress.org Code Reference for a list of available arguments. Using this argument will allow you to send things like custom headers or cookies. Example usage:
    echo wpcom_vip_file_get_contents( 'http://example.com/file.txt', 3, 900,
    array( 'http_api_args' => array( 'headers' => array( 'Accept-Encoding' => 'gzip' ) ) ) );
    

Note that like PHP’s file_get_contents() function, wpcom_vip_file_get_contents() will return the result. You will need to echo it if you want it outputted. This is different from our previous and now deprecated functions, including vip_wp_file_get_contents().

vip_safe_wp_remote_get()

vip_safe_wp_remote_get() is a sophisticated extended version of wp_remote_get(). It is designed to more gracefully handle failure than wp_safe_remote_get() does. Note that like wp_remote_get() and wp_safe_remote_get, it does not cache. Its arguments are as follows:

  1. The URL you want to fetch. This is the only required argument.
  2. This argument is optional. Pass false if you need to set any of the next arguments.
  3. The number of fails required before subsequent requests automatically return the fallback value. This prevents continually making requests and receiving timeouts for a down or slow remote site. Defaults to 3 retries. Cannot be more than 10.
  4. The number of seconds before the request times out. Can be 1, 2, or 3 and it defaults to 1 second.
  5. This argument controls both the number of seconds before resetting the fail counter and the number of seconds to delay making new requests after the fail threshold is reached. Defaults to 20 and cannot be less than 10.

If you’re confused, here’s some examples that should help clarify:

// Get a URL with a 1 second timeout and cancel remote calls for
// 20 seconds after 3 failed attempts in 20 seconds have occurred
$response = vip_safe_wp_remote_get( $url );
if ( is_wp_error( $response ) )
	echo 'No data is available.';
else
	echo wp_remote_retrieve_body( $response );

// Get a URL with 1 second timeout and cancel remote calls for 60 seconds
// after 1 failed attempt in 60 seconds has occurred. On failure, display &quot;N/A&quot;.
$response = vip_safe_wp_remote_get( $url, false, 1, 1, 60 );
if ( is_wp_error( $response ) )
	echo 'N/A';
else
	echo wp_remote_retrieve_body( $response );

fetch_feed()

WordPress’s built-in fetch_feed() function should be used for fetching and parsing feeds. It has built-in caching that defaults to 43200 seconds (12 hours). To change that value, use a filter:

function someprefix_return_900() {
	return 900;
}

add_filter( 'wp_feed_cache_transient_lifetime', 'someprefix_return_900' );
$feed = fetch_feed( $feed_url );
remove_filter( 'wp_feed_cache_transient_lifetime', 'someprefix_return_900' );

wpcom_vip_wp_oembed_get()

`wpcom_vip_wp_oembed_get()` is a wrapper for WordPress’ own `wp_oembed_get()` but with added caching.

Uncached Remote Requests

If for some reason you need to make an uncached remote request, such as to ping an external service during post publish, then you should use the powerful and flexible WordPress HTTP API rather than directly using cURL or another method.

Note that uncached remote requests should never run on the front end of your site for speed and performance reasons.

cURL fopen fsockopen

Use current_time(), not date_default_timezone_set()

Use WordPress’s

current_time( 'timestamp' )

if you need to get a time that’s adjusted for the site’s timezone setting in the admin area.

If you need to work with the timezone offset:

get_option( 'gmt_offset' )

Please don’t use date_default_timezone_set(). The timezone in PHP needs to stay GMT+0 as that’s what WordPress expects it to be. Several features are dependent on this, and will break if you adjust the timezone.

Custom user roles

Sometimes the default roles and capabilities aren’t exactly what you need for your site. If you need to create new roles or modify existing ones, we have helper functions to assist you in doing this. Please use these functions rather than the traditional methods as this will ensure that your code works on WordPress.com and in your development environments.

As an example, here’s how you can register a “Reviewer” role:

add_action( 'init', function() {
    $ver = 42; // bump each time this code is changed
    // check if this has been run already
    if ( $ver <= get_option( 'custom_roles_version' ) {
        return;
    }

    // add a Reviewer role
    wpcom_vip_add_role( 'reviewer', 'Reviewer', array(
        'read' => true,
        'edit_posts' => true,
        'edit_others_posts' => true,
        'edit_private_posts' => true,
        'edit_published_posts' => true,
        'read_private_posts' => true,
        'edit_pages' => true,
        'edit_others_pages' => true,
        'edit_private_pages' => true,
        'edit_published_pages' => true,
        'read_private_pages' => true,
        )
    );

    // update the version to prevent this running again
    update_option( 'custom_roles_version', $ver );
} );

Note: you’ll want to use these helper functions on the ‘init’ hook, and ensure you only run them when the role definitions need to change. An example technique is shown.

You can find all available capabilities in WordPress Codex.

Here are some more examples:

add_action( 'init', function() {
    $ver = 43; // bump each time this code is changed
    // check if this has been run already
    if ( $ver <= get_option( 'custom_roles_version' ) {
        return;
    }
    
    // Add new role
    wpcom_vip_add_role( 'super-editor', 'Super Editor', array( 'level_0' => true ) );

    // Remove publish_posts cap from authors
    wpcom_vip_merge_role_caps( 'author', array( 'publish_posts' => false ) );

    // Remove all caps from contributors
    wpcom_vip_override_role_caps( 'contributor', array( 'level_0' => false ) );

    // Duplicate an existing role and modify some caps
    wpcom_vip_duplicate_role( 'administrator', 'station-administrator', 'Station Administrator',
        array( 'manage_categories' => false ) );

    // Add custom cap to a role
    wpcom_vip_add_role_caps( 'administrator', array( 'my-custom-cap' ) );

    // Remove cap from a role
    wpcom_vip_remove_role_caps( 'author', array( 'publish_posts' ) );

    // update the version to prevent this running again
    update_option( 'custom_roles_version', $ver );
} );

Database queries

Direct database queries should be avoided wherever possible. Instead, it’s best to rely on WordPress API functions for fetching and manipulating data.

Of course this is not always possible, so if any direct queries need to be run here are some best practices to follow:

  • Use filters to adjust queries to your needs. Filters such as posts_where can help adjust the default queries done by WP_Query. This helps keep your code compatible with other plugins. There are numerous filters available to hook into inside /wp-includes/query.php.
  • Make sure that all your queries are protected against SQL injection by making use of $wpdb->prepare and other escaping functions like esc_sql and like_escape.
  • Try to avoid cross-table queries, especially queries which could contain huge datasets such as negating taxonomy queries like the -cat option to exclude posts of a certain category. These queries can cause a huge load on the database servers.
  • Remember that the database is not a tool box. Although you might be able to perform a lot of work on the database side, your code will scale much better by keeping database queries simple and performing necessary calculations and logic in PHP.
  • Avoid using DISTINCT, GROUP, or other query statements that cause the generation of temporary tables to deliver the results.
  • Be aware of the amount of data you are requesting. Make sure to include defensive limits.
  • When creating your own queries in your development environment, be sure to examine the query for performance issues using the EXPLAIN statement. Confirm indexes are being used.
  • Don’t JOIN the users table.
  • Cache the results of queries where it makes sense.

Uncached functions

WordPress core has a number of functions that, for various reasons, are uncached, which means that calling them will always result in an SQL query. Below, we outline some of these functions:

  • get_posts()
    • Unlike WP_Query, the results of get_posts() are not cached via Advanced Post Cache.
    • Use WP_Query instead, or set 'suppress_filters' => false.
      $args = array(
      	'post_type'        => 'post',
      	'posts_per_page'   => 3,
      	'suppress_filters' => false,
      );
      $query = get_posts( $args );
      
    • When using WP_Query instead of get_posts don’t forget about setting ignore_sticky_posts and no_found_rows params appropriately (both are hardcoded inside a get_posts function with value of true )
  • wp_get_recent_posts()
    • See get_posts()
  • get_children()
    • Similar to get_posts(), but also performs a no-LIMIT query among other bad things by default. Alias of break_my_site_now_please(). Do not use. Instead do a regular WP_Query and make sure that the post_parent you are looking for is not 0 or a falsey value. Also make sure to set a reasonable posts_per_page, get_children will do a -1 query by default, a maximum of 100 should be used (but a smaller value could increase performance)
  • term_exists()
    • Use wpcom_vip_term_exists() instead
  • get_page_by_title()
  • get_page_by_path()
    • Use wpcom_vip_get_page_by_path() instead
  • url_to_postid()
    • Use wpcom_vip_url_to_postid() instead
  • count_user_posts()
    • Use wpcom_vip_count_user_posts() instead.
  • wp_old_slug_redirect()
    • Use wpcom_vip_old_slug_redirect() instead.
  • get_adjacent_post()get_previous_post()get_next_post(), previous_post_link(), next_post_link()
    • Use  wpcom_vip_get_adjacent_post() instead.
  • attachment_url_to_postid()
    • Use  wpcom_vip_attachment_url_to_postid() instead.
  • wp_oembed_get()
    • Use wpcom_vip_wp_oembed_get() instead.

Creating good changesets

Changesets are the heart of any version control system, and making good changesets is vitally important to the maintainability of your code. As all code on WordPress.com VIP is reviewed by a real person, it’s even more important all changesets are well crafted.

Remember always code (and commit) for the maintainer.

A Good Changeset:

Represents one logical change

What comprises a ‘logical change’ is up for interpretation, but only directly related changes are included. Generally, the smaller the changeset, the better.

Good Example: Adding the CSS, JS, HTML, and PHP code for a new UI button.

Bad Example: Adding the new UI button, fixing whitespacing, and tweaking copy in the footer.

Bundles related changes together

It’s much easier to trace refactorings and other changes if related changes are grouped together. Rather than splitting a logical change into many separate commits, related changes should be combined.

Good Example: Refactoring code into a new plugin by moving it to a new file and including that file.

Bad Example: Refactoring code into a new plugin by putting the code removal, addition, and include into separate commits.

Is Atomic

An atomic commit means that the system is always left in a consistent state after the changeset is committed. No one commit would cause the codebase to be in an invalid state. The commit is easily rolled back to a previous valid state, including all related changes, without the need to analyze the potential interdependencies of neighboring commits.

Good Example: Adding a new feature to the homepage by committing the HTML / PHP changes alongside the required CSS / JS changes, so there is never an incomplete state (HTML elements without styling) in the codebase.

Bad Example: Committing the HTML changes and requisite CSS / JS separately. The first commit represents an inconsistent state, as the feature can exist in the DOM without being properly styled.

Is Properly Described

Accurately describing the changes is very important for others (and future you) looking at your code. A good commit message describes the what and why of a change. Please see Writing Good Commit Messages for more information.

Writing good commit messages

Commit messages are one of the most common ways developers communicate with other developers, including our VIP team, so it’s important that your commit message clearly communicate changes with everybody else.

Who are we writing commit messages for?

The audience of a commit message is:

0. People reading the commit timeline.

1. People debugging code.

What is a good commit message?

Having these assumptions in mind:

1. Good commit messages should have a subject line. One sentence briefly describing what the change is, and (if it makes sense) why it was necessary.

A good subject line gives the reader the power to know the gist of the commit without bothering to read the whole commit message.

Example:

Fix stats link on m.example.com

This does not need a high-level why part, because it’s obvious – the links weren’t working.

Example:

Stats Report: clear caches on each post to save memory

Here we need a why part, because if the message was only “clear caches on each post”, the obvious follow-up question is, “Why would you clear cache for each post in a loop?!”.

Whenever the commit is a part of a clearly-defined and named project, prefixing the commit with the project name is also very helpful. It’s not mandatory, because often the project space is vague and the list of committed files reveals similar information.

2. There should be an empty line between the subject line and the rest of the commit message (if any). Whitespace is like bacon for our brains.

3. A good commit message tells why a change was made.

Reasoning why is helpful to both of our audiences. Those following the timeline, can learn a new approach and how to make their code better. Those tracing bugs gain insight for the context of the problem you were trying to solve, and it helps them decide whether the root cause is in the implementation or higher up the chain.

Explaining why is tricky, because it’s often obvious. “I’m fixing it because it’s broken”. “I’m improving this, because it can be better.”

If it’s obvious, go one level deeper. The 5 Whys technique is great. Not only for looking for root causes of problems, but for making sure you are doing what you are doing for the right reasons.

Example:

JSON API: Split class into hierarchy for easier inclusion in ExamplePlugin

Including the old code required a bunch of hacks and compatibility layers.
With the new hierarchy, we can get rid of almost all the hacks and drop the files into ExamplePlugin as is.

Here the commit message very conveniently explains what the downsides were of the old approach and why the new approach is better.

Example:

Remove filtering by ticket

It's not very useful, while it's slow to generate.

The workflow is to usually go to the ticket page and see associated
comments there.

Here the commit message shares a UX decision we made, which is the primary reason of the commit.

5. Most commits fix a problem. In this case a good commit message explains what caused the problem and what its consequences were.

Everybody needs to know what caused a problem in order to avoid causing a similar problem again. Knowing the consequences can explain already noticed erroneous behaviour and can help somebody debugging a problem compare the consequences of this, already fixed problem with the one being debugged.

If possible, avoid the word fix. Almost always there is a more specific verb for your action.

If the problem is caused by a single changeset, a good commit message will mention it.

6. A good commit message explains how it achieves its goal. But only if isn’t obvious.

Most of the time it’s obvious. Only sometimes some high-level algorithm is encoded in the change and it would benefit the reader to know it.

Example:

Add a first pass client stat for bandwidth

Bandwidth is extrapolated from a month sample. From
there we get the average number of bytes per pageview
for each blog. This data is cached in means.json.

All the code for generating the data in means.json is
in the static methods of the class.

Here we explain the algorithm for guessing bandwidth data. It would have been possible to extract this information from the commit, but it would’ve taken a lot of time and energy. Also, by including it in the commit message we imply that it’s important for you to know that.

7. If the subject line of a commit message contains the word and or in other way lists more than one item, the commit is probably too large. Split it.

Make your commits as small as possible. If you notice a coding style problem while fixing a bug, make a note and fix it after you fix the bug. If you are fixing a bug and you notice another bug, make a note and fix the second bug in another commit.

The same is especially true for white space changes to existing code. White spaces changes should be a separate commit.

8. A good commit message should not depend on the code to explain what it does or why it does it.

Two notes here:

This doesn’t mean we should tell what each line of code does. It means that we should convey all the non-trivial information in the code to the commit message.

This doesn’t mean we whouldn’t include any of this information in the code. Knowing why a function exists, what it does, or what algorithm does it use can often be a useful comment.

9. It’s perfectly OK to spend more time crafting your commit message than writing the code for your commit.

10. It’s perfectly OK for your commit message to be longer than your commit.

11. A good commit message gives props and references relevant tickets.

12. Common sense always overrules what a good commit message thinks it should be.

Other perspectives

Here’s another excellent post that explains how to approach a good commit message: http://robots.thoughtbot.com/5-useful-tips-for-a-better-commit-message

The Code: guidelines for VIP developers

307_6563_ch-929

At WordPress.com VIP, we feel very privileged to work with some of the best developers on some of the world’s biggest sites. It’s a small community of smart people who get to build some amazing technology.

As a developer working on WordPress.com VIP, I will:

  • Never stop learning.
  • Not be afraid to ask questions.
  • Be open to feedback, constructive criticism, and collaborative discussion.
  • Be proactive in finding solutions, and not wait for someone else to resolve it for me.
  • Test and review my code before submitting for peer review.
  • Escape, sanitize, and validate all the things.
  • Be kind, courteous, and helpful to my fellow developers.

Two helpful links to get you started:

Writing custom WP-CLI commands

Occasionally, you may find you need to access or transform data on your site. If it’s more than a dozen posts affected, it’s often more efficient to write what we call a custom WP-CLI command (sometimes called a “bin script”). In writing a custom WP-CLI command, you can easily change strings, assign categories, or add post meta across hundreds or thousands of posts. However, with great power comes great responsibility — any small mistake you make with your logic could have negative repercussions across your entire dataset.

Here are some guidelines we’d encourage you to follow when writing a custom WP-CLI command.

Writing commands

Check out the great documentation on how to write a command. When you write commands for VIP, there are a few things to keep in mind:

  • You should extend the WPCOM_VIP_CLI_Command class provided in the development helpers, which includes helper functions like stop_the_insanity(). Do this instead of extending WP_CLI_Command.
  • Make sure you require the file that contains your new command in your functions.php file
  • Make sure you only include the command if WP_CLI is defined and true

Here’s an example of what might be in your functions file:

// CLI scripts
if ( defined( 'WP_CLI' ) && WP_CLI ) {
	require_once MY_THEME_DIR . '/inc/class-mycommand1-cli.php';
	require_once MY_THEME_DIR . '/inc/class-mycommand2-cli.php';
}

Once you’ve written your command and tested it throughly in your local environment, you can commit it to your theme.

We’re working on the ability for you to run custom WP-CLI commands yourself, but for the moment we need to run them for you. When you’ve done so, open a ticket with us with explanation of what you’re trying to accomplish. We’ll check, test, and run it.

Best Practices

It can be easy to make a minor mistake with your command that causes a lot of pain. We encourage you to do the following:

  • Comment well and provide clear usage instructions. It’s important to be very clear about what each part is doing and why — commenting each step of your logic is a good sanity check. Comments are especially helpful when something maybe doesn’t work as intended and we need to debug to figure out why.
  • If your command is calling wp_update_post() or importing posts, make sure to define( 'WP_IMPORTING', true ); at the top of the related code. This will ensure only the minimum of extra actions are fired.
  • Be as verbose as possible. While operating the command, we’re often asked for progress and estimated time to finish as running scripts in production often takes much longer than it takes in staging environment. The same applies to live run which typically takes much longer than the initial dry run. It’s important for the person running the command to know that something is happening, what’s happening and when the script will finish. Have an opening line in the script and a line for every action the command is performing. For instance:
    public function __invoke( $args, $assoc_args ) {
        
        //... dealing with args
        
        if ( true === $dry_mode ) {
            WP_CLI::line( "===Dry Run===" ); //let us know the dry mode is turned on
        } else {
            WP_CLI::line( "Doing it live!" );
        }
        
        //... defining $query_args and creating new WP_Query object
        //set variables for holding stats printed on the end of the run
        $updated = 0;
        $missed = 0;
    
        do {
    
            //let us know how many posts are about to be processed
            WP_CLI::line( sprintf( "Processing %d posts at offset of %d of %d total found posts", count( $query->posts ), $offset, $query->found_posts ) );
    
            // do stuff
    
            // let us know what's going to happen
            WP_CLI::line( sprintf( "Updating %s meta for post_id: " ), 'some_meta_key', $post_id );
    
            //save result of update/delete functions
            $updated = update_post_meta( $post_id, 'some_meta_key', sanitize_text_field( $some_meta_value ) );
    
            if ( $updated ) {
    
                //let us know whether the update was successful
                WP_CLI::line( "Success: Updated post_meta '%s' for post_id %d with value %s", 'some_meta_key', $post_id, serialize( $some_meta_value ) );
                $updated++; //count successful updates
    
            } else {
    
                //and provide us with some helpful debug info in case it was not
                WP_CLI::line( "Error: Failed to update post_meta '%s' for post_id %d with value %s", 'some_meta_key', $post_id, serialize( $some_meta_value ) ); //some values, eg.: WP_Error object should be serialized in order to print something meaningful
                $missed++; //as well as errors/skips
    
            }
    
            //free up memory
            $this->stop_the_insanity();
    
            $query_args['paged']++;
            $query = new WP_Query( $query_args );
    
        } while( $query->have_posts() );
    
        //let us know what's the result of the script
        WP_CLI::line( "Finished the script. Updated: %d. Missed: %d", $updated, $missed );
    }
    
  • Use the progress bar. The command might also take advantage of a progress bar class which is available:
    public function __invoke( $args, $assoc_args ) {
        
        //... dealing with args
        
        $posts_per_page = 100; //posts per page will be used for ticks
        
        //... defining $query_args and creating new WP_Query object
        
        //create new progress bar, provide number of all posts we'll be dealing with as well as a size of a batch processed before the first/next tick will happen
        $progress = new cliprogressBar( sprintf( 'Starting the command. Found %d posts', $query->found_posts ), $query->found_posts, $posts_per_page );
        
        $progress->display();
        
        do {
            
            WP_CLI::line( sprintf( "Processing %d posts at offset of %d of %d total found posts", count( $query->posts ), $offset, $query->found_posts ) );
            
            //...
            
            $progress-&amp;amp;amp;gt;tick( $posts_per_page ); //tick
            
            // Free up memory
            $this->stop_the_insanity();
    
            $query_args['paged']++;
            $query = new WP_Query( $query_args );
    
        } while ( $query->have_posts() );
    
        $progress->finish(); //done
    
        WP_CLI::line( "Finished the script. Updated: %d. Missed: %d", $updated, $missed );
    }
    
  • It’s a good idea to default your command to do a test run without affecting live data. Add an argument to allow a “live” run. This way, we can compare what the actual impact is versus the expected impact.
    A good way to do this is to do:

    $dry_mode = ! empty ( $assoc_args['dry-run'] );
    if( ! $dry_mode ) {
    	WP_CLI::line( " * Removing {$user->user_login} ( {$user->ID} )... " );
    	$remove_result = remove_user_from_blog( $user->ID, $blog_id );
    	if ( is_wp_error( $remove_result ) ) {
    		$failed_to_remove[] = $user;
    	}
    } else {
    	WP_CLI::line( " * Will remove {$user->user_login} ( {$user->ID} )... " );
    }

    If your code modifies existing data we will ask for a dry run option so that we can confirm with you that things are good

  • Check your CLI methods have the necessary arguments. WP CLI passes 2 arguments, $args and $assoc_args, to each command, you’ll need these to implement dry run options. You can take advantage of wp_parse_args for setting default values for optional parameters:
    $args_assoc = wp_parse_args( $args_assoc, array(
        'dry-run' => true
        'post-meta' => 'some_default_post_meta' //etc...
    ) );
    
  • If you’re modifying lots of data on a live site, make sure to include sleep() in key places. This will help with load associated with cache invalidation and replication. We also recommend using the WPCOM_VIP_CLI_Command methods stop_the_insanity() to clear memory after having processed 100 posts. If you are processing a large number of posts using the start_bulk_operation() and end_bulk_operation() class methods to disable functionality that is often problematic with large write operations.
  • Prepare the command for long runs. The vast majority of WP-CLI commands deals with lots of data on live site. The command should be prepared for processing those without exhausting memory and overloading the database. Make sure to call stop_the_insanity() method and include sleep() in key places (the sleep() will help with load associated with cache invalidation and replication.) Good start is to call $this->stop_the_insanity() after processing (updating, deleting …) 100 posts. Every command using get_posts() or WP_Query should also call $this->stop_the_insanity() after looping over 100 posts at max – this will allow the command to run without interruptions.
  • Prepare the command for restart. Even if the sleep and stop_the_insanity functions are in place, command might die in the middle of its run. Commands dealing with a lot of posts or other long-running commands should be prepared for restart. You might either design them to be idempotent (meaning they can safely be run multiple times) or provide operator an option to start from certain point, perhaps using an offset argument or other suitable mean.
  • Define all constants which are standard in WordPress for performed actions. For instance, if you’re writing an importer, make sure to define( 'WP_IMPORTING', true ); at the top of your subcommand. This will ensure only the minimum of extra actions are fired and will make the command faster.
  • Use WP-CLI::Error only if you want to interrupt the command. Using WP-CLI::Error will result in interrupting the command’s run. Sometimes you just want to know about the error and have it logged for further investigation or just for knowing what did not went as expected. Some “errors” are also not errors, but are expected (You don’t want to update post which does not meet certain conditions etc.). In those cases, you should be using WP_CLI::Line with custom debugging information as this won’t make the command to exit and stop further execution.
  • Direct Database Queries will probably break in unexpected ways. Use core functions as much as possible. WP-CLI loads WordPress core as well as your theme and thus makes all standard WordPress and your theme’s functions available to you in the command. Take advantage of those when possible by using direct SQL queries (specifically those that do UPDATEs or DELETEs) will cause the caches to be invalid. In some cases if a direct SQL query is required, only do SELECTs. Do any write operation using the core WordPress functionality. You may want to remove certain hooks from wp_update_post or do other actions to get the desired behaviour. In some rare contexts, a direct SQL query could be a better choice, but it must be followed by clean_post_cache().While we’re not allowing direct SQL queries for plugins and themes on our platform, for WP-CLI commands it’s sometimes better to do direct SQL query for two reasons: you want to prevent certain hooks from being triggered and/or WP_Query might be pretty expensive for what you need.When building your custom direct SQL queries, remember to properly sanitize the input as you’ll miss the advantage of core’s sanity checks. You should always use $wpdb->prepare method:
    global $wpdb;
    $wpdb->get_results( $wpdb->prepare( "SELECT * FROM {$wpdb->posts} WHERE post_title = %s AND ID > %d", $post_title, $min_post_id ) );
    

    When dealing with “LIKE” statement, use $wpdb->esc_like method:

    $like = '%' . $wpdb->esc_like( $args['search'] ) . '%';
    $query = $wpdb->prepare( "SELECT * FROM {$wpdb->posts} as p AND ((p.post_title LIKE %s) OR (p.post_name LIKE %s))", $like, $like );
    

    When updating posts with direct SQL queries, make sure to flush associated cache so the updates will be visible on your site before the cache expires:

    $wpdb->update(
        $wpdb->posts, //table
        array(
            'post_content' => sanitize_text_field( $post_content ) //data should not be SQL escaped, but they should be sanitized
        ), //data
        array( 'ID' => intval( $post_id ) ), //where
        array( '%s' ), //data format
        array( '%d' ) //where format
    );
    clean_post_cache( $post_id ); //clean the cache, else the changes would not be reflected until the cache expires
    
  • When in doubt, ask us!

FAQ

How do I modify all the posts?

Without a no-LIMIT query, it can be confusing how you would modify all your posts. The problem is that a no-LIMIT query just won’t work in most situations. If the query takes longer than 30 seconds, it will timeout and fail. The solution is use smaller queries and page through the results.

For example:

<?php

class Test_CLI_Command extends WPCOM_VIP_CLI_Command {

	/**
	 * CLI command that takes a metakey (required) and post category (optional)
	 * and publishes all pending posts once they have have had their metakeys updated.
	 *
	 * @subcommand update-metakey
	 * @synopsis --meta-key=<metakey> [--category=<category>] [--dry-run]
	 */
	public function update_metakey( $args, $assoc_args ) {
		$this->start_bulk_operation(); // Disable term counting, Elasticsearch indexing, and PushPress.

		$posts_per_page = 100;
		$paged          = 1;
		$count          = 0;

		// Meta key value is required, otherwise an error will be returned.
		if ( isset( $assoc_args['meta-key'] ) ) {
			$meta_key = $assoc_args['meta-key'];
		} else {
			/*
			 * Caution: calling WP_CLI::error stops the execution of the command.
			 * Use WP_CLI::error only in case you want to stop the execution. Use
			 * WP_CLI::warning or WP_CLI::line for non-blocking errors.
			 */
			WP_CLI::error( 'Must have --meta-key attached.' );
		}

		// Category value is optional.
		if ( isset( $assoc_args['category'] ) ) {
			$cat = $assoc_args['category'];
		} else {
			$cat = '';
		}

		// If --dry-run is not set, then it will default to true.
		// Must set --dry-run explicitly to false to run this command.
		if ( isset( $assoc_args['dry-run'] ) ) {
			/*
			 * passing `--dry-run=false` to the command leads to the `false` value being
			 * set to string `'false'`, but casting `'false'` to bool produces `true`.
			 * Thus the special handling.
			 */
			if ( 'false' === $assoc_args['dry-run'] ) {
				$dry_run = false;
			} else {
				$dry_run = (bool) $assoc_args['dry-run'];
			}
		} else {
			$dry_run = true;
		}

		// Let the user know in what mode the command runs.
		if ( $dry_run ) {
			WP_CLI::line( 'Running in dry-run mode.' );
		} else {
			WP_CLI::line( 'We\'re doing it live!' );
		}

		do {
			$posts = get_posts( array(
				'posts_per_page'   => $posts_per_page,
				'paged'            => $paged,
				'category'         => $cat,
				'post_status'      => 'pending',
				'suppress_filters' => 'false',
			));

			foreach ( $posts as $post ) {
				if ( ! $dry_run ) {
					update_post_meta( $post->ID, $meta_key, 'true' );
					wp_update_post( array( 'post_status' => 'publish' ) );
				}
				$count++;
			}

			// Pause.
			WP_CLI::line( 'Pausing for a breath...' );
			sleep( 3 );

			// Free up memory.
			$this->stop_the_insanity();

			/*
			 * At this point, we have to decide whether or not to increase the value of $paged
			 * variable. In case a value which is being used for querying the posts (like post_status
			 * in our example) is being changed via the command, we should keep the WP_Query starting
			 * from the beginning in every iteration. If the any value used for querying the posts
			 * is not being changed, then we need to update the value in order to walk through all the posts.
			 */
			// $paged++;

		} while ( count( $posts ) );

		if ( false === $dry_run ) {
			WP_CLI::success( sprintf( '%d posts have successfully been published and had their metakeys updated.', $count ) );
		} else {
			WP_CLI::success( sprintf( '%d posts will be published and have their metakeys updated.', $count ) );
		}
		$this->end_bulk_operation(); // Trigger a term count as well as trigger bulk indexing of Elasticsearch site.
	}

	/**
	 * CLI command that takes a taxonomy (required) and updates terms in that
	 * taxonomy by removing the "test-" prefix.
	 *
	 * @subcommand update-terms
	 * @synopsis --taxonomy=<taxonomy> [--dry_run]
	 */
	public function update_terms( $args, $assoc_args ) {
		$count = 0;

		$this->start_bulk_operation(); // Disable term counting, Elasticsearch indexing, and PushPress.

		// Taxonomy value is required, otherwise an error will be returned.
		if ( isset( $assoc_args['taxonomy'] ) ) {
			$taxonomy = $assoc_args['taxonomy'];
		} else {
			/*
			 * Caution: calling WP_CLI::error stops the execution of the command.
			 * Use WP_CLI::error only in case you want to stop the execution. Use
			 * WP_CLI::warning or WP_CLI::line for non-blocking errors.
			 */
			WP_CLI::error( 'Must have a --taxonomy attached.' );
		}

		// If --dry-run is not set, then it will default to true.
		// Must set --dry-run explicitly to false to run this command.
		if ( isset( $assoc_args['dry-run'] ) ) {
			/*
			 * passing `--dry-run=false` to the command leads to the `false` value being
			 * set to string `'false'`, but casting `'false'` to bool produces `true`.
			 * Thus the special handling.
			 */
			if ( 'false' === $assoc_args['dry-run'] ) {
				$dry_run = false;
			} else {
				$dry_run = (bool) $assoc_args['dry-run'];
			}
		} else {
			$dry_run = true;
		}

		// Let he user know in what mode the command runs.
		if ( $dry_run ) {
			WP_CLI::line( 'Running in dry-run mode.' );
		} else {
			WP_CLI::line( 'We\'re doing it live!' );
		}

		$terms = get_terms( array( 'taxonomy' => $taxonomy ) );

		foreach ( $terms as $term ) {
			if ( ! $dry_run ) {
				wp_update_term( $term-&gt;term_id, $term->taxonomy, array(
					'name' => str_replace( 'test ', '', $term->name ),
					'slug' => str_replace( 'test-', '', $term->slug ),
				) );
			}
			$count++;
		}

		$this->end_bulk_operation(); // Trigger a term count as well as trigger bulk indexing of Elasticsearch site.

		if ( false === $dry_run ) {
			WP_CLI::success( sprintf( '%d terms were updated.', $count ) );
		} else {
			WP_CLI::success( sprintf( '%d terms will be updated.', $count ) );
		}
	}
}

WP_CLI::add_command( 'test-command', 'Test_CLI_Command' );

Ready to get started?

Drop us a note.

No matter where you are in the planning process, we’re happy to help, and we’re actual humans here on the other side of the form. 👋 We’re here to discuss your challenges and plans, evaluate your existing resources or a potential partner, or even make some initial recommendations. And, of course, we’re here to help any time you’re in the market for some robust WordPress awesomeness.