Windows Performance Toolkit with Windows Vista , 7 , or 8

Easily half of the following is stolen wholesale with mad props from this video –50 Performance Tricks to make your HTML5 Application Sites Faster. We’re going to highlight the Windows Performance Toolkit and how it can help us quantify and isolate performance bottlenecks with our websites. (Most of the following would apply to Windows Store apps, Azure sites, and MVC/ASP websites.) And this blog was excellent as a step by step walkthrough.

The presenter, Jatinder Mann, went to five common travel sites – all doing roughly the same thing, but in very different ways. Which one below was the slowest?

You would think it would be #1 – at 77K lines of javascript. But in fact it’s site #2. Even though it doesn’t have the highest # of any particular category! (Site #4, by the way, was the slowest.) So, you can’t just go by the javascript linecount or imagecount in determining performance.

In fact there are three factors impacting web performance – 1. Network Performance, 2. CPU, and 3. GPU. Since often #1 and #3 are beyond our control – how do we drive down the CPU time? Well, let’s look at the standard CPU processing cycle:

Using the Windows Performance Toolkit, you can see – each site has a different bottleneck:

Site #1 above is network-bound for example, #3 – is Javascript-bound. There’s six principles to remember in optimizing web performance:

1. Quickly respond to network requests.

  • Avoid redirections. 63% of the sites on the web use redirections – they could gain a 10% improvement in performance, just by not using redirects! (250msec on avg for this.)
  • Avoid using the refresh tag. Meta-refresh is an inefficient pattern.
  • Use Content Distribution Networks. For a browser client in Georgia accessing a webserver in Portland, that could be a 250-300 msec delay. Here, cloud-based server farms like Azure or Amazon really come in handy.
  • Maximize use of connections – and max concurrent resources.


2. Minimize Bytes Downloaded

  • Below you can see the payload for an average web request. Images make up – by far – the bulk of the issue. Interestingly enough, this doesn’t change much if we’re using AJAX heavily (such as w/AJAX-heavy sites like gmail.)
  • Take a look at your resources downloaded. The avg download for a website is 777K. The vast majority of these are dead weight.
  • Gzip your contents! Then the server will send a encoded version of the contents. This usually is built-into the web servers themselves – but confirm it with the Windows Performance Toolkit.
  • The Windows Store can package resources directly in the AppX package itself. For the rest of us – we use HTML5 AppCache. This pulls your resources into the app cache so you’re not hitting the network.
  • When you initiate a request, make sure an expiration date is specified. (Now, what about dynamically changing content – news feeds, logos? Use an if-modified-since request modifier.)


3. Efficiently Structure Markup

  • Always load your web pages in the latest browser mode.
  • Always link stylesheets at the top of the page.
  • Avoid embedded and inline styles.
  • Always put JavaScript at the end of the file. Don’t put it in the header – especially with older browsers!
  • Avoid duplicate code (52% of pages have 100 or more lines of duplicate code!)
  • Standardize on a single framework. Find a library you like and go with it!
  • And, segment your styles by page.


4. Optimize Media Usage

  • Have a standardized naming convention for images and stick to it. For example, for the code below, the browser will attempt to download each – individually. Even though they’re the same file!
  • Minimize the number of images. On average a site downloads 58 images. If it’s over 20-30, alarm bells should be going off.
  • Use image sprites where at all possible –this especially improves GPU.
  • On images – everybody seems to have their favorite. Jitendra favors PNG’s for all except photographs – website elements and logos, etc.
  • Use native resolutions. So, for example, if the image is 500×400, don’t use a resize to 50×50 in the browser – take the extra time to resize it yourself in a image editor.
  • Replace images with CSS3 gradients, incl border radius and Css3 transforms (move, rotate, etc)
  • If you’re using HTML5 video, specify an image preview… otherwise the browser has to guess and pull it off the site. He strongly recommends HTML5 video over Flash/Quicktime/Silverlight from a pure performance perspective. These compete for resources with the web app itself… so it slows down page rendering.
  • Proactively download rich content/images and put it in AppCache.


5. Write Fast JavaScript

  • Stick to integer math. Once upon a time this was very slow – now it’s blazing fast (for example 200 msec compared to 40 msec for a C++ app). The one exception is floating point calcs, which can go up to 1600 msec. So use math.floor or math.ceil functions… there’s ways around this.
  • Make sure to minify your javascript:
  • Initialize your javascript on demand.
  • Minimize dom interaction. And use the built-in DOM methods like .firstchild or .nextsibling.
  • Use innerHTML to construct your page – this is 10-15X faster .
  • Have no more than 1000 elements in your DOM.
  • JSON always is faster than XML. And use native browser JSON elements.
  • Use regular expressions carefully. If you can, use string concats – much faster.
  • If at all possible – asynchronously load your javascript with the async keyword.

6. Know What Your Application is Doing

  • Understand Javascript timers. A quick way is to align your timers with your display frame. For a typical 60mhz monitor refresh rate, that’s 16.7 msec. So, do yourself a favor – look at all of your timers and set the timetuts to 16.7 msec.

Using the Windows Performance Toolkit on a Sample Site

First I pulled down the WPT bits from (Note that this will run on Vista/7 as well as Windows 8 – ignore all the Windows 8 banners.)

Below I selected just the Windows Perf Toolkit and WAT pieces.

Then I ran the following steps:

  1. Open up an IE site., whatever.
  2. Log into cmd as an admin
  3. From that command prompt execute

    xperf -start mytrace -on PerfTrack

  4. Navigate to your site IE to your target site and wait for five seconds after the page appears to be visually done loading and the browser reaches a quiescent state.
  5. Stop the trace by executing

    xperf -stop mytrace -d mytrace.etl

  6. Now launch the Windows Performance Analyzer, part of the WPT toolkit, by executing

    xperfview mytrace.etl


A more sophisticated script can be used by naming the following trace_detail.cmd and running it – again using Admin privileges:

@echo off
set session=mytrace
if not @%[email protected] == @@ set session=%1

xperf -on Latency -f %session%kernel.etl -start %session% -on Microsoft-IE+Microsoft-IEFRAME+Microsoft-PerfTrack-IEFRAME+Microsoft-PerfTrack-MSHTML -f %session%user.etl

if not errorlevel 0 goto :eof

echo Performance Trace started.
echo When done with profile actions,


xperf -stop %session%
if not errorlevel 0 goto :eof
xperf -stop
if not errorlevel 0 goto :eof

xperf -merge %session%user.etl %session%kernel.etl %session%combined.etl
if not errorlevel 0 goto :eof

start xperfview %session%combined.etl

Look at all the good stuff that this exposes. See below – yes this is a site that takes more than 1 minute to load!!! – we can see our trouble starts almost from the get-go. Within five seconds we’re pegging the CPU’s at 50%.

Above, yellow/red are reads – and blue is writes. As you can see there’s a lot of write activity dominating the first 40 seconds or so, followed by reads.

A really cool thing was the ability to zoom in. At about 35 seconds in we started to see a real spike in cPU usage. Zooming in on 10 seconds or so pointed us to our likely culprit – and the source of the read delays – Amazon cloud drive storage access.

It’s extremely easy to zoom in on the start/stop indicators and overlay your own chart/graph. You can hover your cursor over any of those data points and get an idea of what process is starting/stopping. Sweet, sweet eye candy.

Browsing the Summary Table gives you the tabular data you’ll need to check start/stop times on any particular process. This is how Jitendra put together those fancy-schmantsy looking pie charts for his presentation.

Recording a stack walk would follow the format mentioned in this blog article:

xperf -on PROC_THREAD+LOADER+PROFILE -stackwalk profile
rem Your scenario goes here…
xperf -d mytrace.etl

Anyway, all good stuff. I’m excited about Xperf and Windows Performance Tracing – along with Fiddler this is an excellent diagnostic tool. Combined with automated test projects with timed ceilings, this is a great way to catch performance problems early – and nail down inefficient libraries/processes in our web apps.

Leave a Reply

Your email address will not be published. Required fields are marked *