MetaStockTools.com


Trading System Evaluation/Development Tools





Synopsis



The trading system backtesting/evaluation/development tools outlined in this essay, were first developed with testing Relative Strength Comparison (RSC) strategies in mind, but they also apply to developing and backtesting any other trading methodologies.


RSC can be a powerful tool in a trader's arsenal. It incorporates market fundamentals with technical analysis, for a sound basis for any trading strategy.

At its most basic, RSC is a direct comparison of price to a related composite index.
The fundamental forces that affect a whole industry sector are often the very same fundamentals that affect the price in a related security. Thus, going against prevalent industry fundamentals in any trading strategy is usually a recipe for early capital depletion.

There are many RSC strategy flavors, such as the popular RSC "ranking" strategies. This essay will deal with one of the lesser know varieties of RSC-based strategies, and provide a sound platform for building and testing a profitable trading strategy.

However, before any trading strategy can be built and backtested, there needs to be a uniform and objective way of measuring performance in place - i.e. profit/risk.


This RSC article appears in two parts:

The first part will deal with measuring trading system performance and risk, in a purely objective way so that a true benchmark can be used for measuring performance. This is an essential part of any system development, yet (along with the subject of risk) it very rarely gets a mention.

Once this measuring yardstick is in place, the second part deals with an unusual RSC-based profitable trading strategy, and its relative performance metrics.
This strategy can be applied to both daily and weekly time-frames.







Testing... 1,2,3



Developing and testing trading systems involves constant backtesting to determine if development is going in the right direction. With a sound and balanced testing strategy, any trading system's profitability can be determined quickly. This is an essential part of system development.

The following two suggestions are based on personal experience (both personal and client) with system development:

1) Avoid optimization wherever possible.
Invariably, optimization almost always leads to curve-fitting, a condition where system rules are crafted to fit past results, with the unlikely possibility of optimized results repeating in the future. Curve-fitting can be a useful tool to analyse and smooth data for statistical purposes, but is the wrong tool for more complex system development.

2) I haven't used MetaStock's System Tester since 2001-2002, preferring to use profit indicators (such as shown below) instead. Instant equity or profit indicators are much easier to use, and they provide reliable and repeatable results without a myriad of parameters for the user to possibly get lost in.


So, for now, let's get down to the business of designing a good and balanced way of measuring system performance.







Risk



Profit without risk makes as much sense as day without night.

To illustrate this point, let's take a look at these two extreme examples:

Strategy A offers an average annual return of 5%.
Strategy B promises a return of 50,000%.

Which one would you prefer?

Obviously it is not possible to compare these two strategies directly without measuring their relative risk. Strategy A is a basic guaranteed-return, almost risk-free investment account, such as offered by any bank. Strategy B involves buying a lottery ticket, with a correspondingly much higher risk.

Knowingly or otherwise, most investors adjust their capital to perceived risk exposure.
In the extreme example above, only a crazed gambler would allocate 100% of his available capital to a lottery ticket. Most of us would probably be well aware of the consequences of the risk involved, and would most likely allocate a very small portion of our capital to it, whereas we would be relatively comfortable with allocating 100% of our capital to a guaranteed investment.

Therefore, by adjusting their exposure to risk (using position sizing and/or restricting capital allocation), most traders actually normalize risk in some way or another.

Leaving these two extreme examples aside, all trading strategies should also be normalized to risk before direct comparisons are made. Any attempts to measure performance otherwise, can lead to complications and difficulties when it comes to assessing what is truly profitable and what is not.







Defining risk



Before we can normalize returns to risk, we must ask: what is risk?

Risk and drawdown basically measure the same: the possibility of losing one's capital. Historical drawdown based on an appropriate test period is probably the best way of determining future risk.

Risk is traditionally measured using Standard Deviation, such as used in the Sharpe ratio formula.
Sharpe ratio and other popular methods of measuring risk don't seem to take into account real-world risk. They are an industry standard as used by managed funds, but perhaps they under-report true risk. It is possible to show a reasonable Sharpe ratio which may include a large drawdown in the trading record.

Unless we have access to a constant stream of trading capital, drawdowns tend to be "sticky". That is to say, that a 100% drawdown is fatal & final to one's capital, regardless of the amount of smoothing/massaging done to historical risk using Nobel prize winning risk formulae. ;)


Some traders measure risk annually or even monthly. Unfortunately drawdowns tend to be either partially or fully cumulative, so even a low monthly drawdown can be carried over and add up from month to month to a capital-breaking major drawdown.

Smoothing or isolating yearly/monthly risk leads to the mistaken belief that trading is a relatively safe pastime. Funds tend to do this, perhaps deliberately in order to present a safer picture to their clients.


There are many complex ways of measuring risk, but a straight-forward measure of drawdown is the best indicator of potential risk ahead.

Maximum historical drawdowns (rather than monthly/annual) should always be used as a yardstick when comparing returns. If a past maximum running drawdown period brought losses of 40%, then this amount of possible future risk should be considered when allocating capital to that particular strategy.







Measuring profitability



Profit Long %


{ System Profit Long % - fixed trade size - v2.2
Basic code - Entry/Exit on Close of signal.

©Copyright 2005~2006 Jose Silva.
The grant of this license is for personal use
only - no resale or repackaging allowed.
All code remains the property of Jose Silva.
http://www.metastocktools.com }

{* Entry Long formula/reference *}
{ Contrarian breakout system example
- do not trade! }
entry:=C<Ref(LLV(C,21),-1);

{* Exit Long formula/reference *}
exit:=C>Ref(HHV(C,10),-1);

{ User inputs }
cost:=Input("Total Transaction costs
Brokerage + Slippage) %:",0,100,.2)/200;
plot:=Input("plot: [1]%Profit, [2]Signals",1,2,1);

{ Trade Binary & clean Entry/Exit signals }
init:=Cum(IsDefined(entry+exit))=1;
flag:=ValueWhen(1,entry-exit<>0 OR init,entry);
entry:=flag*(Alert(flag=0,2) OR entry*Cum(entry)=1);
exit:=(flag=0)*(Alert(flag,2) OR exit*Cum(exit)=1);

{ Profit % curve }
EntryVal:=ValueWhen(1,entry,C*(1+cost));
Profit:=C*(1-cost)/EntryVal-1;
ProfitPer:=(flag*Profit+Cum(exit*Profit))*100;

{ Plot in own window below chart }
If(plot=1,ProfitPer,entry-exit)




Click on charts to enlarge




This basic MetaStock profit % indicator plots a profit/loss based on a simple (but usually unprofitable) strategy. It also includes (often-ignored) transaction costs, and can also show us drawdowns and profit curve smoothness at a glance.







Measuring lowest profit curve point




Lowest(Fml("Profit Long %"))






The above sample profit curve was taken from a profitable trading period. It will probably continue to show an 11% drawdown for the foreseeable future, as profits would need to fall more than 38% from peak levels in order for a larger drawdown to appear.

Using this common method of measuring drawdown, it is possible to have a historical (incorrect) maximum equity drawdown of 0%. A system's large initial profit can sometimes translate into a max equity drawdown of 0%, even if it doesn't make another penny in profit for years to come and actually loses all profits since the initial trading period.

This does not mean that the strategy is risk-free by any means. A less-fortunate trader may decide to begin trading at the peak of the strategy's profit curve (such as the peak of the NASDAQ bubble), with nothing but losses to follow instead.

It is my strong view that we should take into account the worst possible historical drawdown when assessing probable future risk.







Measuring maximum peak-to-trough profit loss

(true historical drawdown)



profit:=Fml("Profit Long %");
Highest(Max(Highest(profit),0)-profit)






Zooming in on the first trade, it can be clearly seen that if the entry had been delayed by just two days, we would have suffered a 60% larger drawdown for that trade - i.e., the historical drawdown would now be 17% instead of the previously-measured 11%.

Drawdown (i.e., the risk of losing all trading capital and going broke) should be measured from equity peak to trough, and the largest historical peak-to-trough drawdown should then be used as the benchmark for possible future risk. After all, there is always the chance that the trader begins trading at the peak of the system's performance, and is looking ahead at a major loss period.

So now that we have a valid definition of risk - what next?







Adjusting exposure to risk



Let's take these two theoretical strategy examples:

1) Trading strategy A averages 20%pa net profit with a maximum historical drawdown (capital peak to trough) of 50%.

2) Trading strategy B averages 10%pa with a max historical drawdown of 20%.

Directly comparing strategy profits A to B, would result in the mistaken view that A is twice as profitable as B - whereas in fact, a potential trader (assuming of sound mind) would naturally scale back position size when trading the riskier system A. Decreasing position size will then also result in reduced potential profits for that strategy.


So, normalizing A and B profits to a common maximum historical risk of 20% (downsizing capital or position size in A to match B's risk), would eventually equate with:

1) Trading strategy A now averages 8%pa with a normalized max historical drawdown (peak to trough) of 20%.

2) Trading strategy B averages 10%pa with a max historical drawdown (peak to trough) of 20%.

Therefore, after normalizing for risk, trading strategy B actually outperforms trading strategy A in the real world.







Comparing apples to apples



Buy & Hold is the best yardstick to compare any strategy against a benchmark.
Traditionally, traders avoid comparisons to Buy & Hold because of that strategy's large interim drawdowns.
Again, by normalizing Buy & Hold above-average drawdowns, other risk-normalized strategies can be compared directly to it.

When it comes to backtesting, survivorship bias is a very pervasive and thorny issue.
Survivorship bias comes about because any current list of stocks only includes successful securities in it.

It is practically impossible to remove survivorship bias from pre-screened lists, as often these lists are partly updated at irregular or random intervals between major updates.

Even when no pre-screened lists are used, as when using the current list of securities in any market, the survivorship bias can be quite strong - e.g., often company failures such as Enron or HIH (Australia) are not included in backtesting, resulting in above-average or skewed performance statistics.

Benchmarking any strategy, whether it is based on pre-screened lists or otherwise, can be faithfully accomplished by risk-normalizing profits, and comparing it to a risk-normalized Buy & Hold trade for the same period. Annualizing these returns (%pa) will then give us the real picture of the strategy's true worth.







Summary



When comparing any prospective trading strategy performance, one needs to take into account the total capital necessary to trade it, and/or adjust strategy returns appropriately to a stable risk benchmark. This is basically what risk-normalized returns are all about, leveling returns to a common, comparable risk.

A thorough and valid risk-normalized testing procedure, necessary for valid/fast system development, requires the following:


1) Profit formula

Ideally, this should be plotted as an indicator below the chart, such as with Roy's famous Trade Equity indicators.

Profit indicators plot an instant broad picture of not only the system's profit for that chart, but also its drawdowns and inactive periods, and the visual relationship between them.

Keeping in mind that eventual profit is related to actual risk, ideally we should be looking for the smoothest profitable outcome, rather than the largest profit.

It is also wise to check for performance on less-than-ideal market situations, such as trading ranges and downturns.

Is the system profitable due to one or two lucky trades?
How does it perform in market range/downtrend periods?
What if we change one entry/exit rule - does it improve the smoothness of the profit curve?



2) Inclusion of transaction costs

Brokerage and slippage/spread are very important components of any trade, and they become increasingly important as the trading rate increases and/or position size decreases.

For example, it is not possible to directly compare short-term trading strategies with longer-term ones, without also including full entry & exit transaction costs. Very active short-term trading strategies usually accumulate a larger transaction cost total, impacting profitability to a greater degree.



3) Normalization of exposed capital (or profit) to a tolerable risk level

Adjusting capital/profit levels to a historical, say, 30% drawdown, allows the direct comparison of very different trading strategies, such as Buy & Hold vs intraday trading.



4) Comparison of system profit to a stable benchmark

Is the system profitable by itself, or is its profit really the result of favourable market conditions?
Are we unknowingly using market survivorship bias, confusing it for system performance?
A test for Buy & Hold over the same period, adjusted to a common risk, will tell the true story.



5) Annualization of profits

Direct system comparisons also require profits to be measured on an annual basis (x%pa).

This is necessary because there may be inactive periods within the security's test period, which in MetaStock means that the testing period may eventually stretch back longer than anticipated. 1000 bars of security A may take us back four years, but the same numbers of bars in security B, with an inactive period of two years in between, effectively means that the test period is six years for that stock.

Annualized profits (only after the first year, to avoid skewing of short-term profit) takes care of data period disparities.



6) Backtest over a broad universe of securities

This is necessary to obtain a broad view of system performance.
Ideally, a minimum of 200 securities should be used in a system performance exploration.

The annualized, risk-adjusted profit indicator is simply run in a MetaStock exploration column, and the median (midpoint) result is selected by the following process:


a) Click on the profit Long % column header to rank results by profitability:




b) Select the middle profit % result, so that clicking again on the same column header
(and reversing the profitability order) does not change the position of the selected result:




Median profit results from explorations using 200+ securities, are not as affected by extreme results as they would be by adding & averaging same results. Extreme results (outliers) tend to skew averages, whereas the midpoint in a series of results (median) doesn't suffer from the same problem unless the data is random.


Please note that this is not the same as portfolio backtesting, such as used by TradeSim.
Portfolio backtesting should be performed after the trading system is developed.



Where can I find these system development/backtesting tools?


All the above tools are available with the MACDH Divergence and URSC v3.0 kits.








Coming up next...



In the next & final part of this system-development article, I will be offering an unusual but profitable (and backtestable) RSC-based strategy, along with its risk-normalized profit comparison to a Buy & Hold benchmark.

The risk-normalized performance of these strategies will be measured (and compared to Buy & Hold), using the system development tools described here and available in the URSC & MACDH Divergence kits.

The kits' system-development tools include:

• Profit indicators and explorations.

• Risk normalization variable parameters, which allow rapid and valid comparisons between systems with markedly different risk profiles.

• Buy & Hold option for the same trading period, useful for comparing system's performance to a risk-normalized benchmark.

• Annualized returns, necessary for comparing strategies based on different backtesting periods.



So now, let's go to part two of this essay, and develop a sound basis for a trading strategy with the aid of these useful backtesting tools.









©Copyright 2006 Jose Silva

Except for personal use, no part of this text may be reproduced in any form or by any means without the written permission of the author.






Last updated on 21st August 2006