I'm brain dead at the moment, and almost catching up, but here are the notes I made at the time. There were two different ways of analysing the results of the step test: one was more objective, the other subjective. The big problem, of course, was that they were self-paced, and most of the potential comparison reference studies were not.
In case anyone is interested, here's how far I have got. There are graphs for each, and both are flat and around the 20% mark. In both, the SMC group is marginally best. There are two calculations based on the step test.
The self-paced step test of fitness involves timing participants while they do 20 step-ups and step-downs (of 2 steps each), as well as gathering resting and post-exercise heart rates.
A measure of fitness was calculated as [Body Mass (in kg) x 9.81 x total step height (in metres) x 20] / time (in seconds)
(That part converts body mass into the force of gravity, which the patient is working against, in Newtons, multiplied by total lifting height to give work done in Joules, divided by time in seconds, to give power used in watts.)
then it is divided by %HHR=(highest measured HR - resting HR)/(predicted max HR - resting HR) x 100
In other words it is a measure of a patients actual power output in comparison with how much their heart had to work at it. Now what I can't find is any other set of referenced figures apart from a study by Petrella on older folk, but behind a paywall. I can't see any other way of working out what a healthy score would be.
The second is a measure of perceived exertion, which uses the Borg scale from 6 (very, very light exertion - almost resting) to 20 (utter exhaustion, where 19 on the scale is extremely strenuous, probably the most strenuous most people ever experience). They adjust it to give a score for physiological work done, by dividing it by the post-exercise heart rate as a percentage of the maximum predicted heart rate (which is 220–age for men, and 206–(0.88xage)) for women, then multiplying by 100 to give it as a percentage.
The Borg scale runs from 6 to 20 to give a rough idea of heartbeats per minute divided by 10 for a healthy adult at that level of work (i.e. 60 to 200). If a patient aged 40 pushed himself hard (perception of say 16) and had a final heart rate of 144, he would score 20% (if my maths is right). So it measures how hard the person thinks he is going, in comparison with how hard his heart is pumping. A high score would suggest that his perception was worse than his heart rate suggested, a low score would mean that his heart was working much harder than his perception of the task. By not subtracting the resting heart rate, it is much more a measure of heart rate than of increased heart rate, so it is affected by people having a high resting heart rate.
I thought I had found a good reference with Anderson but it was only for 17 patients (with bronchial problems). There also was a reference to Karloh, again only with a small group, but of healthy and bronchial patients. But it was for the Chester step test which is different.