Thanks everyone for helping me think more clearly about this matter. I've got a lot more thinking to do yet.
Some quick thoughts.
The "evaluation" is an overall evaluation of the clinic's approach, not of the individual patient. Prior to using any assessment, it would be tested over a period of time with ME patients, and some sort of measure of variability obtained. Minor changes like a pyjama day or not would vanish under those natural variations. After all, if a clinic is claiming success, it needs to be well above any natural variations.
The reason for only asking for details for one day is to reduce load and aim at accuracy. There would have to be some sort of spiel about perhaps choosing a couple of "normal" days in advance to complete the clock chart, and perhaps use the second day as the one to answer the questions.
I did think about electronic methods, such as fitbits etc., but decided firstly that they were too inaccurate, and secondly that they did not easily select energy-sapping tasks, such as driving or trying to arrange house insurance: the emphasis was on physical movement. One problem that reared its head when I was trying to work out why PACE had ordered so few trackers for their 640 patients was the sheer complexity of getting them returned, the data uploaded, and ready for the next patient. When you start to factor in weekends, holidays, delays in the post etc. it gets very messy.
It's interesting, Jon, that you were able to discuss the difficulties of assessing success in teaching maths with your daughter: I had envisaged having to explain the complexities. I also think that Peter's list is very relevant. But surely the point is not so much that we have to measure "success" with such a difficult condition, in which normal ideas of success are inappropriate, but that we need to measure effectiveness, for which Peter's list is very important. I think that there could be two important outcomes from such an analysis. The first is that clinics stop claiming major success in treating the condition, and the second that unhelpful practices or attitudes are highlighted.
As head of maths, even knowing that the analysis was difficult and potentially misleading, I spent a lot of time on it each year. It helped me pinpoint areas that needed improvement. Obviously it had to be combined with other sources of information and professional knowledge, but it kept me focused on the task of making the department as good as it could be. Do specialists running departments have that pressure and do they have suitable methods of looking at that?