|
There is some, albeit very little, substance for the problem's existence.
It is to do with the fact that real time clock chips are just simple counter devices, where seconds reaching 0 (after 59) causes minutes to count up one etc... so during the time at which the update 'ripples' through the counter stages the time and date is technically unstable. For example, the minutes may just have gone to zero, but the hour has not been updated yet. However, clock chips have two ways of preventing this instability from upsetting software. They either copy the counters into a set of latches once they have updated, so that there is never any instability (called 'buffering') or they set a binary 'flag' which the software can check to see if it is safe to read the time and date counters.
Crouch-Echlin affects only those systems that use the flag mechanism to read the counters. They argue that in the 146818 real time clock chip (used in most desktop PCs) there is a 244us (micro-second) period from the flag changing state to indicate 'unstable clock' during which the time is still stable, and that if the flag is read (a) just BEFORE the flag changes so that the software thinks the time is stable AND (b) it takes MORE than 244us to read all of the counter registers then the time and date might be wrongly read.
They argue that PCs do not take more than 244us to read, but a BIOS patch which introduces extra instructions may result in this not being the case. It is a quite plausible failure, until you do a few calculations. We figure that on a 20Mhz PC (v.slow) you would need to introduce more than 500 instructions to cause the problem. Not likely! A faster machine needs even more instructions, and what is more, the instructions need to be introduced between the counter registers being read. A typical bit of code will read all the registers, and then figure out what needs to be done, so there is never going to be a problem regardless of how many extra instruction cycles are added. On PCs the real time clock chip is read only once, when the PC starts up, so even if it were a genuine problem it would only occur extremely rarely.
The conclusion from our joint investigation is that only very old PCs could be affected (old 286 and early 386 machines), and even then, extremely rarely and only if the BIOS patch has been written abominably badly. In short, a totally unlikely scenario. Embedded systems use a variety of different real time clock chip types which lessens the possible problem, and are in almost all cases unlikely to be affected for the above reason.
The Couch-Echlin Effect only affects PC-based systems as they are being switched on and initialised. Most embedded systems are rarely, if ever, switched off.
Following the on-line publication of the above article Mike Echlin entered into the following dialogue with Patrick Bossert about the effect of Crouch-Echlin (time-dilation) Effect on embedded systems:
[Mike Echlin] I have again returned to your pages to read your information. I am impressed, this looks real good. I especially like your step by step analysis procedures.
About your Crouch Echlin page.
You say "It is a quite plausible failure, until you do a few calculations. "
In my opinion you may need to do some more calculations. Most people who do speed calculations and speed timing on PCs for RTC access do so after boot. But the time/date is read from the RTC initially during POST, which as you know has the computer in a much different state then after boot. For instance, the code being run is being run from ROM, and on a 286 this is a 20 fold difference in speed then when the same code is run from RAM. On a Pentium this speed difference, just for ram vs. ROM alone, can account for a 100 fold difference in speed of the code being run. (so 50 uS becomes 5000 uS real quick.) But Ram vs ROM is not the only difference, there is also the isa bus vs. the faster PCI bus on Pentiums, (the ROM is on the isa bus, and during post the PCI drivers are not yet loaded....) basically when post is running your Pentium computer is a running post code is a 200 MHz machine, running in real mode (8 bit, 8mhz) on a 8mhz 8 bit bus. So any timing test done in "regular" mode has no bearing on what the computer is actually doing during post.
[Patrick Bossert] You are absolutely correct in this matter - we have used 20MHz as a basis for our calculations whereas the clock rate we should have been working from is clearly 8MHz. This means only 200 instructions or thereabouts could make a very big difference to the stability of the clock data being read. It also means that the real machine speed is almost immaterial. I have to agree with you.
However, if we look at embedded systems applications, the C-E effect can only appear if the extra instructions are executed in between certain RTC registers being read. If the extra calculation is performed at the end of the register read cycle then it is perfectly safe. My area of expertise is in Embedded systems, where I have designed a number of different industrial process and security controllers, and I have never seen a case of embedded systems code where C-E effect could occur, as the registers are always read together. I have also seen code excerpts from a large number of systems tested with the Delta-T Probe and they confirm this view. I have used the following timing schematic (your email viewer needs to display this in a proportional font like Courier New for the bits to line up) to illustrate the problem:
<-----244us----------->
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ S M H D M Y W
=> All data is read OK
The question then is: where are the extra instructions added? We have not seen code from one single embedded system which does not read RTC registers consecutively and only then do any calculations on the date values. Most RTCs store their registers in consecutive locations, and any code written to read them will usually read the smallest (i.e. seconds or tenths of seconds) first. I have seen a number of bits of code which read a register value from the RTC, and only go on to read the next register if the value has reached wrap-around point. e.g. if seconds reach 00 then the code reads minutes etc. The following scenario based on the date window being calculated in the middle of the RTC register read sequence is very unlikely in my opinion, due to the order in which the registers are read (even for a PC BIOS), but would clearly cause the C-E effect to occur.
<-----244us----------->
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ D M Y <extra code> S M H W
=> Hour and day of Week may be wrongly read
This is not really applicable to embedded systems as they tend to be rather more simple in their peripherals, but could the section on a PC be due to an NMI or DMA being serviced? (I guess the BIOS would disable IRQs during the RTC access) as an NMI or DMA will interrupt the clock reading process and return at a later time (assuming a long bit of activity) when the clock may well be unstable:
<-----244us----------->
------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 'OK' flag
-----------------------------------XXXXXXXXXX Data
Reads: ^ S M <DMA from HardDisk> H D M Y W
=> Hour, day of Week, Day, Month and Year may be wrongly read
This would, of course, mean that the problem would be equally likely to occur both pre- and post-millennium. I suspect it is a combination of both effects that may be the real reason for C-E effect.
[Mike Echlin] Yes newer computers are faster then older ones, this makes the occurrence of the effect less often, but does not eliminate it in newer computers. (we had originally thought as you do, faster machines will be fast enough that the effect won't happen, and we calculated that the cut off would be 66 MHz, but we have seen, and have had numerous reports of machines of much faster speeds showing this effect.)
[Patrick Bossert] I agree that the speed is actually not a significant factor, certainly where the BIOS ROM sits on an ISA bus.
[Mike Echlin] (Besides a non-buffered RTC the other ingredient needed is the accessing software or firmware has to have some type of error that allows it to access the RTC while the data is bad. Compaq has identified and confirmed 1 type of problem that can cause this, a bios that ignores the UIP bit. This may be many problems that allow the same symptoms to happen.)
[Patrick Bossert] Ignoring the UIP bit is clearly just bad programming. You would expect a PC BIOS which does this to have the occasional glitch in reading the clock on start-up. This makes the problem as likely to occur both pre and post-millennium.
[Mike Echlin] You state "Embedded systems use a variety of different real time clock chip types which lessens the possible problem, and are in almost all cases unlikely to be affected for the above reason." If the RTC is non-buffered, (and most are not), there is a chance the machine will be affected on random access of the RTC. This is not just a PC problem, our research has mainly been on Ps for two reasons, ease of testing, and availability. But we have reports and test results from other architectures.
[Patrick Bossert] I agree that any non-buffered RTC may give rise to the same problem, but the way embedded systems are programmed to use their RTCs has, in all cases that we have seen to date, avoided the problem by implementing date windowing code after all the clock registers have been read. We have been able to see this by using the Delta-T Probe to capture the code using the RTC register access as a trigger. We have even looked at a few (very) old BIOS routines in early PCs to see how they read the RTC, and we found that they always read all the clock registers together.
Another factor is that a significant percentage of embedded systems use serial RTCs, which by definition produce stable data.
[Mike Echlin] You also say, " Most embedded systems are rarely, if ever, switched off. "
This is largely true of the manufacturing world yes. But not all embeds are in manufacturing. Medical embeds are switched on and off all the time. The embeds in automobiles (and some car companies use 286s for their cars) are switched on each time you start the car. the embeds I design at my "real" job are for data acquisition and analysis in nuclear plants and nuclear facilities. They are also switched on and off daily, even multiple times daily. These are just some examples of embeds that are not on continuously.
[Patrick Bossert] This is true enough, and when you look closely you find that most embedded systems do not just read the RTC on start-up like a PC, they poll it all the time, or get the time on the basis of an interrupt which notifies the processor that the time has changed. Assuming the code was written in such a manner as to be unstable as a result of extra date windowing code being executed, any glitches would be momentary at best, and not persist. My experience has been that date windows are never implemented mid-way through the register reading process so the problem never surfaces. There is clearly an application of the Delta-T Probe to look at the code to verify this if it is cause for concern in a particular system.
Yes we will see this effect in embeds, maybe even more then in desktop machines, because embeds are generally based on tried and true designs, and they have a longer life span then desktops do. these 2 reasons combined assure us that there will be a lot more embeds out there that are older, slower and more likely to have a non-buffered RTC, and so more likely to be affected, and more likely to show the affects more often.
We have yet to see a single instance of the C-E effect occurring in embedded systems, but I cannot rule out the fact that it is possible for the above reasons. If something is so critical that momentary clock variations may halt a process, then with the Delta-T Probe we have the tool to test it and find out whether it is an issue.
[Mike Echlin] My experience with embeds is, by definition, different then yours. I like your comments, but it will take me a few days to digest them as I am busy right now getting a presentation together on Y2K.
[Patrick Bossert] Further to my last email here's an example embedded system code sample of an RTC being read, and the RTC register values being stored in RAM for the applications code to use for comparisons. This was grabbed from a live refrigeration controller in a major retail store (name removed) using a Delta-T Probe. The code is from one of the most widely used embedded controllers in buildings infrastructure in the UK, and is typical of the type of clock-access routines we regularly find.
Code Sample Summary
Probe User: Embedded Science: Engineer
Project: XXXXX
Site Reference: XXXXX
Domain: XXXXX
Asset Number: 154-A
Equipment Category: Electrical Building Management
Equipment Make: Trend
Equipment Model: IQ131
Firmware Version: 5.4
Serial Number: 837690/26
IC List
CPU () 6809
ROM1 () NM27512
ROM2 () NM27512
RAM1 () HM6264
RAM2 () HM6264
RTC () HD146818
Seconds stored in RAM2 at address 0703h
Minutes stored in RAM2 at address 0702h
Hours stored in RAM2 at address 0701h
Day of Week stored in RAM2 at address 06FDh
Day stored in RAM2 at address 0700h
Month stored in RAM2 at address 06FFh
Year stored in RAM2 at address 06FEh
Trigger stored in RAM2 at address 0700h
Test Date: 12:24:00 07/01/1999
Clip Connection Test OK
Submitted Samples
RTC Read
RAM Write
RAM Read 1..8
Automatic Validation Results
Visual RAM: Clock with no Century
RTC Read: Not Validated
RAM Write: Not Validated
RAM Read: Not Validated
Result: Automatic code scan has not been validated
Sample: RTC Read Sample
EX: C453 5F CLRB
EX: C454 17 07 C2 LBSR Code_cc19
MW: 0307 57
MW: 0306 C4
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 0001 00 ; <<<RTC READ>>>
EX: CC1F 39 RTS
MR: 02F9 CC
MR: 02FA 20
EX: CC20 F7 81 3B Code_813b
EX: C459 2F 01 BLE Code_c45c
EX: Code_c45c: C45C A7 15 STA -11,X
MW: 0703 00 ; <<<SECONDS>>>
EX: C45E 26 05 BNE Code_c465
EX: C460 17 05 E6 LBSR Code_ca49
MW: 0307 63
MW: 0306 C4
EX: Code_ca49: CA49 34 30 PSHS Y,X
MW: 0305 29
MW: 0304 06
MW: 0303 0E
MW: 0302 07
EX: CA4B 8E 07 0E LDX #0070Eh
EX: CA4E AE 1A LDX -6,X
MR: 0708 07
MR: 0709 0E
EX: CA50 17 00 EE LBSR Code_cb41
MW: 0301 53
MW: 0300 CA
EX: Code_cb41: CB41 34 30 PSHS Y,X
MW: 02FF 29
MW: 02FE 06
MW: 02FD 0E
MW: 02FC 07
EX: CB43 30 15 LEAX -11,X
EX: CB45 10 8E CB 6B LDY #0CB6Bh
EX: CB49 5F CLRB
EX: CB4A 34 04 PSHS B
MR: 02FC 07
MW: 02FB 00
EX: Code_cb4c: CB4C E6 A5 LDB B,Y
EX: CB4E 17 00 C8 LBSR Code_cc19
MW: 02FA 51
MW: 02F9 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 0002 08 ; <<<RTC READ>>>;<<<TRIGGER>>>
EX: CC1F 39 RTS
MR: 02F9 CB
MR: 02FA 51
EX: CB51 A7 82 STA ,X
MW: 0702 08 ; <<<MINUTES>>>
EX: CB53 E6 E4 LDB ,S
MR: 02FB 00
EX: CB55 5C INCB
EX: CB56 A1 A5 CMPA B,Y
EX: CB58 23 03 BLS Code_cb5d
EX: Code_cb5d: CB5D 5C INCB
EX: CB5E E7 E4 STB ,S
MW: 02FB 02
EX: CB60 C1 0A CMPB #00Ah
EX: CB62 2F E8 BLE Code_cb4c
EX: Code_cb4c: CB4C E6 A5 LDB B,Y
EX: CB4E 17 00 C8 LBSR Code_cc19
MW: 02FA 51
MW: 02F9 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 0004 0C ; <<<RTC READ>>>
EX: CC1F 39 RTS
MR: 02F9 CB
MR: 02FA 51
EX: CB51 A7 82 STA ,X
MW: 0701 0C ; <<<HOURS>>>
EX: CB53 E6 E4 LDB ,S
MR: 02FB 02
EX: CB55 5C INCB
EX: CB56 A1 A5 CMPA B,Y
EX: CB6E 17 23 03 LBSR Code_ee74
EX: Code_cb5d: CB5D 5C INCB
EX: CB5E E7 E4 STB ,S
MW: 02FB 04
EX: CB60 C1 0A CMPB #00Ah
EX: CB62 2F E8 BLE Code_cb4c
EX: Code_cb4c: CB4C E6 A5 LDB B,Y
EX: CB4E 17 00 C8 LBSR Code_cc19
MW: 02FA 51
MW: 02F9 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 0007 07 ; <<<RTC READ>>>
EX: CC1F 39 RTS
MR: 02F9 CB
MR: 02FA 51
EX: CB51 A7 82 STA ,X
MW: 0700 07 ; <<<DAY>>>
EX: CB53 E6 E4 LDB ,S
MR: 02FB 04
EX: CB55 5C INCB
EX: CB56 A1 A5 CMPA B,Y
EX: CB70 1F 23 TFR Y,U
EX: CB59 03 5C COM 05Ch
EX: CB5E E7 E4 STB ,S
MW: 02FB 06
EX: CB60 C1 0A CMPB #00Ah
EX: CB62 2F E8 BLE Code_cb4c
EX: Code_cb4c: CB4C E6 A5 LDB B,Y
EX: CB4E 17 00 C8 LBSR Code_cc19
MW: 02FA 51
MW: 02F9 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 0008 01 ; <<<RTC READ>>>
EX: CC1F 39 RTS
MR: 02F9 CB
MR: 02FA 51
EX: CB51 A7 82 STA ,X
MW: 06FF 01 ; <<<MONTH>>>
EX: CB53 E6 E4 LDB ,S
MR: 02FB 06
EX: CB55 5C INCB
EX: CB56 A1 A5 CMPA B,Y
EX: CB72 0C 23 INC 023h
EX: CB59 03 5C COM 05Ch
EX: CB5E E7 E4 STB ,S
MW: 02FB 08
EX: CB60 C1 0A CMPB #00Ah
EX: CB62 2F E8 BLE Code_cb4c
EX: Code_cb4c: CB4C E6 A5 LDB B,Y
EX: CB4E 17 00 C8 LBSR Code_cc19
MW: 02FA 51
MW: 02F9 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 0009 63 ; <<<RTC READ>>>
EX: CC1F 39 RTS
MR: 02F9 CB
MR: 02FA 51
EX: CB51 A7 82 STA ,X
MW: 06FE 63 ; <<<YEAR>>>
EX: CB53 E6 E4 LDB ,S
MR: 02FB 08
EX: CB55 5C INCB
EX: CB56 A1 A5 CMPA B,Y
EX: CB74 63 23 COM +3,Y
EX: CB59 03 5C COM 05Ch
EX: CB5E E7 E4 STB ,S
MW: 02FB 0A
EX: CB60 C1 0A CMPB #00Ah
EX: CB62 2F E8 BLE Code_cb4c
EX: Code_cb4c: CB4C E6 A5 LDB B,Y
EX: CB4E 17 00 C8 LBSR Code_cc19
MW: 02FA 51
MW: 02F9 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 0006 04 ; <<<RTC READ>>>
EX: CC1F 39 RTS
MR: 02F9 CB
MR: 02FA 51
EX: CB51 A7 82 STA ,X
MW: 06FD 04 ; <<<DAY OF WEEK>>>
EX: CB53 E6 E4 LDB ,S
MR: 02FB 0A
EX: CB55 5C INCB
EX: CB56 A1 A5 CMPA B,Y
EX: CB76 07 23 ASR 023h
EX: CB59 03 5C COM 05Ch
EX: CB5E E7 E4 STB ,S
MW: 02FB 0C
EX: CB60 C1 0A CMPB #00Ah
EX: CB62 2F E8 BLE Code_cb4c
EX: CB64 35 04 PULS B
MR: 02FB 0C
MR: 02FC 07
EX: CB66 17 00 7E LBSR Code_cbe7
MW: 02FB 69
EX: Code_cbe7: CBE7 34 10 PSHS X
MR: 02FA CB
MW: 02F9 FD
MW: 02F8 06
EX: CBE9 C6 0A LDB #00Ah
EX: CBEB 8D 2C BSR Code_cc19
MW: 02F7 ED
MW: 02F6 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 000A 2A ; <<<RTC READ>>>
EX: CC1F 39 RTS
MR: 02F9 CB
MR: 02FA ED
EX: CBED 84 7F ANDA #7F
EX: CBEF 81 2A CMPA #02Ah
EX: CBF1 26 12 BNE Code_cc05
EX: CBF3 C6 0B LDB #00Bh
EX: CBF5 8D 22 BSR Code_cc19
MW: 02F7 F7
MW: 02F6 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 000B 4E ; <<<RTC READ>>>
EX: CC20 F7 81 4E STB Code_814e
EX: CBF9 26 0A BNE Code_cc05
EX: CBFB C6 0D LDB #00Dh
EX: CBFD 8D 1A BSR Code_cc19
MW: 02F7 FF
MW: 02F6 CB
EX: Code_cc19: CC19 F7 40 00 STB Code_4000
EX: CC1C B6 40 01 LDA Code_4001
IR: 000D 80 ; <<<RTC READ>>>
EX: CC20 F7 84 80 STB Code_8480
EX: CC01 27 02 BEQ Code_cc05
EX: CC03 35 90 PULS PC,X
MR: 02F8 06
MR: 02F9 FD
MR: 02FA CB
MR: 02FB 69
EX: CB69 35 B0 PULS PC,Y,X
MR: 02FC 07
... this is totally typical of what we usually find - an interrupt routine that fetches the clock registers, stores them in RAM, and then does a few extra calculations at the end. This was, incidentally, from a system with a 146818 RTC chip, the same type as those found in older PCs. There is no scope in this code capture for any extra date-windowing code to upset the stability of the clock data being read from the RTC, very much like hundreds of other code samples we have seen from embedded systems.
I can provide plenty of other examples taken from live sites, as we do the code validation as a service to most existing Delta-T Probe users... all of which so far cannot by definition be subject to the C-E effect. Interestingly, the above bit of code does not check for the UIP bit... it is interrupt-driven so assumes (rightly I am glad to say) that the data is stable as it will always read it within 0.99 seconds of the interrupt being generated, before the data goes unstable.
I hope this example supports my previous email, and gives you a better idea of what the Delta-T Probe is capable of. It clearly has an application for those people who need to confirm that the RTC usage in their embedded systems is not vulnerable to the C-E effect.
Further commentary about Crouch-Echlin Effect (Time Dilation) by Dave
Eastabrook can be found on
www.elmbronze.co.uk/tdtools/special.htm
|