If you wonder what I tried yesterday, well here you are. I tried to make one of the two crazy servers a unicast client. I hoped that having the chance to decide when to poll the stratum 1's would help them to adjust better. Although a bit "irregular", it seemed to work better for a while. Unfortunately, when I came back to the office today and looked at the graphs, it eventually started to oscillate more and more during the night, until it had a reset today. The other one is still going bad, of course.
One more day has passed, and still looking for a solution… …So, today I spent more and more time in researching if this happened to other people and how they solved it. I must say that the results where pretty descouraging. I found 99% crappy and 1% irrelevant information.
I then looked for differences in the configuration of these machines, finding none. Dug more into the documents on the internet that say how to do a "manual calibration" of the clock speed (1 and 2), dug into the hwclock's command man pages, until I found a new difference on the systems: tail -n 1 /etc/adjtime
said UTC
on the "good" servers, and LOCAL
in the second. So 1) this is a difference and 2) I remember in one of the million pages I read today they say the clock should be set to UTC. Hmmmmm…
Guess what I did now? 🙂
date ; /etc/init.d/ntp stop && rm /etc/adjtime && ntpdate timeserver && hwclock --utc --systohc && /etc/init.d/ntp start ; date
Not sure this will help anyway. Stay tuned for part 3 😉