(tr)uSDX bootloader corruption – a smoking gun

A common topic of discussion on (tr)uSDX (trusdx) forum is problems in loading firmware (application or bootloader.

A user posting provides evidence for discussion of the problem in this case, probable cause, solution, and a better design to prevent the problem.

Analysis

The type of programmer (Arduino), connection (COM port) are the settings one would use to talk to a bootloader already installed on the mcu to write the application program to flash. They are not the settings one would use to install the bootloader, they are not suitable for talking to the ISP facility burned into silicon.

The flash file specified is the bootloader file. This is an attempt to overwrite the bootloader using the bootloader.

The problem here is that writing the bootloader image whilst it is being executed is likely to corrupt the in-flash image of the bootloader. Special techniques are required to update a bootloader using the bootloader, and it is unlikely that this bootloader has that capability.

So, you might ask why a bootloader attempts to write to flash addresses that it knows will overwrite and probably corrupt itself?

I have written bootloaders that check that they are not attempting to do such a thing, but a better measure on the ATmega328P is to use the bootloader protection scheme by programming appropriate LOCK bits.

Robust designs use the LOCK bits to protect the bootloader.

The (tr)uSDX recommended procedure at https://dl2man.de/3a-trusdx-bootloader/ does not program the LOCK bits (for whatever reason), and it leaves the bootloader vulnerable to corruption by:

  • rogue code running in application flash, and
  • the bootloader itself being used to load an image that will overlay the bootloader.

Because this is proprietary code (ie secret), one cannot examine the source code to find if there was a functional reason to leave the LOCK bits unprogrammed. They may have been left unprogrammed to facilitate some half baked serial number embedding scheme for instance? My first guess at the serial number scheme is that it uses the extended signature bytes ‘burned' into the chip, but experiments show that the bootloader section gets modified after loading.

One could experiment by programming the LOCK bits to 0xCF to protect the bootloader immediately upon programming the bootloader and using the same ISP programmer. Note, this may cause problems if the normal application code or the bootloader itself tries to update part of the bootloader section of flash.

Another example

By now, the reader can analyse this. It is exactly the same problem as the previous user's screenshot… but more revealing.

The first part of the log shows an attempt to read and write FUSES and LOCK bits. The operation appears successful, but be aware that the ATmega328P does not provide capability for program running in flash to write FUSES and LOCK bits. It is likely that the bootloader cannot read for write FUSE and LOCK bits, but returns SUCCESS even though it has not worked. The log shows that it was unable to write the LFUSE and it stopped on that failure.

Above is a screenshot after trying to read FUSES and LOCK bits using the Arduino type programmer via Optiboot bootloader. All results are wrong.

Above is the correct result obtained with an ISP programmer.

If you want to read or write FUSES or LOCK bits, use an ISP programmer. Take note of the clock rate issue raised at ISP programming of the (tr)uSDX.

Summary

The screenshot might well describe a method by which end users have corrupted the bootloader, an easy enough mistake to make, even for experienced users.

In my experience, a properly installed ATmega328P bootloader (one that passes read back verification AND is protected by the LOCK bits) never needs replacement (unless to upgrade it for enhancement or bug fix).

The fact that experts are advising users of the (tr)uSDX to reprogram the bootloader hints known bootloader corruption.

Continued at (tr)uSDX bootloader corruption – a smoking gun – an experiment.