Expressions like
(uin16_t)a * (uint16_t)b * (uint16_t)c
might be promoted to (signed) int (in that example, on platforms where sizeof(int) > sizeof(uint16_t)),
and therefore lead to undefined behaviour on overflow.
The above expression can be fixed as
1u * (uint16_t)a * (uint16_t)b * (uint16_t)c
(The 1u makes sure a, b, and c would be promoted to unsigned int (instead of int) on platforms where sizeof(int) > sizeof(uint16_t))
cf. https://stackoverflow.com/questions/27001604/32-bit-unsigned-multiply-on-64-bit-causing-undefined-behavior
Still not sure whether it's better to calculate
rref[pivot_row][col] * (rref[row][pivot_col] / pivot_val) (option 1)
vs.
rref[pivot_row][col] * rref[row][pivot_col] / pivot_val (option 2)
(i.e. a * b / c vs. "a * (b/c))
in terms of floating point error.
But I think option 1 (current commit) is better, since the scale factor
(rref[row][pivot_col] / pivot_val) is always <= 1 here (I think).