Extrinsic Transform:
For \(p = [x,y,z]\) we have on the camera coordinates:
Camera intrinsic parameters:
If \(p_c = Rp + t =: \left[ x_c, y_c, z_c \right]^T\) then the point in the image plane is:
Suppose \(p_{tf}\) and \(p_{bf}\) are the top and bottom points for the first row object.
The distance on the \(y\) coordinate in the image plane is given by:
Case 1: first row
Suppose \(p_{f}\) and \(p_{b}\) are the front and back points for the last row object.
The distance on the \(y\) coordinate in the image plane is given by:
Case 2: last row (no inclination)