The low channel count + large MTU can lead to the Ethernet HW buffering for so long that the ring buffer in the HW wraps around, and you wind up having the packets come into the buffer right after the read head rather than right before, and as a result you get the maximum latency of the HW packet buffer rather than the minimum latency. We are still working on how to detect this condition. But its probably best not to use the combination of small channel counts and large MTUs. The Large MTU is really for the case of large channel counts + high sample rated.
With respect to the audio coming in early, are you doing an analog loop or are you just looping through the mixer? Also which unit are you using?