Reproduction of InternVideo3 (arXiv 2606.12195): transformers-native text+image+video inference PoC for the 8B model.