You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use pre-trained MMVID with celebvhq-text so i want to know how long text sequence should be for how long frames. is it the same with mmvid training config trained on MM-Vox (frames_num = 8, text sequence = 50)?
And in paper, the text descriptions contain all the "action, face attributes, emotion...etc" information, but you uploaded them separately. Then, let us know how to integrate them into one and which sentence belongs to what frames.
Thank you.
The text was updated successfully, but these errors were encountered:
I want to use pre-trained MMVID with celebvhq-text so i want to know how long text sequence should be for how long frames. is it the same with mmvid training config trained on MM-Vox (frames_num = 8, text sequence = 50)?
And in paper, the text descriptions contain all the "action, face attributes, emotion...etc" information, but you uploaded them separately. Then, let us know how to integrate them into one and which sentence belongs to what frames.
Thank you.
The text was updated successfully, but these errors were encountered: