Abstract: We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large ...
Abstract: Extending large image-text pre-trained models (e.g., CLIP) for video understanding has made significant advancements. To enable the capability of CLIP to perceive dynamic information in ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results