- 註冊時間
- 2024-3-6
- 最後登錄
- 2024-3-6
- 閱讀權限
- 10
- 積分
- 5
- 精華
- 0
- 帖子
- 1
該用戶從未簽到
|
in text and image generation. Why do others use enI without thinking of using it for video generation? This comes from another problem: the full attention mechanism in the rnfrer architecture Memory requirements will increase quadratically with the length of the input sequence, so the computational cost will be very, very high when processing micro-signals such as videos. In layman's terms, although the effect of using rnfrer will be good, the computing resources required are also very scary. This is not very economical. Of course, although enI has obtained various financings, it is still not that wealthy, so they did not directly invest resources but thought of another way to solve the problem of high computing costs. Here we must first Introducing
the concept of "len latent", it is a kind of "dimensionality reduction or compression" that aims to express the essence of information with less information. Let’s give an inappropriate but easy-to-understand example. It’s as if we can use a three-dimensional view to save and record the structure of a simple three-dimensional object without having to save the three-dimensional object itself. enI has developed a video Rich People Phone Number List compression network for this purpose to first reduce the dimensionality of the video to latent space and then use the compressed video data to generate he.
This can reduce the input information and effectively reduce the computational pressure brought by the rnfrer architecture. .In this way, most of the problems have been solved. enI has successfully integrated the Vincent video model into the paradigm of its large language model that has achieved great success in the past, so it is difficult to think about
|
|