DreamTalk-DMT: A Lightweight Sparse Mechanism Model with Dynamic Thresholds

Jia Zhang; Lin Po Shang

Authors

Jia Zhang Author
Lin Po Shang Author

Keywords:

Diffusion model, Dynamic threshold sparsification, Mutual information constrained optimization, Decoupled decoder, Cross-modal feature fusion module

Abstract

Aiming at the shortcomings of the DreamTalk 2D digital human synthesis model in computational efficiency and expression generation fineness, this paper proposes an optimization method combining adaptive sparsity and cross-modal feature enhancement. By introducing a dynamic threshold sparsity mechanism into the diffusion model, the sparsity ratio was dynamically adjusted based on the learnable threshold and Exponential Moving Average (EMA), and the Mutual information Constraint (MI Constraint) was combined to minimize the information loss, which reduced the calculation amount of the model while retaining key features. The model architecture is improved, and the decoupled decoder is designed to decompose the facial expression into the upper and lower regions for independent processing. The dynamic linear layer is combined to realize parameter adaptation under the style condition, and the detail expression of expression generation is improved. In addition, Tacotron speech features and Wav2Vec acoustic features are fused to enhance the synchronization of speech and expression, and skip connections are used to optimize the information transmission efficiency.

DreamTalk-DMT: A Lightweight Sparse Mechanism Model with Dynamic Thresholds

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite