时间轴 | DaNing的博客

2025

JiT: Back to Basics-Let Denoising Generative Models Denoise

本文前置知识: DDPM: DDPM: Denoising Diffusion Probabilistic Model. Rectified Flow: ReFlow: Flow Straight and Fast-Learning t

2025-12-18 深度学习

Diffusion

TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis

本文前置知识: Flow Matching: Flow Matching: Flow Matching for Generative Modeling. TCSinger: TCSinger: Zero-Shot Singing Voi

2025-08-07 深度学习

Audio SVS

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control 论文: TCSinger: Zero-Shot Si

2025-07-01 深度学习

Audio SVS

Flow Matching: Flow Matching for Generative Modeling

本文前置知识: Rectified Flow: ReFlow: Flow Straight and Fast-Learning to Generate and Transfer Data with Rectified Flow. F

2025-05-29 深度学习

Diffusion Flow Flow Matching

MeanFlow: Mean Flows for One-step Generative Modeling

本文前置知识: Flow Matching: Flow Matching for Generative Modeling. 或者: Rectified Flow: ReFlow: Flow Straight and Fast: Le

2025-05-22 深度学习

Diffusion Flow Flow Matching Mean Flow

ReFlow: Flow Straight and Fast-Learning to Generate and Transfer Data with Rectified Flow

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow 论文: Flow Straight and Fast: Learning

2025-05-20 深度学习

Diffusion Flow ReFlow

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment 论文: AlignSTS: Speech-to-Singing Conversion via Cross-Mo

2025-05-13 深度学习

STS Audio

DDIM: Denoising Diffusion Implicit Models

本文前置知识: DDPM: DDPM: Denoising Diffusion Probabilistic Model. Denoising Diffusion Implicit Models 论文: Denoising Diffu

2025-03-21 深度学习

Diffusion DDIM

RoPE / RoFormer: Enhanced Transformer with Rotary Position Embedding

RoPE / RoFormer: Enhanced Transformer with Rotary Position Embedding本文是论文 RoFormer: Enhanced Transformer with Rotary Pos

2025-02-12 深度学习

RoPE LLM

CLAP: Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation本文是论文Large-Sca

2025-02-07 深度学习

Audio

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

本文前置知识: HiFi - GAN: HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. EVA-

2025-01-17 深度学习

Audio SVS Vocoder TTS

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Robust Speech Recognition via Large-Scale Weak Supervision本文是论文Robust Speech Recognition via Large-Scale Weak Supervisio

2025-01-14 DaNing

Audio ASR