ECCV2024|AIGC(图像生成,视频生成,3D生成等)相关论文汇总(附论文链接/开源代码)【持续更新】

ECCV2024|AIGC相关论文汇总(如果觉得有帮助,欢迎点赞和收藏)

  • Awesome-ECCV2024-AIGC
  • 1.图像生成(Image Generation/Image Synthesis)
      • Accelerating Diffusion Sampling with Optimized Time Steps
      • AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
      • A Watermark-Conditioned Diffusion Model for IP Protection
      • BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
      • ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
      • Data Augmentation for Saliency Prediction via Latent Diffusion
      • Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics
      • DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
      • DiffiT: Diffusion Vision Transformers for Image Generation
      • Large-scale Reinforcement Learning for Diffusion Models
      • MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation
      • Memory-Efficient Fine-Tuning for Quantized Diffusion Model
      • OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
      • Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
  • 2.图像编辑(Image Editing)
      • A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
      • BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
      • FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
      • StableDrag: Stable Dragging for Point-based Image Editing
      • TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
  • 3.视频生成(Video Generation/Video Synthesis)
      • Audio-Synchronized Visual Animation
      • Dyadic Interaction Modeling for Social Behavior Generation
      • EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
      • FreeInit : Bridging Initialization Gap in Video Diffusion Models
      • MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
      • ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
  • 4.视频编辑(Video Editing)
      • Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
      • DragAnything: Motion Control for Anything using Entity Representation
  • 5.3D生成(3D Generation/3D Synthesis)
      • EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
      • GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
      • GVGEN:Text-to-3D Generation with Volumetric Representation
      • Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
      • ParCo: Part-Coordinating Text-to-Motion Synthesis
      • Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
  • 6.3D编辑(3D Editing)
      • Gaussian Grouping: Segment and Edit Anything in 3D Scenes
      • SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
      • Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
  • 7.多模态大语言模型(Multi-Modal Large Language Models)
      • An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
      • ControlCap: Controllable Region-level Captioning
      • DriveLM: Driving with Graph Visual Question Answering
      • Elysium: Exploring Object-level Perception in Videos via MLLM
      • Empowering Multimodal Large Language Model as a Powerful Data Generator
      • GiT: Towards Generalist Vision Transformer through Universal Language Interface
      • How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
      • Long-CLIP: Unlocking the Long-Text Capability of CLIP
      • MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
      • Merlin:Empowering Multimodal LLMs with Foresight Minds
      • Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
      • MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
      • PointLLM: Empowering Large Language Models to Understand Point Clouds
      • R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
      • SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
      • ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
      • ST-LLM: Large Language Models Are Effective Temporal Learners
      • TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
      • UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
  • 8.其他任务(Others)
  • 参考
  • 相关整理

Awesome-ECCV2024-AIGC

A Collection of Papers and Codes for ECCV2024 AIGC

整理汇总下2024年ECCV AIGC相关的论文和代码,具体如下。

欢迎star,fork和PR~
优先在Github更新:Awesome-ECCV2024-AIGC,欢迎star~
知乎:https://zhuanlan.zhihu.com/p/706699484

参考或转载请注明出处

ECCV2024官网:https://eccv.ecva.net/

ECCV接收论文列表:

ECCV完整论文库:

开会时间:2024年9月29日-10月4日

论文接收公布时间:2024年

【Contents】

  • 1.图像生成(Image Generation/Image Synthesis)
  • 2.图像编辑(Image Editing)
  • 3.视频生成(Video Generation/Image Synthesis)
  • 4.视频编辑(Video Editing)
  • 5.3D生成(3D Generation/3D Synthesis)
  • 6.3D编辑(3D Editing)
  • 7.多模态大语言模型(Multi-Modal Large Language Model)
  • 8.其他多任务(Others)

1.图像生成(Image Generation/Image Synthesis)

Accelerating Diffusion Sampling with Optimized Time Steps

  • Paper: https://arxiv.org/abs/2402.17376
  • Code: https://github.com/scxue/DM-NonUniform

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

  • Paper: https://arxiv.org/abs/2406.18958
  • Code: https://github.com/open-mmlab/AnyControl

A Watermark-Conditioned Diffusion Model for IP Protection

  • Paper:
  • Code: https://github.com/rmin2000/WaDiff

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

  • Paper: https://arxiv.org/abs/2404.04544
  • Code: https://github.com/gwang-kim/BeyondScene

ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image

  • Paper: https://arxiv.org/abs/2402.11849
  • Code:

Data Augmentation for Saliency Prediction via Latent Diffusion

  • Paper:
  • Code: https://github.com/IVRL/AugSal

Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics

  • Paper: https://arxiv.org/abs/2310.17316
  • Code: https://github.com/EnVision-Research/Defect_Spectrum

DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

  • Paper:
  • Code: https://github.com/murphytju/DiffFAS

DiffiT: Diffusion Vision Transformers for Image Generation

  • Paper: https://arxiv.org/abs/2312.02139
  • Code: https://github.com/NVlabs/DiffiT

Large-scale Reinforcement Learning for Diffusion Models

  • Paper: https://arxiv.org/abs/2401.12244
  • Code: https://github.com/pinterest/atg-research/tree/main/joint-rl-diffusion

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

  • Paper: https://arxiv.org/abs/2405.05806
  • Code: https://github.com/csyxwei/MasterWeaver

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

  • Paper:
  • Code: https://github.com/ugonfor/TuneQDM

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

  • Paper: https://arxiv.org/abs/2403.10983
  • Code: https://github.com/kongzhecn/OMG

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

  • Paper: https://arxiv.org/abs/2403.09176
  • Code: https://github.com/byeongjun-park/Switch-DiT

2.图像编辑(Image Editing)

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

  • Paper: https://arxiv.org/abs/2312.03594
  • Code: https://github.com/open-mmlab/PowerPaint

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

  • Paper: https://arxiv.org/abs/2403.06976
  • Code: https://github.com/TencentARC/BrushNet

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

  • Paper:
  • Code: https://github.com/kookie12/FlexiEdit

StableDrag: Stable Dragging for Point-based Image Editing

  • Paper: https://arxiv.org/abs/2403.04437
  • Code:

TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

  • Paper: https://arxiv.org/abs/2403.15033
  • Code: https://github.com/TinyBeauty/TinyBeauty

3.视频生成(Video Generation/Video Synthesis)

Audio-Synchronized Visual Animation

  • Paper: https://arxiv.org/abs/2403.05659
  • Code: https://github.com/lzhangbj/ASVA

Dyadic Interaction Modeling for Social Behavior Generation

  • Paper: https://arxiv.org/abs/2403.09069
  • Code: https://github.com/Boese0601/Dyadic-Interaction-Modeling

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

  • Paper: https://arxiv.org/abs/2404.01647
  • Code: https://github.com/tanshuai0219/EDTalk

FreeInit : Bridging Initialization Gap in Video Diffusion Models

  • Paper: https://arxiv.org/abs/2312.07537
  • Code: https://github.com/TianxingWu/FreeInit

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

  • Paper: https://arxiv.org/abs/2405.20222
  • Code: https://github.com/MyNiuuu/MOFA-Video

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

  • Paper: https://arxiv.org/abs/2310.01324
  • Code:

4.视频编辑(Video Editing)

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

  • Paper: https://arxiv.org/abs/2403.13745
  • Code: https://github.com/G-U-N/Be-Your-Outpainter

DragAnything: Motion Control for Anything using Entity Representation

  • Paper: https://arxiv.org/abs/2403.07420
  • Code: https://github.com/showlab/DragAnything

5.3D生成(3D Generation/3D Synthesis)

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

  • Paper: https://arxiv.org/abs/2405.00915
  • Code: https://github.com/ymxlzgy/echoscene

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

  • Paper: https://arxiv.org/abs/2405.00915
  • Code: https://github.com/ibrahimethemhamamci/GenerateCT

GVGEN:Text-to-3D Generation with Volumetric Representation

  • Paper:
  • Code: https://github.com/SOTAMak1r/GVGEN

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

  • Paper: https://arxiv.org/abs/2403.07487
  • Code: https://github.com/steve-zeyu-zhang/MotionMamba

ParCo: Part-Coordinating Text-to-Motion Synthesis

  • Paper: https://arxiv.org/abs/2403.18512
  • Code: https://github.com/qrzou/ParCo

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

  • Paper: https://arxiv.org/abs/2311.17050
  • Code: https://github.com/Yzmblog/SurfD

6.3D编辑(3D Editing)

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

  • Paper: https://arxiv.org/abs/2312.00732
  • Code: https://github.com/lkeab/gaussian-grouping

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

  • Paper: https://arxiv.org/abs/2403.18512
  • Code: https://github.com/JarrentWu1031/SC4D

Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing

  • Paper: https://arxiv.org/abs/2403.10050
  • Code: https://github.com/slothfulxtx/Texture-GS

7.多模态大语言模型(Multi-Modal Large Language Models)

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

  • Paper: https://arxiv.org/abs/2403.06764
  • Code: https://github.com/pkunlp-icler/FastV

ControlCap: Controllable Region-level Captioning

  • Paper: https://arxiv.org/abs/2401.17910
  • Code: https://github.com/callsys/ControlCap

DriveLM: Driving with Graph Visual Question Answering

  • Paper: https://arxiv.org/abs/2312.14150
  • Code: https://github.com/OpenDriveLab/DriveLM

Elysium: Exploring Object-level Perception in Videos via MLLM

  • Paper: https://arxiv.org/abs/2403.16558
  • Code: https://github.com/Hon-Wong/Elysium

Empowering Multimodal Large Language Model as a Powerful Data Generator

  • Paper:
  • Code: https://github.com/zhaohengyuan1/Genixer

GiT: Towards Generalist Vision Transformer through Universal Language Interface

  • Paper: https://arxiv.org/abs/2403.09394
  • Code: https://github.com/Haiyang-W/GiT

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

  • Paper: https://arxiv.org/abs/2311.17600
  • Code: https://github.com/UCSC-VLAA/vllm-safety-benchmark

Long-CLIP: Unlocking the Long-Text Capability of CLIP

  • Paper: https://arxiv.org/abs/2403.15378
  • Code: https://github.com/beichenzbc/Long-CLIP

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

  • Paper: https://arxiv.org/abs/2403.14624
  • Code: https://github.com/ZrrSkywalker/MathVerse

Merlin:Empowering Multimodal LLMs with Foresight Minds

  • Paper: https://arxiv.org/abs/2312.00589
  • Code: https://github.com/Ahnsun/merlin

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

  • Paper: https://arxiv.org/abs/2403.11755
  • Code: https://github.com/jmiemirza/Meta-Prompting

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

  • Paper: https://arxiv.org/abs/2403.14624
  • Code: https://github.com/isXinLiu/MM-SafetyBench

PointLLM: Empowering Large Language Models to Understand Point Clouds

  • Paper: https://arxiv.org/abs/2308.16911
  • Code: https://github.com/OpenRobotLab/PointLLM

R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

  • Paper: https://arxiv.org/abs/2403.04924
  • Code: https://github.com/lxa9867/r2bench

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

  • Paper:
  • Code: https://github.com/AI-Application-and-Integration-Lab/SAM4MLLM

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

  • Paper: https://arxiv.org/abs/2311.12793
  • Code: https://github.com/ShareGPT4Omni/ShareGPT4V

ST-LLM: Large Language Models Are Effective Temporal Learners

  • Paper: https://arxiv.org/abs/2404.00308
  • Code: https://github.com/TencentARC/ST-LLM

TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

  • Paper: https://arxiv.org/abs/2404.00384
  • Code: https://github.com/shjo-april/TTD

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

  • Paper: https://arxiv.org/abs/2311.17136
  • Code: https://github.com/TIGER-AI-Lab/UniIR

8.其他任务(Others)

持续更新~

参考

相关整理

  • Awesome-CVPR2024-AIGC
  • Awesome-AIGC-Research-Groups
  • Awesome-Low-Level-Vision-Research-Groups
  • Awesome-CVPR2024-CVPR2021-CVPR2020-Low-Level-Vision
  • Awesome-ECCV2020-Low-Level-Vision

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mfbz.cn/a/772144.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Codeforces Round 955 (Div. 2, with prizes from NEAR!)(A~C题解)

这场比赛怎么说呢,一开始打的还算好,能进前1000,但是后面就被卡住了,这个确实没办法水平还是不够,学过的还是没想起来,后面继续练 A. Soccer 题解:水题一个,想要在过程中出现平局的…

web零碎知识

&nbsp 在html文件中 连续的空格会被认为是一个空格 所以我们需要使用&nbsp来代表空格 &#x3000 把这个当成tab键来使用 我们可以引入js文件,就可以减少html文件的长度。 首先创建一个js文件夹,然后在js文件夹中创建一个,后缀…

【第17章】MyBatis-Plus自动维护DDL

文章目录 前言一、功能概述二、注意事项三、代码示例四、实战1. 准备2. ddl配置类3. 程序启动4. 效果(数据库) 总结 前言 在MyBatis-Plus的3.5.3版本中,引入了一项强大的功能:数据库DDL(数据定义语言)表结构的自动维护。这一功能…

【电路笔记】-B类放大器

B类放大器 文章目录 B类放大器1、概述2、B类放大器介绍3、推挽式配置4、限制交叉失真5、B类放大器效率6、总结1、概述 我们在之前的文章中已经知道,A 类放大器的特点是导通角为 360,理论最大效率为 50%。 在本文中,我们将详细介绍另一类放大器,称为B类放大器,它是为解决A…

康姿百德磁性床垫好不好,效果怎么样靠谱吗

康姿百德典雅款床垫,打造舒适睡眠新体验 康姿百德床垫是打造舒适睡眠新体验的首选,其设计能够保护脊椎健康,舒展脊椎,让您享受一夜好眠。康姿百德床垫的面料选择也非常重要,其细腻亲肤的针织面料给您带来柔软舒适的触…

A*——AcWing 179. 八数码

A* 定义 A* 算法是一种在图形或地图中寻找最短路径的启发式搜索算法。它通过综合考虑起始节点到当前节点的实际代价和当前节点到目标节点的预估代价,来决定下一步的搜索方向。 运用情况 路径规划:如在地图导航中为车辆、行人规划最优路线。游戏开发&…

旅游系统(附管理端+前台)PHP源码

一. 前言 今天小编给大家带来了一款可学习,可商用的,旅游系统 源码,支持二开,无加密。支持景点管理,登录,景点预定,意见反馈,统计等功能。详细界面和功能见下面视频演示。 二. 视频…

深入挖掘海外快手kwai ads推广巴西slots手游广告独家优势

深入挖掘海外快手kwai ads推广巴西slots手游广告独家优势 在数字化时代,广告投放已成为各行各业不可或缺的一部分,特别是在游戏行业,如何有效地推广游戏产品,吸引玩家的眼球,成为了每一个游戏开发商和广告主所关注的焦…

DllImport进阶:参数配置与高级主题探究

深入讨论DllImport属性的作用和配置方法 在基础篇中,我们已经简单介绍了DllImport的一些属性。现在我们将深入探讨这些属性的实际应用。 1. EntryPoint EntryPoint属性用于指定要调用的非托管函数的名称。如果托管代码中的函数名与非托管代码中的函数名不同&#…

TreeSize Free - 硬盘空间管理工具

TreeSize FreeTreeSize Free 是一款免费的强大灵活的硬盘空间管理工具。可以帮你找出硬盘上最大的目录以及它占用的空间。支持空间大小显示、分配空间和占用空间、文件数、3D工具条和分配图、最近使用数据、文件作者、NTFS压缩率等信息,并支持搜索文件。该软件类似浏…

掌握亚马逊自养号:测评策略的核心要点与实战经验

在当今电商领域的激烈角逐中,亚马逊测评对于卖家而言,已从单纯的销量助推器与好评累积工具,进化为品牌塑造与市场洞察的关键环节。然而,许多卖家仍局限于传统认知,未能充分挖掘自养号测评的多元化价值与深远影响。本文…

Modbus协议转Profinet协议网关模块连智能仪表与PLC通讯

一、现场需求:PLC作为控制器,仪表设备做为执行设备,执行设备能够实时响应PLC传来的指令,并且向PLC回馈数据,从而达到PLC对仪表设备进行控制和监测,实现对生产过程的精准控制。 二、解决方案:通过…

2024年7月5日 十二生肖 今日运势

小运播报:2024年7月5日,星期五,农历五月三十 (甲辰年庚午月庚午日),法定工作日。 红榜生肖:狗、羊、虎 需要注意:鸡、牛、鼠 喜神方位:西北方 财神方位:正…

java考试题20道

选择题 编译Java源代码文件的命令是javac javac命令是将Java源代码文件进行编译得到字节码文件(.class文件) java命令是在JVM上运行得到的字节码文件 下面是一个示例: javac test.java -------> test.class java test ------> 运行test.class文件下列那…

QT_GUI

1、QT安装 一个跨平台的应用程序和用户界面框架,用于开发图形用户界面(GUI)应用程序以及命令行工具。QT有商业版额免费开源版,一般使用免费开源版即可,下面安装的是QT5,因为出来较早,使用较多&…

以品质为初心,以创新为驱动,光明乳业闪耀第十五届中国奶业大会

2024年7月3日,以“数智赋能引领产业发展增长点,科技创新驱动奶业新质生产力”为主题的中国奶业协会第十五届奶业大会奶业20强(D20)论坛暨2024中国奶业展览会隆重召开,光明乳业党委书记、董事长黄黎明受邀出席会议&…

代谢组数据分析(十三):评估影响代谢物的重要临床指标

欢迎大家关注全网生信学习者系列: WX公zhong号:生信学习者Xiao hong书:生信学习者知hu:生信学习者CDSN:生信学习者2介绍 相关性分析是通过计算两个变量之间的相关系数来评估它们之间线性关系的强度和方向。最常用的是皮尔逊相关系数(Pearson correlation coefficient),…

python库(3):Cerberus库

1 Cerberus简介 Cerberus 是一个Python数据验证库,设计用于验证数据结构的有效性和一致性。它提供了一种简单而强大的方式来定义和应用验证规则,特别适用于处理用户输入的验证、配置文件的检查以及API的参数验证等场景。下面将详细介绍 Cerberus 的特点…

vite项目配置svg图标(vite-plugin-svg-icons)

1.插件地址 网址 , 可以去里面查看中文文档,里面有详情的教程 2.使用, 如果你安装的有element-plus ,可以使用这样的方式来修改大小和颜色 <el-icon size"18" color"red"><SvgIcon name"xing"></SvgIcon></el-icon> …

汇聚荣拼多多评价好不好?

汇聚荣拼多多评价好不好?在探讨电商平台的口碑时&#xff0c;用户评价是衡量其服务质量和商品质量的重要指标。拼多多作为国内领先的电商平台之一&#xff0c;其用户评价自然成为消费者选择购物平台时的参考依据。针对“汇聚荣拼多多评价好不好?”这一问题&#xff0c;可以从…