77779193永利|官网(欢迎您)

鼓浪智能涌动未来 | 第二届鼓浪人工智能论坛（春季）成功举办

发布时间：2023年03月15日

文章来源：

浏览：次

为了更好地推动我国计算机视觉领域的创新研究，2023年3月10日-12日，第二届鼓浪人工智能论坛（春季）在厦门市鼓浪屿成功举办。本次论坛由77779193永利官网、多媒体可信感知与高效计算教育部重点实验室联合承办。来自清华大学、北京航空航天大学、上海交通大学、浙江大学、南京大学、微软亚洲研究院和77779193永利官网的八位计算机视觉领域的杰出青年专家与分享了他们的团队在人工智能技术应用于不同领域中的深度思考和最新成果。共同探讨当下人工智能领域前沿技术，对人工智能未来发展方向进行了深入地交流与学习。多媒体可信感知与高效计算教育部重点实验室共一百五十余名师生参与了本次论坛。实验室各小组进行了研究方向与科研成果的汇报，同时对于实验室优秀文章进行了海报展示与讲解，对优秀工程硕士成果进行了展示和评选。

本次论坛召集人纪荣嵘教授致开幕辞。纪荣嵘教授对于各位参会学者的到来表示由衷的感谢，并表达了对本次论坛的期许。他指出本次论坛旨在提供一个人工智能领域前沿的交流平台，邀请到的来宾都是由同学们推选的，对人工智能领域做出杰出贡献的专家，希望这次论坛能够实现思想上的碰撞，为人工智能领域的发展带来新的学术火花。

北京航空航天大学刘偲教授进行了《自动驾驶与车路协同感知》的报告，报告中提出了目标感知金字塔蒸馏（OADP）框架。刘教授提出开放词表目标检测器以任意文本查询作为输入，具有检测所描述对象的能力。以往方法采用知识蒸馏从预训练视觉-语言模型（PVLMs）中提取知识并将其传递给检测器。然而，由于非自适应候选目标区域裁剪和单层级特征模仿过程，预训练模型知识在提取过程中容易遭受信息损失和噪声的影响，导致知识转移十分低效。为了解决这些问题，我们提出了目标感知金字塔蒸馏（OADP）框架。框架由一个目标感知知识提取（OAKE）模块和金字塔蒸馏（DP）机制构成。在从预训练视觉-语言模型中提取目标知识时，前者自适应地变换目标候选框，并采用目标感知掩码注意力机制来保证对象知识的精确性和完整性。后者引入全局和区块蒸馏，以实现更全面的知识转移，弥补目标蒸馏中缺失的关系信息。大量实验表明，我们的方法显著改进了现有方法相比。特别地，在MS-COCO数据集上，我们的OADP框架达到35.6 mAPn的性能，以3.3 mAPn的差距超过当前最先进的方法。

清华大学刘烨斌教授作了《基于动态神经辐射场的数字人生成技术》报告，介绍了在动态神经辐射场的高效动态表征、先验与多元表征融合、轻量化采集下的动态实时重建渲染等方面的科研工作。刘教授提出神经辐射场(Neural Radiance Field, NeRF)作为一种基于隐式表达的神经渲染技术，以其可微、端到端、高质量视点生成等特性在计算机视觉与图形学领域得到广泛关注。然而，对于动态场景的神经辐射场需要同时考虑视点和时域一致性问题，数据维度极高，至今未有高效的场景表征方式。本报告围绕动态神经辐射场表征、重建与生成，重点考虑现实动态人体场景，围绕数字人重建和生成关键技术，分别介绍报告在动态神经辐射场的高效动态表征、先验与多元表征融合、轻量化采集下的动态实时重建与渲染、数字化身高保真驱动与实时交互编辑等方面的科研工作，涵盖人体、人脸乃至全身数字化身生成的相关视觉与图形学前沿，包括沉浸式全息通信技术和AI数字人等热点前沿进行探讨与分析。

上海交通大学陈谐副教授进行了《语音翻译和语音驱动的数字人系统》的报告，主要分享了近期陈教授组在语音翻译和数字人系统方面做的一些探索和尝试。陈教授的报告主要分享团队近期在语音翻译和数字人系统方面做的一些探索和尝试。在语音翻译方面，团队基于公开中文数据，训练一个较好的能实时完成中文语音识别和实时机器翻译的系统，实现实时翻译功能；在语音驱动数字人方面，通过优化语音特征提取，损失函数设计和数据增强等方面，基于10句左右的精标数据，可以搭建一个性能不错的语音驱动的数字人原型系统。

浙江大学许威威教授以《Scalable Neural Indoor Scene Rendering》为主题作了汇报，提出了一种可扩展的神经场景重建和渲染方法，支持大型室内场景的分布式训练和交互式渲染。报告摘要如下：We propose a scalable neural scene reconstruction and rendering method to support distributed training and interactive rendering of large indoor scenes. Our representation is based on tiles. Tile appearances are trained in parallel through a background sampling strategy that augments each tile with distant scene information via a proxy global mesh. Each tile has two low-capacity MLPs: one for view-independent appearance (diffuse color and shading) and one for view-dependent appearance (specular highlights, reflections). We leverage the phenomena that complex view-dependent scene reflections can be attributed to virtual lights underneath surfaces at the total ray distance to the source. This lets us handle sparse samplings of the input scene where reflection highlights do not always appear consistently in input images. We show interactive free-viewpoint rendering results from five scenes, one of which covers an area of more than 100 m2. Experimental results show that our method produces higher-quality renderings than a single large-capacity MLP and five recent neural proxy-geometry and voxel-based baseline methods. Our code and data are available at project webpage https://xchaowu.github.io/papers/scalable-nisr.

南京大学副研究员过洁作了《基于NeRF的室外场景重光照技术》报告，提出了一种新的基于不受约束的野外稀疏照片集的自由视点室外场景重照明框架—NeuLighting。报告摘要如下：We propose NeuLighting, a new framework for free viewpoint outdoor scene relighting from a sparse set of unconstrained in-the-wild photo collections. Our framework represents all the scene components as continuous functions parameterized by MLPs that take a 3D location and the lighting condition as input and output reflectance and necessary outdoor illumination properties. The key to our method includes a neural lighting representation that compresses the per-image illumination into a disentangled latent vector, and a new free viewpoint relighting scheme that is robust to arbitrary lighting variations across images. The lighting representation is compressive to explain a wide range of illumination and can be easily fed into the query-based NeuLighting framework, enabling efficient shading effect evaluation under any kind of novel illumination. Furthermore, to produce high-quality cast shadows, we estimate the sun visibility map to indicate the shadow regions according to the scene geometry and the sun direction. Thanks to the flexible and explainable neural lighting representation, our system supports outdoor relighting with many different illumination sources, including natural images, environment maps, and time-lapse videos.

微软亚洲研究院首席研究员段楠进行了《生成式人工智能：进展与挑战》的报告，旨在简要介绍生成式人工智能研究的最新进展。段教授指出以ChatGPT为代表的生成式人工智能(generative AI) 让全社会感受到了科技创新的颠覆性力量，并让人们看到构建通用人工智能的曙光。在这一背景下，本报告旨在简要介绍生成式人工智能研究的最新进展，包括大模型涌现能力(emergent abilities of large models)、上下文学习(in-context learning)、思维链 (chain-of-thought)、指令微调(instruction tuning)、以及该类模型在文本、代码、对话和视觉生成任务中的典型工作。当然，现有技术依然面临一系列问题和挑战。

上海交通大学钱彦旻教授就《Multi-Modal Robust Speech Processing，Analysis and Recognition in Reality》作了报告，总结了近年来的进展，并介绍钱教授团队在复杂真实场景下的多模态语音处理方面所做的努力。报告摘要如下：Although intelligent speech processing has been greatly advanced in research and widely used in many real-life applications, there still remains a large performance gap between controlled environments and real-life scenarios. Multi-modality research is one of the important strategies to boost the performance of speech processing system in reality, and has been the hot topic in both academia and industry. In this talk, we will summarize recent progress and present our efforts on multi-modal speech processing in the complex real scenario, especially on the new techniques developed in SJTU for multi-modal based speaker identification, speech separation and enhancement, speech recognition, scene analysis and pretraining model with self-supervised training.

浙江大学赵洲教授作了《跨模态生成模型》的汇报，介绍了赵教授团队最近在跨模态序列生成方面的尝试，包括语音合成、歌声合成等。赵教授提出随着信息技术的飞速发展，多模态数据已经成为近年来智能人机交互场景中数据的主要形式。跨模态序列生成是实现人机智能交互的关键技术之一。在这次报告中，将介绍我们最近在跨模态序列生成方面的尝试，包括语音合成（NATSpeech）、歌声合成（DiffSinger）、音色合成（Make-An-Audio）和人脸视频合成(GeneFace)等。以上相关工作已经在NeurIPS 19/21/22、ICLR 21/23上发表，代码已经在Github上进行开源。