首页/心情/心情句子爱可可AI前沿推介(9.17)伟达发布世界最强A1芯片H200,性能升90%，Lama2准理速度翻倍，有何重要意义

爱可可AI前沿推介(9.17)伟达发布

时间：2023-11-15 01:53

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[IR] MURAL: Multimodal, Multitask Retrieval Across Languages

A Jain, M Guo, K Srinivasan, T Chen, S Kudugunta, C Jia, Y Yang, J Baldridge

[Google Research]

MURAL：跨语言多模态多任务检索。图像-标题对和翻译对都提供了学习语言深度表示和语言之间联系的途径。本文在MURAL(MUltimodal, MUltitask Representations Across Languages)中使用这两种类型的配对，这是一种解决两种任务的双编码器：1）图像-文本匹配和 2）翻译对匹配。通过纳入数十亿的翻译对，MURAL扩展了ALIGN——一个从18亿个噪声图像-文本对中学习的最先进的双编码器。当使用相同的编码器时，MURAL的性能与ALIGN在几个数据集的资源丰富的语言上的跨模态检索性能相匹配或更好。MURAL极大地提高了资源不足的语言的性能，表明文本-文本学习可以克服这些语言的图像-标题样本的缺乏问题，有助于提高检索样本的文化特异性和多样性。例如，在维基百科图像文本数据集上，MURAL-BASE将八种资源不足的语言的零样本平均召回率平均提高了8.1%，在进行微调时平均提高了6.8%。MURAL的文本表示不仅在系统性连接方面，而且还基于区域语言学，如巴尔干语系，进行了分组。

Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN (Jia et al., 2021)–a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL’s performance matches or exceeds ALIGN’s cross-modal retrieval performance on wellresourced languages across several datasets. More importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL-BASE improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning. We additionally show that MURAL’s text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.

https://weibo.com/1402400261/KywoF9Hp2

2、[CL] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

M Yasunaga, J Leskovec, P Liang

[Stanford University]

LM-Critic：面向无监督语法纠错的语言模型。训练语法错误纠正(GEC)模型需要一组标记的非语法/语法句子对，但手动标注这些句子对可能很昂贵。最近，Break-It-Fix-It(BIFI)框架在学习修复一个没有任何标记的样本的破损程序方面表现出了强大的效果，但这依赖于一个完美的批评者(例如，一个编译器)，它返回一个样本是否有效，这对于GEC任务来说是不存在的。本文展示了如何利用预训练的语言模型(LM)来定义LM-Critic，如果LM赋予一个句子的概率高于它的局部扰动，就会判断该句子符合语法。将这个LM-Critic和BIFI与一大批未标记的句子一起应用于引导现实的非语法/语法对，以训练纠正器。在多个领域的GEC数据集(CoNLL-2014、BEA2019、GMEG-wiki和GMEG-yahoo)上评估了该方法，并表明其在无监督设置(+7.7 F0.5)和有监督设置(+0.5 F0.5)中都优于现有方法。

Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive. Recently, the Break-It-Fix-It (BIFI) framework has demonstrated strong results on learning to repair a broken program without any labeled examples, but this relies on a perfect critic (e.g., a compiler) that returns whether an example is valid or not, which does not exist for the GEC task. In this work, we show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical if the LM assigns it a higher probability than its local perturbations. We apply this LMCritic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical/grammatical pairs for training a corrector. We evaluate our approach on GEC datasets across multiple domains (CoNLL-2014, BEA2019, GMEG-wiki and GMEG-yahoo) and show that it outperforms existing methods in both the unsupervised setting (+7.7 F0.5) and the supervised setting (+0.5 F0.5).

https://weibo.com/1402400261/Kywt03BPR

3、[CV] Resolution-robust Large Mask Inpainting with Fourier Convolutions

R Suvorov, E Logacheva, A Mashikhin, A Remizova, A Ashukha, A Silvestrov, N Kong, H Goka, K Park, V Lempitsky

[Samsung AI Center Moscow & Samsung Research & EPFL]

基于傅里叶卷积的分辨率鲁棒大掩模补全。现代图像绘制系统尽管取得了重大进展，但在处理大面积缺失区域、复杂几何结构和高分辨率图像时经常会遇到困难。其中一个主要原因是在绘画网络和损失函数中都缺乏一个有效的感受野。为缓解这个问题，本文提出了一种新的方法，大掩模补全(LaMa)。LaMa的基础是：i）一个新的补全网络结构，采用快速傅里叶卷积，具有图像范围内的感受野；ii）一个大感受野感知损失；以及iii）大型训练掩模，释放了前两个组成部分的潜力。该补全网络在一系列数据集上改善了最先进技术，甚至在具有挑战性的情况下也取得了出色的性能，例如补完周期性结构。所提出模型出乎意料地适用于比训练时更高的分辨率，并以比竞争基线更低的参数和计算成本实现了这一点。

Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an effective receptive field in both the inpainting network and the loss function. To alleviate this issue, we propose a new method called large mask inpainting (LaMa). LaMa is based on i) a new inpainting network architecture that uses fast Fourier convolutions, which have the image-wide receptive field; ii) a high receptive field perceptual loss; and iii) large training masks, which unlocks the potential of the first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g. completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than those seen at train time, and achieves this at lower parameter&compute costs than the competitive baselines. The code is available atthis https URL.

https://weibo.com/1402400261/Kywwu3TaS

4、[RO] Learning to Navigate Sidewalks in Outdoor Environments

M Sorokin, J Tan, C. K Liu, S Ha

[Georgia Institute of Technology & Robotics at Google & Stanford University]

户外环境人行道导航学习。城市环境中人行道上的户外导航，是重要的人类辅助应用背后的关键技术，如最后一英里的送货或社区巡逻。本文旨在开发一种四足机器人，能遵循公共地图服务生成的路线规划，同时保持在人行道上，避免与障碍物和行人发生碰撞。设计了一个两阶段的学习框架，首先在一个具有特殊标准信息的抽象世界中训练一个教师智能体，再用行为克隆法将技能传授给一个只能接触现实传感器的学生智能体。本文的主要研究工作集中在克服在现实世界中的四足机器人上部署学生策略时的挑战。提出了设计传感方式、网络结构和训练程序的方法，以使四足机器人的策略迁移到非结构化和动态的真实户外环境。在美国亚特兰大市人行道上导航的四足机器人上评估了该学习框架。用学到的导航策略和板载传感器，该机器人能够在有限的人类干预下行走3.2公里。

Outdoor navigation on sidewalks in urban environments is the key technology behind important human assistive applications, such as last-mile delivery or neighborhood patrol. This paper aims to develop a quadruped robot that follows a route plan generated by public map services, while remaining on sidewalks and avoiding collisions with obstacles and pedestrians. We devise a two-staged learning framework, which first trains a teacher agent in an abstract world with privileged ground-truth information, and then applies Behavior Cloning to teach the skills to a student agent who only has access to realistic sensors. The main research effort of this paper focuses on overcoming challenges when deploying the student policy on a quadruped robot in the real world. We propose methodologies for designing sensing modalities, network architectures, and training procedures to enable zeroshot policy transfer to unstructured and dynamic real outdoor environments. We evaluate our learning framework on a quadrupedal robot navigating sidewalks in the city of Atlanta, USA. Using the learned navigation policy and its onboard sensors, the robot is able to walk 3.2 kilometers with a limited number of human interventions.

https://weibo.com/1402400261/KywAjCaFV

5、[CL] Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension

N Inoue, H Trivedi, S Sinha, N Balasubramanian, K Inui

[Stony Brook University & Tohoku University]

先摘要再回答：为多跳阅读理解生成简明解释。我们如何才能为多跳阅读理解(RC)生成简明的解释？目前识别支持性句子的策略，可以被看作是对输入文本进行的以问题为中心的抽取式摘要。然而，这些抽取式的解释不一定是简明的，也就是说，对于回答问题来说，并不是最充分的。本文主张采用一种抽象方法，提出对输入段落产生以问题为中心的抽象摘要，然后将其反馈给RC系统。考虑到有限的人工标注的抽象解释，以半监督方式训练抽象解释器，从监督模型开始，通过试错来进一步训练它，使简明-提示奖励函数最大化。实验表明，所提出的抽象解释器可以在保持充分性的前提下，在有限监督下(只有2k个样本)生成比抽取式解释器更紧凑的解释。

How can we generate concise explanations for multi-hop Reading Comprehension (RC)? The current strategies of identifying supporting sentences can be seen as an extractive questionfocused summarization of the input text. However, these extractive explanations are not necessarily concise i.e. not minimally sufficient for answering a question. Instead, we advocate for an abstractive approach, where we propose to generate a question-focused, abstractive summary of input paragraphs and then feed it to an RC system. Given a limited amount of human-annotated abstractive explanations, we train the abstractive explainer in a semi-supervised manner, where we start from the supervised model and then train it further through trial and error maximizing a conciseness-promoted reward function. Our experiments demonstrate that the proposed abstractive explainer can generate more compact explanations than an extractive explainer with limited supervision (only 2k instances) while maintaining sufficiency.

https://weibo.com/1402400261/KywDea0N6

另外几篇值得关注的论文：