ResearchHub | Open Science Community

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

Jun Xu et al.Jun 1, 2016

While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image captioning, we are not aware of any large-scale video description dataset with comprehensive categories yet diverse video content. In this paper we present MSR-VTT (standing for "MSRVideo to Text") which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text. This is achieved by collecting 257 popular queries from a commercial video search engine, with 118 videos for each query. In its current version, MSR-VTT provides 10K web video clips with 41.2 hours and 200K clip-sentence pairs in total, covering the most comprehensive categories and diverse visual content, and representing the largest dataset in terms of sentence and vocabulary. Each clip is annotated with about 20 natural sentences by 1,327 AMT workers. We present a detailed analysis of MSR-VTT in comparison to a complete set of existing datasets, together with a summarization of different state-of-the-art video-to-text approaches. We also provide an extensive evaluation of these approaches on this dataset, showing that the hybrid Recurrent Neural Networkbased approach, which combines single-frame and motion representations with soft-attention pooling strategy, yields the best generalization capability on MSR-VTT.

Philosophy

Artificial Intelligence

0

Paper

Save

AdaRank

Jun Xu et al.Jul 23, 2007

In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.

Geology

Artificial Intelligence

0

Paper

Save

A model of (often mixed) stereotype content: Competence and warmth respectively follow from perceived status and competition.

Susan Fiske et al.Jan 1, 2002

Ecology

Gender Studies

0

Paper

Save

Adapting ranking SVM to document retrieval

Yunbo Cao et al.Aug 6, 2006

The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a "learning to rank" method, to document retrieval. First, correctly ranking documents on the top of the result list is crucial for an Information Retrieval system. One must conduct training in a way that such ranked results are accurate. Second, the number of relevant documents can vary from query to query. One must avoid training a model biased toward queries with a large number of relevant documents. Previously, when existing methods that include Ranking SVM were applied to document retrieval, none of the two factors was taken into consideration. We show it is possible to make modifications in conventional Ranking SVM, so it can be better used for document retrieval. Specifically, we modify the "Hinge Loss" function in Ranking SVM to deal with the problems described above. We employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming. Experimental results show that our method, referred to as Ranking SVM for IR, can outperform the conventional Ranking SVM and other existing methods for document retrieval on two datasets.

Artificial Intelligence

Information Systems

0

Paper

Artificial Intelligence

551

0

Save

0

LETOR: A benchmark collection for research on learning to rank for information retrieval

Tao Qin et al.Dec 31, 2009

Artificial Intelligence

Information Systems

0

Paper

Artificial Intelligence

460

0

Save

0

Text Matching as Image Recognition

Liang Pang et al.Mar 5, 2016

Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.

Artificial Intelligence

Organic Chemistry

0

Paper

Artificial Intelligence

419

0

Save

0

Learning Hierarchical Representation Model for NextBasket Recommendation

Pengfei Wang et al.Aug 4, 2015

Next basket recommendation is a crucial task in market basket analysis. Given a user's purchase history, usually a sequence of transaction data, one attempts to build a recommender that can predict the next few items that the user most probably would like. Ideally, a good recommender should be able to explore the sequential behavior (i.e., buying one item leads to buying another next), as well as account for users' general taste (i.e., what items a user is typically interested in) for recommendation. Moreover, these two factors may interact with each other to influence users' next purchase. To tackle the above problems, in this paper, we introduce a novel recommendation approach, namely hierarchical representation model (HRM). HRM can well capture both sequential behavior and users' general taste by involving transaction and user representations in prediction. Meanwhile, the flexibility of applying different aggregation operations, especially nonlinear operations, on representations allows us to model complicated interactions among different factors. Theoretically, we show that our model subsumes several existing methods when choosing proper aggregation operations. Empirically, we demonstrate that our model can consistently outperform the state-of-the-art baselines under different evaluation metrics on real-world transaction data.

Artificial Intelligence

Law

0

Paper

Artificial Intelligence

381

0

Save

0

A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations

Shengxian Wan et al.Mar 5, 2016

Matching natural language sentences is central for many applications such as information retrieval and question answering. Existing deep models rely on a single sentence representation or multiple granularity representations for matching. However, such methods cannot well capture the contextualized local information in the matching process. To tackle this problem, we present a new deep architecture to match two sentences with multiple positional sentence representations. Specifically, each positional sentence representation is a sentence representation at this position, generated by a bidirectional long short term memory (Bi-LSTM). The matching score is finally produced by aggregating interactions between these different positional sentence representations, through k-Max pooling and a multi-layer perceptron. Our model has several advantages: (1) By using Bi-LSTM, rich context of the whole sentence is leveraged to capture the contextualized local information in each positional sentence representation; (2) By matching with multiple positional sentence representations, it is flexible to aggregate different important contextualized local information in a sentence to support the matching; (3) Experiments on different tasks such as question answering and sentence completion demonstrate the superiority of our model.

Artificial Intelligence

Paleontology

0

Paper

Artificial Intelligence

236

0

Save

0

User Behavior Simulation with Large Language Model-based Agents for Recommender Systems

Lei Wang et al.Dec 20, 2024

Simulating high quality user behavior data has always been a fundamental yet challenging problem in human-centered applications such as recommendation systems, social networks, among many others. The major difficulty of user behavior simulation originates from the intricate mechanism of human cognitive and decision processes. Recently, substantial evidence have suggested that by learning huge amounts of web knowledge, large language models (LLMs) can achieve human-like intelligence and generalization capabilities. Inspired by such capabilities, in this paper, we take an initial step to study the potential of using LLMs for user behavior simulation in the recommendation domain. To make LLMs act like humans, we design profile, memory and action modules to equip them, building LLM-based agents to simulate real users. To enable interactions between different agents and observe their behavior patterns, we design a sandbox environment, where each agent can interact with the recommendation system, and different agents can converse with their friends via one-to-one chatting or one-to-many social broadcasting. In the experiments, we first demonstrate the believability of the agent-generated behaviors based on both subjective and objective evaluations. Then, to show the potential applications of our method, we simulate and study two social phenomenons including (1) information cocoons and (2) user conformity behaviors. We find that controlling the personalization degree of recommendation algorithms and improving the heterogeneity of user social relations can be two effective strategies for alleviating the problem of information cocoon, and the conformity behaviors can be highly influenced by the amount of user social relations. To advance this direction, we have released our project at https://github.com/RUC-GSAI/YuLan-Rec .

Artificial Intelligence

Information Systems

0

Paper

Artificial Intelligence

1

0

Save

0

Adapting Constrained Markov Decision Process for OCPC Bidding with Delayed Conversions

Leping Zhang et al.Nov 29, 2024

Nowadays, optimized cost-per-click (OCPC) has been widely adopted in online advertising. In OCPC, the advertiser sets an expected cost-per-conversion and pays per click, while the platform automatically adjusts the bid on each click to meet advertiser's constraint. Existing bidding methods are based on feedback control, adjusting bids to keep the current cost-per-conversion close to the expected cost-per-conversion to avoid compensation. However, they overlook the conversion lag phenomenon: There always exists a time interval between the ad's click time and conversion time. This interval makes existing methods overestimate the cost-per-conversion and results in over conservative bidding policies which finally hurts the revenue. To address the issue, this paper proposes a novel bidding method, Bid-DC (Bidding with Delayed Conversions) which predicts the conversion probability of the clicked ads and used it to adjust the cost-per-conversion values. To ensure the bidding model can satisfy the advertiser's constraint, constrained Markov decision process (CMDP) is adapted to automatically learn the optimal parameters from the log data. Both online and offline experiments demonstrate that Bid-DC outperforms the state-of-the-art baselines in terms of improving revenue. Empirical analysis also showed Bid-DC can accurately estimate the cost-per-conversion and make more stable bids.

Industrial And Manufacturing Engineering

Computer Networks And Communications

0

Paper

Industrial And Manufacturing Engineering

Computer Networks And Communications

0

Save