OpenLiveQ (Open Live Test for Question Retrieval) is one of the core tasks in NTCIR, in which your question retrieval systems are evaluated in the production environment of Yahoo! Chiebukuro (a community Q&A service)
|ID||Team Name||Description||Submission Time||nDCG@10|
OpenLiveQ (Open Live Test for Question Retrieval) provides an open live test environment in a community Q&A service of Yahoo Japan Corporation for evaluating question retrieval systems. We offer opportunities of more realistic system evaluation and help research groups address problems specific to real search systems in a production environment (e.g. ambiguous/underspecified queries and diverse relevance criteria). The task is simply defined as follows: given a query and a set of questions with answers, return a ranked list of questions.
|May - Aug, 2018||Offline test (evaluation with relevance judgment data) *|
|Aug 1, 2018||Registration due （ Registration at NTCIR-14 Web site ）*|
|Aug 31, 2018||Run submission due #|
|Sep - Nov, 2018||Online test (evaluation with real users) #|
|Jan 10, 2019||Online test result release #|
|Feb 1, 2019||Task overview paper (draft) release #|
|May 15, 2019||Task participant paper (draft) submission due *|
|May 1, 2019||Task participant paper (camera-ready) submission due *|
|Jun 10 - 13, 2019||NTCIR-14 Conference at NII, Tokyo, Japan *|
|* and # indicate schedules that should be done by participants and organizers, respectively.|
To participate in the NTCIR-14 OpenLiveQ task, please read through What participants must do .Please then take the following steps:
- Register through online registration
- Make two signed original copies of the user agreement forms
- Send the signed copies by postal mail or courier to the NTCIR Project Office
After the agreement is concluded, we will provide the information on how to download the data.
Participants can obtain the following data:
1,000 training and 1,000 test queries input into Yahoo! Chiebukuro search
- The clickthrough rate of each question in the SERP for each query
Demographics of users who clicked on each question
- Fraction of male and female
- Fraction of each age
- At most 1,000 questions with answers for each query, including information presented in the SERP (e.g. snippets)
A set of questions \(D_q \subset D\) (\(D\) is a set of all the questions) is given for each query \(q \in Q\). Only a task in OpenLiveQ is to rank questions in \(D_q\) for each query \(q\).
The input consists of queries and questions for each query.
Queries are included in file "OpenLiveQ-queries-test.tsv", in which each line contains a query. The file format is shown below:
[QueryID_i]is a query ID and
[Content_i]is a query string.
A set of all the questions are included in file "OpenLiveQ-questions-test.tsv", in which each line contains a pair of a query ID and a question ID. The file format is shown below:
where a pair of a query ID and a question ID indicates which documents correspond to a query, i.e. question \(d\) belongs to \(D_q\) for query \(q\). Line
[QuestionID_i_j]belongs to \(D_q\) for
Sample of Input
The output is a ranked list of questions for each query.
Ranked lists should be saved in a single file,
in which each line includes a pair of a query ID and a question ID.
The file format is shown below:
[Description]is a simple description about your system,
which should not include newline characters.
The content of the output file except for the first line
must be exactly the same as that of the question file
"OpenLiveQ-questions.tsv" except for the order of lines.
In the output file, line
[QueryID_i]\t[QuestionID_i_j]shown before line
[QueryID_i]\t[QuestionID_i_j']indicates that the rank of question
[QuestionID_i_j]is higher than that of question
Sample of Output
The output above represents the following ranked lists:
- OLQ-0001: q0000000001, q0000000000
- OLQ-0002: q0000000002, q0000000000
- OLQ-0003: q0000000004, q0000000003
To rank the questions, participants can leverage some resources such as training queries, training questions, question data including titles and body, and clickthrough data.
Training queries are included in file "OpenLiveQ-queries-train.tsv", and the file format is the same as that of "OpenLiveQ-queries-test.tsv".
Training questions are included in file "OpenLiveQ-questions-train.tsv", and the file format is the same as that of "OpenLiveQ-questions-test.tsv".
Information about all the questions are included in "OpenLiveQ-question-data.tsv", and each line of the file contains the following values of a question (values are separated by tabs):
- Query ID (a query for the question)
- Rank of the question in a Yahoo! Chiebukuro search result for the query of Query ID
- Question ID
- Title of the question
- Snippet of the question in a search result
- Status of the question (accepting answers, accepting votes, solved)
- Last update time of the question
- Number of answers for the question
- Page view of the question
- Category of the question
- Body of the question
- Body of the best answer for the question
Clickthrough data are available for some of the questions. Based on the clickthrough data, one can estimate the click probability of the questions, and understand what kinds of users click on a certain question. The clickthrough data are included in file "OpenLiveQ-clickthrough-data.tsv", and each line consits of the following values separated by tabs:
- Query ID (a query for the question)
- Question ID
- Most frequent rank of the question in a Yahoo! Chiebukuro search result for the query of Query ID
- Clickthrough rate
- Fraction of male users among those who clicked on the question
- Fraction of female users among those clicked on the question
- Fraction of users under 10 years old among those who clicked on the question
- Fraction of users in their 10s among those who clicked on the question
- Fraction of users in their 20s among those who clicked on the question
- Fraction of users in their 30s among those who clicked on the question
- Fraction of users in their 40s among those who clicked on the question
- Fraction of users in their 50s among those who clicked on the question
- Fraction of users over 60 years old among those who clicked on the question
Evaluation with relevance judgment data
Offline test is carried out before online test explained later, and determines participants whose systems are evaluated in the online test, based on results in the offline test. Evaluation is conducted in a similar way to traditional ad-hoc retrieval tasks, in which results are evaluated by relevance judgment results and evaluation metrics such as nDCG (normalized discounted cumulative gain), ERR (expected reciprocal rank), and Q-measure. During the offline test period, participants can submit their results once per day through this Web site, and obtain evaluation results right after the submission.
Evaluation MetricsThe following evaluation metrics are used in our plan:
- nDCG (normalized discounted cumulative gain)
- ERR (expected reciprocal rank)
SubmissionYou can submit your run by the following command in Linux or Mac environments:
curl http://www.openliveq.net/runs -X POST -H "Authorization:[AUTH_TOKEN]" -F run_file=@[PATH_TO_YOUR_RUN_FILE]
where [AUTH_TOKEN] is distributed only to participants.
curl http://www.openliveq.net/runs -X POST -H "Authorization:ORG:AABBCCDDEEFF" -F run_file=@data/your_run.tsv
Please note that
- It takes a few minutes to upload a run file,
- Each team is not allowed to submit two or more runs within 24 hours, and
The evaluation result (nDCG@10) will be displayed on the top of this website. Details of evaluation results will be sent after the submission deadline.
Evaluation with real users
Submitted results are evaluated by multileaving1. Submitted results are combined into a single SERP by the multileaving, presented to real users during the online test period, and evaluated on the basis of clicks observed. Results submitted in the offline test period are used in as-is in the online test if they are not significantly worse than the current ranking results. Note that some questions can be excluded in the online test if they are deleted for some reasons before or during the online test.
1 Schuth et al. "Multileaved Comparisons for Fast Online Evaluation." CIKM 2014.
- Makoto P. Kato (Kyoto University)
- Takehiro Yamamoto (Kyoto University)
- Sumio Fujita (Yahoo Japan Corporation)
- Akiomi Nishida (Yahoo Japan Corporation)
- Tomohiro Manabe (Yahoo Japan Corporation)