Leader Board

ID Team Name Description Submission Time Q
153 OKSAT run-N7 2018-09-15 23:41:30 UTC 0.44076
152 ADAPT Final run, normalized best features 2018-09-15 22:32:15 UTC 0.49051
151 OKSAT run-S4 2018-09-14 23:36:29 UTC 0.39083
150 ADAPT MixedModel TitleAnswerSnippet 2018-09-14 20:49:40 UTC 0.46404
149 AITOK view count + answers x snippet cos word2vec double-weighted by norm query 2018-09-14 16:20:29 UTC 0.49437
148 YJRS GBDT 77 features (tuned) 2018-09-14 12:53:22 UTC 0.37429
147 ADAPT Zscore 6 features 2018-09-13 20:29:39 UTC 0.46639
146 OKSAT run-N6 2018-09-13 20:15:44 UTC 0.41897
145 AITOK view count + answers x snippet L1 word2vec double-weighted by norm query 2018-09-13 16:10:16 UTC 0.49412
144 YJRS GBDT 77 features 2018-09-13 08:57:35 UTC 0.37228
143 ADAPT Sum Normalised Features 2018-09-12 18:38:28 UTC 0.44489
142 OKSAT run-N5 2018-09-12 18:10:48 UTC 0.39342
141 AITOK view count + answers x snippet word2vec double-weighted by norm query v2 2018-09-12 15:45:21 UTC 0.49427
140 OKSAT run-U5 2018-09-11 18:09:14 UTC 0.38214
139 AITOK view count + answers x snippet word2vec double-weighted by norm query 2018-09-11 15:43:02 UTC 0.49483
138 OKSAT run9 2018-09-10 16:59:26 UTC 0.49021
137 AITOK view count + answers x snippet 2-gram tf-idf double-weighted by norm query 2018-09-10 15:33:41 UTC 0.50028
136 YJRS YJRS-86 + A -> Q translated 94 features 2018-09-10 07:06:11 UTC 0.38514
135 OKSAT run20 2018-09-09 16:57:29 UTC 0.43063
134 AITOK view count + answers x snippet 2-gram tf-idf weighted by query 2018-09-09 15:30:43 UTC 0.49838
133 ADAPT Combined Features Iteration 2 2018-09-09 08:12:48 UTC 0.44948
132 OKSAT run-S3 2018-09-08 16:52:42 UTC 0.39083
131 AITOK view count + answers x snippet 2-gram tf-idf weighted by query 2018-09-08 15:28:22 UTC 0.50152
130 ADAPT Combined mixed features 2018-09-07 23:09:26 UTC 0.46410
129 AITOK view count + answers x 2-gram tf-idf weighted by query 2018-09-07 14:44:07 UTC 0.50000
128 ADAPT Simple Features 2018-09-06 23:07:45 UTC 0.45909
127 AITOK view count worted with answers x tf-idf weighted by query 2018-09-06 14:37:47 UTC 0.49900
126 ADAPT Combination Mixed-2 2018-09-05 20:22:58 UTC 0.43851
125 AITOK view count sorted with answers, cutoff, click, updated, order and rank 2018-09-05 14:24:36 UTC 0.49393
124 AITOK view count sorted with click, updated, answers, order, rank and cutoff 2018-09-04 14:22:22 UTC 0.49347
123 ADAPT Combination Mixed Features 2018-09-03 22:42:19 UTC 0.49546
122 AITOK click through and view count 2018-09-03 14:19:01 UTC 0.49319
121 OKSAT run-S1 2018-09-03 10:48:31 UTC 0.42256
120 AITOK view count 2018-09-02 13:04:15 UTC 0.49363
119 OKSAT run-N4 2018-09-02 10:22:02 UTC 0.39556
118 ADAPT CLICK Model 2018-09-01 16:31:19 UTC 0.33951
117 AITOK cutoff and view 2018-09-01 12:49:49 UTC 0.43231
116 OKSAT run-U4 2018-09-01 07:49:35 UTC 0.38686
115 AITOK 2-gram TF-IDF+ with click and view with cutoff without rank 2018-08-31 12:15:46 UTC 0.42676
114 OKSAT run-N3 2018-08-31 07:47:46 UTC 0.42346
113 YJRS ListNet 77 features 5cv 2018-08-31 02:50:48 UTC 0.37240
112 ADAPT TopFeatures 2018-08-30 23:39:32 UTC 0.37051
111 AITOK 2-gram TF-IDF+ with click with cutoff and view without rank 2018-08-30 11:40:05 UTC 0.43910
110 ADAPT MartPipeline 2018-08-29 23:36:22 UTC 0.44412
109 AITOK Dependent 2-gram TF-IDF with click through rate with cutoff without rank 2018-08-29 11:08:38 UTC 0.41748
108 OKSAT run-S1 2018-08-29 10:39:14 UTC 0.42334
107 AITOK 2-gram TF-IDF+ with click through rate with cutoff without rank 2018-08-28 11:07:07 UTC 0.42008
106 ADAPT Pipeline system 2018-08-27 15:51:05 UTC 0.45380
105 AITOK 2-gram TF-IDF+ with click through rate with cutoff 2018-08-27 11:03:07 UTC 0.40479
104 OKSAT run-U3 2018-08-27 08:24:28 UTC 0.47441
103 AITOK 1-gram TF-IDF+ with click through rate with cutoff 2018-08-26 10:25:37 UTC 0.39852
102 AITOK 1-gram TF-IDF with click through rate with cutoff 2018-08-25 09:42:37 UTC 0.39724
101 AITOK This result is only for uploading test from AITOK. 2018-08-24 04:07:35 UTC 0.38194
100 YJRS ListNet 77 features 2018-08-19 06:06:55 UTC 0.37340
99 ADAPT Sample Run, Testing 2018-08-13 14:57:31 UTC 0.38194
98 OKSAT run-U2 2018-08-10 02:39:14 UTC 0.43121
97 OKSAT run-U1 2018-08-07 05:25:21 UTC 0.49425
96 OKSAT run-U1 2018-08-07 05:23:56 UTC 0.49425
95 YJRS baseline 77 features (retry) 2018-08-06 01:45:16 UTC 0.39559
94 OKSAT run-U0 2018-08-02 23:29:01 UTC 0.38316
93 YJRS baseline + A -> Q translated 94 features 2018-08-02 06:34:25 UTC 0.46387
92 YJRS YJRS-86 80 features 2018-07-31 02:31:20 UTC 0.45609
91 YJRS baseline 77 features 2018-07-25 04:07:03 UTC 0.39124
90 OKSAT run-S0 2018-07-25 03:21:53 UTC 0.38194
89 ORG # AS IS 2018-06-23 05:00:34 UTC 0.38194

Overview

OpenLiveQ (Open Live Test for Question Retrieval) provides an open live test environment in a community Q&A service of Yahoo Japan Corporation for evaluating question retrieval systems. We offer opportunities of more realistic system evaluation and help research groups address problems specific to real search systems in a production environment (e.g. ambiguous/underspecified queries and diverse relevance criteria). The task is simply defined as follows: given a query and a set of questions with answers, return a ranked list of questions.

NOTE: OpenLiveQ provides only Japanese data and a Japanese open test environment, while we strongly support participants by providing a tool for feature extraction, i.e. Japanese NLP is not required for participation.

Schedule

May - Aug, 2018 Offline test (evaluation with relevance judgment data) *
Sep 1, 2018 Aug 1, 2018 Registration due ( Registration at NTCIR-14 Web site )*
Sep 15, 2018 Aug 31, 2018 Run submission due #
Sep - Nov, 2018 Online test (evaluation with real users) #
Jan 10, 2019 Online test result release #
Feb 1, 2019 Task overview paper (draft) release #
May 15, 2019 Task participant paper (draft) submission due *
May 1, 2019 Task participant paper (camera-ready) submission due *
Jun 10 - 13, 2019 NTCIR-14 Conference at NII, Tokyo, Japan *
* and # indicate schedules that should be done by participants and organizers, respectively.

Participation

To participate in the NTCIR-14 OpenLiveQ task, please read through What participants must do .

Please then take the following steps:
  1. Register through online registration
  2. Make two signed original copies of the user agreement forms
  3. Send the signed copies by postal mail or courier to the NTCIR Project Office

After the agreement is concluded, we will provide the information on how to download the data.

Data

Participants can obtain the following data:

  • 1,000 training and 1,000 test queries input into Yahoo! Chiebukuro search
    • The clickthrough rate of each question in the SERP for each query
    • Demographics of users who clicked on each question
      • Fraction of male and female
      • Fraction of each age
  • At most 1,000 questions with answers for each query, including information presented in the SERP (e.g. snippets)

Task

A set of questions \(D_q \subset D\) (\(D\) is a set of all the questions) is given for each query \(q \in Q\). Only a task in OpenLiveQ is to rank questions in \(D_q\) for each query \(q\).

Input

The input consists of queries and questions for each query.

Queries are included in file "OpenLiveQ-queries-test.tsv", in which each line contains a query. The file format is shown below:
[QueryID_1]\t[Content_1]
[QueryID_2]\t[Content_2]
...
[QueryID_n]\t[Content_n]

where[QueryID_i]is a query ID and[Content_i]is a query string.

A set of all the questions are included in file "OpenLiveQ-questions-test.tsv", in which each line contains a pair of a query ID and a question ID. The file format is shown below:
[QueryID_1]\t[QuestionID_1_1]
[QueryID_1]\t[QuestionID_1_2]
...
[QueryID_n]\t[QuestionID_n_m]

where a pair of a query ID and a question ID indicates which documents correspond to a query, i.e. question \(d\) belongs to \(D_q\) for query \(q\). Line[QueryID_i]\t[QuestionID_i_j]indicates question[QuestionID_i_j]belongs to \(D_q\) for query [QueryID_i] .

Sample of Input

OpenLiveQ-queries.tsv
OLQ-0001 野球
OLQ-0002 広島
OLQ-0003 神社


OpenLiveQ-questions.tsv
OLQ-0001 q0000000000
OLQ-0001 q0000000001
OLQ-0002 q0000000000
OLQ-0002 q0000000002
OLQ-0003 q0000000003
OLQ-0003 q0000000004

Output

The output is a ranked list of questions for each query. Ranked lists should be saved in a single file, in which each line includes a pair of a query ID and a question ID. The file format is shown below:
[Description]
[QueryID_1]\t[QuestionID_1_1]
[QueryID_1]\t[QuestionID_1_2]
...
[QueryID_n]\t[QuestionID_n_m]

where[Description]is a simple description about your system, which should not include newline characters. The content of the output file except for the first line must be exactly the same as that of the question file "OpenLiveQ-questions.tsv" except for the order of lines. In the output file, line[QueryID_i]\t[QuestionID_i_j]shown before line[QueryID_i]\t[QuestionID_i_j']indicates that the rank of question[QuestionID_i_j]is higher than that of question [QuestionID_i_j'] for query[QueryID_i].

Sample of Output

OLQ-0001 q0000000001
OLQ-0001 q0000000000
OLQ-0002 q0000000002
OLQ-0002 q0000000000
OLQ-0003 q0000000004
OLQ-0003 q0000000003


The output above represents the following ranked lists:

  • OLQ-0001: q0000000001, q0000000000
  • OLQ-0002: q0000000002, q0000000000
  • OLQ-0003: q0000000004, q0000000003

Resources

To rank the questions, participants can leverage some resources such as training queries, training questions, question data including titles and body, and clickthrough data.

Training Queries

Training queries are included in file "OpenLiveQ-queries-train.tsv", and the file format is the same as that of "OpenLiveQ-queries-test.tsv".

Training Questions

Training questions are included in file "OpenLiveQ-questions-train.tsv", and the file format is the same as that of "OpenLiveQ-questions-test.tsv".

Question Data

Information about all the questions are included in "OpenLiveQ-question-data.tsv", and each line of the file contains the following values of a question (values are separated by tabs):

  1. Query ID (a query for the question)
  2. Rank of the question in a Yahoo! Chiebukuro search result for the query of Query ID
  3. Question ID
  4. Title of the question
  5. Snippet of the question in a search result
  6. Status of the question (accepting answers, accepting votes, solved)
  7. Last update time of the question
  8. Number of answers for the question
  9. Page view of the question
  10. Category of the question
  11. Body of the question
  12. Body of the best answer for the question

Clickthrough Data

Clickthrough data are available for some of the questions. Based on the clickthrough data, one can estimate the click probability of the questions, and understand what kinds of users click on a certain question. The clickthrough data are included in file "OpenLiveQ-clickthrough-data.tsv", and each line consits of the following values separated by tabs:

  1. Query ID (a query for the question)
  2. Question ID
  3. Most frequent rank of the question in a Yahoo! Chiebukuro search result for the query of Query ID
  4. Clickthrough rate
  5. Fraction of male users among those who clicked on the question
  6. Fraction of female users among those clicked on the question
  7. Fraction of users under 10 years old among those who clicked on the question
  8. Fraction of users in their 10s among those who clicked on the question
  9. Fraction of users in their 20s among those who clicked on the question
  10. Fraction of users in their 30s among those who clicked on the question
  11. Fraction of users in their 40s among those who clicked on the question
  12. Fraction of users in their 50s among those who clicked on the question
  13. Fraction of users over 60 years old among those who clicked on the question
The clickthrough data contain click statistics of a question identified by Question ID when a query identified by Query ID was submitted. The rank of the question can change even for the same query. This is why the third value indicates the most frequent rank of the question.

Evaluation

Offline Test

Evaluation with relevance judgment data

Offline test is carried out before online test explained later, and determines participants whose systems are evaluated in the online test, based on results in the offline test. Evaluation is conducted in a similar way to traditional ad-hoc retrieval tasks, in which results are evaluated by relevance judgment results and evaluation metrics such as nDCG (normalized discounted cumulative gain), ERR (expected reciprocal rank), and Q-measure. During the offline test period, participants can submit their results once per day through this Web site, and obtain evaluation results right after the submission.

Evaluation Metrics

The following evaluation metrics are used in our plan:
  • nDCG (normalized discounted cumulative gain)
  • ERR (expected reciprocal rank)
  • Q-measure

Submission

You can submit your run by the following command in Linux or Mac environments:

curl http://www.openliveq.net/runs -X POST -H "Authorization:[AUTH_TOKEN]" -F run_file=@[PATH_TO_YOUR_RUN_FILE]

where [AUTH_TOKEN] is distributed only to participants.

For example, curl http://www.openliveq.net/runs -X POST -H "Authorization:ORG:AABBCCDDEEFF" -F run_file=@data/your_run.tsv

Please note that

  1. It takes a few minutes to upload a run file,
  2. Each team is not allowed to submit two or more runs within 24 hours, and

The evaluation result (Q-measure) will be displayed on the top of this website. Details of evaluation results will be sent after the submission deadline.

Online Test

Evaluation with real users

Submitted results are evaluated by multileaving1. Submitted results are combined into a single SERP by the multileaving, presented to real users during the online test period, and evaluated on the basis of clicks observed. Results submitted in the offline test period are used in as-is in the online test if they are not significantly worse than the current ranking results. Note that some questions can be excluded in the online test if they are deleted for some reasons before or during the online test. In NTCIR-14 OpenLiveQ-2, by means of the online evaluation, we will evaluate all the runs that outperform the baseline run (ID: 89) in terms of the offline evaluation.

1 Schuth et al. "Multileaved Comparisons for Fast Online Evaluation." CIKM 2014.

Organizers

  • Makoto P. Kato (Kyoto University)
  • Takehiro Yamamoto (Kyoto University)
  • Sumio Fujita (Yahoo Japan Corporation)
  • Akiomi Nishida (Yahoo Japan Corporation)
  • Tomohiro Manabe (Yahoo Japan Corporation)