Gold Standard Online Debates Summaries:
Salient Sentence Selection Dataset (SSSD)

About this dataset:

XML Format [New!]

<comment id = "">

<sentence id = "">

<annotation>

side

the number of likes

CSV Format

recordid: the record identification number which uniquely identifies each sentence.
debateid: the identificaiton number of each debate comment.
debatetopicname: the name of a debate topic.
commentid: the identification number of each debate comment.
sentenceid: the identification number of a sentence in each debate comment.
sentence: a sentence in each comment.
side: a side (stance) of a debate comment.
like: the number of votes that support this comment.
annotation: the annotation for a comment.

sentenceid

commentid

debateid

sentenceid

Dowload this dataset:

[DOWNLOAD XML FORMAT]

[DOWNLOAD CSV FORMAT]

References:

Download this paper

[PDF]

Dowload bibtex entry

[BibTex]

@InProceedings{sanchan20188,
  author    = {Sanchan, Nattapong  and  Aker, Ahmet  and  Bontcheva, Kalina},
  title     = {Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data},
  booktitle = {Computational Linguistics and Intelligent Text Processing},
  editor    = {Gelbukh, Alexander}, 
  year      = {2018},
  address=  = {Cham},
  Volume    = {10762},
  publisher = {Springer International Publishing},
  pages     = {495--505},
  abstract  = {Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile, online debate forums have recently become popular, but have remained largely unexplored. For this reason, there are no sufficient resources of annotated debate data available for conducting research in this genre. In this paper, we collected and annotated debate data for an automatic summarization task. Similar to extractive gold standard summary generation our data contains sentences worthy to include into a summary. Five human annotators performed this task. Inter-annotator agreement, based on semantic similarity, is 36{\%} for Cohen's kappa and 48{\%} for Krippendorff's alpha. Moreover, we also implement an extractive summarization system for online debates and discuss prominent features for the task of summarizing online debate data automatically.},
  isbn      = {978-3-319-77116-8}
   }

Rich text bibliography entry (for copy & paste into a word processor)

Computational Linguistics and Intelligent Text Processing