The detection of discourse structure of scientific documents is important for a number of tasks, including biocuration efforts, text summarisation, and the creation of improved formats for scientific publishing. Currently, many parallel efforts exist to detect a range of discourse elements at different levels of granularity, and for different purposes. Discourse elements detected include facts, problems, hypotheses, experimental results, and analyses of results; the differentiation between new and existing work, and the difference between the author’s own contribution and that of cited sources. A plethora of feature classes is used to identify these elements, including verb tense/mood/voice, semantic verb class, speculative language or negation, and various classes of stance markers, as well as text-structural components and the location of references. The linguistics behind this work involves topics such as the detection of subjectivity, opinion, entailment, and inference; detecting author stance and author disagreement, and inferring differences between the given text and the state of knowledge in a particular field.
Several workshops have been focussed on the detection of some of these features in scientific text, such as speculation and negation in the 2010 workshop on Negation and Speculation in Natural Language Processing and hedging in the CoNLL-2010 Shared Task Learning to detect hedges and their scope in natural language text. There have also been several efforts to produce large-scale corpora, such as BioScope , where negation and speculation information were annotated, and the GENIA Event corpus .
To perform this analysis, a wide range of annotation schemes have been produced, that vary along a number of different axes, including:
• Annotation viewpoint (e.g. argumentative zones, scientific investigation structure, type of knowledge conveyed)
• Unit of annotation (e.g. zone, sentence, segment, event, etc)
• Type of text (abstracts or full papers)
• Domain of application
• Granularity of the annotation categories (coarse or fine-grained)
• Whether other types/levels of information are also annotated (e.g. certainty level, knowledge source, manner etc.)
The goal of the Workshop on “Models of Scientific Discourse Annotation” is to compare and contrast the motivation behind efforts in the discourse annotation of scientific text, the techniques and principles applied in the various approaches, and discuss ways in which they can complement each other and collaborate to form standards for an optimal method of annotating appropriate levels of discourse, with enhanced accuracy and usefulness.
We wish to compare, contrast and evaluate different scientific discourse annotation schemes and tools, in order to answer questions such as:
• What motivates a certain level, method, viewpoint for annotating scientific text?
• What is the annotation level for a unit of argumentation: an event, a sentence, a segment? What are advantages and disadvantages of all three?
• How easily can different schemes to be applied to texts? Are they easily trainable?
• Which schemes are the most portable? Can they be applied to both full papers and abstracts? Can they be applied to texts in different domains?
• How granular should annotation schemes be? What are the advantages/disadvantages of fine and coarse grained annotation categories?
• Can different schemes complement each other to provide different levels of information? Can different schemes be combined to give better results?
• How can we compare annotations, how do we decide which features, approaches, techniques work best?
• How do we exchange and evaluate each other’s annotations?
• How applicable are these efforts towards improved methods of publishing or summarizing science?
We are inviting two types of submissions:
- Research papers by participants who are currently conducting scientific discourse analysis are invited to present their work, augmented by a clear motivation for the granularity, discourse elements and goal of their annotation procedure
- Vision papers, by participants who wish to either compare and contrast existing efforts, or present a vision of annotation as it pertains to specific user goals or a particular view of scientific discourse as a textual genre of study.
In inviting both categories, we hope to stimulate a discussion between the Computational Linguistics community and linguists, genre specialists and sociologists of science, to come to a common understanding regarding the needs and possibilities of scientific discourse analysis.
See the submissions page for more details.