Author: Larry Clarfeld
Posted: November 2019
“Have you decided to read this blog?”
Is this sentence referring to the past tense? The present? The future? All three? There is no universally accepted methodology for assigning temporal reference to text or speech, however when VCL alumnus Lindsay Ross wanted to investigate how temporal reference evolves in palliative care conversations, she was surprised to find there were no publicly available resources for accomplishing the task. So, she created one.
In this blog post, we share the methodology behind the ‘VCL temporal reference tagger’ (TRT) and provide source code in Python for anyone wishing to use this tool in their own research endeavors.
Palliative care conversations are dynamical interactions between patients and clinicians containing content that can span across temporal space. The temporal reference of a conversation may continuously evolve as patients reflect upon the past, take hold of the present, and plan for the future. Understanding how temporal reference changes throughout a conversation provides a unique perspective into how these conversations function. While our interpretation of ‘temporal reference’ is not authoritative, and other valid interpretations may exist, we find in our recent paper (see references below) that our method reveals distinct patterns in the trajectory of temporal reference across narrative time in palliative care consultations.
The TRT begins by using a part-of-speech (POS) tagger to identify the part of speech for each word in a sentence. Traditionally, when linguists and other scholars wanted the POS for a text, it was read manually and each word was tagged considering its meaning in context. These days, machine learning algorithms are trained on massive amounts of manually tagged data to assign POS tags automatically with high accuracy. We use the Natural Language Toolkit (NLTK) Python package for POS tagging (see here for more information on how automatic POS tagging works: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python).
Recall our example, “Have you decided to read this blog?”. The POS tagger from NLTK successfully identifies the verbs in this sentence as “have”, “decided”, and “read”. Specifically, “have” was classified as a singular present-tense verb. “Decided” was tagged as a past-tense verb. For both these verb forms, the temporal reference is apparent, but “read” was tagged as a verb in its base form and so the temporal reference is ambiguous when considering the POS tag alone.
After the POS tagger identifies all verbs in a corpus, the TRT considers context surrounding each verb in order to correctly categorize the temporal reference. Because the verb “read” is preceded by the word “to” in our example, the TRT identifies it as future tense. In the most extreme cases, such as the present participle verb form (verbs ending in -ing), the same verb can function as either past, present, or future depending on the context. For example, “I am reading”, “he was reading”, “she will be reading”. The TRT correctly categorizes the temporal reference in each case as present, past, and future, respectively. For a complete breakdown of how each POS tag is assigned its temporal reference, see the references below and check out the source code.
The VCL Temporal Reference Tagger is available open-source in Python from our website. You are free to use or modify this code for research purposes, as long as you reference the website where you obtained the code. For any published work that utilizes the TRT, or any derivations thereof, please reference the website and the following paper:
Ross, L.M., Danforth C.M., Eppstein M.J., Clarfeld, L.A., Durieux, B.N., Gramling, C.J., Hirsch, L., Rizzo, D.M., Gramling, R. (2019). Story Arcs in Serious Illness: Natural Language Processing features of Palliative Care Conversations. Patient Education and Counseling. (in press)