Donna Byron  




about me


IBM Watson Group Software Projects: 2012 - present

Building on both the IBM Deep Language Processing stack and IBM Bluemix Conversation Engine, my team creates engaging and compelling product discovery tools to revolutionize in-app product promotion. You can try out all ourdeployed user experiences

Academic Research: 1996 - 2010

I am a Research Scientist in the Relational Agents group at Northeastern University. Our lab creates animated, conversational agents that build long-term relationships with their users. Systems we are currently fielding act as coaches for a variety of lifestyle changes in the area of behavioral medicine. We are currently in the pilot phase of a system to encourage walking in older adults with low health literacy.
My own research focuses on computational modeling for referring expressions in dialog. Recently, I have concentrated on the special problems that come up for embodied, situated dialog agents.

Projects at Northeastern Univ. Computer Science:

  • Establishing and Breaking Conceptual Pacts with Dialog Partners
    People use the identity of the person speaking to them when they comprehend language. Between a pair of discourse partners, their interaction history creates expectations of the way objects will be described, and when the speaker complies with those expectations, comprehension is faster (as measured by eye-gaze fixations). However, when speaking to a different dialog partner, the expectations are reset. In computational terms, this would be a reset point in the discourse context.

    The goal of this project is to understand whether language comprehension works the same when humans converse with animated dialog agents. The results will impact the future design of noun phrase generation for dialog agents, as well as behavioral models for relation-building agents.

    This project is supported by the National Science Foundation.
    Collaborators: Joy Hanna (Oberlin), Eric Fosler-Lussier (Ohio State U.), William Hartman (Ohio State U.)

  • The GIVE Challenge: Generating Instructions In Virtual Environments
    GIVE is a challenge problem for natural language generation. We have built a test platform in which systems can be matched with online users through a web interface, and test their mettle in generating intelligible instructions that direct the human partner to solve a treasure hunt task (there are three different tasks in GIVE-2009, with varying levels of difficulty). The results of over 1000 system runs will be presented in the GIVE workshop at ENLG in 2009, and we will show a demo at EACL 2009. For more details, see the Press Release, Play a game (only until 1/31/2009), or stay tuned here for updates.
    Collaborators: Justine Cassell (Northwestern), Alexander Koller (Saarbrucken), Johanna Moore (Edinburgh), Jon Oberlander (Edinburg) Laura Stoia (Google), Kristina Striegnitz (Union College)
    Related publications:
    Byron, D., Koller, A., Oberlander, J., Stoia, L., Striegnitz, K. (2007). ``Generating Instructions in Virtual Environments (GIVE): A Challenge and an Evalution Testbed for NLG''. In the Proceedings of the Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation, Arlington, Virginia, USA, April 2007.

Projects at the OSU Speech and Language Technologies Lab

  • CIVET: Collaborative Interaction in Virtual EnvironmenTs
    Software agents that are embodied and mobile within a 3D space are a core component of many exciting developing application areas within AI. Examples are unmanned autonomous vehicles, assistant robots like domestic helpers or hospital couriers, or characters that run within simulated worlds for training, entertainment, or social interaction. These sorts of agents need to carry on a dialog while also having a shared experience of the external world with their human partner, and they also must understand spatial relationships between the dialog partners and objects in the world that are under discussion, which requires a variety of dialog skills that existing software does not model.

    Collaborators: Eric Fosler-Lussier, Craige Roberts, Timothy Weale, Tianfang Xu, Laura Stoia, Guadalupe Canahuate, Brad Mellen, Thomas Mampilly, Vinay Sharma, Aakash Dalwani, Ryan Gerritsen, Mark Keck,
    Selected Publications:

    • Algorithms for Dialog components:
      • Laura Stoia, Darla Magdalene Shockley, Donna K. Byron, and Eric Fosler-Lussier. Noun phrase generation for situated dialogs. In Proceedings of the Fourth International Natural Language Generation Conference, pages 81-88, Sydney, Australia, July 2006. Association for Computational Linguistics.
              » PDF, BibTeX

      • Laura Stoia, Donna K. Byron, Darla Shockley, and Eric Fosler-Lussier. Sentence planning for realtime navigational instruction. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 157-160, New York City, USA, June 2006. Association for Computational Linguistics.
              » PDF, BibTeX

      • Donna K. Byron, Thomas Mampilly, Vinay Sharma, and Tianfang Xu. Utilizing visual attention for cross-modal coreference interpretation. volume 3554/2005, pages 83-96, 2005. Springer Lecture Notes in Computer Science: Proceedings of Context-05.

    • Corpus linguistic analysis:
      • Donna K. Byron and Laura Stoia. An analysis of proximity markers in collaborative dialog. In Proceedings of the 41st annual meeting of the Chicago Linguistic Society. Chicago Linguistic Society, 2005.

      • Donna K. Byron, Aakash Dalwani, Ryan Gerritsen, Mark Keck, Thomas Mampilly, Vinay Sharma, Laura Stoia, Timothy Weale, and Tianfang Xu. Natural noun phrase variation for interactive characters. In Proceedings of the First Annual Artificial Intelligence and Interactive Digital Entertainment Conference, pages 15-20, Marina del Rey, California, June 2005. AAAI.

    • Corpus resources: Recordings of human pairs solving the treasure hunt task in a quake world. There are currently three different corpus collections available, free for research use. If you would like to use them, please see the Quake corpus web page
  • Behavioral and processing models for demonstrative pronouns I have studied demonstrative pronouns, and especially the distinctions between personal pronouns like it and demonstrative pronouns like that, both from a computational modeling perspective and also in human behavioral studies. Collaborators: Sarah Brown-Schmidt, Mike Tanenhaus
    • Sarah Brown-Schmidt, Donna K. Byron, and Michael K. Tanenhaus. Beyond salience: Interpretation of personal and demonstrative pronouns. Journal of Memory and Language, 53(2):292-313, August 2005.

    • Sarah Brown-Schmidt, Donna K. Byron, and Michael K. Tanenhaus. That's not it and it is not that. reference resolution and conceptual composites. In Manuel Carreiras and Chuck Clifton, editors, The online study of sentence comprehension: Eyetracking, ERP, and beyond, pages 209-228. Psychology Press, 2004.

  • Computational Models for Zero Anaphors in Korean
    In languages such as Korean, Japanese, Spanish and Portuguese that make heavy use of null anaphors, are null anaphors used in the same circumstances as overt pronouns in languages like English? If they are like English overt pronouns, they may yield to the same processing models as we use for English pronouns. Intuitively, it would seem that null anaphors would appear in highly predictable positions, and their meaning would therefore be easy to calculate. But when the language has both overt and null anaphora, as Korean does, do the two forms need different processing? Computer-readable texts and transcripts have recently become available that allow us to investigate these questions.
    Collaborators: Sun-Hee Lee, Whitney Gegg-Harrison, Seok Bae Jang
    • Donna K. Byron, Whitney Gegg-Harrison, and Sun-Hee Lee. Resolving zero anaphors and pronouns in Korean. Traitement Automatique des Langues (TAL), Special Issue on Anaphora Resolution, 46(1), 91-114.
            » BibTeX

    • Sun-Hee Lee, Donna K. Byron, and Seok Bae Jang. Why is zero marking important in korean? Upcoming in the Proceedings of The Second International Joint Conference on Natural Language Processing (IJCNLP-05, Jeju Island, Korea, 2005.

    • Sun-Hee Lee, Donna Byron, and Whitney Gegg-Harrison. Annotations of zero pronoun resolution in Korean using the Penn Korean Treebank. In Sandra Kubler, Joakim Nivre, Erhard Hinrichs, and Holger Wunsche, editors, Proceedings of the Third Workshop on Treebanks and Linguistic Theories (TLT'04), pages 75-88, Tubingen, Germany, 2004.

    • Sun-Hee Lee and Donna K. Byron. Semantic resolution of zero and pronoun anaphors in korean. In Proceedings of the Discourse Anaphora and Reference Resolution Conference (DAARC2004), pages 103-108, September 2004.

  • Bootstrapping Linguistic Resources
    AI system development has always been plagued by the problem of codifying human knowledge and experience in computer form. Natural language systems are no exception. For spoken dialog systems, all of the vocabulary items and concepts that a particular system needs to be able to discuss must be painstakingly defined by hand. With the availability of large knowledge collections such as the WWW, it has recently become more feasible to automatically bootstrap linguistic knowledge for a system under development. Like many other labs, we are experimenting with techniques to bootstrap vocabulary items, examples of larger constituents such as full sentences, and concepts for a particular domain from the web.
    Collaborators: Eric Fosler-Lussier, Laura Stoia, Tianfang Xu, Jeremy Morris
    • Laura Stoia, Tianfang Xu, Donna K. Byron, and Eric Fosler-Lussier. Populating semantic classes using large-scale corpora. In Workshop MEANING-2005 Developing Multilingual Web-Scale Language Technologies, pages 19-24, Trento, Italy, February 2005. The Meaning Project.

    Last modified: Thu Aug 4 17:20:18 EDT 2005 by dbyron