The Natural Language Processing Group at Northeastern University comprises faculty and students working on a wide range of research problems involving machine learning methods for NLP and their application. Topics of interest include: Biomedical NLP, Applications in the Digital Humanities, Computational Social Science, Interpretability / explainable NLP models, Data Extraction, Text Summarization, Bias and Fairness, among others. We list some recent illustrative publications below, but see individual faculty and student pages for more details.

Sample recent publications

(This is a non-exhaustive sampling of papers from the last 1-2 years, and intended to be illustrative; see individual PI pages for full publications lists.)

Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization.
Byron C. Wallace, Sayantan Saha, Frank Soboczenski and Iain J. Marshall. American Medical Informatics Association (AMIA) summit, 2021.

Source attribution: Recovering the press releases behind science health news. Ansel MacLaughlin, John Wihbey, Aleszu Bajak, and David A. Smith. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), 2020.

ERASER: A Benchmark to Evaluate Rationalized NLP Models. Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C. Wallace. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2020

Finite state machine pattern-root arabic morphological generator, analyzer and diacritizer. Maha Alkhairy, Afshan Jafri, and David A. Smith. In Proceedings of the Language Resources and Evaluation Conference (LREC), 2020.

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions. Xiaochuang Han, Byron C. Wallace, and Yulia Tsvetkov. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

Learning to Faithfully Rationalize by Construction. Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, and Byron C. Wallace
In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time. Benjamin Nye, Ani Nenkova, Iain J. Marshall, and Byron C. Wallace. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL); Demo Track, 2020.

Semi-Automating Knowledge Base Construction for Cancer Genetics. Somin Wadhwa, Kanhua Yin, Kevin S. Hughes, and Byron C. Wallace. In Proceedings of the Conference on Automated Knowledge Base Construction (AKBC), 2020.

Query-Focused EHR Summarization to Aid Imaging Diagnosis. Denis Jered McInerney, Borna Dabiri, Anne-Sophie Touret, Geoffrey Young, Jan-Willem van de Meent and Byron C. Wallace. Proceedings of Machine Learning for Healthcare (MLHC), 2020.

Noisy neural language modeling for typing prediction in BCI communication. Rui Dong, David A. Smith, Shiran Dudy, and Steven Bedrick. In Proceedings of the Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), pages 44–51, 2019.

Practical Obstacles to Deploying Active Learning. David Lowell, Zachary Lipton and Byron C. Wallace. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2019.

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction. Yinfei Yang, Oshin Agarwal, Chris Tar, Byron C. Wallace, and Ani Nenkova. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.

Attention is not Explanation. Sarthak Jain and Byron C. Wallace. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.

Inferring Which Medical Treatments Work from Reports of Clinical Trials. Eric Lehman, Jay DeYoung, Regina Barzilay, and Byron C. Wallace. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.

Structured Representations for Reviews: Aspect-Based Variational Hidden Factor Models. Babak Esmaeili, Hongyi Huang, Byron C. Wallace, and Jan-Willem van de Meent. In Proceedings of AISTATS, 2019.

Relevant Courses

CS 6120 – Natural Language Processing

CS 6140 – Machine Learning

CS 6200 – Information Retrieval

CS 4100 – Artificial Intelligence

CS 7180 – Special Topics in Artificial Intelligence

DS 4440 – Practical Neural Networks

Northeastern NLP Natural Language Processing Research Group

NLP @ Northeastern

Sample recent publications

Relevant Courses