ResearchHub | Open Science Community

FD

Franck Dernoncourt

Author with expertise in Statistical Machine Translation and Natural Language Processing

Achievements

Cited Author

Key Stats

Upvotes received:

0

Publications:

11

(45% Open Access)

Cited by:

1,511

h-index:

34

/

i10-index:

79

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

Show more

How is this calculated?

Publications

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Arman Cohan et al.Jan 1, 2018

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, Nazli Goharian. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.

Artificial Intelligence

0

Paper

Save

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks

Ji Lee et al.Jan 1, 2016

Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification.However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one.In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts.Our model achieves state-of-the-art results on three different datasets for dialog act prediction.

Artificial Intelligence

Computer Science

0

Paper

Artificial Intelligence

Save

De-identification of patient notes with recurrent neural networks

Franck Dernoncourt et al.Oct 11, 2016

Objective: Patient notes in electronic health records (EHRs) may contain critical information for medical investigations. However, the vast majority of medical investigators can only access de-identified notes, in order to protect the confidentiality of patients. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) defines 18 types of protected health information that needs to be removed to de-identify patient notes. Manual de-identification is impractical given the size of electronic health record databases, the limited number of researchers with access to non-de-identified notes, and the frequent mistakes of human annotators. A reliable automated de-identification system would consequently be of high value. Materials and Methods: We introduce the first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems. We compare the performance of the system with state-of-the-art systems on two datasets: the i2b2 2014 de-identification challenge dataset, which is the largest publicly available de-identification dataset, and the MIMIC de-identification dataset, which we assembled and is twice as large as the i2b2 2014 dataset. Results: Our ANN model outperforms the state-of-the-art systems. It yields an F1-score of 97.85 on the i2b2 2014 dataset, with a recall of 97.38 and a precision of 98.32, and an F1-score of 99.23 on the MIMIC de-identification dataset, with a recall of 99.25 and a precision of 99.21. Conclusion: Our findings support the use of ANNs for de-identification of patient notes, as they show better performance than previously published systems while requiring no manual feature engineering.

Artificial Intelligence

Health Information Management

0

Paper

Artificial Intelligence

Save

Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives

Sebastian Gehrmann et al.Feb 15, 2018

In secondary analysis of electronic health records, a crucial task consists in correctly identifying the patient cohort under investigation. In many cases, the most valuable and relevant information for an accurate classification of medical conditions exist only in clinical narratives. Therefore, it is necessary to use natural language processing (NLP) techniques to extract and evaluate these narratives. The most commonly used approach to this problem relies on extracting a number of clinician-defined medical concepts from text and using machine learning techniques to identify whether a particular patient has a certain condition. However, recent advances in deep learning and NLP enable models to learn a rich representation of (medical) language. Convolutional neural networks (CNN) for text classification can augment the existing techniques by leveraging the representation of language to learn which phrases in a text are relevant for a given medical condition. In this work, we compare concept extraction based methods with CNNs and other commonly used models in NLP in ten phenotyping tasks using 1,610 discharge summaries from the MIMIC-III database. We show that CNNs outperform concept extraction based methods in almost all of the tasks, with an improvement in F1-score of up to 26 and up to 7 percentage points in area under the ROC curve (AUC). We additionally assess the interpretability of both approaches by presenting and evaluating methods that calculate and extract the most salient phrases for a prediction. The results indicate that CNNs are a valid alternative to existing approaches in patient phenotyping and cohort identification, and should be further investigated. Moreover, the deep learning approach presented in this paper can be used to assist clinicians during chart review or support the extraction of billing codes from text by identifying and highlighting relevant phrases for various medical conditions.

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

Bias and Fairness in Large Language Models: A Survey

Isabel Gallegos et al.Jun 11, 2024

Abstract Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this article, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely, metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

Artificial Intelligence

0

Paper

Save

PDFTriage: Question Answering over Long, Structured Documents

Jon Saad-Falcon et al.Jan 1, 2024

Artificial Intelligence

Computer Science

0

Paper

Artificial Intelligence

Save

MCECR: A Novel Dataset for Multilingual Cross-Document Event Coreference Resolution

Amir Veyseh et al.Jan 1, 2024

Artificial Intelligence

Computer Science

0

Paper

Artificial Intelligence

Computer Science

Save

DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

Manan Suri et al.Jan 1, 2024

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

Save

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

Mihir Parmar et al.Jan 1, 2024

Artificial Intelligence

Management Science And Operations Research

0

Paper

Artificial Intelligence

Management Science And Operations Research

Save

ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning

Hieu Man et al.Jan 1, 2024

Artificial Intelligence

Computer Science

0

Paper

Artificial Intelligence

Computer Science

Save

Load More