care4lang
gw_logo

Projects
We work on a myriad of projects (Listed thematically):

Arabic Varieties (MSA & Dialects) Language Technologies

1. AIDA: Automatic Identification of Dialectal Arabic
2. Processing of Intrasentential Linguistic Code Switching
3. Optimal Diacritization for Arabic Processing and Readability
4. AMIRA/MADAMIRA: Tokenization, Diacritization, lemmatization, POS, Base Phrase Chunking (Shallow Parsing), NER
5. DIRA: Dialect Information Retrieval Assistant
6. Spelling Error Correction
7. Automatic Idafa Identification and Classification

Computational Socio-Pragmatics & Social Media Analytics

1. Perspective Identification and Classification
2. Meme & Rumor Propagation and Detection
3. Sub-group detection
4. Linguistic Committed Belief Tagging
5. Sentiment Analysis
6. Emotion (Intensity) Detection and Classification
7. Urgency Detection

Enabling Language Technologies

1. Domain Adaptation
2. Processing of Low Resource Languages
3. Multilingual/Cross-Lingual Word Sense Disambiguation
4. Open Domain Semantic Role Labeling for low resource languages
5. Multilingual/Cross-Lingual Semantic Textual Similarity
6. Automatic Multiword expression detection and classification
7. Automatic Thesaurus Augmentation and Verification
8. SPLIT: Language Independent preprocessing pipeline

Health Analytics

1. Automatic detection of schizophrenia and PTSD in social media
2. Decision support systems for cancer patients
3. Linguistic Speech Impairment Assistant

Text Analytics

1. Named Entity Recognition
2. Named Entity Linking
3. Event Detection and Classification
4. Name Entity Alias Recognition

Resource Building

1. Tharwa: A pan dialectal Arabic English French thesaurus
2. A unified framework for the universal characterization and Automatic Identification of Multilingual Multiword Expressions
3. Unified framework for multilingual code switch annotations
4. A repository for multilingual emotion annotated data
5. Semantic Textual Similarity Data

NLP Applications

1. Semantically Aware Machine Translation
2. Cross Lingual Textual Entailment
George Washington University +
Natural Language Processing lab +
800 22nd St NW Suite 4934, Washington DC 20036
gw_logo