NLP 8505
Arabic Natural Language Processing
Lectures
Monday: 3:00pm - 4:30pm, Classroom 6
Wednesday: 3:00pm - 4:30pm, Lecture Hall 2
Course Description
This course offers an in-depth introduction to Arabic Natural Language Processing (NLP), focusing on the unique challenges presented by Arabic as a computational object of study. Students will learn about core enabling technologies for NLP with a strong focus on the Arabic language and its dialects. The course will include text normalization, morphological analysis, syntactic parsing, and semantic analysis. The course will integrate theory and hands-on experience, including applied deep learning techniques and practical applications like machine translation, sentiment analysis, and more. By the end of the course, students will be equipped to contribute to advancements in Arabic NLP research and development.
Topics
This course combines theoretical foundations with applied practice, organized around the core components of Arabic NLP. The main topics include:
- Arabic Script and Orthography: Principles of the writing system and orthographic variation.
- Tokenization: Fundamentals of word segmentation for Arabic.
- Arabic Morphology: Morphological structure and processes; computational analysis, generation, and disambiguation methods.
- Dialect Modeling: Representing and processing dialectal Arabic.
- Arabic Resources: Corpora and tools.
- Applications in Arabic NLP: Readability modeling, grammatical error correction, and text rewriting as case studies of end-to-end systems.
Supplemental Material
- Required: Habash, Nizar Y. (2010). Introduction to Arabic natural language processing. Morgan & Claypool Publishers [ANLP].
- Required: Camel Tools Documentation [CTD].
- Required: Selected papers from NLP literature, see (evolving) schedule.
Grading
| Percentage | Assessment Component |
|---|---|
| 25% | Assignment 1 |
| 25% | Assignment 2 |
| 50% |
Course Project:
– Team Declaration (5%)
– Proposal Abstract (5%) – Preliminary Report (10%) – Presentation + Final Report (30%) |
Schedule
Week 1
March 2: Introduction to Arabic NLP, history, challenges
- Slides
- Reading List:
- ANLP: Chapter 1
- CTD: Overview
- Arabic Computational Linguistics
- Assignment #1 Assigned
March 4: Arabic Script and Orthography
- Slides
- Reading List:
- ANLP: Chapters 2 and 3
- CTD: Command Line Tools, Utils, Data
- On Arabic Transliteration
Week 2
March 9: Morphological Structure, Analysis and Generation
- Slides
- Reading List:
- ANLP: Chapter 4
- CTD: Morphology
- An Arabic Morphological Analyzer and Generator with Copious Features
- A Morphological Analyzer for Egyptian Arabic
- Team and Project Declaration Due
March 11: Morphological Disambiguation
- Slides
- Reading List:
- Assignment #1 Due
- Assignment #2 Assigned
Week 3
March 23: Arabic Dialect Modeling 1
- Slides
- Reading List:
March 25: Arabic Dialect Modeling 2
- Slides
- Reading List:
Week 4
March 30: Dialectal Arabic Evaluation – Guest Lecture (Dr. Amr Keleg)
- Slides
- Reading List:
- ALDi: Quantifying the Arabic Level of Dialectness of Text
- Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification
- Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets
- Revisiting Common Assumptions about Arabic Dialects in NLP
April 1: Arabic Syntactic Analysis – Guest Lecture (Prof. Nizar Habash)
- Slides
- Reading List:
- Assignment #2 Due
Week 5
April 6: Educational ArabicNLP
- Slides
- Reading List
- Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation
- Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study
- A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment
- The SAMER Arabic Text Simplification Corpus
- Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection
- LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring
April 8: Team 1 - Presentation / Reading Group
- Slides
Week 6
April 13: Team 2 - Presentation / Reading Group
- Slides
April 15: Team 3 - Presentation / Reading Group
- Slides
Week 7
April 20: Bias and Ethics
- Slides
- Reading List:
April 22: Controlled Natural Language Generation for Morphologically Rich Languages: The Case of Arabic
- Slides
Week 8
April 27:
- Final Presentations and Projects Due