NLP 8505
Arabic Natural Language Processing
Lectures
Monday: 3:00pm - 4:30pm, Classroom 6
Wednesday: 3:00pm - 4:30pm, Lecture Hall 2
Course Description
This course offers an in-depth introduction to Arabic Natural Language Processing (NLP), focusing on the unique challenges presented by Arabic as a computational object of study. Students will learn about core enabling technologies for NLP with a strong focus on the Arabic language and its dialects. The course will include text normalization, morphological analysis, syntactic parsing, and semantic analysis. The course will integrate theory and hands-on experience, including applied deep learning techniques and practical applications like machine translation, sentiment analysis, and more. By the end of the course, students will be equipped to contribute to advancements in Arabic NLP research and development.
Topics
This course combines theoretical foundations with applied practice, organized around the core components of Arabic NLP. The main topics include:
- Arabic Script and Orthography: Principles of the writing system and orthographic variation.
- Tokenization: Fundamentals of word segmentation for Arabic.
- Arabic Morphology: Morphological structure and processes; computational analysis, generation, and disambiguation methods.
- Dialect Modeling: Representing and processing dialectal Arabic.
- Arabic Resources: Corpora and tools.
- Applications in Arabic NLP: Readability modeling, grammatical error correction, and text rewriting as case studies of end-to-end systems.
Supplemental Material
- Required: Habash, Nizar Y. (2010). Introduction to Arabic natural language processing. Morgan & Claypool Publishers [ANLP].
- Required: Camel Tools Documentation [CTD].
- Required: Selected papers from NLP literature.
Grading
| Percentage | Assessment Component |
|---|---|
| 25% | Assignment 1 |
| 25% | Assignment 2 |
| 50% |
Course Project:
– Team and Project Declaration (10%)
– Project Related Work and Methodology (10%) – Final Report (30%) |
Schedule
Week 1
March 2: Introduction to Arabic NLP, history, challenges
- Slides
- Video
- Reading List:
- ANLP: Chapter 1
- CTD: Overview
- Arabic Computational Linguistics
- Assignment #1 Assigned
March 4: Arabic Script and Orthography
- Slides
- Video
- Reading List:
- ANLP: Chapters 2 and 3
- CTD: Command Line Tools, Utils, Data
- On Arabic Transliteration
Week 2
March 23: Morphological Structure, Analysis and Generation
- Slides
- Video
- Reading List:
- ANLP: Chapter 4
- CTD: Morphology
- An Arabic Morphological Analyzer and Generator with Copious Features
- A Morphological Analyzer for Egyptian Arabic
- Team and Project Declaration Due
March 25: Morphological Disambiguation
Week 3
April 1: Arabic Dialect Modeling 1
April 2: Arabic Dialect Modeling 2
- Slides
- Video
- Reading List:
- Unified Guidelines and Resources for Arabic Dialect Orthography
- A Spelling Correction Corpus for Multiple Arabic Dialects
- Arabic Dialect Identification under Scrutiny: Limitations of Single-label
- ALDi: Quantifying the Arabic Level of Dialectness of Text
- Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets
- Exploiting Dialect Identification in Automatic Dialectal Text Normalization
- Revisiting Common Assumptions about Arabic Dialects in NLP
- The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness
Week 4
April 6: Arabic Syntactic Analysis – Guest Lecture (Prof. Nizar Habash)
April 8: Educational ArabicNLP
- Slides
- Video
- Reading List:
- Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation
- Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study
- A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment
- The SAMER Arabic Text Simplification Corpus
- Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection
- LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring
- Assignment #2 Due
Week 5
April 13: Projects Presentations
April 15: Projects Presentations
Week 6
April 20: Bias and Ethics
April 22: Current Trends and Outlook
Week 8
April 27:
- Projects Due