Natural Language Processing
An introduction on the techniques and Python tools for practitioners who want to get started with Natural Language Processing applications.
Description of the course
The course will balance theoretical foundations with practical examples using the Python programming language. No prior experience with libraries such as NLTK or scikit-learn is required for this course. Having existing experience with Python will be extremely useful but not required: users of other programming languages and tools (including e.g. Java, C++, C#, JavaScript, Matlab, Excel or Rlang) will find this course beneficial.
Learning objectives
By attending this course, you will learn about:
- Data representations to effectively work with text
- Exploratory techniques to quickly gain insights from your text data
- Machine Learning techniques to organise your documents into categories
- How to evaluate the quality of your models
- Ideas on advanced applications using Natural Language data
Syllabus
Natural Language Processing Foundations
The first section provides the basic tools and techniques to get started with Natural Language Processing.
- Overview on NLP applications and the Python NLP ecosystem – NLTK, spaCy, Gensim, scikit-learn
- Working with text: tokenisation, text pre-processing, regular Expressions
- Word frequencies and co-occurrences: stop-words and Zipf's Law, mining topics of interest with co-occurrences
- Text Representation: n-grams, bag-of-words, word embeddings and document embeddings
Topic Modelling
This section aims at improving our understanding of a document, or a collection of documents, using techniques that go beyond simple word frequencies.
- Bird's-eye view on a document or a dataset
- Navigating topics and sub-topics in a document or a dataset
Text Classification
This section tackles the problem of classifying documents into a set of predefined categories.
- Categorising documents
- Topic Classification
- Sentiment Analysis
- Model evaluation: assessing classification quality
- Model introspection: explaining the classification results
Overview on Advanced Applications
The last section offers an outlook on advanced NLP problems, so delegates are equipped with ideas and techniques to tackle more specific applications. Sample applications include:
- Named Entity Recognition: identifying named entity in text
- Text Summarisation: extracting the most useful sentences from a document or a collection of documents
- Search Engines: building a search engine to retrieve relevant documents from a custom data set of text