Resources
Welcome to the LingHacks Resources page! Below is a growing collection of natural language processing/computational linguistics-related resources, primarily for students (though many may be applicable to non-students as well). Have a resource you think the community would find helpful? Let us know via this form!
Note as of June 2022: we've moved to a new site! This page will no longer be updated - please see the resource page on the new site for up-to-date collections of resources.
Note as of June 2022: we've moved to a new site! This page will no longer be updated - please see the resource page on the new site for up-to-date collections of resources.
Learn NLP
Before (or while) jumping into NLP-related activities, it might be helpful to learn some introductory material. Below are some free resources that you can use to get up-to-speed on NLP.
Programming
Before learning computational linguistics, it would be beneficial to have a baseline level of familiarity with introductory computer science/programming. The maintainers of this resource page mostly use Python, so that's what this page will be centered around, but NLP resources in other programming languages do exist as well. We also include some references to Python software packages commonly used in NLP.
For a precise understanding of the quantitative workings of NLP systems, it helps to understand some [multivariable] calculus, linear algebra, probability, and statistics. It is possible to understand some basic algorithms with less mathematical background, but these areas of math and statistics become more important as NLP systems--particularly deep learning systems--become more complex. We recommend doing calculus (single-variable then multivariable), followed by linear algebra, followed by probability, followed by statistics (though high school AP-level statistics is helpful and can be done before or at the same time as single-variable calculus), but this is by no means a strict ordering requirement.
NLP can be considered a subfield of machine learning, so it's beneficial to have a general understanding of the machine learning field before diving into the specifics of applying machine learning to language.
Understanding linguistics--the science of language--isn't strictly necessary to do many computational linguistics-related things, but we still think it's helpful to grasp some basic terminology and concepts and ground your study of NLP in linguistic principles.
At last, some courses and reference sites specific to NLP!
Whether you're interested in pursuing a research career or just want to be familiar with some hot topics in NLP, here are some websites where you can read the latest NLP research:
NLP does not exist in a vacuum. Here are some resources that you can use to learn how/why that is and what you can do about it. Some of these resources extend to AI/tech in general as well.
Programming
Before learning computational linguistics, it would be beneficial to have a baseline level of familiarity with introductory computer science/programming. The maintainers of this resource page mostly use Python, so that's what this page will be centered around, but NLP resources in other programming languages do exist as well. We also include some references to Python software packages commonly used in NLP.
- Codecademy's Introduction to Python
- LingHacks' own blog post about basic Python
- Official Python language reference
- NumPy package reference
- scikit-learn package reference
- Keras API reference
- PyTorch reference
For a precise understanding of the quantitative workings of NLP systems, it helps to understand some [multivariable] calculus, linear algebra, probability, and statistics. It is possible to understand some basic algorithms with less mathematical background, but these areas of math and statistics become more important as NLP systems--particularly deep learning systems--become more complex. We recommend doing calculus (single-variable then multivariable), followed by linear algebra, followed by probability, followed by statistics (though high school AP-level statistics is helpful and can be done before or at the same time as single-variable calculus), but this is by no means a strict ordering requirement.
- MIT OpenCourseWare's Single-Variable Calculus course
- MIT OpenCourseWare's Multivariable Calculus course
- MIT OpenCourseWare's Linear Algebra course
- MIT OpenCourseWare's Introduction to Probability
- MIT OpenCourseWare's Introduction to Probability and Statistics
- Seeing Theory, a visual introduction to probability and statistics by Brown University
NLP can be considered a subfield of machine learning, so it's beneficial to have a general understanding of the machine learning field before diving into the specifics of applying machine learning to language.
- Andrew Ng's Machine Learning course
- MIT's Introduction to Deep Learning
- Udacity's Introduction to Artificial Intelligence (you'll have to make a free account to access this link)
- Fast AI's Practical Deep Learning for Coders
- Dr. Jason Brownlee's Machine Learning Mastery site
- 3Blue1Brown's 4-Part Video Series on Neural Networks
- Microsoft's PyTorch Tutorial
Understanding linguistics--the science of language--isn't strictly necessary to do many computational linguistics-related things, but we still think it's helpful to grasp some basic terminology and concepts and ground your study of NLP in linguistic principles.
- MIT OpenCourseWare's Introduction to Linguistics
- MIT OpenCourseWare's Introduction to Phonology
- MIT OpenCourseWare's Introduction to Syntax
- MIT OpenCourseWare's Introduction to Semantics and Pragmatics
At last, some courses and reference sites specific to NLP!
- Stanford's Natural Language Processing with Deep Learning course
- Stanford's Natural Language Understanding course
- Professor Dan Jurafsky's Speech and Language Processing textbook
- Natural Language Processing with Dan Jurafsky and Chris Manning
- Fast AI's Code-First Introduction to Natural Language Processing
- Professor Jacob Eisenstein's NLP Notes
Whether you're interested in pursuing a research career or just want to be familiar with some hot topics in NLP, here are some websites where you can read the latest NLP research:
- The Association for Computational Linguistics Anthology (papers from most of the big NLP conferences and journals, including ACL, IJCNLP, TACL, and EMNLP)
- ArXiv Computation and Language (pre-prints of NLP-related papers that may or may not have been peer-reviewed yet)
- Papers With Code NLP Tasks (NLP papers accompanied by code, organized by computational task)
- Connected Papers: a general paper search tool that helps you visually explore papers that are related to each other/to a certain topic
NLP does not exist in a vacuum. Here are some resources that you can use to learn how/why that is and what you can do about it. Some of these resources extend to AI/tech in general as well.
- Oxford Insights' Racial Bias in NLP paper
- Queer in AI's guide on how to make virtual conferences queer-friendly
- Shirin Ghaffary's article on racism in hate speech detection algorithms
- Blodgett et al's "Language (Technology) is Power" paper
- AllenNLP's Fairness Module
- Field et al's survey of race, racism, and anti-racism in NLP
- Blodgett et al's paper on pitfalls of in fairness benchmark datasets
- Some proceedings of the ACM Conference on Fairness, Accountability, and Transparency
- Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing
- Hutchinson et al's paper on the harms of NLP toward disabled people
- Professor Ruha Benjamin's book, Race After Technology
- Professor Safiya Noble's book, Algorithms of Oppression
- Caliskan et al's paper on human-like biases in machine learning algorithms (source: Race After Technology by Professor Ruha Benjamin)
- Data & Society Research Institute's Algorithmic Accountability Primer (source: Race After Technology by Professor Ruha Benjamin)
- Allied Media Conference (source: Race After Technology and Anti-Racism Daily)
NLP Activities
You're probably here to ultimately add to your resumé. So, here's how you can do that.
- North American Computational Linguistics Olympiad - open to US & Canada-based 6th-12th grade students, happens every January
- AI4ALL Summer Programs - open to high school students (exact demographics and grade levels vary by site); the organization also periodically hires staff to support student programs
- Johns Hopkins Center for Language and Speech Processing Workshops - open to undergraduate+ students (may or may not happen each summer)
- Stanford Center for the Study of Language and Information summer internship - open to undergraduate students
- National Science Foundation Research Experiences for Undergraduates, CS opportunities
- Companies that do NLP work that may be hiring (note: we don't endorse any of these companies)
- Start your own inclusive CS initiative that may or may not be related to NLP - check out NCWIT's AspireIT Toolkit for some guidance
- And of course, LingHacks (all info on this site)! We host hackathons, provide resources for you to start clubs, and host/partner to host workshops in NLP.