Glossary
Text Normalisation
The process of transforming text into a consistent, standardized form for comparison, storage, or display.
Definition
Text normalisation covers a range of operations including case standardization, whitespace cleanup, accent removal, punctuation stripping, and encoding unification. It ensures that text from different sources can be compared, stored, or displayed in a predictable way.
Examples
- Use cases: Database deduplication, search indexing, content migration
Common Questions
What is the difference between text formatting and text normalisation?
Text formatting usually refers to visual changes like capitalization. Text normalisation is broader and includes encoding, whitespace, and structural changes to make text machine-comparable and consistent.