Convert accented Unicode characters to non-accented in JavaScript
A guide on how to convert accented Unicode characters in the Vietnamese alphabet to non-accented letters using JavaScript's normalize
method. This JavaScript code efficiently handles Vietnamese text processing.
Here’s the JavaScript code to convert all accented Vietnamese letters to non-accented letters:
function removeVietnameseAccents(str) {
return str.normalize('NFD') // Decompose letters and accents
.replace(/[\u0300-\u036f]/g, '') // Remove accents
.replace(/đ/g, 'd') // Convert đ to d
.replace(/Đ/g, 'D'); // Convert Đ to D
}
// Example usage
const originalStr = "Chào bạn! Hôm nay trời rất đẹp. Đừng quên mang ô nhé.";
const resultStr = removeVietnameseAccents(originalStr);
console.log(resultStr); // Output: "Chao ban! Hom nay troi rat dep. Dung quen mang o nhe."
Detailed Explanation:
-
normalize('NFD')
:- The
normalize
method breaks down characters and their accents (NFD form), for example,á
becomesa
and the accent'
.
- The
-
replace(/[\u0300-\u036f]/g, '')
:- This regular expression matches and removes all diacritics in the Unicode range
\u0300
to\u036f
, covering accents like acute, grave, circumflex, etc.
- This regular expression matches and removes all diacritics in the Unicode range
-
replace(/đ/g, 'd')
:- Converts the
đ
character tod
.
- Converts the
-
replace(/Đ/g, 'D')
:- Converts the
Đ
character toD
.
- Converts the
-
const resultStr = removeVietnameseAccents(originalStr);
:- Calls the function to convert the accented Vietnamese string to non-accented.
-
console.log(resultStr);
:- Outputs the converted string to the console.
JavaScript Version:
The code is compatible with modern JavaScript versions (ES6 and later). The normalize
method is supported in modern browsers like Chrome, Firefox, Edge, and Safari.