How to convert Unicode characters with diacritics to non-diacritics in C#
A detailed guide on how to convert Unicode characters with Vietnamese diacritics into non-diacritic characters in C#.
In this article, you will learn how to convert a Vietnamese text with diacritics to a non-diacritic string in C#. This is useful when handling Vietnamese text or creating more URL-friendly strings.
C# code:
using System;
using System.Globalization;
using System.Text;
using System.Text.RegularExpressions;
public class Program
{
public static string RemoveDiacritics(string text)
{
// Normalize the string to Unicode FormD
string normalizedString = text.Normalize(NormalizationForm.FormD);
// Remove non-basic Latin characters
StringBuilder stringBuilder = new StringBuilder();
foreach (char c in normalizedString)
{
UnicodeCategory unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
if (unicodeCategory != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(c);
}
}
// Normalize the string back to FormC and return
return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}
public static void Main(string[] args)
{
string originalText = "Chào mừng bạn đến với thế giới lập trình C#!";
string result = RemoveDiacritics(originalText);
Console.WriteLine("Original string: " + originalText);
Console.WriteLine("Non-diacritic string: " + result);
}
}
Detailed explanation:
using System;
,using System.Globalization;
,using System.Text;
,using System.Text.RegularExpressions;
: Import the necessary libraries for string handling, Unicode, and removing diacritics.string normalizedString = text.Normalize(NormalizationForm.FormD);
: Converts the string to Unicode FormD, which separates the diacritics from the base characters.foreach (char c in normalizedString)
: Loops through each character in the normalized string.CharUnicodeInfo.GetUnicodeCategory(c)
: Gets the Unicode category of the character.if (unicodeCategory != UnicodeCategory.NonSpacingMark)
: Checks if the character is not a diacritic. If it's not, adds it tostringBuilder
.return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
: Returns the string without diacritics and normalizes it back to FormC.
System Requirements:
- .NET Core 3.1 or later, or .NET Framework 4.5 or later
- Visual Studio or .NET CLI
How to install:
- Install Visual Studio or the .NET SDK from Microsoft's official website.
Tips:
- When working with Vietnamese text, using this method to convert strings with diacritics to non-diacritics can help with searching, comparison, and creating more URL-friendly strings.