How to convert Unicode characters with diacritics to non-diacritics in C#

A detailed guide on how to convert Unicode characters with Vietnamese diacritics into non-diacritic characters in C#.

In this article, you will learn how to convert a Vietnamese text with diacritics to a non-diacritic string in C#. This is useful when handling Vietnamese text or creating more URL-friendly strings.

C# code:

using System;
using System.Globalization;
using System.Text;
using System.Text.RegularExpressions;

public class Program
{
    public static string RemoveDiacritics(string text)
    {
        // Normalize the string to Unicode FormD
        string normalizedString = text.Normalize(NormalizationForm.FormD);
        
        // Remove non-basic Latin characters
        StringBuilder stringBuilder = new StringBuilder();
        foreach (char c in normalizedString)
        {
            UnicodeCategory unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
            if (unicodeCategory != UnicodeCategory.NonSpacingMark)
            {
                stringBuilder.Append(c);
            }
        }
        
        // Normalize the string back to FormC and return
        return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
    }
    
    public static void Main(string[] args)
    {
        string originalText = "Chào mừng bạn đến với thế giới lập trình C#!";
        string result = RemoveDiacritics(originalText);
        
        Console.WriteLine("Original string: " + originalText);
        Console.WriteLine("Non-diacritic string: " + result);
    }
}

Detailed explanation:

  1. using System;, using System.Globalization;, using System.Text;, using System.Text.RegularExpressions;: Import the necessary libraries for string handling, Unicode, and removing diacritics.
  2. string normalizedString = text.Normalize(NormalizationForm.FormD);: Converts the string to Unicode FormD, which separates the diacritics from the base characters.
  3. foreach (char c in normalizedString): Loops through each character in the normalized string.
  4. CharUnicodeInfo.GetUnicodeCategory(c): Gets the Unicode category of the character.
  5. if (unicodeCategory != UnicodeCategory.NonSpacingMark): Checks if the character is not a diacritic. If it's not, adds it to stringBuilder.
  6. return stringBuilder.ToString().Normalize(NormalizationForm.FormC);: Returns the string without diacritics and normalizes it back to FormC.

System Requirements:

  • .NET Core 3.1 or later, or .NET Framework 4.5 or later
  • Visual Studio or .NET CLI

How to install:

  • Install Visual Studio or the .NET SDK from Microsoft's official website.

Tips:

  • When working with Vietnamese text, using this method to convert strings with diacritics to non-diacritics can help with searching, comparison, and creating more URL-friendly strings.

Related

How to UPDATE data in a MySQL database using C#

A guide on how to use Prepared Statements in C# to update data in a MySQL table safely and efficiently with multiple parameters.
How to POST Data to an API using C#

A guide on how to send data to an API using the POST method in C# with the HttpClient class, enabling you to easily interact with APIs.
Comprehensive guide to concatenating strings in C#

A detailed guide on all the ways to concatenate strings in C#, including the concatenation operator, string methods, and other efficient approaches.
How to GET JSON data from an API using C#

A guide on how to retrieve JSON data from an API using C#, leveraging the HttpClient class and Newtonsoft.Json library for processing data.
Hiding a C# Application from Task Manager

A guide on how to hide a C# application from Task Manager using Win32 API to adjust the application's display properties.
Passing Authentication Header Token when Posting Data to API Using C#

A guide on how to pass the Authentication Header Token when making a POST request to an API using C# by utilizing HttpClient and a Bearer Token.
How to automatically log into a website using Selenium with Chrome in C#

A guide on how to use Selenium in C# to automatically log into a website. This article will use the Chrome browser and outline step-by-step how to automate the login process.
Multithreading in C#

A comprehensive guide on how to implement multithreading in C# to make better use of CPU resources and enhance application performance by executing multiple tasks simultaneously.
Send JavaScript code to a website using Selenium in C#

A guide on how to use Selenium in C# to send a JavaScript snippet to a website opened in the Chrome browser. The article will provide sample code and detailed explanations for each step.
Common Functions Used with Selenium Chrome in C#

This article lists and describes common functions used when working with Selenium Chrome in C#. These functions help automate tasks in the Chrome browser effectively.

main.add_cart_success