How to convert Unicode characters with accents to non-accented in Java

A guide on how to convert accented Unicode characters to non-accented characters in Java using `Normalizer` and regular expressions.

In this article, we will explore how to use the Normalizer class in Java to remove accents from Unicode characters, particularly Vietnamese letters. This method is useful for string processing when comparisons or searches are needed.

Java code:

import java.text.Normalizer;
import java.util.regex.Pattern;

public class RemoveDiacritics {
    public static void main(String[] args) {
        String textWithDiacritics = "Chào mừng bạn đến với Java!";
        String textWithoutDiacritics = removeDiacritics(textWithDiacritics);
        System.out.println(textWithoutDiacritics);
    }

    public static String removeDiacritics(String text) {
        // Normalize the text to NFD form
        String normalized = Normalizer.normalize(text, Normalizer.Form.NFD);
        // Regular expression to remove non-letter characters
        Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
        // Replace diacritical marks
        return pattern.matcher(normalized).replaceAll("").replaceAll("[^\\p{ASCII}]", "");
    }
}

Detailed explanation:

  1. import java.text.Normalizer;: Imports the Normalizer class for Unicode string processing.
  2. String textWithDiacritics = "Chào mừng bạn đến với Java!";: Declares a string with accents.
  3. String normalized = Normalizer.normalize(text, Normalizer.Form.NFD);: Normalizes the string to NFD form to separate base characters from diacritics.
  4. Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");: Creates a regular expression to find combining diacritical marks.
  5. return pattern.matcher(normalized).replaceAll("");: Removes all combining marks from the string.
  6. replaceAll("[^\\p{ASCII}]", "");: Removes all non-ASCII characters.

System Requirements:

  • Java version 8 or higher

How to install Java:

Download Java from the official Oracle website and follow the installation instructions.

Tips:

  • This method can be used for various languages, not just Vietnamese.
  • Be sure to thoroughly test input strings to ensure accurate results.
Tags: Unicode, Java


Related

Writing data to an Excel file using Java

A guide on how to write data to an Excel file using Java, leveraging the Apache POI library for effective and simple manipulation of Excel files.
Generating Captcha in Java

A comprehensive guide on how to create a Captcha in Java to protect your application from automated activities and enhance security.
How to Get JSON Data from API Using Java

This guide will show you how to use Java to send a GET request to an API and read the returned JSON data using HttpURLConnection.
How to UPDATE data in a MySQL database using Java

A guide on how to use Prepared Statements in Java to update data in a MySQL database table safely and effectively.
How to DELETE data from a MySQL database using Java

A guide on how to use Prepared Statements in Java to delete data from a table in a MySQL database safely and effectively.
How to open Notepad using Java

This guide explains how to open the Notepad application using Java by utilizing `Runtime.getRuntime().exec()`. It demonstrates how Java can interact with the system to launch external programs.
Guide to creating a multi-image upload form in Java

A step-by-step guide on how to create a multi-image upload form using Java with Spring Boot and the `Commons FileUpload` library. This tutorial covers setup and code examples.
JSON Web Token (JWT) Authentication in Java

This guide demonstrates how to use JSON Web Token (JWT) to authenticate users in a Java application. Specifically, we'll use JWT to secure APIs in a Spring Boot application, covering token generation, validation, and securing endpoints.
Create a Simple Chat Application Using Socket.IO in Java

A detailed guide on how to create a simple chat application using Java and Socket.IO. This article will help you understand how to set up a server and client for real-time communication.
How to automatically login to a website using Selenium with Chrome in Java

This article explains how to use Selenium with Chrome to automatically log into a website using Java. It covers how to interact with web elements to perform login actions on the user interface.

main.add_cart_success