Remove duplicates from a string

Given a string of characters. Write code which will remove duplicate characters from the string:

Input: "social.ritambhara.in"
Output: "social.rtmbhn"

Solution:

There can be multiple solutions to this problem each having its own benefits and limitations. lets see some of the ways of removing the duplicates:

Method-1:

Bruite-Force Method (Remove if the character is already present before in the string)

For each character
 - check wither this character is seen earlier in the string or not
    - If Yes, remove the character (duplicate)
Code:

I have not written the code to sort the string (Use quick sort, because the string is of random characters)

    /** Function to remove the duplicate characters from the given string
     */
    void removeDuplicates(char* str)
    {
        if(str == NULL || str[0] == '\0' || str[1] == '\0')
            return;
        for(int i=1; str[i]!= '\0'; i++)
        {
            bool dup = false;
            for(int j=0; j<i; j++)
                if(str[j] == str[i])
                {
                    dup = true;
                    break;
                }
            if(dup)    // Duplicate. Remove the char
            {
                for(int j=i; str[j] != '\0'; j++)
                    str[j] = str[j+1];
                // This is important, else one character will be skipped
                i--;
            }
        }
    }

Time Complexity: In the worst case this algorithm will take O(n2) time Extra Space:     Constant....  O(1)

Obviously this is not a good method. Lets move ahead and discuss better methods:

Next: Method-2: Using Sorting

Method-2: Using Sorting

Sort the string
- Compare each character with the previous one
      and remove it if it is same (duplicate).

Original String : social.ritambhara.in After Sorting     : ..aaaabchiiilmnorrst

Removing the blue characters.

There can be two methods to remove adjacent duplicates,

1. remove one duplicate at a time.

If the input string is "aaaabbccccc" and we are removing the duplicate character 'a'. Then in the first iteration it will become "aaabbccccc" (One character removed) In second iteration it will become  "aabbccccc" and so on...

2. Find total number of duplicates and remove them all in one go.

In "aaaabbccccc". a is repeated 4 times, so 3 duplicates. Hence shift all the characters starting from first 'b' 3 positions left.

Final String : .abchilmnorst

The  problem with this approach is that the relative positions of characters in the original string will get changed.

Time complexity: O(n.lg(n)) - Sorting will take O(n.lg(n)) time. Extra space required: constant.. O(1)

Code:
    /** Function to remove the duplicate characters from the given string
     */
    void removeAdjescentDuplicates(char* str)
    {
        if(str == NULL || str[0] == '\0' || str[1] == '\0')
            return;
        int i=0; // will always point to the first character in the set
        while(str[i]!= '\0')
        {
            int j=i+1;
            while(str[j] != '\0' && str[j] == str[i])
                j++;
            // If end of the string is reached, then stop here.
            if(str[j] == '\0')
            {
                str[i+1] = '\0';
                return;
            }
            // No duplicates
            if(j == i+1)
            {
                i++;
                continue;
            }
            int numDuplicates = j-i-1;
            j = i;
            do
            {
                j++;
                str[j] = str[j+numDuplicates];
            }while(str[j+numDuplicates] != '\0');
            i++;
        }
    }

This method also takes a lot of time. In the next section we discuss a method which will take the least time, but uses some extra memory.

  Next: Metod-3: Using Hashing

Method-3: Using Hashing In this case we will use a separate array (which will act as hash table) to store the number of occurrences of a character in the string.

Step-1: Populate the hash (array) to store the number of times a character appears in the array.
        If original string is "social.ritambhara.in", then the hash array will have
    arr['s'] = 1
    arr['o'] = 1
    arr['c'] = 1
    arr['i'] = 3
    arr['a'] = 4
    arr['l'] = 1
    arr['.'] = 2
    arr['r'] = 2
    arr['t'] = 1
    arr['m'] = 1
    arr['b'] = 1
    arr['h'] = 1
Step-2: Now traverse the array from backward direction and for each character
   - remove the character if its hash count is > 1.
   - Decrease the corresponding hash count.
Code:
    void removeDuplicatesUsingHash(char* str)
    {
        int hash[256] = {0};
        // Populating the hash.
        for(int i=0; str[i] != '\0'; i++)
            hash[str[i]]++;
        // traversing the array from back and
        // removing the characters if its hash value is > 1
        for(int i = strlen(str)-1; i>0; i--)
        {
            // Need to do this here, because str[i] may change in loop below.
            hash[str[i]]--;
            if(hash[str[i]] > 0)
            {
                // Removing the character.
                int j=i;
                for(; str[j] != '\0'; j++)
                    str[j] = str[j+1];
            }
        }
    }
Next: Java Code.. Java Code

Thanks Arun Tomar for providing the code.

Below is the solution with Java Collection perspective. Please see to it if it fine according to the best case solutions.

    public class Ritamtest
    {
        private static String name = "social.ritambhara.in";
        public static void main(String[] arun) {
            removeDuplicate(name);
        }
        private static void removeDuplicate(String str) {
            Set<Character> set = new TreeSet<Character>();
            for (int i = 0; i < str.length(); i++) {
                if (!(set.contains(str.charAt(i)))) {
                    set.add(str.charAt(i));
                }
            }
            for (char cha : set) {
                System.out.println("Set values : " + cha);
            }
        }
    }
Please let me know your feedback / suggestions / comments.

0 Comments

Leave a comment