The Anagram Problem: Simplified

When given two strings, the goal is simple: determine whether they are anagrams of each other. An anagram is defined as a string that, when its letters are rearranged, forms another string. Both strings must have the same characters in the same frequency, regardless of the order.

Let’s walk through a clean, structured approach to solving this problem efficiently and then test it properly.

A Real-World Analogy: Organizing a Toolbox

Imagine you have two toolboxes with identical sets of tools. One toolbox, however, has all the tools randomly placed, while the other has them organized by type. The task is to confirm that both boxes contain the exact same tools, even if they’re arranged differently. The challenge is simple: you don’t care about the order, just that the contents match.

This is exactly what we need to do with the strings: check if they contain the same characters in equal amounts, regardless of their order.

Approach 1: Sorting and Comparing

A straightforward solution is to sort both strings and compare them directly. If they are equal, the strings are anagrams.

public bool bIsAnagram(string sSource, string sTarget)
{
    // Step 1: Early exit if lengths differ
    if (sSource.Length ! = sTarget.Length)
    {
      // Different sizes, different strings
        return false; 
    }

    // Step 2: Sort both strings and compare
    char[] acharSource = sSource.ToCharArray();
    char[] acharTarget = sTarget.ToCharArray();

    Array.Sort(acharSource);
    Array.Sort(acharTarget);

    // Step 3: Check if sorted strings are identical
    for (int iIndex = 0; iIndex < acharSource.Length; iIndex++)
    {
        if (acharSource[iIndex] ! = acharTarget[iIndex])
        {
          // Characters differ, not an anagram
            return false;
        }
    }

    // They're anagrams
    return true; 
}

Note: Instead of manually comparing the sorted arrays with a for loop, you can simplify the comparison by using the SequenceEqual method from the LINQ library. This method checks if two sequences (like arrays) are equal element-by-element.

Here’s how you can replace the loop with SequenceEqual:

return acharSource.SequenceEqual(acharTarget);

You can find more information on the SequenceEqual method in the Microsoft Docs. Using SequenceEqual can make your code more readable and concise.

  • Time Complexity: O(n log n) due to sorting.
  • Space Complexity: O(n) as we store the sorted characters.

Sorting is a simple and intuitive solution, but it can become inefficient as string length increases due to the sorting step.

Approach 2: Counting Characters

Sorting works, but we can do better. A more efficient solution is to count how many times each character appears in both strings. If the counts match, then the strings are anagrams.

public bool bIsAnagram(string sSource, string sTarget)
{
    if (sSource.Length ! = sTarget.Length)
    {
      // Different sizes, not an anagram
      return false;
    }

    // Step 2: Array to count character frequencies
    // For lowercase English letters
    int[] aiCharCounts = new int[26];

    // Step 3: Count character appearances in both strings
    for (int iIndex = 0; iIndex < sSource.Length; iIndex++)
    {
        // Increment for sSource
        aiCharCounts[sSource[iIndex] - 'a']++;
        // Decrement for sTarget
        aiCharCounts[sTarget[iIndex] - 'a']--;
    }

    // Step 4: If counts are balanced, it's an anagram
    foreach (int iCount in aiCharCounts)
    {
        if (iCount ! = 0)
        {
            // Mismatched counts, not an anagram
            return false;
        }
    }

    // Counts match, they're anagrams
    return true;
}

  • Time Complexity: O(n), faster than sorting since we’re just counting characters.
  • Space Complexity: O(1), as the array size is fixed at 26 (for lowercase letters).

This approach avoids sorting and directly compares the “ingredients” (characters) of both strings.

Key Concept: Character-to-Index Conversion

One of the core ideas behind the counting method is using a small, fixed-size array (length 26, for each lowercase letter) to track the frequency of characters in both strings. This is made efficient by mapping each letter to a specific index in the array. Here’s how it works:

In C#, every character is represented by an ASCII value. The lowercase letters “a” through “z” are sequential in the ASCII table, starting from 97 for “a”, 98 for “b”, and so on. By subtracting “a” from any letter, we can easily map it to an index:

  • ‘a’ – ‘a’ = 0 (index for ‘a’)
  • ‘b’ – ‘a’ = 1 (index for ‘b’)
  • ‘c’ – ‘a’ = 2 (index for ‘c’)

Using this conversion, we can efficiently count characters in each string by incrementing or decrementing the corresponding index in the array.

Let’s say s = “abc” and t = “cab”:

We start with an array of size 26, initialized to zeros. This array will track the difference in character counts between s and t.

store = [0, 0, 0, 0, …, 0] // (26 zeros for 26 letters)

Each index in this array represents a letter, where:

  • store[0] corresponds to “a”
  • store[1] corresponds to “b”
  • store[2] corresponds to “c”
  • and so on, up to store[25] for “z”.

We go through each character of s and t at the same time.

Let’s go character by character:

  1. First characters: s[0] = ‘a’ and t[0] = ‘c’
    • For s[0] = ‘a’, we increment store[0] (because ‘a’ is at index 0).
    • For t[0] = ‘c’, we decrement store[2] (because ‘c’ is at index 2).
      Now the array looks like this:
      store = [1, 0, -1, 0, 0, 0, …, 0]
  2. Second characters: s[1] = ‘b’ and t[1] = ‘a’
    • For s[1] = ‘b’, we increment store[1] (because ‘b’ is at index 1).
    • For t[1] = ‘a’, we decrement store[0] (because ‘a’ is at index 0). Now, store[0] goes back to 0.
      Now the array looks like this:
      store = [0, 1, -1, 0, 0, 0, …, 0]
  3. Third characters: s[2] = ‘c’ and t[2] = ‘b’
    • For s[2] = ‘c’, we increment store[2] (because ‘c’ is at index 2). This brings store[2] back to 0.
    • For t[2] = ‘b’, we decrement store[1] (because ‘b’ is at index 1). Now, store[1] goes back to 0.
      Now the array looks like this:
      store = [0, 0, 0, 0, 0, 0, …, 0]

After processing all the characters, if the array contains only zeros, it means both strings have the same characters in the same frequency. In this case, s = “abc” and t = “cab” are anagrams.

Edge Cases – What Could Go Wrong?

  1. Empty Strings: Two empty strings are technically anagrams (since they have no characters).
  2. Different Lengths: If the lengths are different, the strings cannot be anagrams.
  3. Single Characters: “a” and “a” are anagrams, but “a” and “b” are not.

Testing the Solution in Visual Studio

Let’s go through the steps to test the solution in Visual Studio.

  1. Open Visual Studio and create a new Console App project using C#.
  2. Name your project, for example, AnagramChecker.
  1. In the Program.cs file, replace the default code with the bIsAnagram methods we’ve written above.
  2. In the Main method, write test cases to validate the function.
class Program
{
    static void Main(string[] args)
    {
        // Test cases
        string sSource1 = "racecar";
        string sTarget1 = "carrace";

        string sSource2 = "hello";
        string tTarget2 = "world";

        Console.WriteLine($"Are '{sSource1}' and '{sTarget1}' anagrams?{bIsAnagram(sSource1, sTarget1)}");
        Console.WriteLine($"Are '{sSource2}' and '{tTarget2}' anagrams? {bIsAnagram(sSource2, tTarget2)}");
    }

    // Paste IsAnagram method here
    public static bool bIsAnagram(string sSource, string sTarget)
    {
        if (sSource.Length ! = sTarget.Length)
        {
            // Different sizes, not an anagram
            return false;
        }

        // Step 2: Array to count character frequencies
        // For lowercase English letters
        int[] aiCharCounts = new int[26];

        // Step 3: Count character appearances in both strings
        for (int iIndex = 0; iIndex < sSource.Length; iIndex++)
        {
            // Increment for sSource
            aiCharCounts[sSource[iIndex] - 'a']++;
            // Decrement for sTarget
            aiCharCounts[sTarget[iIndex] - 'a']--;
        }

        // Step 4: If counts are balanced, it's an anagram
        foreach (int iCount in aiCharCounts)
        {
            if (iCount ! = 0)
            {
                // Mismatched counts, not an anagram
                return false;
            }
        }

        // Counts match, they're anagrams
        return true;
    }
}

  1. Press F5 to build and run the program.
  2. Check the console output for the results of the test cases.
Console-App-Results-Anagram

You can add more test cases to ensure your solution handles all edge cases properly.

Note: Try adding more test cases, like empty strings or strings with different lengths, to ensure your solution handles all scenarios correctly.

To see how each method performs with larger datasets, you can add benchmarking code using Stopwatch – by adding it to your Main method:

    static void Main(string[] args)
    {
        // Test cases
        string sSource1 = "racecar";
        string sTarget1 = "carrace";

        string sSource2 = "hello";
        string tTarget2 = "world";

        Stopwatch stopwatch = new Stopwatch();

        stopwatch.Start();
        Console.WriteLine($"Are '{sSource1}' and '{sTarget1}' anagrams? {bIsAnagram(sSource1, sTarget1)}");
        stopwatch.Stop();
        Console.WriteLine($"Execution Time: {stopwatch.ElapsedMilliseconds}ms");

        stopwatch.Start();
        Console.WriteLine($"Are '{sSource2}' and '{tTarget2}' anagrams? {bIsAnagram(sSource2, tTarget2)}");
        stopwatch.Stop();
        Console.WriteLine($"Execution Time: {stopwatch.ElapsedMilliseconds}ms");
    }

You will get a result like the following:

Console-App-Results-Anagram

Run the benchmark for both methods with large inputs to compare their performance.

Comparison of Approaches

ApproachTime ComplexitySpace ComplexityProsCons
Sorting MethodO(n log n)O(n)Simple and easy to implementSlow for long strings due to sorting
Counting MethodO(n)O(1)Fast, efficient for large inputsSlightly more complex to implement

Final Thoughts – Finding the Right Approach

In solving the “Is Anagram” problem, it’s important to balance simplicity with performance. The sorting approach is easy to understand but not as efficient as the counting method, which is better suited for larger strings.

As always, understanding why these approaches work is more valuable than memorizing the solutions. Whether you’re sorting characters or counting frequencies, focus on the problem-solving mindset.

Challenge for You

Now that you’ve seen how to solve the anagram problem, try implementing a version that handles case sensitivity or special characters (like punctuation or spaces), or see if you can improve the performance even more!

Bonus: Benchmarking

If you’re curious about how the different methods perform under pressure, try running them on longer strings and measuring execution time. Here’s a hint: the counting method should outperform the sorting one on large inputs!

Similar Posts