Longest substring with 2 unique chars Algorithm

A few days ago my friend asked me a question, How can we find the longest substring which contains 2 unique characters in linear time without additional memory?

Yeah, we spoke about algorithms 🙂. I suggested to use 2 cursors like start and end and just get substring by these cursors. Generally idea is right, but how can we manage these cursors ? This is quite hard. Maybe not so hard but need some time to think about it.

OK, what do we have ?

1)  we have input string like next abcc;

2)  we should not use additional memory, it means we can store some indexes (like start and end cursors), maybe some additional points, but cannot clone input string or use some data – structure to manage string, I mean trie tree and/or so on.

3) complexity should be linear

OK,  we have all information about algorithm and can start development. Yes, firstly I’ve provided these implementation after that he told me about algorithm Kadane.

Yes, we should modify it, but idea is great, ok lets go to implementation:

import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.Set;
 
/** * * @author */
public class LongestSubstring {
    public String subString(String data) {
 
        if (data == null || data.isEmpty()) {
            return "";
        }
        if (data.length() <= 2) {
            return data;
        }
 
        Set<Character> chars = new LinkedHashSet<Character>();
        String maxString = "";
 
        int start = 0;
        int end;
 
        for (end = 0; end < data.length(); end++) {
            chars.add(data.charAt(end));
            if (chars.size() > 2) // next letter in string
            {
                Character removed = removeFirst(chars);
                String tempString = data.substring(start, end);
                maxString = chooseMaxString(maxString, tempString);
                start = moveStart(start, removed, tempString);
            }
        }
        if (chars.size() <= 2) {
            maxString = chooseMaxString(maxString, data.substring(start, end));
        }
        return maxString;
    }
 
    private String chooseMaxString(String maxString, String tempString) {
        if (maxString.length() < tempString.length()) {
            return tempString;
        }
        return maxString;
    }
 
    private int moveStart(int start, Character removed, String data) {
        int result = data.length() - 2;
        Character lastChar = data.charAt(data.length() - 1);
        Character prevChar = data.charAt(result);
        while (prevChar.equals(lastChar)) {
            prevChar = data.charAt(--result);
        }
        return data.indexOf(removed) < 0 ? start + result : start + result + 1 ; // +1 because we took last char
    }
 
    private Character removeFirst(Set<Character> chars) {
        Character character = null;
        Iterator<Character> iterator = chars.iterator();
        if (iterator.hasNext()) {
            character = iterator.next();
            iterator.remove();
        }
        return character;
    }
}

I am using next variables:

        Set<Character> chars = new LinkedHashSet<Character>();
        String maxString = "";
 
        int start = 0;
        int end;

Instead of Set we can use two Characters, but it’s more readable use Set and store just 2 chars in it.

Start and End these are two cursors I’ve written above and maxString is result string of the our algorithm.

All steps in implementation are easy but I want to draw your attention to method moveStart. This implementation contains simple trick 🙂. But I think all of you can understand it !!!

OK, testing code (maybe there are some problems in code I’ve tested on a few test data)

        LongestSubstring longestSubstring = new LongestSubstring();
        assertEquals("", longestSubstring.subString(null));
        assertEquals("", longestSubstring.subString(""));
        assertEquals("bcbbbbcccb", longestSubstring.subString("abcbbbbcccbdddadacb"));
        assertEquals("a", longestSubstring.subString("a"));
        assertEquals("ab", longestSubstring.subString("ab"));
        assertEquals("ab", longestSubstring.subString("abc"));
        assertEquals("bcc", longestSubstring.subString("abcc"));
        assertEquals("aaaaaaaaaaaaaaab", longestSubstring.subString("aaaaaaaaaaaaaaabcccccc"));
        assertEquals("aaaaaaaaaaaaaaa", longestSubstring.subString("aaaaaaaaaaaaaaa"));

Looks like working in linear time 🙂

Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *