Trie

네이버나 구글에서 검색 시 찾고자 하는 검색어의 일부분만 쳐도 추천하는 검색어가 자동완성되는 기능을 많이 접해보셨을 겁니다. 이때 사용하는 알고리즘이 Trie 로서 문자열 길이만큼의 시간복잡도가 소요되어 빠르고 간단합니다. Trie 알고리즘은 Figure 1과 같이 노드를 이용한 트리 형태로 구성되고 트리의 각 노드는 해당하는 문자와 사전형태로 이루어진 자식 노드, 문자열의 끝을 알리는 플래그로 구성됩니다. Hashable한 딕셔너리가 사용되므로 ($O(1)$) 주어진 prefix 길이만큼의 시간복잡도로 prefix를 공유하는 모든 문자열을 파악할 수 있습니다.

Implementation

먼저 트리를 구성할 노드를 정의합니다. 인자로 해당 노드의 문자 (char), 플래그로 사용할 전체문자열 (data)를 받고 자식 노드를 담을 딕셔너리를 생성합니다.

class Node(object):
    def __init__(self, char, data=None):
        self.char = char
        self.data = data
        self.children = dict()

본격적으로 Trie 알고리즘을 구현하면 1) 생성 시에 루트노드를 정의하고, 2) 전체 문자열을 담을 insert 함수와 3) 주어진 prefix로부터 prefix로 시작하는 모든 입력 문자열을 검색할 starts_with 함수가 필요합니다. 먼저 생성자와 insert 함수를 다음과 같이 구현합니다.

class Trie(object):
    def __init__(self):
        self.root = Node(None)

    def insert(self, string):
        current_node = self.root

        for char in string:
            if char not in current_node.children:
                current_node.children[char] = Node(char=char)

            current_node = current_node.children[char]

        ## Last character
        current_node.data = string

생성자 __init__ 함수에는 아무 것도 들어있지 않은 루트 노드를 생성합니다.
insert 함수의 string 인자는 Trie 구조에 담을 전체 문자열로 문자열의 각 문자 순으로 반복문을 수행하면서 자식 노드에 해당 문자가 없으면 노드를 새롭게 생성하고 있다면 해당 자식 노드로 넘어갑니다.
반복문을 다 수행했다면 마지막 노드에 전체 문자열의 끝임을 알리기 위한 플래그를 설정합니다. 여기서는 전체 문자열을 그대로 넣었습니다.

다음으로 특정 prefix가 들어왔을 때 prefix로 시작하는 모든 문자열을 반환하는 starts_with 함수입니다.

    def starts_with(self, prefix):
        current_node = self.root
        words = []

        for p in prefix:
            if p in current_node.children:
                current_node = current_node.children[p]
            else:
                return words

        ## If prefix is a complete word itself
        if current_node.data is not None:
            if len(current_node.children) == 0:
                return [current_node.data]
            else:
                words.append(current_node.data)

        stack = list(current_node.children.values())

        while stack:
            current_node = stack.pop()
            if current_node.data is not None:
                words.append(current_node.data)

            stack.extend(list(current_node.children.values()))

        return words

먼저 prefix로 주어진 부분 문자열을 반복하면서 해당 문자에 맞는 자식 노드로 지속적으로 넘어갑니다. 자식 노드가 존재하지 않는다면 prefix로 시작하는 문자열이 Trie에 존재하지 않는 것이므로 빈 리스트를 리턴합니다.
만약 현재 노드의 플래그가 True이고 (Node.data 속성이 플래그로 사용됩니다.) 자식 노드가 존재하지 않는다면 인자로 주어진 prefix만이 해당되므로 그것을 그대로 리턴하고 자식 노드가 있다면 words 리스트에 prefix를 추가합니다.
다음으로 자식 노드들로 이루어진 스택을 하나 구성하고 스택에서 하나씩 노드를 추출하면서 플래그가 True이면 words 리스트에 그 값을 추가하고 False라면 해당 노드의 자식 노드들을 스택에 추가합니다.

마지막으로는 Trie를 구성할 문자열 리스트를 insert하는 add_data 함수를 만들어주고 결과를 확인합니다.

    def add_data(self, data: list):
        for s in data:
            self.insert(s)

trie = Trie()
trie_list = ['france', 'frodo', 'fire', 'firefox', 'free', 'freedom']
trie.add_data(trie_list)
print(trie.starts_with('free'))
print(trie.starts_with('fr'))

'Theory > Algorithms' 카테고리의 다른 글

몬테카를로 시뮬레이션 (1) - 파이 계산하기 (0)	2022.02.25
Cholesky Decomposition (촐레스키 분해) - 파이썬 구현 (0)	2022.02.20
Sparse Matrix (희소행렬) 구현하기 (0)	2021.07.05
위상 정렬 (Topological Sort) (0)	2021.03.05
자료구조 - 서로소 집합 (0)	2021.03.04

홍러닝

Trie

Implementation

'Theory > Algorithms' 카테고리의 다른 글

티스토리툴바

Trie

Implementation

'Theory > Algorithms' 카테고리의 다른 글

'Theory/Algorithms' Related Articles

티스토리툴바