NLPA Review of Basic Data Structures and Algorithms

Get Started. It's Free
or sign up with your email address
NLPA Review of Basic Data Structures and Algorithms by Mind Map: NLPA Review of Basic Data Structures and Algorithms

1. language classes

1.1. by machine

1.1.1. finite state automata

1.1.2. pushdown automata

1.1.3. linear bounded automata

1.1.4. turing machines

1.2. by language

1.2.1. regular

1.2.2. context free

1.2.3. context sensitive

1.2.4. phrase structure

1.2.5. (uncomputable)

1.3. by algorithm

1.3.1. recursive descent parsing top-down parser predictive no backtracking LL(k) grammars context free, only examines next k tokens no left recursion backup backtracking may not terminate at all may take exponential time

1.3.2. LR parsers bottom-up parsers deterministic context free languages linear time

2. data structures

2.1. comparison-based

2.1.1. binary trees unbalanced binary trees worst case O(log n) operations red-black trees AVL trees B-tree Splay trees amortized O(log n) operations scapegoat tree randomized O(log n) operations treaps

2.1.2. skiplists example create sparser and sparser lists by randomly selecting subelements go along sparse list until you've gone to far, then go down to next level alternatively, can create a non-randomized, amortized version of this algorithm

2.2. hashing

2.2.1. hash tables chaining hash buckets contain lists of elements open addressing hash buckets contain elements themselves (or an empty indicator) collisions are handled by issues resizing collision handling iteration

2.2.2. hash functions desired properties deterministic output should be uniform across its range for input distributions general issues hashing numbers hashing strings cryptographic hashing universal hashing perfect hash function

2.2.3. Bloom filters similar to hashing hash entries are just 1 bit each, indicating whether that bucket has ever been used if the Bloom filter returns false, the key has not been previously added if the Bloom filter returns true, the key may have been added, or we may have a false positive repeat with multiple bits to get probability estimates for false positives

2.3. alphabet-based

2.3.1. tries

2.3.2. radix tree

2.3.3. patricia tree

2.3.4. suffix tree

2.3.5. suffix array

3. algorithms

3.1. sorting

3.1.1. quicksort

3.1.2. mergesort

3.1.3. radix sort

3.2. Boyer-Moore

3.2.1. string search

3.2.2. UNIX tool: fgrep

3.2.3. problem find a string in a text (perhaps repeatedly) input: query, text, output: position of first match can spend time and memory on preprocessing the string preprocesses query, not text

3.2.4. brute force solution match at position 0 return if match advance to next position and repeat

3.2.5. Boyer-Moore idea: when we look at one character in the text, it gives us additional information about positions where the string can't match

3.2.6. NB: you don't need to know the details of the algorithm, but you should know the name, what it does, and what the idea is

3.3. Hunt-McIlroy

3.3.1. longest common subsequence

3.3.2. UNIX tool: diff

3.3.3. problem given two sequences, find the longest common subsequence (not substring)

3.3.4. NB: you don't need to know the details of the algorithm, but you should know the name and what it does

3.4. Levenshtein distance

3.4.1. minimum number of edits necessary to transform one string into another

3.4.2. UNIX tool: agrep

3.4.3. problem: given two input string, find the minimum number of insertions, deletions, and substitutions to transform one into another

3.4.4. solution: dynamic programming

3.4.5. (already covered this in previous classes; you need to know this algorithm well, it's the prototypical dynamic programming algorithm)