Coding Interview University
原先我為了成為一個軟體工程師而建立這份簡單的讀書主題清單(To-do list), 但這份 To-do list 隨著時間而膨脹成這個樣子。 做完這份 To-do list 上的每個目標後,我成為了 Amazon 的工程師! 你或許不需要像我讀一樣多。但是,每個讓你成為一位稱職工程師所需要的知識都在這裡了。
我每天讀 8~12 小時的書,這樣持續了好幾個月。這是我的故事:為什麼我為了 Google 面試而讀了 8 個月
在這份 To-do list 內的主題會讓你擁有足夠的知識去面對幾乎每家軟體公司的專業面試, 這些公司包含了科技巨獸,例如 Amazon、Facebook、Google,或者是 Microsoft。
祝你好運!
翻譯:
正在翻譯的項目:
What is it?
這是我為了從一個網頁開發者(自學,並且沒有任何與資工、電腦科學有關的學位),成為一個大公司軟體工程師,持續好幾個月的讀書計畫。
這是為了那些新手軟體工程師,或者是那些想要轉換跑道,從軟體/網頁開發者轉為軟體工程師(需要資工、電腦科學的知識)的人。
請注意就算你有多年的軟體/網頁開發經驗,那些著名的大型軟體公司,像是 Google、Amazon、Facebook,或是 Microsoft 事實上把軟體/網頁開發(Software/Web Development)與軟體工程(Software Engineering)視為不同,而後者需要的是電腦科學/資訊工程的知識。
如果你想成為一個可靠的工程師或者是 Operation Engineer,閱讀並且學習更多這份清單中的 The Optional List(裡面包含網路與資訊安全的知識)。
目錄
- What is it?
- Why use it?
- How to use it
- 不要覺得自己不夠聰明
- 關於影片資源
- 面試過程&面試準備
- 面試時專精一種程式語言
- 書單
- 在你開始之前
- 這份清單沒有包含的內容
- 先備知識
- 每日計畫
- 演算法複雜度 / Big-O / 漸進分析
- 資料結構
- 更多
- 樹狀結構(Tree)
- Trees-重點與背景知識
- 二元搜尋樹
- Heap / Priority Queue / Binary Heap
- 平衡搜尋樹(基本概念,非細節)
- 遍歷:前序、中序、後序、BFS、DFS
- 排序
- 選擇排序
- 插入排序
- 堆積排序
- 快速排序
- 合併排序
- 圖
- 有向圖
- 無向圖
- adjacency matrix
- adjacency list
- 遍歷: 廣度優先(BFS), 深度優先(DFS)
- 更多知識
- 系統設計、可擴充性、資料處理 (如果你有四年以上的經驗)
- 總複習
- 解題練習
- 解題練習/挑戰
- 面試前夕
- 你的履歷
- 想想面試時可能的狀況
- 一旦你得到工作
---------------- 以下皆為選讀 ----------------
額外資源
- 選修書籍
- 額外學習
- 編譯器
- Emacs and vi(m)
- Unix 命令列工具
- 資訊理論
- 同位位元 & 漢明碼
- Entropy
- 密碼學
- 壓縮
- 電腦安全
- 垃圾回收
- 平行計算
- 通訊、序列化以及佇列系統
- A*搜尋演算法
- 快速傅立葉轉換
- 布隆過濾器
- HyperLogLog
- Locality-Sensitive Hashing
- van Emde Boas Trees
- Augmented Data Structures
- 平衡樹
- AVL 樹
- 伸縮樹 Splay tree
- 紅黑樹
- 2-3 搜尋樹
- 2-3-4 樹(aka 2-4 樹)
- N-ary (K-ary, M-ary)樹
- B 樹
- k-D 樹
- Skip lists
- Network Flows
- Disjoint Sets & Union Find
- 快速處理的數學
- Treap
- 線性程式設計
- 幾何、Convex hull
- 離散數學
- 機器學習
- 某些主題的額外知識
- 影片系列
- 電腦科學課程
- 論文
Why use it?
當我開始這項計畫的時候,我不知道 Stack 與 Heap 的差別,不知道時間複雜度(Big-O),不知道樹狀結構(Tree),也不知道如何遍歷一個圖(Graph)。過去如果我需要寫一個排序演算法(Sorting Algorithm),那個 code 一定是個災難。我過去都用程式語言中內建的資料結構(Data Structure),對於資料結構裡面的實作方法跟原理我完全沒有任何的概念。除非我的程式碰到了”out of memory”的錯誤我才會去找解決方法,否則我從不特別去花費心思管理程式中的記憶體配置。雖然我有用過多維陣列(Multidimensional Arrays)跟關聯陣列(Associative Arrays),但我從來沒有自己時做過資料結構。
這是個遠大的計畫,或許要花上你數個月的時間。如果你對其中大部分的東西已經很熟悉的話,那麼執行這項計畫所花費的時間將減少許多。
How to use it
下面每項是大綱,你需要從上到下的去理解這些大綱。
我用了 Github-flavored markdown 語法,其中包含了可以確定完成進度的任務清單。
建立一個新的 Branch 以使用 Github-flavored markdown 的勾選功能。只要在[]中打 x,像是: [x]
Fork一個branch,並且跟隨以下的指令
git clone git@github.com:<your_github_username>/coding-interview-university.git
git checkout -b progress
git remote add jwasham https://github.com/jwasham/coding-interview-university
git fetch --all
在你完成了一些目標後,在框框中打x
git add .
git commit -m "Marked x"
git rebase jwasham/main
git push --set-upstream origin progress
git push --force
Don’t feel you aren’t smart enough(不要覺得自己不夠聰明)
- 大多數成功的軟體工程師都非常聰明,但他們都有一種覺得自己不夠聰明的不安全感。
- The myth of the Genius Programmer(天才 Programmer 的迷思)
- It’s Dangerous to Go Alone: Battling the Invisible Monsters in Tech(不要單打獨鬥:面對科技中的隱形怪物)
About Video Resources(關於影片資源)
有些影片要註冊 Coursera 或者 Edx 的課程後才能觀看,也就是所謂的 MOOCs。有時候某些課程需要等待好幾個月才能註冊,這期間你無法觀看這些課程的影片。
我非常喜歡那些大學的線上課程。感謝你們幫忙加入一些免費、可隨時觀看的公開資源,像是那些線上課程的YouTube影片。
Interview Process & General Interview Prep(面試過程&面試準備)
- ABC: Always Be Coding
- Whiteboarding
- Demystifying Tech Recruiting
- 如何錄取 Big Tech(Google, Amazon, Facebook, Apple):
- Coding 面試解密:
- Facebook Coding 面試解密:
- 準備課程:
- Software Engineer Interview Unleashed (paid course):
- 從前 Google 面試官身上學習如何充實自己,讓自己能夠應付軟體工程師的面試。
- Python for Data Structures, Algorithms, and Interviews (paid course):
- Python 面試準備課程,其中包含了資料結構、演算法、模擬面試等等。
- Intro to Data Structures and Algorithms using Python (Udacity free course):
- Python 免費資料結構及演算法課程。
- Data Structures and Algorithms Nanodegree! (Udacity paid Nanodegree):
- 超過 100 種實際的資料結構及演算法練習。名師指導讓你準備好面試以及工作的實際情況。
- Software Engineer Interview Unleashed (paid course):
面試時專精一種程式語言(Pick One Language for the Interview)
在面試的 coding 階段,你可以選擇任何一個你擅長的程式語言。但多數大公司僅有以下選擇:
- C++
- Java
- Python
你也可以選擇以下的程式語言,但可能會有某些限制:
- JavaScript
- Ruby
我之前寫過一篇關於在面試時選擇程式語言的文章:Pick One Language for the Coding Interview
你需要非常熟練這個程式語言,並且對他非常了解。
閱讀更多有關程式語言的選擇:
- http://www.byte-by-byte.com/choose-the-right-language-for-your-coding-interview/
- http://blog.codingforinterviews.com/best-programming-language-jobs/
因為我正在學習 C、C++以及 Python,所以下面會出現一些有關於這些程式語言的資源。
書單(Book List)
為了節省你的時間,以下是已經縮減過的書單。
面試準備(Interview Prep)
- Programming Interviews Exposed: Coding Your Way Through the Interview, 4th Edition
- 附有解答 in C++ and Java
- 內含很棒的 coding 面試解密
- 不會很困難,大多問題都比面試中的還簡單(從我讀過的)
- Cracking the Coding Interview, 6th Edition
- 附有解答 in Java
如果你有額外的時間(If you have tons of extra time):
選擇以下其中一個:
- Elements of Programming Interviews (C++ version)
- Elements of Programming Interviews in Python
- 程式面試精華 (Java 版)
程式語言精進
面試時你需要選擇一種程式語言(詳如上述)
以下是一些我對程式語言的建議。這邊沒有所有種類程式語言的資源,所以歡迎補充。
如果你讀過以下其中一本,你應該已經具備了所有解決 coding 問題所需要的資料結構與演算法的知識。除非你想要複習,否則你可以跳過這個計畫中所有的教學影片。
C++
我沒讀過這兩本書,但他們頗受好評。作者是 Sedgewick,他超讚的!
- Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching
- Algorithms in C++ Part 5: Graph Algorithms
如果你有更好的 C++書籍,請告訴我。我正在蒐集全面性的資源。
Java
- Algorithms (Sedgewick and Wayne)
- 在 Coursera 平台上有影片、書籍內容、(以及 Sedgewick!)
或者:
- Data Structures and Algorithms in Java
- 作者:Goodrich、Tamassia、Goldwasser
- 被作為 UC Berkeley 資工系入門課程的補充教材
- 看看下面我對這本書的 Python 版的書評。兩本書都包含了相同的主題。
Python
- Data Structures and Algorithms in Python
- 作者:Goodrich、Tamassia、Goldwasser
- 我超愛這本書。他包含了所有東西。
- 很 Python 的 Code!
- 我的書評: https://startupnextdoor.com/book-report-data-structures-and-algorithms-in-python/
在你開始之前
這份清單隨著時間越來越大。當然,這也同時代表我越來越難以掌握他的整體內容。
以下是一些清單內的錯誤,希望能讓你避免這些錯誤,並且有更好的學習體驗。
1. 你沒辦法記住所有事情
我看了數小時的影片,同時也寫下了大量的筆記。但過了幾個月後,大部分的東西都消失的無影無蹤。我花了三天重新看過我的筆記,並做了小字卡幫助我複習他們。
請閱讀以下的文章以免重蹈覆轍:
Retaining Computer Science Knowledge.
有人推薦給我的課程(但我還沒看過:( ): Learning how to Learn
2. 使用小字卡
為了解決剛剛提到的遺忘問題,我自己寫了一個小字卡網站。網站上可以新增兩種小字卡,一般的以及程式碼。 每一種類的小字卡都有不同的格式。
這個小字卡網站在製作時便是以行動裝置優先的方式設計的,好處是無論我在何處,我都可以在我的手機與平板上複習。
製作屬於自己的免費小字卡:
- Flashcards site repo
- My flash cards database (old - 1200 cards):
- My flash cards database (new - 1800 cards):
我的小字卡資料庫中包含了組合語言、Python 的小知識、機器學習以及統計。這些內容已經超出了原本他的預設。
關於小字卡:當你第一次知道答案後,別馬上把那張小字卡標記為已知。反覆複習這張小字卡,直到每次都能答對後才是真正學會了這個問題。反覆的動作會讓這個知識深深地烙印在你的腦海內。
這裡有個替代我小字卡的網站Anki,很多人向我推薦過他。這個網站用同一個字卡重複出現的方式讓你牢牢地記住他。 這個網站非常容易使用,支援多平台,並且有雲端同步功能。在 iOS 平台上收費 25 美金,其他平台免費。
這是我用 Anki 這個網站裡的格式所儲存的小字卡資料庫: https://ankiweb.net/shared/info/25173560 (感謝 @xiewenya)
3. 學習資料結構與演算法的同時,也要做一些 Coding 面試中常出現的問題
把你學過的東西應用在解題上面,否則你很快就會忘了他們。這是一個過來人的經驗談。一旦你自認學會了一個主題,像是 Linked List 之類的,打開任何一本 Coding 面試問題書籍,做一些裡面有關 Linked List 的問題。接著繼續讀後面的主題。然後,再回頭反覆做有關 Linked List、遞迴或者其他任何東西(原文為 Recursion,非 Recursive)的題目。但切記一定在讀這些資料結構、演算法的同時,也要實際去寫一些有關這些東西的題目。公司錄取你是為了能有即戰力能夠上戰場,而非一個紙上談兵的人。這邊我覺得還不錯的書籍和網站。更多: Coding Question Practice
4. 複習,複習,再複習
我自己寫了一些有關於 ASCII Code、網路 OSI 模型、Big-O(時間複雜度)等等的小抄。我有空的時候就會把他們拿出來看一看複習一下。
打 Code 累了的話就休息半個小時,並且複習你的小字卡。
5. 專注
能夠干擾你,浪費你寶貴時間的東西很多。因此,專注集中精神實在很難。放點純音樂能幫上一些忙。
這份清單沒有包含的內容
以下為普遍但沒有包含在這份清單內的技術:
- SQL
- Javascript
- HTML、CSS,以及其他前後端的技術
每日計畫
每個主題所花費的時間都不盡相同,有些只要一天,有些需要花上數天。有些主題只有單純的知識而無包含實作。
每天我選擇下面其中一個主題,看跟該主題相關的影片,再用下面的程式語言實作:
- C - 用使用了 struct *或者其他東西當作參數的 struct 以及函數
- C++ - 不要使用內建的東西
- C++ - 用 C++內建的東西,像是 STL 的 Linked List,std::list。
- Python - 使用內建的東西(為了練習 Python)
- 寫一些測試來驗證自己寫的東西是正確的,像是用 assert()等簡單的方法。
- 你也可以用 Java 來練習,上面只是我自己的方法。
你不需要學會所有的程式語言,你只需要專精在某個程式語言 one language for the interview.
為什麼要這樣寫 Code?
- 練習,練習,再練習,直到我對他產生厭惡感,並且能輕鬆無誤地寫出那些 Code。(有些東西需要特別記住,像是在邊界的時候會出現問題(edge cases),或者一些小細節)
- 全部自己來(像是手動分配/釋放記憶體,不要依賴語言中的 garbage collection 的功能(除了 Python 或者 Java))
- 利用語言中內建的東西及工具,之後在實際工作的時候才能得心應手(畢竟我不想在工作時手刻一個 Linked List)。
我沒有時間做每個主題中的每個東西,但我會盡力而為。
下面是我自己寫的程式碼:
你不需要記住每個演算法裡面的內容。
試試看把程式碼寫在白板或者紙上而不是電腦上。接著用一些測資來測試他。最後才用電腦來驗證。
先備知識
- 學習 C
- C 語言無所不在。在你學習的過程中,幾乎任何一本書、課程,或者影片中你都能看到他的身影。
- C Programming Language, Vol 2
- 這本書還滿輕薄的,但他能讓你有初步對於 C 語言的認識。看著這本書並且練習,你能更快地掌握 C 語言。理解 C 語言能讓你更了解程式的運作以及內部記憶體配置。
- answers to questions
- 一個程式在電腦中是如何運作的:
演算法複雜度(Algorithmic complexity) / Big-O / 漸進分析(Asymptotic analysis)
- 沒有任何東西能實作
- 這個主題有許多影片,看到你真正了解他為止。你可以隨時回來複習他。
- 如果這些課程太過數學的話,你可以去看看最下面離散數學的影片,他能讓你更了解這些數學背後的來源以及原理。
- Harvard CS50 - Asymptotic Notation (video)
- Big O Notations (general quick tutorial) (video)
- Big O Notation (and Omega and Theta) - best mathematical explanation (video)
- Skiena:
- A Gentle Introduction to Algorithm Complexity Analysis
- Orders of Growth (video)
- Asymptotics (video)
- UC Berkeley Big O (video)
- UC Berkeley Big Omega (video)
- Amortized Analysis (video)
- Illustrating “Big O” (video)
- TopCoder (includes recurrence relations and master theorem):
- Cheat sheet
- [Review] Big-O notation in 5 minutes (video)
資料結構
-
陣列
- 實作一個可以自動調整大小的陣列(動態陣列 vector)
- (動態)陣列背後原理:
- 實作動態陣列(可變、可動態調整大小的陣列)
- 練習在程式中用陣列以及指標,透過計算指標而存取該內容,而不是直接用索引。
- 直接動態生成一個新的陣列
- 可以生成一個 int 型別的陣列,但不要使用語言提供的功能
- 從 16,或者更大的數開始寫,像是 2 的次方 - 16、32、64、128。
- size() - 陣列中元素個數
- capacity() - 陣列能存的最大元素個數
- is_empty()
- at(index) - 傳回該索引值的元素,附有邊界檢查(boundary check)
- push(item)
- insert(index, item) - 把元素插入該索引值,把原本在該索引值的元素往右邊移動。
- prepend(item) - 可以把元素插入索引值為 0 的地方。
- pop() - 移除陣列中最後一個元素,並回傳該元素的值。
- delete(index) - 刪除在該索引值的元素,並且把右邊剩下元素全部往左移。
- remove(item) - 從陣列中尋找該數值,並且移除他(就算陣列中數個地方都有這個數值)。
- find(item) - 從陣列中尋找該數值,並且傳回最前面找到該數值的索引值,如果沒有則傳回-1。
- resize(nex_capacity) // private function
- 當陣列已經用盡了所有容量後,把陣列的容量*2。
- 如果移除掉一個元素後,陣列實際大小是最大容量的 1/4,則把陣列容量減半。
- 時間複雜度
- O(1) 在陣列末端插入/刪除元素
- O(n) 在任何地方插入/刪除元素
- 空間複雜度
- 在記憶體中的存放位置是連續的,這種儲存方式有助於存取的性能。
- 所需空間 = (陣列容量,>=n) * 元素所需大小,但就算結果為 2n,實際上仍算成 O(n)
-
Linked Lists
- Linked Lists 背後原理:
- C Code (video) - 沒有完整的 code,裡面只包含了用 struct 實作節點的方式以及其記憶體配置。
- Linked List vs 陣列:
- why you should avoid linked lists (video)
- 小心!: 你需要一些關於指標的指標(Pointer to pointer)的知識: (當你回傳一個指標到函式,這個動作可能會改變指標所指向的地址) 這個頁面僅提供基本對於指標的指標的認識。我不推薦這個遍歷 linked list 的方式,因為他用的方式太過神奇,所以可讀性以及維護性並不好。
- 實作 Linked list (我做了有末端指標(tail pointer)的版本以及無末端指標的版本):
- size() - 回傳 linked list 裡面的元素個數
- empty() - 回傳型態:bool,如果 linked list 為空,回傳 true
- value_at(index) - 回傳索引值為 index 的元素的數值,第一個元素索引值為 0,以此類推
- push_front(value) - 從 linked list 的起始點加入新的元素
- pop_front() - 移除第一個元素,並且回傳該元素的數值
- push_back(value) - 在 linked list 末端加入新元素
- pop_back() - 移除最後一個元素,並且回傳該元素的數值
- front() - 回傳第一個元素的數值
- back() - 回傳最後一個元素的數值
- insert(index, value) - 把新元素插入到該索引值,而新元素指向原本在該索引值的元素。
- erase(index) - 刪除該索引值的元素(節點)
- value_n_from_end(n) - 回傳從末端開始計算的第 n 個元素的數值
- reverse() - 反轉該 linked list
- remove_value(value) - 刪除第一個為該數值的元素(意即 7 2 2 1,要刪除 2 的話,只刪除 index:1 的那個 2)
- 雙向 linked List
- 背後原理(影片)
- 不需實作
-
Stack(堆疊)
- Stacks(影片)
- 使用 Stacks 先進後出(Last-In First-Out)(影片)
- [Review] Stacks in 3 minutes (video)
- 無須實作,可以用陣列實作,但這樣太過簡單了。
-
Queue(佇列)
- 使用 Queues(先進先出)First-In First-Out(影片)
- Queue(影片)
- Circular buffer/FIFO
- Priority Queues(影片)
- [Review] Queues in 3 minutes (video)
- 使用 linked list 實作,包含末端指標(tail pointer):
- enqueue(value) - 在 queue 末端加入元素
- dequeue() - 刪除當時 queue 中最早進入的元素(意即 queue 中第一個元素),並且回傳該元素的值。
- empty()
- full()
-
[ ] 複雜度:
- enqueue: O(1) (平均情況,無論對於用 linked list 或陣列實作的方法)
- dequeue: O(1) (linked list 與陣列)
- empty: O(1) (linked list 與陣列)
-
Hash table(雜湊表)
-
影片:
-
線上開放式課程:
-
實作雜湊表(用陣列以及線性探測(linear probing))
- hash(k, m) - m:雜湊表的大小
- add(key, value) - 如果 key 已經存在,則更新該 key 的 value
- exists(key)
- get(key)
- remove(key)
-
更多
-
二分搜尋法(Binary Search)
- 二分搜尋法(影片)
- 二分搜尋法(影片)
- 細節
- [Review] Binary search in 4 minutes (video)
- 實作:
- 二分搜尋法 (對已經排列好的數列)
- 用遞迴(recursion)的方法實作二分搜尋法
-
位元運算(Bitwise operations)
- Bits cheat sheet - 你應該能背出一些 2 的指數(從 2^1 到 2^16 以及 2^32)
- 實際了解如何用下列的位元運算子來操作每個位元: &, |, ^, ~, >>, <<
- 一補數與二補數
- count set bits
- swap values:
- 絕對值:
樹狀結構(Tree)
-
Trees-重點與背景知識
- Series: Trees (影片)
- 如何建立一棵樹
- 遍歷
- 操作樹的演算法
- BFS(breadth-first search) and DFS(depth-first search) (影片)
- BFS(廣度優先搜尋)重點:
- 每一層的順序(BFS,用 queue)
- 時間複雜度: O(n)
- 空間複雜度: 最佳: O(1), 最糟: O(n/2)=O(n)
- DFS(深度優先搜尋)重點:
- 時間複雜度: O(n)
- 空間複雜度: 最佳: O(log n) - 平均. 樹的高度 最糟: O(n)
- 中序 (DFS: 左子樹、根、右子樹)
- 後序 (DFS: 左子樹、右子樹、根)
- 前序 (DFS: 根、左子樹、右子樹)
- BFS(廣度優先搜尋)重點:
- [Review] Breadth-first search in 4 minutes (video)
- [Review] Depth-first search in 4 minutes (video)
- [Review] Tree Traversal (playlist) in 11 minutes (video)
-
二元搜尋樹 Binary search trees: BSTs
- Binary Search Tree Review (影片)
- Series (影片)
- starts with symbol table and goes through BST applications
- Introduction (影片)
- MIT (影片)
- C/C++:
- 實作:
- insert // 把數值插入到二元搜尋樹當中
- get_node_count // get count of values stored
- print_values // 把二元搜尋樹中的數值從小到大輸出
- delete_tree
- is_in_tree // 如果給定的數值位於二元搜尋樹當中則回傳 true
- get_height // 回傳該節點內的高度(單一節點的樹高度為 1)
- get_min // 回傳二元搜尋樹當中的最小值
- get_max // 回傳二元搜尋樹當中的最大值
- is_binary_search_tree
- delete_value
- get_successor // 回傳二元搜尋樹當中大小在給定數值後一位的數值,如果沒有則回傳-1
-
Heap / Priority Queue / Binary Heap
- 一般將此資料結構以樹的方式視覺化,但實際上是以線性的方式儲存(陣列、linked list)
- Heap
- Introduction (影片)
- Naive Implementations (影片)
- Binary Trees (影片)
- Tree Height Remark (影片)
- Basic Operations (影片)
- Complete Binary Trees (影片)
- Pseudocode (影片)
- Heap Sort - jumps to start (影片)
- Heap Sort (影片)
- Building a heap (影片)
- MIT: Heaps and Heap Sort (影片)
- CS 61B Lecture 24: Priority Queues (影片)
- Linear Time BuildHeap (max-heap)
- [Review] Heap (playlist) in 13 minutes (video)
- 實作 max heap:
- insert
- sift_up - needed for insert
- get_max - returns the max item, without removing it
- get_size() - return number of elements stored
- is_empty() - returns true if heap contains no elements
- extract_max - returns the max item, removing it
- sift_down - needed for extract_max
- remove(i) - removes item at index x
- heapify - create a heap from an array of elements, needed for heap_sort
- heap_sort() - take an unsorted array and turn it into a sorted array in-place using a max heap
- 重點: using a min heap instead would save operations, but double the space needed (cannot do in-place).
排序
-
重點:
- 實作排序,並且了解該排序法在最佳/最糟/平均情況下的複雜度為何:
- 不要用泡沫排序法(bubble sort),那個太糟了 - O(n^2),除非 n<=16
- 排序演算法的穩定性(“快速排序法穩定嗎?“)
- 哪個演算法可以用在 linked list 上?那些可以用在陣列上?或者兩個都可以?
- 我不推薦對一個 linked list 進行排序,但合併排序是可行的.
- 對 linked list 進行合併排序
- 實作排序,並且了解該排序法在最佳/最糟/平均情況下的複雜度為何:
-
有關於堆積排序,可以看看上面有關於堆積的介紹。堆積排序很棒,但不太穩定。
-
UC Berkeley:
-
合併排序程式碼:
-
快速排序程式碼:
-
實作:
- 合併排序: 平均與最糟都是 O(n log n)
- 快速排序: 平均 O(n log n)
- 選擇排序與插入排序平均與最糟的時間複雜度都是 O(n^2)。
- 有關堆積排序,請見上方關於堆積的介紹。
-
非必要,但我建議看一看:
這裡有15 種排序演算法的影片,如果你想對排序演算法有更多的了解,看看Additional Detail on Some Subjects裡的「排序」這個部分
圖
圖在電腦科學中可以用來表示、處理很多問題,所以這個部分就像樹以及排序一樣篇幅很長。
-
重點:
- 有 4 種基本表示圖的方式:
- 物件與指標(objects and pointers)
- adjacency matrix
- adjacency list
- adjacency map
- 請務必了解每種圖的表示法與每種表示法的優點及缺點。
- 廣度優先搜尋(BFS)與深度優先搜尋(DFS) - 知道他們的時間複雜度,他們的優缺點,以及如何實作他們。
- 如果出現一個問題,請優先想想看有沒有辦法用圖的方式解決,如果沒辦法再想想其他方法。
- 有 4 種基本表示圖的方式:
-
MIT(影片):
-
Skiena Lectures - 很棒的入門介紹:
- CSE373 2012 - Lecture 11 - Graph Data Structures (影片)
- CSE373 2012 - Lecture 12 - Breadth-First Search (影片)
- CSE373 2012 - Lecture 13 - Graph Algorithms (影片)
- CSE373 2012 - Lecture 14 - Graph Algorithms (con’t) (影片)
- CSE373 2012 - Lecture 15 - Graph Algorithms (con’t 2) (影片)
- CSE373 2012 - Lecture 16 - Graph Algorithms (con’t 3) (影片)
-
圖(複習以及進階知識):
- 6.006 Single-Source Shortest Paths Problem (影片)
- 6.006 Dijkstra (影片)
- 6.006 Bellman-Ford (影片)
- 6.006 Speeding Up Dijkstra (影片)
- Aduni: Graph Algorithms I - Topological Sorting, Minimum Spanning Trees, Prim’s Algorithm - Lecture 6 (影片)
- Aduni: Graph Algorithms II - DFS, BFS, Kruskal’s Algorithm, Union Find Data Structure - Lecture 7 (影片)
- Aduni: Graph Algorithms III: Shortest Path - Lecture 8 (影片)
- Aduni: Graph Alg. IV: Intro to geometric algorithms - Lecture 9 (影片)
-
CS 61B 2014 (starting at 58:09) (影片) - CS 61B 2014: Weighted graphs (影片)
- Greedy Algorithms: Minimum Spanning Tree (影片)
- Strongly Connected Components Kosaraju’s Algorithm Graph Algorithm (影片)
- [Review] Shortest Path Algorithms (playlist) in 16 minutes (video)
- [Review] Minimum Spanning Trees (playlist) in 4 minutes (video)
-
完整 Coursera 課程:
-
我將實作:
- 深度優先搜尋搭配 adjacency list(遞迴 recursive)
- 深度優先搜尋搭配 adjacency list(疊代+stack)
- 深度優先搜尋搭配 adjacency matrix (遞迴 recursive)
- 深度優先搜尋搭配 adjacency matrix(疊代+stack)
- 廣度優先搜尋搭配 adjacency list
- 廣度優先搜尋搭配 adjacency matrix
- single-source shortest path (Dijkstra)
- minimum spanning tree
- DFS-based algorithms (看看上面 Aduni 的影片):
- check for cycle (needed for topological sort, since we’ll check for cycle before starting)
- topological sort
- count connected components in a graph
- list strongly connected components
- check for bipartite graph
更多知識
-
遞迴 Recursion
- Stanford 課程-遞迴 recursion 與回溯法 backtracking:
- 什麼時候適合用這些
- tail recursion 沒有比較好?
-
動態規劃
- 在你的面試中或許沒有任何動態規劃的問題,但能夠知道一個題目可以使用動態規劃來解決是很重要的。
- 動態規劃很難,因為動態規劃的題目通常都要有遞迴關係。要想到他的解法有時需要天外飛來一筆的想法。
- 我建議你可以多看一些動態規劃的題目,這可以讓你對這類型的問題以及他的長相有更多的理解。
- 影片:
- Skiena 的影片有點難跟上,他有時候用白板,寫的字又很小。
- Skiena: CSE373 2012 - Lecture 19 - Introduction to Dynamic Programming (影片)
- Skiena: CSE373 2012 - Lecture 20 - Edit Distance (影片)
- Skiena: CSE373 2012 - Lecture 21 - Dynamic Programming Examples (影片)
- Skiena: CSE373 2012 - Lecture 22 - Applications of Dynamic Programming (影片)
- Simonson: Dynamic Programming 0 (starts at 59:18) (影片)
- Simonson: Dynamic Programming I - Lecture 11 (影片)
- Simonson: Dynamic programming II - Lecture 12 (影片)
- List of individual DP problems (每一部都很短): Dynamic Programming (v 影片 ideo)
- Yale 課程筆記:
- Coursera:
-
物件導向程式設計
- Optional: UML 2.0 Series (影片)
- 重要!物件導向程式設計原則: SOLID Principles (影片)
-
設計模式
-
- 學學這些設計模式:
- strategy
- singleton
- adapter
- prototype
- decorator
- visitor
- factory, abstract factory
- facade
- observer
- proxy
- delegate
- command
- state
- memento
- iterator
- composite
- flyweight
- Chapter 6 (Part 1) - Patterns (影片)
- Chapter 6 (Part 2) - Abstraction-Occurrence, General Hierarchy, Player-Role, Singleton, Observer, Delegation (影片)
- Chapter 6 (Part 3) - Adapter, Facade, Immutable, Read-Only Interface, Proxy (影片)
- Series of videos (27 部影片)
- Head First Design Patterns
- 我知道”Design Patterns: Elements of Reusable Object-Oriented Software”才是這方面的經典書籍,但深入淺出系列對於物件導向的初學者可能是更好的選擇。
- Handy reference: 101 Design Patterns & Tips for Developers
- Design patterns for humans
- 學學這些設計模式:
-
組合數&機率
-
NP、NP-Complete 以及近似演算法
- 了解一些著名的 NP-Complete 問題,像是旅行推銷員問題(traveling salesman problem)和背包問題(knapsack problem),並要能夠在面試官問類似這些經典範例的問題時辨認出他們。
- 了解 NP-complete 的方法。
- 計算複雜性理論(影片))
- Simonson:
- Skiena:
- 複雜性: P、NP、NP-completeness、Reductions(影片)
- 複雜性: 近似演算法(影片)
- 複雜性: Fixed-Parameter Algorithms(影片)
- Peter Norvig 在影片中提到旅行推銷員問題的近似最佳解:
- 在演算法導論(Introduction to Algorithms)第 1048~1140 頁
-
快取(cache)
-
程序與執行緒(Processes&Threads)
-
Computer Science 162 - 作業系統 (25 部影片): - 程序與執行緒在第 1~11 部影片中 - 作業系統以及系統程式設計(影片)
- 程序與執行緒有什麼差別?
- 涵蓋:
- 探討程序、執行緒與並行性
- 程序與執行緒的差別
- 程序(processes)
- 執行緒(threads)
- 鎖(locks)
- 互斥鎖(mutexes)
- 號誌(semaphores)
- 監視器(monitors)
- 他們如何運作?
- 死鎖(deadlock)
- 活鎖(livelock)
- CPU 活動、中斷、上下文交換(context switching)
- 現代 cpu 並行性由多核處理器實現
- Paging, segmentation and virtual memory(影片)
- 中斷 Interrupt(影片)
- 程序所需資源 (記憶體): code, 靜態儲存, stack, heap,檔案描述符(file descriptors)以及 i/o)
- 執行序所需資源 (跟上述相同(除了 stack), 其他的執行序都各自擁有 pc、stack 計數器、暫存器以及 stack 在相同的程序中)
- Forking 只有複製寫入(唯讀),直到新的程序被寫入到記憶體中 u,然後他再複製一個新的
- 上下文交換
- 上下文交換如何被作業系統以及硬體所啟動
- 探討程序、執行緒與並行性
- C++執行序 (包含 10 部影片的系列)
- Python 並行性 (影片)):
-
Testing
- 涵蓋:
- 單元測試如何運作
- 什麼是模擬對象(mock object)
- 整合測試(integration testing)是什麼
- 什麼是依賴注入(dependency injection)
- 敏捷軟體測試-James Bach(影片))
- James Bach 軟體測試的公開課程(影片))
- Steve Freeman - 測試導向軟體開發(影片))
- 依賴注入(Dependency injection):
- 如何寫測試
- 涵蓋:
-
排程 Scheduling
- 作業系統中排程如何運作?
- 可以從作業系統的課程影片中學習
-
字串搜尋演算法以及操作
如果你想知道更多有關這個主題的知識,可以去看某些主題的額外知識中的”String Matching”
-
字典樹(Tries)
- 請注意字典樹有許多種類。有些有前綴,有些沒有,而有些在追蹤路徑時使用字串而非位元。
- 我有看過這些程式碼,但沒有實作
- Sedgewick-字典樹(3 部影片)
- Note:資料結構以及 coding 技巧
- 簡短的課程影片:
- 字典樹:一個被忽視的資料結構
- TopCoder-使用字典樹
- Stanford 線上課程(實際使用範例))(影片)
- MIT 進階資料結構-字串(影片中間有點困難)(影片)
-
浮點數
-
Unicode
-
端序(Endianness)
- 大/小端序
- 大端序 Vs 小端序(影片)
- 由裡入內的大端序與小端序(影片)
- 對於核心開發非常具有技術性。如果大多數的內容聽不懂也沒關係。
- 前半部就已經足夠了
-
網路
- 以下為如果你有網路相關經驗,或是想成為一個可靠的工程師需要知道的知識
- 知道這些有益無害,多多益善!
- Khan Academy
- UDP and TCP: Comparison of Transport Protocols(影片)
- TCP/IP and the OSI Model Explained!(影片)
- Packet Transmission across the Internet. Networking & TCP/IP tutorial.(影片)
- HTTP(影片)
- SSL and HTTPS(影片)
- SSL/TLS(影片)
- HTTP 2.0(影片)
- Video Series (21 videos)(影片)
- Subnetting Demystified - Part 5 CIDR Notation(影片)
- Sockets:
系統設計、可擴充性、資料處理
如果你已經擁有了 4 年以上的程式經驗,那你可以來看看有關系統設計的問題
-
可擴充性與系統設計的範圍非常廣大,裡面還包含了許多主題,因為在設計一個具有擴充性的軟體/硬體時有許多事情需要考量。你可以花點時間看看這些。
-
考量:
- 可擴充性
- 將大量資料抽取出而成一個值
- 將一個資料轉換成另一個
- 處理大量的資料
- 系統設計
- 功能集
- 介面
- class 階層
- 在某些限制下設計一個系統
- 簡單與強健的系統
- 權衡得失
- 效能評估與最佳化
- 可擴充性
-
從這裡開始: The System Design Primer
-
How Do I Prepare To Answer Design Questions In A Technical Inverview?
-
System Design Interview - 這個裡面包含許多資源。把裡面的文章與例子都看過一遍,裡面有些我把他特別放在下面。
-
共識演算法(Consensus Algorithms):
-
可擴充性:
- 不用把這些全讀過,挑幾個你有興趣的即可。
- Great overview(影片)
- 簡單系列:
- Scalable Web Architecture and Distributed Systems
- Fallacies of Distributed Computing Explained
- Pragmatic Programming Techniques
- Jeff Dean - Building Software Systems At Google and Lessons Learned (影片)
- Introduction to Architecting Systems for Scale
- Scaling mobile games to a global audience using App Engine and Cloud Datastore (影片)
- How Google Does Planet-Scale Engineering for Planet-Scale Infra (影片)
- The Importance of Algorithms
- Sharding
- Scale at Facebook (2012), “Building for a Billion Users” (影片)
- Engineering for the Long Game - Astrid Atkinson Keynote(影片)
- 7 Years Of YouTube Scalability Lessons In 30 Minutes
- How PayPal Scaled To Billions Of Transactions Daily Using Just 8VMs
- How to Remove Duplicates in Large Datasets
- A look inside Etsy’s scale and engineering culture with Jon Cowie (video)
- What Led Amazon to its Own Microservices Architecture
- To Compress Or Not To Compress, That Was Uber’s Question
- Asyncio Tarantool Queue, Get In The Queue
- When Should Approximate Query Processing Be Used?
- Google’s Transition From Single Datacenter, To Failover, To A Native Multihomed Architecture
- Spanner
- Machine Learning Driven Programming: A New Programming For A New World
- The Image Optimization Technology That Serves Millions Of Requests Per Day
- A Patreon Architecture Short
- Tinder: How Does One Of The Largest Recommendation Engines Decide Who You’ll See Next?
- Design Of A Modern Cache
- Live Video Streaming At Facebook Scale
- A Beginner’s Guide To Scaling To 11 Million+ Users On Amazon’s AWS
- How Does The Use Of Docker Effect Latency?
- A 360 Degree View Of The Entire Netflix Stack
- Latency Is Everywhere And It Costs You Sales - How To Crush It
- Serverless (very long, just need the gist)
- What Powers Instagram: Hundreds of Instances, Dozens of Technologies
- Cinchcast Architecture - Producing 1,500 Hours Of Audio Every Day
- Justin.Tv’s Live Video Broadcasting Architecture
- Playfish’s Social Gaming Architecture - 50 Million Monthly Users And Growing
- TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
- PlentyOfFish Architecture
- Salesforce Architecture - How They Handle 1.3 Billion Transactions A Day
- ESPN’s Architecture At Scale - Operating At 100,000 Duh Nuh Nuhs Per Second
- 看看下方的”通訊、序列化以及佇列系統”,裡面有一些你可以參考的東西。
- Twitter:
- 更多請看看下面影片系列中的”Mining Massive Datasets”
-
練習系統設計的過程:以下是在紙上練習的一些方法,每個都有他們如何運用在現實中的說明文件:
- 複習: 系統設計入門
- HiredInTech 系統設計
- 提示
- 流程:
- 了解問題與其範圍:
- 在面試官的幫助下定義使用情況
- 提供些額外功能的建議
- 移除面試官認為是超出範圍的東西
- 默認高可用度是必須的,並增加一些使用情況
- 設想限制:
- 想想看每個月會有多少請求
- 想想看每秒會有多少請求(他們可能自願或者讓你自己計算)
- 評估讀寫比率
- 評估時請保持 80/20 法則
- 每秒有多少資料被寫入?
- 5 年內總共需要的儲存空間
- 每秒有多少資料被讀取?
- 抽象設計:
- 層(服務、資料、快取)
- 基礎:讀取平衡、通訊
- 初步概覽驅動整個服務的關鍵演算法
- 考慮會遇到的瓶頸以及解決方案
- 了解問題與其範圍:
- 練習:
總複習
這部分我放了一些簡短的影片,觀看這些影片可以快速的複習一些重要的觀念。
如果你想時常複習,那真是太棒了!
- 2-3 分鐘快速複習影片系列(23 個影片)
- 2-5 分鐘快速複習影片系列-Michael Sambol (40 個影片)
- Sedgewick Videos - Algorithms I
- Sedgewick Videos - Algorithms II
解題練習
現在你已經知道上面所有有關電腦科學的主題了,該是時候做些解題的練習了。
解題練習不能死記題目的解法
為什麼你需要練習解題:
- 快速識別問題,以及如何應用正確的資料結構及演算法。
- 蒐集問題的需求
- 模擬面試時用你的方法闡述問題
- 試著不要在電腦上寫程式,而是 在白板上或紙上
- 想出該問題的時間與空間複雜度
- 測試你的解法
這裡有個很棒的入門教學,內容是如何在面試中有條不紊,並且有互動溝通地解決問題。這種能力可以從面試書籍中獲得,但我覺得這個也超讚的:Algorithm design canvas
家裡沒有白板嗎?這很合理。但我是個奇怪的人,家裡有個大白板。沒有白板的話,可以去美術社買個大的繪圖板。你可以坐在沙發上練習。這是我的「沙發白板」。我在照片中放了一枝筆當作比例尺。如果你用筆的話,你將會希望你可以擦拭他,因為他很快就會變髒了。通常我都用鉛筆與橡皮擦。
補充:
閱讀並解題(按照以下順序):
- Programming Interviews Exposed: Secrets to Landing Your Next Job, 2nd Edition
- 附有 C、C++、Java 的解答
- Cracking the Coding Interview, 6th Edition
- 附有 Java 的解答
看看上方的書單
解題練習/挑戰
學了一些東西之後,可以開始試試每天解一些題目,越多越好!
Coding 面試題目影片:
- IDeserve (88 部影片)
- Tushar Roy (5 個播放清單)
- 對於解法的練習非常有幫助
- Nick White - LeetCode Solutions (187 部影片)
- 這些是我最近很喜歡的影片,你可以在短時間內看完他們
- 對於解法以及程式碼有很棒的解釋
解題網站:
- LeetCode
- TopCoder
- Project Euler (math-focused)
- Codewars
- HackerEarth
- HackerRank
- Codility
- InterviewCake
- Geeks for Geeks
- InterviewBit
- Sphere Online Judge (spoj)
- Codechef
解題 repository:
更多面試:
- Gainlo.co: Mock interviewers from big companies - I used this and it helped me relax for the phone screen and on-site interview.
- Pramp: Mock interviews from/with peers - 點對點面試練習
- Refdash: Mock interviews and expedited interviews - 同樣能藉由跳過跟多家科技公司的面試,幫助你快速步上軌道,
面試前夕
- Coding 面試解密-第二集(影片):
你的履歷
- 看看 Coding 面試解密中的履歷準備。
想想面試時可能的狀況
一些我預想的問題(我或許已經知道答案,但想知道他們的意見或是團隊的觀點):
- 你的團隊規模多大?
- 你的開發週期大概是怎樣?敏捷 agile、瀑布式 waterfall、sprint?
- 截止日前趕工是常態嗎?或是這之中有彈性?
- 在你的團隊中是怎麼做決定的?
- 每週你們開幾次會?
- 你覺得你的工作環境能幫助你專注嗎?
- 你目前在做哪個專案?
- 你喜歡這個專案的哪個部份?
- 工作生活如何?
- 工作與生活如何取得平衡?
一旦你得到工作
恭喜!!!
繼續學習
活到老,學到老。
*****************************************************************************************************
*****************************************************************************************************
下面的東西都是額外的。
讀這些東西,可以更了解電腦科學的概念,
並且能讓自己對任何軟體工程的工作做更好的準備。
如此一來,你將會成為一個更全面的軟體工程師。
*****************************************************************************************************
*****************************************************************************************************
選修書籍
你可以從以下的書單挑選你有興趣的主題來研讀
-
The Unix Programming Environment
- 老,但卻很棒
-
The Linux Command Line: A Complete Introduction
- 現代選擇
-
- 設計模式的入門簡介
-
Design Patterns: Elements of Reusable Object-Oriented Software
- 也被稱為”四人幫”(“Gang of Four(GOF)“)
- 經典設計模式書籍
-
Algorithm Design Manual (Skiena)
- 作為複習以及問題辨別
- 這本書中演算法的部分難度已經超過面試會出現的
- 本書分為兩個部分:
- 資料結構及演算法課本
- 優點:
- 跟其他演算法課本一樣是個很棒的複習素材
- 內含關於作者以往解決工業及學術上問題的經驗的故事
- 含 C 語言程式碼範例
- 缺點:
- 某些地方跟”Introduction to Algorithms”一樣艱深,但在某些主題,“Introduction to Algorithms”或許是更好的選擇。
- 第 7、8、9 章有點難以消化,因為某些地方並沒有解釋得很清楚,或者根本上我就是個學渣
- 別會錯意了,我很喜歡 Skiena 的教學方法以及他的風格。
- 優點:
- 演算法目錄:
- 這個部分是買這本書的最大誘因
- 我即將著手進行這部分,一旦完成這部分我會再更新上來
- 可以在 kindle 上租
- 資料結構及演算法課本
- 解答:
- 勘誤表
-
Write Great Code: Volume 1: Understanding the Machine
- 這本書出版於 2004 年,某些程度上他有點過時了,但對於初步理解電腦是個很棒的資源
- 作者發明了高階組合語言 HLA,所以提到,並且舉了一些 HLA 的例子。裡面沒有用到很多,但都是很棒的組合語言的例子。
- 這些章節很值得一讀,讓你具備極佳的基礎:
- 第 2 章 - 數值表示法
- 第 3 章 - 二進位運算和位元運算
- 第 4 章 - 浮點數表示法
- 第 5 章 - 字元表示法
- 第 6 章 - 記憶體組織和存取
- 第 7 章 - 複合資料型別和記憶體
- 第 9 章 - CPU 架構
- 第 10 章 - 指令集架構
- 第 11 章 - 記憶體架構與組織
-
- 重要: 讀這本書 CP 值不高。這本書作為複習演算法和資料結構還算滿不錯的,但它裡面沒有教你怎麼實作這些東西。你必須要能夠自己寫出很棒、有效率的解法。
- 也被稱為 CLR,或是 CLRS,因為 Stein 是後來才加入作者群的。
-
Computer Architecture, Sixth Edition: A Quantitative Approach
- 更豐富、更新(2017 年),但篇幅較長。
-
- 前幾章提供了一些解決 coding 問題的精妙絕倫的方法(有些很舊,甚至還用磁帶),但那些只是導論。這是本程式設計和架構的指南。
額外學習
我把他們加了進來為了讓你成為更全方位的軟體工程師,並且留意一些科技以及演算法,讓你的資料庫中有更多素材。
-
編譯器
-
Emacs and vi(m)
- 讓你自己熟悉 unix 的文字編輯器
- vi(m):
- emacs:
-
Unix 命令列(command line)工具
-
資訊理論 (影片)
- Khan Academy
- 更多有關馬可夫鏈:
- 進一步的可以去看看下方的 MIT 6.050J Information 和 Entropy 系列
-
同位位元 & 漢明碼 (影片)
-
Entropy
- 同樣看看下方的影片
- 務必先看過資訊理論的影片
- Information Theory, Claude Shannon, Entropy, Redundancy, Data Compression & Bits (影片)
-
密碼學
- 同樣看看下方的影片
- 務必先看過資訊理論的影片
- Khan Academy Series
- Cryptography: Hash Functions
- Cryptography: Encryption
-
壓縮
-
電腦安全
-
垃圾回收(GC)
-
平行計算
-
通訊、序列化以及佇列系統(Messaging, Serialization, and Queueing Systems)
-
A*搜尋演算法
-
快速傅立葉轉換
-
布隆過濾器(Bloom filter)
- 給定布隆過濾器 m 個位元以及 k 個雜湊函式,插入以及 membership testing 都是 O(k)
- 布隆過濾器 (影片))
- 布隆過濾器 | Mining of Massive Datasets | Stanford University (影片)
- 教學
- 如何寫一個布隆過濾器的 APP
-
HyperLogLog
-
Locality-Sensitive Hashing
- 用來判定文件的相似度
- 跟 MD5/SHA 正好相反,他們是用來判定兩個文件/字串是否完全相同
- Simhashing (hopefully) made simple
-
van Emde Boas Trees
-
Augmented Data Structures
-
平衡樹
-
至少要了解一種平衡樹(並且理解怎麼實作他):
-
“在平衡樹中,AVL 和 2/3 樹已經有點過時了,而紅黑樹則比較流行。伸展樹(Splay tree)是一個特別有趣,能夠自我伸展的資料結構,他利用了旋轉 rotation 來移動任何 accessed key 到樹的根部。” - Skiena
-
在這麼多平衡樹之中,我選擇實作伸縮樹。從我曾聽過的看來,在面試時你並不需要會實作平衡搜尋樹。我看過了很多紅黑樹的程式碼。
- 伸縮樹:插入、搜尋、刪除函式 如果你最後選擇實作紅黑樹,試試看以下這些:
- 搜尋和插入函式,跳過刪除
-
我想學習更多有關 B-Tree 的東西,因為在大量資料的時候他運用得十分廣泛
-
AVL 樹
- 實際上: 從我所知,這些樹在實際上並沒有用到太多,但這些是他們可能會出現的地方: AVL 樹是另一個能夠在 O(log n)的時間內搜尋、插入和刪除的資料結構。AVL 樹比起紅黑樹更加嚴格的兩邊平衡,而這也導致了比較慢的插入和刪除,但比較快速的拿取。這對於那些只要建立一次,之後就只要取用的資料結構非常有吸引力,像是字典(或是程式字典,像是直譯器或組譯器的 opcodes)
- MIT AVL Trees / AVL Sort (影片)
- AVL Trees (影片)
- AVL Tree Implementation (影片)
- Split And Merge
-
伸縮樹 Splay tree
- 實際上: 伸縮樹運用在快取、記憶體分配、路由、垃圾回收、資料壓縮、ropes(在長字串時取代 string)、Windows NT(在虛擬記憶體、網路以及檔案系統)等等。
- CS 61B: Splay Trees (影片)
- MIT Lecture: 伸縮樹:
- 非常數學,但務必看看最後 10 分鐘
- 影片
-
紅黑樹
- 這些是 2-3 樹的解釋(請看下方)
- 實際上: 紅黑樹插入、刪除以及搜尋的時間複雜度在最糟情況下仍有一定程度的保證表現。 這不只讓他們在時間要求極高的應用(像是即時應用)有著很高的價值,在建構其他資料結構中也有著極高的價值。 舉例來說,許多在計算幾何中的資料結構都是由紅黑樹所構成的,而在 Linux 中用到的 Completely Fair Scheduler 也用到了紅黑樹。 在 Java8 中,Collection HashMap 也從原本用 Linked List 實作,儲存特定元素的 hashcode,改為用紅黑樹實作。
- Aduni - Algorithms - Lecture 4 (link jumps to starting point) (影片)
- Aduni - Algorithms - Lecture 5 (影片)
- Red-Black Tree
- An Introduction To Binary Search And Red Black Tree
- [Review] Red-Black Trees (playlist) in 30 minutes (video)
-
2-3 搜尋樹
- 實際上: 2-3 樹在犧牲了搜尋速度下換來了相對快速的插入速度(因為高度比起 AVL 樹來的高)
- 2-3 樹極少被用到,因為實作他需要用到不同的節點。因此,紅黑樹的出現機率較高。
- 23-Tree Intuition and Definition (影片)
- Binary View of 23-Tree
- 2-3 Trees (student recitation) (影片)
-
2-3-4 樹 (aka 2-4 樹)
- 實際上: 每個 2-4 樹都有相對應,有著相同資料順序的紅黑樹。2-4 樹的插入以及刪除跟紅黑樹中的 color-flipping 以及 rotation 是相同的。這讓 2-4 樹成為了了解紅黑樹背後的 邏輯的重要工具,而這也是為什麼很多演算法入門課本都會在介紹紅黑樹前介紹 2-4 樹,即使2-4 樹在實際上並不常用到。
- CS 61B Lecture 26: Balanced Search Trees (影片)
- Bottom Up 234-Trees (影片)
- Top Down 234-Trees (影片)
-
N-ary (K-ary, M-ary) trees
- 重點: N 或 K 指的是 branching factor(最大的分支數)
- 二元樹屬於 2-ary tree,branching factor 為 2
- 2-3 樹屬於 3-ary
- K-Ary Tree
-
B 樹
- 有趣的小知識: B 代表什麼是個謎題,或許代表 Boeing、Balanced,或 Bayer(co-inventor)
- 實際上: B 樹廣泛的運用在資料庫中。多數現代資料系統使用 B 樹(或是其變異體)。除了在資料庫方面的應用,B 樹也用在資料系統中,使得我們能夠 隨機存取特定檔案中的任意部分。最基礎的問題是把 file block 的 i address 轉變為 disk block(或是轉變為 cylinder head sector)的 address。
- B-Tree
- B-Tree Datastructure
- Introduction to B-Trees (影片)
- B-Tree Definition and Insertion (影片)
- B-Tree Deletion (影片)
- MIT 6.851 - Memory Hierarchy Models (影片) - 涵蓋了 cache-oblivious B 樹,非常有趣的資料結構 - 前 37 分鐘非常技術性,可以跳過(B 代表 block 大小、cache line 大小)
- [Review] B-Trees (playlist) in 26 minutes (video)
-
-
k-D 樹
- 是找矩形中或是更高維度物體中的點的數量的好方法
- 對於 k-nearest neighbors 很有幫助
- Kd Trees (影片)
- kNN K-d tree algorithm (影片)
-
Skip lists
- “這些資料結構某種程度上有些邪門” - Skiena
- Randomization: Skip Lists (影片)
- For animations and a little more detail
-
Network Flows
-
Disjoint Sets & Union Find
-
快速處理(Fast Processing)的數學
-
Treap
- 二元搜尋樹以及 heap 的結合
- Treap
- Data Structures: Treaps explained (影片)
- Applications in set operations
-
線性程式設計 Linear Programming (影片)
-
幾何、Convex hull (影片)
-
離散數學
- 觀看下方影片
-
機器學習
- 為什麼要學機器學習?
- Google’s Cloud Machine learning tools (影片)
- Google Developers’ Machine Learning Recipes (Scikit Learn & Tensorflow) (影片)
- Tensorflow (影片)
- Tensorflow Tutorials
- Practical Guide to implementing Neural Networks in Python (使用 Theano)
- 課程:
- Great starter course: Machine Learning - 只有影片 - see videos 12-18 for a review of linear algebra (14 and 15 are duplicates)
- Neural Networks and Deep Learning Learning
- Google’s Deep Learning Nanodegree
- Google/Kaggle Machine Learning Engineer Nanodegree
- Self-Driving Car Engineer Nanodegree
- 資源:
某些主題的額外知識
我為了強化某些已經在上面呈現的內容,所以才增加這些東西。但因為上面已經有太多內容了,所以不想把這些放在上面。
You want to get hired in this century, right?
-
SOLID
- Bob Martin SOLID Principles of Object Oriented and Agile Design (video)
- S - Single Responsibility Principle | Single responsibility to each Object
- O - Open/Closed Principal | On production level Objects are ready for extension but not for modification
- L - Liskov Substitution Principal | Base Class and Derived class follow ‘IS A’ principal
- I - Interface segregation principle | clients 不該被強迫去實作他們不會用到的介面
- D -Dependency Inversion principle | 減少對物件的 composition 的依賴
-
Union-Find
-
More Dynamic Programming (影片)
- 6.006: Dynamic Programming I: Fibonacci, Shortest Paths
- 6.006: Dynamic Programming II: Text Justification, Blackjack
- 6.006: DP III: Parenthesization, Edit Distance, Knapsack
- 6.006: DP IV: Guitar Fingering, Tetris, Super Mario Bros.
- 6.046: Dynamic Programming & Advanced DP
- 6.046: Dynamic Programming: All-Pairs Shortest Paths
- 6.046: Dynamic Programming (student recitation)
-
Advanced Graph Processing (影片)
-
MIT Probability (過於數學,進度緩慢,但這對於數學的東西卻是必要之惡) (影片):
-
String Matching
- Rabin-Karp (影片):
- Knuth-Morris-Pratt (KMP):
- Boyer–Moore string search algorithm
- Coursera: Algorithms on Strings
- 開始的時候很不錯,但過了 KMP 的部分後就變得過於困難
- 很好的解釋了 tries(字典樹)
- 可以跳過
-
排序
- Stanford 的排序課程:
- Shai Simonson, Aduni.org:
- Steven Skiena 的排序課程:
影片系列
坐下來享受一下”Netflix 和技巧” :P
-
Excellent - MIT Calculus Revisited: Single Variable Calculus
-
Computer Science 70, 001 - Spring 2015 - Discrete Mathematics and Probability Theory
-
CSE373 - Analysis of Algorithms (25 部影片)
-
UC Berkeley CS 152: Computer Architecture and Engineering (20 部影片) -
MIT 6.042J: Mathematics for Computer Science, Fall 2010 (25 部影片)
電腦科學課程
論文
- Love classic papers?
- 1978: Communicating Sequential Processes
- 2003: The Google File System
- replaced by Colossus in 2012
- 2004: MapReduce: Simplified Data Processing on Large Clusters
- mostly replaced by Cloud Dataflow?
- 2006: Bigtable: A Distributed Storage System for Structured Data
- 2006: The Chubby Lock Service for Loosely-Coupled Distributed Systems
- 2007: Dynamo: Amazon’s Highly Available Key-value Store
- The Dynamo paper kicked off the NoSQL revolution
- 2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)
- 2010: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- 2010: Dremel: Interactive Analysis of Web-Scale Datasets
- 2012: Google’s Colossus
- 論文暫不提供
- 2012: AddressSanitizer: A Fast Address Sanity Checker:
- 2013: Spanner: Google’s Globally-Distributed Database:
- 2014: Machine Learning: The High-Interest Credit Card of Technical Debt
- 2015: Continuous Pipelines at Google
- 2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads
- 2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- 2015: How Developers Search for Code: A Case Study
- 2016: Borg, Omega, and Kubernetes
LICENSE
Coding Interview University
I originally created this as a short to-do list of study topics for becoming a software engineer, but it grew to the large list you see today. After going through this study plan, I got hired as a Software Development Engineer at Amazon! You probably won’t have to study as much as I did. Anyway, everything you need is here.
I studied about 8-12 hours a day, for several months. This is my story: Why I studied full-time for 8 months for a Google interview
Please Note: You won’t need to study as much as I did. I wasted a lot of time on things I didn’t need to know. More info about that below. I’ll help you get there without wasting your precious time.
The items listed here will prepare you well for a technical interview at just about any software company, including the giants: Amazon, Facebook, Google, and Microsoft.
Best of luck to you!
Translations:
Translations in progress:
What is it?
This is my multi-month study plan for becoming a software engineer for a large company.
Required:
- A little experience with coding (variables, loops, methods/functions, etc)
- Patience
- Time
Note this is a study plan for software engineering, not frontend engineering or fullstack development. There are really super roadmaps and coursework for those career paths elsewhere (see https://roadmap.sh/ for more info).
There is a lot to learn in a university Computer Science program, but only knowing about 75% is good enough for an interview, so that’s what I cover here. For a complete CS self-taught program, the resources for my study plan have been included in Kamran Ahmed’s Computer Science Roadmap: https://roadmap.sh/computer-science
Table of Contents
The Study Plan
- What is it?
- Why use it?
- How to use it
- Don’t feel you aren’t smart enough
- A Note About Video Resources
- Choose a Programming Language
- Books for Data Structures and Algorithms
- Interview Prep Books
- Don’t Make My Mistakes
- What you Won’t See Covered
- The Daily Plan
- Coding Question Practice
- Coding Problems
Topics of Study
- Algorithmic complexity / Big-O / Asymptotic analysis
- Data Structures
- More Knowledge
- Trees
- Trees - Intro
- Binary search trees: BSTs
- Heap / Priority Queue / Binary Heap
- balanced search trees (general concept, not details)
- traversals: preorder, inorder, postorder, BFS, DFS
- Sorting
- selection
- insertion
- heapsort
- quicksort
- merge sort
- Graphs
- directed
- undirected
- adjacency matrix
- adjacency list
- traversals: BFS, DFS
- Even More Knowledge
- Final Review
Getting the Job
- Update Your Resume
- Find a Job
- Interview Process & General Interview Prep
- Be thinking of for when the interview comes
- Have questions for the interviewer
- Once You’ve Got The Job
---------------- Everything below this point is optional ----------------
Optional Extra Topics & Resources
- Additional Books
- System Design, Scalability, Data Handling (if you have 4+ years experience)
- Additional Learning
- Compilers
- Emacs and vi(m)
- Unix command line tools
- Information theory
- Parity & Hamming Code
- Entropy
- Cryptography
- Compression
- Computer Security
- Garbage collection
- Parallel Programming
- Messaging, Serialization, and Queueing Systems
- A*
- Fast Fourier Transform
- Bloom Filter
- HyperLogLog
- Locality-Sensitive Hashing
- van Emde Boas Trees
- Augmented Data Structures
- Balanced search trees
- AVL trees
- Splay trees
- Red/black trees
- 2-3 search trees
- 2-3-4 Trees (aka 2-4 trees)
- N-ary (K-ary, M-ary) trees
- B-Trees
- k-D Trees
- Skip lists
- Network Flows
- Disjoint Sets & Union Find
- Math for Fast Processing
- Treap
- Linear Programming
- Geometry, Convex hull
- Discrete math
- Additional Detail on Some Subjects
- Video Series
- Computer Science Courses
- Papers
Why use it?
If you want to work as a software engineer for a large company, these are the things you have to know.
If you missed out on getting a degree in computer science, like I did, this will catch you up and save four years of your life.
When I started this project, I didn’t know a stack from a heap, didn’t know Big-O anything, or anything about trees, or how to traverse a graph. If I had to code a sorting algorithm, I can tell ya it would have been terrible. Every data structure I had ever used was built into the language, and I didn’t know how they worked under the hood at all. I never had to manage memory unless a process I was running would give an “out of memory” error, and then I’d have to find a workaround. I used a few multidimensional arrays in my life and thousands of associative arrays, but I never created data structures from scratch.
It’s a long plan. It may take you months. If you are familiar with a lot of this already it will take you a lot less time.
How to use it
Everything below is an outline, and you should tackle the items in order from top to bottom.
I’m using GitHub’s special markdown flavor, including tasks lists to track progress.
If you don’t want to use git
On this page, click the Code button near the top, then click “Download ZIP”. Unzip the file and you can work with the text files.
If you’re open in a code editor that understands markdown, you’ll see everything formatted nicely.
If you’re comfortable with git
Create a new branch so you can check items like this, just put an x in the brackets: [x]
-
Fork the GitHub repo:
https://github.com/jwasham/coding-interview-university
by clicking on the Fork button. -
Clone to your local repo:
git clone https://github.com/<YOUR_GITHUB_USERNAME>/coding-interview-university.git cd coding-interview-university git remote add upstream https://github.com/jwasham/coding-interview-university.git git remote set-url --push upstream DISABLE # so that you don't push your personal progress back to the original repo
-
Mark all boxes with X after you completed your changes:
git commit -am "Marked personal progress" git pull upstream main # keep your fork up-to-date with changes from the original repo git push # just pushes to your fork
Don’t feel you aren’t smart enough
- Successful software engineers are smart, but many have an insecurity that they aren’t smart enough.
- Following videos may help you overcome this insecurity:
A Note About Video Resources
Some videos are available only by enrolling in a Coursera or EdX class. These are called MOOCs. Sometimes the classes are not in session so you have to wait a couple of months, so you have no access.
It would be great to replace the online course resources with free and always-available public sources, such as YouTube videos (preferably university lectures), so that you people can study these anytime, not just when a specific online course is in session.
Choose a Programming Language
You’ll need to choose a programming language for the coding interviews you do, but you’ll also need to find a language that you can use to study computer science concepts.
Preferably the language would be the same, so that you only need to be proficient in one.
For this Study Plan
When I did the study plan, I used 2 languages for most of it: C and Python
- C: Very low level. Allows you to deal with pointers and memory allocation/deallocation, so you feel the data structures
and algorithms in your bones. In higher level languages like Python or Java, these are hidden from you. In day to day work, that’s terrific,
but when you’re learning how these low-level data structures are built, it’s great to feel close to the metal.
- C is everywhere. You’ll see examples in books, lectures, videos, everywhere while you’re studying.
- The C Programming Language, 2nd Edition
- This is a short book, but it will give you a great handle on the C language and if you practice it a little you’ll quickly get proficient. Understanding C helps you understand how programs and memory work.
- You don’t need to go super deep in the book (or even finish it). Just get to where you’re comfortable reading and writing in C.
- Answers to questions in the book
- Python: Modern and very expressive, I learned it because it’s just super useful and also allows me to write less code in an interview.
This is my preference. You do what you like, of course.
You may not need it, but here are some sites for learning a new language:
For your Coding Interview
You can use a language you are comfortable in to do the coding part of the interview, but for large companies, these are solid choices:
- C++
- Java
- Python
You could also use these, but read around first. There may be caveats:
- JavaScript
- Ruby
Here is an article I wrote about choosing a language for the interview: Pick One Language for the Coding Interview. This is the original article my post was based on: Choosing a Programming Language for Interviews
You need to be very comfortable in the language and be knowledgeable.
Read more about choices:
See language-specific resources here
Books for Data Structures and Algorithms
This book will form your foundation for computer science.
Just choose one, in a language that you will be comfortable with. You’ll be doing a lot of reading and coding.
C
- Algorithms in C, Parts 1-5 (Bundle), 3rd Edition
- Fundamentals, Data Structures, Sorting, Searching, and Graph Algorithms
Python
- Data Structures and Algorithms in Python
- by Goodrich, Tamassia, Goldwasser
- I loved this book. It covered everything and more.
- Pythonic code
- my glowing book report: https://startupnextdoor.com/book-report-data-structures-and-algorithms-in-python/
Java
Your choice:
- Goodrich, Tamassia, Goldwasser
- Sedgewick and Wayne:
- Algorithms
- Free Coursera course that covers the book (taught by the authors!):
C++
Your choice:
- Goodrich, Tamassia, and Mount
- Sedgewick and Wayne
Interview Prep Books
You don’t need to buy a bunch of these. Honestly “Cracking the Coding Interview” is probably enough, but I bought more to give myself more practice. But I always do too much.
I bought both of these. They gave me plenty of practice.
- Programming Interviews Exposed: Coding Your Way Through the Interview, 4th Edition
- Answers in C++ and Java
- This is a good warm-up for Cracking the Coding Interview
- Not too difficult. Most problems may be easier than what you’ll see in an interview (from what I’ve read)
- Cracking the Coding Interview, 6th Edition
- answers in Java
If you have tons of extra time:
Choose one:
- Elements of Programming Interviews (C++ version)
- Elements of Programming Interviews in Python
- Elements of Programming Interviews (Java version) - Companion Project - Method Stub and Test Cases for Every Problem in the Book
Don’t Make My Mistakes
This list grew over many months, and yes, it got out of hand.
Here are some mistakes I made so you’ll have a better experience. And you’ll save months of time.
1. You Won’t Remember it All
I watched hours of videos and took copious notes, and months later there was much I didn’t remember. I spent 3 days going through my notes and making flashcards, so I could review. I didn’t need all of that knowledge.
Please, read so you won’t make my mistakes:
Retaining Computer Science Knowledge.
2. Use Flashcards
To solve the problem, I made a little flashcards site where I could add flashcards of 2 types: general and code. Each card has different formatting. I made a mobile-first website, so I could review on my phone or tablet, wherever I am.
Make your own for free:
I DON’T RECOMMEND using my flashcards. There are too many and most of them are trivia that you don’t need.
But if you don’t want to listen to me, here you go:
Keep in mind I went overboard and have cards covering everything from assembly language and Python trivia to machine learning and statistics. It’s way too much for what’s required.
Note on flashcards: The first time you recognize you know the answer, don’t mark it as known. You have to see the same card and answer it several times correctly before you really know it. Repetition will put that knowledge deeper in your brain.
An alternative to using my flashcard site is Anki, which has been recommended to me numerous times. It uses a repetition system to help you remember. It’s user-friendly, available on all platforms and has a cloud sync system. It costs $25 on iOS but is free on other platforms.
My flashcard database in Anki format: https://ankiweb.net/shared/info/25173560 (thanks @xiewenya).
Some students have mentioned formatting issues with white space that can be fixed by doing the following: open deck, edit card, click cards, select the “styling” radio button, add the member “white-space: pre;” to the card class.
3. Do Coding Interview Questions While You’re Learning
THIS IS VERY IMPORTANT.
Start doing coding interview questions while you’re learning data structures and algorithms.
You need to apply what you’re learning to solving problems, or you’ll forget. I made this mistake.
Once you’ve learned a topic, and feel somewhat comfortable with it, for example, linked lists:
- Open one of the coding interview books (or coding problem websites, listed below)
- Do 2 or 3 questions regarding linked lists.
- Move on to the next learning topic.
- Later, go back and do another 2 or 3 linked list problems.
- Do this with each new topic you learn.
Keep doing problems while you’re learning all this stuff, not after.
You’re not being hired for knowledge, but how you apply the knowledge.
There are many resources for this, listed below. Keep going.
4. Focus
There are a lot of distractions that can take up valuable time. Focus and concentration are hard. Turn on some music without lyrics and you’ll be able to focus pretty well.
What you won’t see covered
These are prevalent technologies but not part of this study plan:
- Javascript
- HTML, CSS, and other front-end technologies
- SQL
The Daily Plan
This course goes over a lot of subjects. Each will probably take you a few days, or maybe even a week or more. It depends on your schedule.
Each day, take the next subject in the list, watch some videos about that subject, and then write an implementation of that data structure or algorithm in the language you chose for this course.
You can see my code here:
You don’t need to memorize every algorithm. You just need to be able to understand it enough to be able to write your own implementation.
Coding Question Practice
Why is this here? I'm not ready to interview.
Why you need to practice doing programming problems:
- Problem recognition, and where the right data structures and algorithms fit in
- Gathering requirements for the problem
- Talking your way through the problem like you will in the interview
- Coding on a whiteboard or paper, not a computer
- Coming up with time and space complexity for your solutions (see Big-O below)
- Testing your solutions
There is a great intro for methodical, communicative problem solving in an interview. You’ll get this from the programming interview books, too, but I found this outstanding: Algorithm design canvas
Write code on a whiteboard or paper, not a computer. Test with some sample inputs. Then type it and test it out on a computer.
If you don’t have a whiteboard at home, pick up a large drawing pad from an art store. You can sit on the couch and practice. This is my “sofa whiteboard”. I added the pen in the photo just for scale. If you use a pen, you’ll wish you could erase. Gets messy quick. I use a pencil and eraser.
Coding question practice is not about memorizing answers to programming problems.
Coding Problems
Don’t forget your key coding interview books here.
Solving Problems:
Coding Interview Question Videos:
- IDeserve (88 videos)
- Tushar Roy (5 playlists)
- Super for walkthroughs of problem solutions
- Nick White - LeetCode Solutions (187 Videos)
- Good explanations of solution and the code
- You can watch several in a short time
- FisherCoder - LeetCode Solutions
Challenge/Practice sites:
- LeetCode
- My favorite coding problem site. It’s worth the subscription money for the 1-2 months you’ll likely be preparing.
- See Nick White and FisherCoder Videos above for code walk-throughs.
- HackerRank
- TopCoder
- Codeforces
- Codility
- Geeks for Geeks
- AlgoExpert
- Created by Google engineers, this is also an excellent resource to hone your skills.
- Project Euler
- very math focused, and not really suited for coding interviews
Let’s Get Started
Alright, enough talk, let’s learn!
But don’t forget to do coding problems from above while you learn!
Algorithmic complexity / Big-O / Asymptotic analysis
- Nothing to implement here, you’re just watching videos and taking notes! Yay!
- There are a lot of videos here. Just watch enough until you understand it. You can always come back and review.
- Don’t worry if you don’t understand all the math behind it.
- You just need to understand how to express the complexity of an algorithm in terms of Big-O.
- Harvard CS50 - Asymptotic Notation (video)
- Big O Notations (general quick tutorial) (video)
- Big O Notation (and Omega and Theta) - best mathematical explanation (video)
- Skiena (video)
- UC Berkeley Big O (video)
- Amortized Analysis (video)
- TopCoder (includes recurrence relations and master theorem):
- Cheat sheet
- [Review] Big-O notation in 5 minutes (video)
Well, that’s about enough of that.
When you go through “Cracking the Coding Interview”, there is a chapter on this, and at the end there is a quiz to see if you can identify the runtime complexity of different algorithms. It’s a super review and test.
Data Structures
-
Arrays
- About Arrays:
- Implement a vector (mutable array with automatic resizing):
- Practice coding using arrays and pointers, and pointer math to jump to an index instead of using indexing.
- New raw data array with allocated memory
- can allocate int array under the hood, just not use its features
- start with 16, or if starting number is greater, use power of 2 - 16, 32, 64, 128
- size() - number of items
- capacity() - number of items it can hold
- is_empty()
- at(index) - returns item at given index, blows up if index out of bounds
- push(item)
- insert(index, item) - inserts item at index, shifts that index’s value and trailing elements to the right
- prepend(item) - can use insert above at index 0
- pop() - remove from end, return value
- delete(index) - delete item at index, shifting all trailing elements left
- remove(item) - looks for value and removes index holding it (even if in multiple places)
- find(item) - looks for value and returns first index with that value, -1 if not found
- resize(new_capacity) // private function
- when you reach capacity, resize to double the size
- when popping an item, if size is 1/4 of capacity, resize to half
- Time
- O(1) to add/remove at end (amortized for allocations for more space), index, or update
- O(n) to insert/remove elsewhere
- Space
- contiguous in memory, so proximity helps performance
- space needed = (array capacity, which is >= n) * size of item, but even if 2n, still O(n)
-
Linked Lists
- Description:
- C Code (video) - not the whole video, just portions about Node struct and memory allocation
- Linked List vs Arrays:
- Why you should avoid linked lists (video)
- Gotcha: you need pointer to pointer knowledge: (for when you pass a pointer to a function that may change the address where that pointer points) This page is just to get a grasp on ptr to ptr. I don’t recommend this list traversal style. Readability and maintainability suffer due to cleverness.
- Implement (I did with tail pointer & without):
- size() - returns number of data elements in list
- empty() - bool returns true if empty
- value_at(index) - returns the value of the nth item (starting at 0 for first)
- push_front(value) - adds an item to the front of the list
- pop_front() - remove front item and return its value
- push_back(value) - adds an item at the end
- pop_back() - removes end item and returns its value
- front() - get value of front item
- back() - get value of end item
- insert(index, value) - insert value at index, so current item at that index is pointed to by new item at index
- erase(index) - removes node at given index
- value_n_from_end(n) - returns the value of the node at nth position from the end of the list
- reverse() - reverses the list
- remove_value(value) - removes the first item in the list with this value
- Doubly-linked List
- Description (video)
- No need to implement
-
Stack
- Stacks (video)
- [Review] Stacks in 3 minutes (video)
- Will not implement. Implementing with array is trivial
-
Queue
- Queue (video)
- Circular buffer/FIFO
- [Review] Queues in 3 minutes (video)
- Implement using linked-list, with tail pointer:
- enqueue(value) - adds value at position at tail
- dequeue() - returns value and removes least recently added element (front)
- empty()
- Implement using fixed-sized array:
- enqueue(value) - adds item at end of available storage
- dequeue() - returns value and removes least recently added element
- empty()
- full()
- Cost:
- a bad implementation using linked list where you enqueue at head and dequeue at tail would be O(n) because you’d need the next to last element, causing a full traversal each dequeue
- enqueue: O(1) (amortized, linked list and array [probing])
- dequeue: O(1) (linked list and array)
- empty: O(1) (linked list and array)
-
Hash table
-
Videos:
- Hashing with Chaining (video)
- Table Doubling, Karp-Rabin (video)
- Open Addressing, Cryptographic Hashing (video)
- PyCon 2010: The Mighty Dictionary (video)
- PyCon 2017: The Dictionary Even Mightier (video)
- (Advanced) Randomization: Universal & Perfect Hashing (video)
- (Advanced) Perfect hashing (video)
- [Review] Hash tables in 4 minutes (video)
-
Online Courses:
-
Implement with array using linear probing
- hash(k, m) - m is size of hash table
- add(key, value) - if key already exists, update value
- exists(key)
- get(key)
- remove(key)
-
More Knowledge
-
Binary search
- Binary Search (video)
- Binary Search (video)
- detail
- blueprint
- [Review] Binary search in 4 minutes (video)
- Implement:
- binary search (on sorted array of integers)
- binary search using recursion
-
Bitwise operations
- Bits cheat sheet - you should know many of the powers of 2 from (2^1 to 2^16 and 2^32)
- Get a really good understanding of manipulating bits with: &, |, ^, ~, >>, <<
- 2s and 1s complement
- Count set bits
- Swap values:
- Absolute value:
Trees
-
Trees - Intro
- Intro to Trees (video)
- Tree Traversal (video)
- BFS(breadth-first search) and DFS(depth-first search) (video)
- BFS notes:
- level order (BFS, using queue)
- time complexity: O(n)
- space complexity: best: O(1), worst: O(n/2)=O(n)
- DFS notes:
- time complexity: O(n)
- space complexity: best: O(log n) - avg. height of tree worst: O(n)
- inorder (DFS: left, self, right)
- postorder (DFS: left, right, self)
- preorder (DFS: self, left, right)
- BFS notes:
- [Review] Breadth-first search in 4 minutes (video)
- [Review] Depth-first search in 4 minutes (video)
- [Review] Tree Traversal (playlist) in 11 minutes (video)
-
Binary search trees: BSTs
- Binary Search Tree Review (video)
- Introduction (video)
- MIT (video)
- C/C++:
- Binary search tree - Implementation in C/C++ (video)
- BST implementation - memory allocation in stack and heap (video)
- Find min and max element in a binary search tree (video)
- Find height of a binary tree (video)
- Binary tree traversal - breadth-first and depth-first strategies (video)
- Binary tree: Level Order Traversal (video)
- Binary tree traversal: Preorder, Inorder, Postorder (video)
- Check if a binary tree is binary search tree or not (video)
- Delete a node from Binary Search Tree (video)
- Inorder Successor in a binary search tree (video)
- Implement:
- insert // insert value into tree
- get_node_count // get count of values stored
- print_values // prints the values in the tree, from min to max
- delete_tree
- is_in_tree // returns true if given value exists in the tree
- get_height // returns the height in nodes (single node’s height is 1)
- get_min // returns the minimum value stored in the tree
- get_max // returns the maximum value stored in the tree
- is_binary_search_tree
- delete_value
- get_successor // returns next-highest value in tree after given value, -1 if none
-
Heap / Priority Queue / Binary Heap
- visualized as a tree, but is usually linear in storage (array, linked list)
- Heap
- Introduction (video)
- Binary Trees (video)
- Tree Height Remark (video)
- Basic Operations (video)
- Complete Binary Trees (video)
- Pseudocode (video)
- Heap Sort - jumps to start (video)
- Heap Sort (video)
- Building a heap (video)
- MIT: Heaps and Heap Sort (video)
- CS 61B Lecture 24: Priority Queues (video)
- Linear Time BuildHeap (max-heap)
- [Review] Heap (playlist) in 13 minutes (video)
- Implement a max-heap:
- insert
- sift_up - needed for insert
- get_max - returns the max item, without removing it
- get_size() - return number of elements stored
- is_empty() - returns true if heap contains no elements
- extract_max - returns the max item, removing it
- sift_down - needed for extract_max
- remove(x) - removes item at index x
- heapify - create a heap from an array of elements, needed for heap_sort
- heap_sort() - take an unsorted array and turn it into a sorted array in-place using a max heap or min heap
Sorting
-
Notes:
- Implement sorts & know best case/worst case, average complexity of each:
- no bubble sort - it’s terrible - O(n^2), except when n <= 16
- Stability in sorting algorithms (“Is Quicksort stable?“)
- Which algorithms can be used on linked lists? Which on arrays? Which on both?
- I wouldn’t recommend sorting a linked list, but merge sort is doable.
- Merge Sort For Linked List
- Implement sorts & know best case/worst case, average complexity of each:
-
For heapsort, see Heap data structure above. Heap sort is great, but not stable
-
UC Berkeley:
-
Merge sort code:
-
Quick sort code:
-
Implement:
- Mergesort: O(n log n) average and worst case
- Quicksort O(n log n) average case
- Selection sort and insertion sort are both O(n^2) average and worst case
- For heapsort, see Heap data structure above
-
Not required, but I recommended them:
As a summary, here is a visual representation of 15 sorting algorithms. If you need more detail on this subject, see “Sorting” section in Additional Detail on Some Subjects
Graphs
Graphs can be used to represent many problems in computer science, so this section is long, like trees and sorting were.
-
Notes:
- There are 4 basic ways to represent a graph in memory:
- objects and pointers
- adjacency matrix
- adjacency list
- adjacency map
- Familiarize yourself with each representation and its pros & cons
- BFS and DFS - know their computational complexity, their trade offs, and how to implement them in real code
- When asked a question, look for a graph-based solution first, then move on if none
- There are 4 basic ways to represent a graph in memory:
-
MIT(videos):
-
Skiena Lectures - great intro:
- CSE373 2020 - Lecture 10 - Graph Data Structures (video)
- CSE373 2020 - Lecture 11 - Graph Traversal (video)
- CSE373 2020 - Lecture 12 - Depth First Search (video)
- CSE373 2020 - Lecture 13 - Minimum Spanning Trees (video)
- CSE373 2020 - Lecture 14 - Minimum Spanning Trees (con’t) (video)
- CSE373 2020 - Lecture 15 - Graph Algorithms (con’t 2) (video)
-
Graphs (review and more):
- 6.006 Single-Source Shortest Paths Problem (video)
- 6.006 Dijkstra (video)
- 6.006 Bellman-Ford (video)
- 6.006 Speeding Up Dijkstra (video)
- Aduni: Graph Algorithms I - Topological Sorting, Minimum Spanning Trees, Prim’s Algorithm - Lecture 6 (video)
- Aduni: Graph Algorithms II - DFS, BFS, Kruskal’s Algorithm, Union Find Data Structure - Lecture 7 (video)
- Aduni: Graph Algorithms III: Shortest Path - Lecture 8 (video)
- Aduni: Graph Alg. IV: Intro to geometric algorithms - Lecture 9 (video)
- CS 61B 2014: Weighted graphs (video)
- Greedy Algorithms: Minimum Spanning Tree (video)
- Strongly Connected Components Kosaraju’s Algorithm Graph Algorithm (video)
- [Review] Shortest Path Algorithms (playlist) in 16 minutes (video)
- [Review] Minimum Spanning Trees (playlist) in 4 minutes (video)
-
Full Coursera Course:
-
I’ll implement:
- DFS with adjacency list (recursive)
- DFS with adjacency list (iterative with stack)
- DFS with adjacency matrix (recursive)
- DFS with adjacency matrix (iterative with stack)
- BFS with adjacency list
- BFS with adjacency matrix
- single-source shortest path (Dijkstra)
- minimum spanning tree
- DFS-based algorithms (see Aduni videos above):
- check for cycle (needed for topological sort, since we’ll check for cycle before starting)
- topological sort
- count connected components in a graph
- list strongly connected components
- check for bipartite graph
Even More Knowledge
-
Recursion
- Stanford lectures on recursion & backtracking:
- When it is appropriate to use it?
- How is tail recursion better than not?
- 5 Simple Steps for Solving Any Recursive Problem(video)
-
Dynamic Programming
- You probably won’t see any dynamic programming problems in your interview, but it’s worth being able to recognize a problem as being a candidate for dynamic programming.
- This subject can be pretty difficult, as each DP soluble problem must be defined as a recursion relation, and coming up with it can be tricky.
- I suggest looking at many examples of DP problems until you have a solid understanding of the pattern involved.
- Videos:
- Skiena: CSE373 2020 - Lecture 19 - Introduction to Dynamic Programming (video)
- Skiena: CSE373 2020 - Lecture 20 - Edit Distance (video)
- Skiena: CSE373 2020 - Lecture 20 - Edit Distance (continued) (video)
- Skiena: CSE373 2020 - Lecture 21 - Dynamic Programming (video)
- Skiena: CSE373 2020 - Lecture 22 - Dynamic Programming and Review (video)
- Simonson: Dynamic Programming 0 (starts at 59:18) (video)
- Simonson: Dynamic Programming I - Lecture 11 (video)
- Simonson: Dynamic programming II - Lecture 12 (video)
- List of individual DP problems (each is short): Dynamic Programming (video)
- Yale Lecture notes:
- Coursera:
-
Design patterns
- Quick UML review (video)
- Learn these patterns:
- strategy
- singleton
- adapter
- prototype
- decorator
- visitor
- factory, abstract factory
- facade
- observer
- proxy
- delegate
- command
- state
- memento
- iterator
- composite
- flyweight
- Series of videos (27 videos)
- Book: Head First Design Patterns
- I know the canonical book is “Design Patterns: Elements of Reusable Object-Oriented Software”, but Head First is great for beginners to OO.
- Handy reference: 101 Design Patterns & Tips for Developers
-
Combinatorics (n choose k) & Probability
- Math Skills: How to find Factorial, Permutation and Combination (Choose) (video)
- Make School: Probability (video)
- Make School: More Probability and Markov Chains (video)
- Khan Academy:
- Course layout:
- Just the videos - 41 (each are simple and each are short):
-
NP, NP-Complete and Approximation Algorithms
- Know about the most famous classes of NP-complete problems, such as traveling salesman and the knapsack problem, and be able to recognize them when an interviewer asks you them in disguise.
- Know what NP-complete means.
- Computational Complexity (video)
- Simonson:
- Skiena:
- Complexity: P, NP, NP-completeness, Reductions (video)
- Complexity: Approximation Algorithms (video)
- Complexity: Fixed-Parameter Algorithms (video)
- Peter Norvig discusses near-optimal solutions to traveling salesman problem:
- Pages 1048 - 1140 in CLRS if you have it.
-
How computers process a program
-
Caches
-
Processes and Threads
- Computer Science 162 - Operating Systems (25 videos):
- for processes and threads see videos 1-11
- Operating Systems and System Programming (video)
- What Is The Difference Between A Process And A Thread?
- Covers:
- Processes, Threads, Concurrency issues
- Difference between processes and threads
- Processes
- Threads
- Locks
- Mutexes
- Semaphores
- Monitors
- How they work?
- Deadlock
- Livelock
- CPU activity, interrupts, context switching
- Modern concurrency constructs with multicore processors
- Paging, segmentation and virtual memory (video)
- Interrupts (video)
- Process resource needs (memory: code, static storage, stack, heap, and also file descriptors, i/o)
- Thread resource needs (shares above (minus stack) with other threads in the same process but each has its own pc, stack counter, registers, and stack)
- Forking is really copy on write (read-only) until the new process writes to memory, then it does a full copy.
- Context switching
- How context switching is initiated by the operating system and underlying hardware?
- Processes, Threads, Concurrency issues
- threads in C++ (series - 10 videos)
- CS 377 Spring ‘14: Operating Systems from University of Massachusetts
- concurrency in Python (videos):
- Computer Science 162 - Operating Systems (25 videos):
-
Testing
- To cover:
- how unit testing works
- what are mock objects
- what is integration testing
- what is dependency injection
- Agile Software Testing with James Bach (video)
- Open Lecture by James Bach on Software Testing (video)
- Steve Freeman - Test-Driven Development (that’s not what we meant) (video)
- Dependency injection:
- How to write tests
- To cover:
-
String searching & manipulations
- Sedgewick - Suffix Arrays (video)
- Sedgewick - Substring Search (videos)
- Search pattern in text (video)
If you need more detail on this subject, see “String Matching” section in Additional Detail on Some Subjects.
-
Tries
- Note there are different kinds of tries. Some have prefixes, some don’t, and some use string instead of bits to track the path
- I read through code, but will not implement
- Sedgewick - Tries (3 videos)
- Notes on Data Structures and Programming Techniques
- Short course videos:
- The Trie: A Neglected Data Structure
- TopCoder - Using Tries
- Stanford Lecture (real world use case) (video)
- MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through) (video)
-
Floating Point Numbers
-
Unicode
-
Endianness
- Big And Little Endian
- Big Endian Vs Little Endian (video)
- Big And Little Endian Inside/Out (video)
- Very technical talk for kernel devs. Don’t worry if most is over your head.
- The first half is enough.
-
Networking
- If you have networking experience or want to be a reliability engineer or operations engineer, expect questions
- Otherwise, this is just good to know
- Khan Academy
- UDP and TCP: Comparison of Transport Protocols (video)
- TCP/IP and the OSI Model Explained! (video)
- Packet Transmission across the Internet. Networking & TCP/IP tutorial. (video)
- HTTP (video)
- SSL and HTTPS (video)
- SSL/TLS (video)
- HTTP 2.0 (video)
- Video Series (21 videos) (video)
- Subnetting Demystified - Part 5 CIDR Notation (video)
- Sockets:
Final Review
This section will have shorter videos that you can watch pretty quickly to review most of the important concepts.
It's nice if you want a refresher often.
- Series of 2-3 minutes short subject videos (23 videos)
- Series of 2-5 minutes short subject videos - Michael Sambol (40 videos):
- Sedgewick Videos - Algorithms I
- Sedgewick Videos - Algorithms II
Update Your Resume
- See Resume prep information in the books: “Cracking The Coding Interview” and “Programming Interviews Exposed”
- “This Is What A GOOD Resume Should Look Like” by Gayle McDowell (author of Cracking the Coding Interview),
- Note by the author: “This is for a US-focused resume. CVs for India and other countries have different expectations, although many of the points will be the same.”
- “Step-by-step resume guide” by Tech Interview Handbook
- Detailed guide on how to set up your resume from scratch, write effective resume content, optimize it, and test your resume
- Detailed guide on how to set up your resume from scratch, write effective resume content, optimize it, and test your resume
Interview Process & General Interview Prep
- How to Pass the Engineering Interview in 2021
- Demystifying Tech Recruiting
- How to Get a Job at the Big 4:
- Cracking The Coding Interview Set 1:
- Cracking the Facebook Coding Interview:
- Prep Courses:
- Python for Data Structures, Algorithms, and Interviews (paid course):
- A Python centric interview prep course which covers data structures, algorithms, mock interviews and much more.
- Intro to Data Structures and Algorithms using Python (Udacity free course):
- A free Python centric data structures and algorithms course.
- Data Structures and Algorithms Nanodegree! (Udacity paid Nanodegree):
- Get hands-on practice with over 100 data structures and algorithm exercises and guidance from a dedicated mentor to help prepare you for interviews and on-the-job scenarios.
- Grokking the Behavioral Interview (Educative free course):
- Many times, it’s not your technical competency that holds you back from landing your dream job, it’s how you perform on the behavioral interview.
- AlgoMonster (paid course with free content):
- The crash course for LeetCode. Covers all the patterns condensed from thousands of questions.
- Python for Data Structures, Algorithms, and Interviews (paid course):
Mock Interviews:
- Gainlo.co: Mock interviewers from big companies - I used this and it helped me relax for the phone screen and on-site interview
- Pramp: Mock interviews from/with peers - peer-to-peer model of practice interviews
- interviewing.io: Practice mock interview with senior engineers - anonymous algorithmic/systems design interviews with senior engineers from FAANG anonymously
- Meetapro: Mock interviews with top FAANG interviewers - an Airbnb-style mock interview/coaching platform.
Be thinking of for when the interview comes
Think of about 20 interview questions you’ll get, along with the lines of the items below. Have at least one answer for each. Have a story, not just data, about something you accomplished.
- Why do you want this job?
- What’s a tough problem you’ve solved?
- Biggest challenges faced?
- Best/worst designs seen?
- Ideas for improving an existing product
- How do you work best, as an individual and as part of a team?
- Which of your skills or experiences would be assets in the role and why?
- What did you most enjoy at [job x / project y]?
- What was the biggest challenge you faced at [job x / project y]?
- What was the hardest bug you faced at [job x / project y]?
- What did you learn at [job x / project y]?
- What would you have done better at [job x / project y]?
Have questions for the interviewer
Some of mine (I already may know the answers, but want their opinion or team perspective):
- How large is your team?
- What does your dev cycle look like? Do you do waterfall/sprints/agile?
- Are rushes to deadlines common? Or is there flexibility?
- How are decisions made in your team?
- How many meetings do you have per week?
- Do you feel your work environment helps you concentrate?
- What are you working on?
- What do you like about it?
- What is the work life like?
- How is the work/life balance?
Once You’ve Got The Job
Congratulations!
Keep learning.
You’re never really done.
*****************************************************************************************************
*****************************************************************************************************
Everything below this point is optional. It is NOT needed for an entry-level interview.
However, by studying these, you'll get greater exposure to more CS concepts, and will be better prepared for
any software engineering job. You'll be a much more well-rounded software engineer.
*****************************************************************************************************
*****************************************************************************************************
Additional Books
These are here so you can dive into a topic you find interesting.
- The Unix Programming Environment
- An oldie but a goodie
- The Linux Command Line: A Complete Introduction
- A modern option
- TCP/IP Illustrated Series
- Head First Design Patterns
- A gentle introduction to design patterns
- Design Patterns: Elements of Reusable Object-Oriented Software
- AKA the “Gang Of Four” book, or GOF
- The canonical design patterns book
- Algorithm Design Manual (Skiena)
- As a review and problem recognition
- The algorithm catalog portion is well beyond the scope of difficulty you’ll get in an interview
- This book has 2 parts:
- Class textbook on data structures and algorithms
- Pros:
- Is a good review as any algorithms textbook would be
- Nice stories from his experiences solving problems in industry and academia
- Code examples in C
- Cons:
- Can be as dense or impenetrable as CLRS, and in some cases, CLRS may be a better alternative for some subjects
- Chapters 7, 8, 9 can be painful to try to follow, as some items are not explained well or require more brain than I have
- Don’t get me wrong: I like Skiena, his teaching style, and mannerisms, but I may not be Stony Brook material
- Pros:
- Algorithm catalog:
- This is the real reason you buy this book.
- This book is better as an algorithm reference, and not something you read cover to cover.
- Class textbook on data structures and algorithms
- Can rent it on Kindle
- Answers:
- Errata
- Algorithm (Jeff Erickson)
- Write Great Code: Volume 1: Understanding the Machine
- The book was published in 2004, and is somewhat outdated, but it’s a terrific resource for understanding a computer in brief
- The author invented HLA, so take mentions and examples in HLA with a grain of salt. Not widely used, but decent examples of what assembly looks like
- These chapters are worth the read to give you a nice foundation:
- Chapter 2 - Numeric Representation
- Chapter 3 - Binary Arithmetic and Bit Operations
- Chapter 4 - Floating-Point Representation
- Chapter 5 - Character Representation
- Chapter 6 - Memory Organization and Access
- Chapter 7 - Composite Data Types and Memory Objects
- Chapter 9 - CPU Architecture
- Chapter 10 - Instruction Set Architecture
- Chapter 11 - Memory Architecture and Organization
- Introduction to Algorithms
- Important: Reading this book will only have limited value. This book is a great review of algorithms and data structures, but won’t teach you how to write good code. You have to be able to code a decent solution efficiently
- AKA CLR, sometimes CLRS, because Stein was late to the game
- Computer Architecture, Sixth Edition: A Quantitative Approach
- For a richer, more up-to-date (2017), but longer treatment
System Design, Scalability, Data Handling
You can expect system design questions if you have 4+ years of experience.
- Scalability and System Design are very large topics with many topics and resources, since there is a lot to consider when designing a software/hardware system that can scale. Expect to spend quite a bit of time on this
- Considerations:
- Scalability
- Distill large data sets to single values
- Transform one data set to another
- Handling obscenely large amounts of data
- System design
- features sets
- interfaces
- class hierarchies
- designing a system under certain constraints
- simplicity and robustness
- tradeoffs
- performance analysis and optimization
- Scalability
- START HERE: The System Design Primer
- System Design from HiredInTech
- How Do I Prepare To Answer Design Questions In A Technical Interview?
- 8 Things You Need to Know Before a System Design Interview
- Database Normalization - 1NF, 2NF, 3NF and 4NF (video)
- System Design Interview - There are a lot of resources in this one. Look through the articles and examples. I put some of them below
- How to ace a systems design interview
- Numbers Everyone Should Know
- How long does it take to make a context switch?
- Transactions Across Datacenters (video)
- A plain English introduction to CAP Theorem
- MIT 6.824: Distributed Systems, Spring 2020 (20 videos)
- Consensus Algorithms:
- Consistent Hashing
- NoSQL Patterns
- Scalability:
- You don’t need all of these. Just pick a few that interest you.
- Great overview (video)
- Short series:
- Scalable Web Architecture and Distributed Systems
- Fallacies of Distributed Computing Explained
- Jeff Dean - Building Software Systems At Google and Lessons Learned (video)
- Introduction to Architecting Systems for Scale
- Scaling mobile games to a global audience using App Engine and Cloud Datastore (video)
- How Google Does Planet-Scale Engineering for Planet-Scale Infra (video)
- The Importance of Algorithms
- Sharding
- Engineering for the Long Game - Astrid Atkinson Keynote(video)
- 7 Years Of YouTube Scalability Lessons In 30 Minutes
- How PayPal Scaled To Billions Of Transactions Daily Using Just 8VMs
- How to Remove Duplicates in Large Datasets
- A look inside Etsy’s scale and engineering culture with Jon Cowie (video)
- What Led Amazon to its Own Microservices Architecture
- To Compress Or Not To Compress, That Was Uber’s Question
- When Should Approximate Query Processing Be Used?
- Google’s Transition From Single Datacenter, To Failover, To A Native Multihomed Architecture
- The Image Optimization Technology That Serves Millions Of Requests Per Day
- A Patreon Architecture Short
- Tinder: How Does One Of The Largest Recommendation Engines Decide Who You’ll See Next?
- Design Of A Modern Cache
- Live Video Streaming At Facebook Scale
- A Beginner’s Guide To Scaling To 11 Million+ Users On Amazon’s AWS
- A 360 Degree View Of The Entire Netflix Stack
- Latency Is Everywhere And It Costs You Sales - How To Crush It
- What Powers Instagram: Hundreds of Instances, Dozens of Technologies
- Salesforce Architecture - How They Handle 1.3 Billion Transactions A Day
- ESPN’s Architecture At Scale - Operating At 100,000 Duh Nuh Nuhs Per Second
- See “Messaging, Serialization, and Queueing Systems” way below for info on some of the technologies that can glue services together
- Twitter:
- For even more, see “Mining Massive Datasets” video series in the Video Series section
- Practicing the system design process: Here are some ideas to try working through on paper, each with some documentation on how it was handled in the real world:
- review: The System Design Primer
- System Design from HiredInTech
- cheat sheet
- flow:
- Understand the problem and scope:
- Define the use cases, with interviewer’s help
- Suggest additional features
- Remove items that interviewer deems out of scope
- Assume high availability is required, add as a use case
- Think about constraints:
- Ask how many requests per month
- Ask how many requests per second (they may volunteer it or make you do the math)
- Estimate reads vs. writes percentage
- Keep 80/20 rule in mind when estimating
- How much data written per second
- Total storage required over 5 years
- How much data read per second
- Abstract design:
- Layers (service, data, caching)
- Infrastructure: load balancing, messaging
- Rough overview of any key algorithm that drives the service
- Consider bottlenecks and determine solutions
- Understand the problem and scope:
- Exercises:
Additional Learning
I added them to help you become a well-rounded software engineer, and to be aware of certain
technologies and algorithms, so you'll have a bigger toolbox.
-
Compilers
-
Emacs and vi(m)
- Familiarize yourself with a unix-based code editor
- vi(m):
- emacs:
- The Absolute Beginner’s Guide to Emacs (video by David Wilson)
- The Absolute Beginner’s Guide to Emacs (notes by David Wilson)
-
Unix command line tools
-
Information theory (videos)
- Khan Academy
- More about Markov processes:
- See more in MIT 6.050J Information and Entropy series below
-
Parity & Hamming Code (videos)
- Intro
- Parity
- Hamming Code:
- Error Checking
-
Entropy
- Also see videos below
- Make sure to watch information theory videos first
- Information Theory, Claude Shannon, Entropy, Redundancy, Data Compression & Bits (video)
-
Cryptography
- Also see videos below
- Make sure to watch information theory videos first
- Khan Academy Series
- Cryptography: Hash Functions
- Cryptography: Encryption
-
Compression
- Make sure to watch information theory videos first
- Computerphile (videos):
- Compressor Head videos
- (optional) Google Developers Live: GZIP is not enough!
-
Computer Security
-
Garbage collection
-
Parallel Programming
-
Messaging, Serialization, and Queueing Systems
-
A*
-
Fast Fourier Transform
-
Bloom Filter
- Given a Bloom filter with m bits and k hashing functions, both insertion and membership testing are O(k)
- Bloom Filters (video)
- Bloom Filters | Mining of Massive Datasets | Stanford University (video)
- Tutorial
- How To Write A Bloom Filter App
-
HyperLogLog
-
Locality-Sensitive Hashing
- Used to determine the similarity of documents
- The opposite of MD5 or SHA which are used to determine if 2 documents/strings are exactly the same
- Simhashing (hopefully) made simple
-
van Emde Boas Trees
-
Augmented Data Structures
-
Balanced search trees
-
Know at least one type of balanced binary tree (and know how it’s implemented):
-
“Among balanced search trees, AVL and 2/3 trees are now passé, and red-black trees seem to be more popular. A particularly interesting self-organizing data structure is the splay tree, which uses rotations to move any accessed key to the root.” - Skiena
-
Of these, I chose to implement a splay tree. From what I’ve read, you won’t implement a balanced search tree in your interview. But I wanted exposure to coding one up and let’s face it, splay trees are the bee’s knees. I did read a lot of red-black tree code
- Splay tree: insert, search, delete functions If you end up implementing red/black tree try just these:
- Search and insertion functions, skipping delete
-
I want to learn more about B-Tree since it’s used so widely with very large data sets
-
AVL trees
- In practice: From what I can tell, these aren’t used much in practice, but I could see where they would be: The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly balanced than red–black trees, leading to slower insertion and removal but faster retrieval. This makes it attractive for data structures that may be built once and loaded without reconstruction, such as language dictionaries (or program dictionaries, such as the opcodes of an assembler or interpreter)
- MIT AVL Trees / AVL Sort (video)
- AVL Trees (video)
- AVL Tree Implementation (video)
- Split And Merge
-
Splay trees
- In practice: Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, data compression, ropes (replacement of string used for long text strings), in Windows NT (in the virtual memory, networking and file system code) etc
- CS 61B: Splay Trees (video)
- MIT Lecture: Splay Trees:
- Gets very mathy, but watch the last 10 minutes for sure.
- Video
-
Red/black trees
- These are a translation of a 2-3 tree (see below).
- In practice: Red–black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures which provide worst-case guarantees; for example, many data structures used in computational geometry can be based on red–black trees, and the Completely Fair Scheduler used in current Linux kernels uses red–black trees. In the version 8 of Java, the Collection HashMap has been modified such that instead of using a LinkedList to store identical elements with poor hashcodes, a Red-Black tree is used
- Aduni - Algorithms - Lecture 4 (link jumps to starting point) (video)
- Aduni - Algorithms - Lecture 5 (video)
- Red-Black Tree
- An Introduction To Binary Search And Red Black Tree
- [Review] Red-Black Trees (playlist) in 30 minutes (video)
-
2-3 search trees
- In practice: 2-3 trees have faster inserts at the expense of slower searches (since height is more compared to AVL trees).
- You would use 2-3 tree very rarely because its implementation involves different types of nodes. Instead, people use Red Black trees.
- 23-Tree Intuition and Definition (video)
- Binary View of 23-Tree
- 2-3 Trees (student recitation) (video)
-
2-3-4 Trees (aka 2-4 trees)
- In practice: For every 2-4 tree, there are corresponding red–black trees with data elements in the same order. The insertion and deletion operations on 2-4 trees are also equivalent to color-flipping and rotations in red–black trees. This makes 2-4 trees an important tool for understanding the logic behind red–black trees, and this is why many introductory algorithm texts introduce 2-4 trees just before red–black trees, even though 2-4 trees are not often used in practice.
- CS 61B Lecture 26: Balanced Search Trees (video)
- Bottom Up 234-Trees (video)
- Top Down 234-Trees (video)
-
N-ary (K-ary, M-ary) trees
- note: the N or K is the branching factor (max branches)
- binary trees are a 2-ary tree, with branching factor = 2
- 2-3 trees are 3-ary
- K-Ary Tree
-
B-Trees
- Fun fact: it’s a mystery, but the B could stand for Boeing, Balanced, or Bayer (co-inventor).
- In Practice: B-Trees are widely used in databases. Most modern filesystems use B-trees (or Variants). In addition to its use in databases, the B-tree is also used in filesystems to allow quick random access to an arbitrary block in a particular file. The basic problem is turning the file block i address into a disk block (or perhaps to a cylinder-head-sector) address
- B-Tree
- B-Tree Datastructure
- Introduction to B-Trees (video)
- B-Tree Definition and Insertion (video)
- B-Tree Deletion (video)
- MIT 6.851 - Memory Hierarchy Models (video) - covers cache-oblivious B-Trees, very interesting data structures - the first 37 minutes are very technical, may be skipped (B is block size, cache line size)
- [Review] B-Trees (playlist) in 26 minutes (video)
-
-
k-D Trees
- Great for finding number of points in a rectangle or higher dimension object
- A good fit for k-nearest neighbors
- kNN K-d tree algorithm (video)
-
Skip lists
- “These are somewhat of a cult data structure” - Skiena
- Randomization: Skip Lists (video)
- For animations and a little more detail
-
Network Flows
-
Disjoint Sets & Union Find
-
Math for Fast Processing
-
Treap
- Combination of a binary search tree and a heap
- Treap
- Data Structures: Treaps explained (video)
- Applications in set operations
-
Linear Programming (videos)
-
Geometry, Convex hull (videos)
-
Discrete math
Additional Detail on Some Subjects
I added these to reinforce some ideas already presented above, but didn't want to include them
above because it's just too much. It's easy to overdo it on a subject.
You want to get hired in this century, right?
-
SOLID
- Bob Martin SOLID Principles of Object Oriented and Agile Design (video)
- S - Single Responsibility Principle | Single responsibility to each Object
- O - Open/Closed Principle | On production level Objects are ready for extension but not for modification
- L - Liskov Substitution Principle | Base Class and Derived class follow ‘IS A’ Principle
- I - Interface segregation principle | clients should not be forced to implement interfaces they don’t use
- D -Dependency Inversion principle | Reduce the dependency In composition of objects.
-
Union-Find
-
More Dynamic Programming (videos)
- 6.006: Dynamic Programming I: Fibonacci, Shortest Paths
- 6.006: Dynamic Programming II: Text Justification, Blackjack
- 6.006: DP III: Parenthesization, Edit Distance, Knapsack
- 6.006: DP IV: Guitar Fingering, Tetris, Super Mario Bros.
- 6.046: Dynamic Programming & Advanced DP
- 6.046: Dynamic Programming: All-Pairs Shortest Paths
- 6.046: Dynamic Programming (student recitation)
-
Advanced Graph Processing (videos)
-
MIT Probability (mathy, and go slowly, which is good for mathy things) (videos):
-
String Matching
- Rabin-Karp (videos):
- Knuth-Morris-Pratt (KMP):
- Boyer–Moore string search algorithm
- Coursera: Algorithms on Strings
- starts off great, but by the time it gets past KMP it gets more complicated than it needs to be
- nice explanation of tries
- can be skipped
-
Sorting
- Stanford lectures on sorting:
- Shai Simonson:
- Steven Skiena lectures on sorting:
-
NAND To Tetris: Build a Modern Computer from First Principles
Video Series
Sit back and enjoy.
-
List of individual Dynamic Programming problems (each is short)
-
Excellent - MIT Calculus Revisited: Single Variable Calculus
-
Skiena lectures from Algorithm Design Manual - CSE373 2020 - Analysis of Algorithms (26 videos)
-
Carnegie Mellon - Computer Architecture Lectures (39 videos)
-
MIT 6.042J: Mathematics for Computer Science, Fall 2010 (25 videos)
Computer Science Courses
Algorithms implementation
Papers
- Love classic papers?
- 1978: Communicating Sequential Processes
- 2003: The Google File System
- replaced by Colossus in 2012
- 2004: MapReduce: Simplified Data Processing on Large Clusters
- mostly replaced by Cloud Dataflow?
- 2006: Bigtable: A Distributed Storage System for Structured Data
- 2006: The Chubby Lock Service for Loosely-Coupled Distributed Systems
- 2007: Dynamo: Amazon’s Highly Available Key-value Store
- The Dynamo paper kicked off the NoSQL revolution
- 2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)
- 2012: AddressSanitizer: A Fast Address Sanity Checker:
- 2013: Spanner: Google’s Globally-Distributed Database:
- 2015: Continuous Pipelines at Google
- 2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads
- 2015: How Developers Search for Code: A Case Study
- More papers: 1,000 papers