以往單一的Master Node已經使用多年, 但是已經開始出現瓶頸, 現在這個Caffeine就是在Multiple Master Nodes新架構上
theRegister的文章”Google File System II: Dawn of the Multiplying Master Nodes“就指出:
The trouble – at least for applications that require low latency – is that there’s only one master. “One GFS shortcoming that this immediately exposed had to do with the original single-master design,” Quinlan says. “A single point of failure may not have been a disaster for batch-oriented applications, but it was certainly unacceptable for latency-sensitive applications, such as video serving.”
theRegister文章也指出, 除了反應時間的問題外, 單一Master Node對於File Count也有限制:
The other issue is that Google’s single master can handle only a limited number of files. The master node stores the metadata describing the files spread across the chunkservers, and that metadata can’t be any larger than the master’s memory. In other words, there’s a finite number of files a master can accommodate.
透過分散式Master與分散式Slave, Master Node所能儲存的metadata就能無限制的增加, 並且透過將Chunks由64MB降低為1MB, 每個Slave Node儲存較小的檔案, 如此讓空間更精簡以因應未來十年的需求
那既然分散式Master比較優秀, 為何當時Google會決定單一Master呢? 負責GFS的總工程師Sean Quinlan表示, 因為單一Master對於設計上比較簡單:
The decision to go with a single master was actually one of the very first decisions, mostly just to simplify the overall design problem. That is, building a distributed master right from the outset was deemed too difficult and would take too much time. Also, by going with the single-master approach, the engineers were able to simplify a lot of problems. Having a central place to control replication and garbage collection and many other activities was definitely simpler than handling it all on a distributed basis. So the decision was made to centralize that in one machine.
所以本次Caffeine Update, 其實架構上的變化比資料上的變化更重要, 如果架構變成分散式Master Nodes成功, 那麼再來可以玩的就更多了, 資料量更大、相關度更高、評等參考資料更多 …. 接著資料的變化就會慢慢出來了
如果有興趣探討更多關於新一代GFS, 可以參考ACM的”GFS: Evolution on Fast-forward“