Sonny不讀不行: Scala Parallel Programming筆記 10

Sequential Collection Hierarchy

Parallel Collection Hierarchy

注意中間的generic collection系列是用在當collection未知是sequential or parallel的時候。

舉例來說，如果要找某個array中最大的palindrome:

由於xs是一個GenSeq[Int]，所以只要是這個trait的subclass都能不管是否真的parallel或是sequential而去呼叫以下：

.par conversion time

把一個collection轉換成parallel collection是需要時間的，一個100萬element的list需要128 ms，而一個100萬elements的vector只需要0.0015 ms，這是由於list並非parallelizable collection，所以scala把list轉換成最接近形式的parallel collection: vector。

以下是parallel collections:

所以挑選一個parallel collection要很注意！不要選錯。

小心使用thread-unsafe collection

以下是一個錯誤的範例：

這邊有個讓我發現我觀念錯誤的地方就是，其實不是把0 until 1000給拆成總共加起來是1000個的數份collection，而是語意會達成sequential的語意，也就是應該會拆成{parallel set a} x {parallel set b} 的組合（但是這樣應該會重複計算？不過parallelism會彌補），要不然帶入setA和setB都是disjoint parallel set的話，就變成{parallel set a} x {parallel set b} 組合的其中幾種，而且結果會是不確定的。所以不是我想的那樣，但是這個寫法還是錯！

錯誤在於mutable.Set是not thread-safe，所以每次+=都會update同一個object（同時也驗證上面講的{parallel set a} x {parallel set b} ），所以會錯。

其中一種解決方法是改用thread-safe collection取代mutable.Set:

另一個方法是使用combinator，避免mutation side effects:

這邊所謂的combinator是指"filter" ，丟入b(_)，課堂中講這是似乎有Int=>Boolean的signature，所以是一個function object，代表把set b整個丟入去filter。
其實光是a.filter(b(_))即可，但是如果要efficient，就可以用比較小的set去當filter parameter。

不要在mutable collection traverse時 update / read

這個就算在sequential case也成立。
請看以下錯誤範例：

這邊錯誤(1)在traverse graph.par時，graph(k)被寫入graph(v)，我們不能在traverse的過程中還update。(2) graph(v) 這是一個read operation，但是collection卻正在被(1) update！因為mutable.Map不是一個thread-safe collection!!!!

TrieMap collection

上面的問題可以用一個concurrent TrieMap collection來解決：因為TrieMap提供了atomic snapshot功能，能把某個state確切snapshot起來（在collection.par開始時），所以之後的traverse都是用這個snapshot，而之後的update都不會被看見。

所以這邊有三個graph instance:
1. 在graph.par (被convert成parallel collection)時，一個紅色的snapshot被“隱性自動的”建立了，此snapshot在traverse過程中都一直被用到，所以graph(k) = previous(v)不會影響此紅色snapshot

2. 藍色的instance就是這個TrieMap本身，是mutable，真正的update在這個collection

3. 粉容色的instance是一個我們自己建立的snapshot，這是用來保存update之前的狀態，因為此program有需要用到。

這樣有了1，我們就不會在traverse時候遇到內容被update會打亂traverse state的問題
有了3，我們就不會遇到去讀正在update的graph裡面的某個value，可能state被打亂的問題。

Sonny不讀不行

code

2017年7月31日星期一

Scala Parallel Programming筆記 10 - Scala parallel collections

Sequential Collection Hierarchy

Parallel Collection Hierarchy

.par conversion time

小心使用thread-unsafe collection

不要在mutable collection traverse時 update / read

TrieMap collection

沒有留言:

張貼留言

code

2017年7月31日 星期一

Scala Parallel Programming筆記 10 - Scala parallel collections

Sequential Collection Hierarchy

Parallel Collection Hierarchy

.par conversion time

小心使用thread-unsafe collection

不要在mutable collection traverse時 update / read

TrieMap collection

沒有留言:

張貼留言

2017年7月31日星期一