You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Break up the neural network into one network for each
This should be unnecessary, but it helps me debugging the values. Also
add a counter for the number of unique states that are traversed during
a session.
Copy file name to clipboardExpand all lines: ticTacToe.scala
+26-19Lines changed: 26 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -31,17 +31,17 @@ case class InvalidCall(message: String) extends Exception(message)
31
31
objectParameters {
32
32
// Tabular Parameters
33
33
valtabularAlpha=0.1
34
-
valtabularNumberTrainEpisodes=20000
34
+
valtabularNumberTrainEpisodes=50000
35
35
// Both
36
-
valepsilon=0.1
36
+
valepsilon=0.2
37
37
valnumberTestEpisodes=20000
38
38
// Neural Net Parameters
39
-
valneuralNumberTrainEpisodes=100000
40
-
valneuralValueLearningAlpha=0.1// The learning rate used by the value update function
41
-
valneuralNetAlpha=0.1// The learning rate in the neural net itself
39
+
valneuralNumberTrainEpisodes=200000
40
+
valneuralNetAlpha=0.5// The learning rate in the neural net itself
42
41
valneuralGamma=0.99// discount rate
43
-
valneuralInitialBias=0.15// This is in the range [0, f(n)] where n is the number of input neurons and f(x) = 1/sqrt(n). See here: http://neuralnetworksanddeeplearning.com/chap3.html#weight_initialization
44
-
valneuralNumberHiddenNeurons=26
42
+
valneuralInitialBias=0.33// This is in the range [0, f(n)] where n is the number of input neurons and f(x) = 1/sqrt(n). See here: http://neuralnetworksanddeeplearning.com/chap3.html#weight_initialization
43
+
valneuralNumberHiddenNeurons=40
44
+
valneuralValueLearningAlpha=1.0/neuralNumberHiddenNeurons // The learning rate used by the value update function
valstateValues=Map[List[String], Map[Int, Double]]() // The state-value function is stored in a map with keys that are environment states of the Tic-tac-toe board and values that are arrays of the value of each possible action in this state. A possible action is any space that is not currently occupied.
debugPrint(s"Updated player ${name}'s neural net for ${previousStateFeatureVector.mkString(", ")} with reward ${reward} and targetValue ${targetValue}")
0 commit comments