- 积分
- 8505
- 回帖
- 0
- 西莫币
-
- 贡献
-
- 威望
-
- 存款
-
- 阅读权限
- 70
- 最后登录
- 1970-1-1
签到天数: 28 天 连续签到: 2 天 [LV.4]偶尔看看III
|
楼主 |
发表于 2024-6-7 15:07
|
显示全部楼层
来自: 中国江西宜春
本帖最后由 katawong 于 2024-6-7 15:09 编辑
在第7楼,我帖出了一些实际的数据。在这些数据中,可以看到,AI的得分能力虽然能达到最高值1499,但不太稳定,那么,经过一段时间的训练,它的水平改进的怎么样了呢?下面列出它的最新表现。数据如下:
可见,它的设计水平已经获得了极大的提高。
episode: 1/15000, policy_loss: 0.4161, value_loss: 215.2850, reward_loss: 5.3234, policy_entropy: 0.6239, score: 212.0000
episode: 2/15000, policy_loss: 0.7824, value_loss: 536.4194, reward_loss: 24.4589, policy_entropy: 0.3556, score: 115.0000
episode: 3/15000, policy_loss: 0.4982, value_loss: 777.7183, reward_loss: 12.4306, policy_entropy: 0.4113, score: 157.0000
episode: 4/15000, policy_loss: 0.5999, value_loss: 491.3422, reward_loss: 18.5668, policy_entropy: 0.5290, score: 123.0000
episode: 5/15000, policy_loss: 0.4910, value_loss: 525.6333, reward_loss: 25.0361, policy_entropy: 0.5887, score: 178.0000
episode: 6/15000, policy_loss: 0.6155, value_loss: 566.3329, reward_loss: 15.6904, policy_entropy: 0.5261, score: 965.0000
episode: 7/15000, policy_loss: 0.5150, value_loss: 453.9730, reward_loss: 8.0561, policy_entropy: 0.5793, score: 439.0000
episode: 8/15000, policy_loss: 0.5349, value_loss: 589.5212, reward_loss: 10.4976, policy_entropy: 0.5743, score: 118.0000
episode: 9/15000, policy_loss: 0.5593, value_loss: 538.7057, reward_loss: 8.4509, policy_entropy: 0.6550, score: 148.0000
episode: 10/15000, policy_loss: 0.4949, value_loss: 513.8514, reward_loss: 11.3358, policy_entropy: 0.6725, score: 173.0000
episode: 11/15000, policy_loss: 0.4489, value_loss: 333.5057, reward_loss: 5.8847, policy_entropy: 0.6871, score: 1499.0000
episode: 12/15000, policy_loss: 0.4858, value_loss: 380.6137, reward_loss: 13.5092, policy_entropy: 0.4660, score: 673.0000
episode: 13/15000, policy_loss: 0.4853, value_loss: 326.1999, reward_loss: 10.7604, policy_entropy: 0.4214, score: 1499.0000
episode: 14/15000, policy_loss: 0.4360, value_loss: 251.4935, reward_loss: 8.3748, policy_entropy: 0.4258, score: 1499.0000
episode: 15/15000, policy_loss: 0.4405, value_loss: 266.2397, reward_loss: 6.9854, policy_entropy: 0.5010, score: 250.0000
episode: 16/15000, policy_loss: 0.4612, value_loss: 329.6295, reward_loss: 12.1463, policy_entropy: 0.5029, score: 137.0000
episode: 17/15000, policy_loss: 0.4328, value_loss: 246.7722, reward_loss: 9.5955, policy_entropy: 0.4708, score: 146.0000
episode: 18/15000, policy_loss: 0.4205, value_loss: 363.3236, reward_loss: 17.4401, policy_entropy: 0.4099, score: 221.0000
episode: 19/15000, policy_loss: 0.4038, value_loss: 286.7342, reward_loss: 12.3858, policy_entropy: 0.4490, score: 1499.0000
episode: 20/15000, policy_loss: 0.4126, value_loss: 302.7286, reward_loss: 13.7845, policy_entropy: 0.4567, score: 1499.0000
saving...
********** save weights ************
saved
episode: 21/15000, policy_loss: 0.4221, value_loss: 268.0316, reward_loss: 12.9250, policy_entropy: 0.3954, score: 1499.0000
episode: 22/15000, policy_loss: 0.4103, value_loss: 164.1839, reward_loss: 4.9871, policy_entropy: 0.5047, score: 1499.0000
episode: 23/15000, policy_loss: 0.4098, value_loss: 176.3108, reward_loss: 6.6844, policy_entropy: 0.4693, score: 185.0000
episode: 24/15000, policy_loss: 0.4451, value_loss: 335.0486, reward_loss: 17.4868, policy_entropy: 0.6215, score: 187.0000
episode: 25/15000, policy_loss: 0.4455, value_loss: 317.3674, reward_loss: 16.0549, policy_entropy: 0.5449, score: 177.0000
episode: 26/15000, policy_loss: 0.4370, value_loss: 323.3865, reward_loss: 16.7364, policy_entropy: 0.4715, score: 240.0000
episode: 27/15000, policy_loss: 0.4674, value_loss: 212.3390, reward_loss: 9.8769, policy_entropy: 0.4446, score: 1499.0000
episode: 28/15000, policy_loss: 0.4824, value_loss: 258.5034, reward_loss: 13.2846, policy_entropy: 0.5335, score: 1499.0000
episode: 29/15000, policy_loss: 0.4589, value_loss: 256.1392, reward_loss: 12.9572, policy_entropy: 0.5453, score: 235.0000
episode: 30/15000, policy_loss: 0.4912, value_loss: 281.6791, reward_loss: 15.7824, policy_entropy: 0.5648, score: 172.0000
episode: 31/15000, policy_loss: 0.5186, value_loss: 311.5533, reward_loss: 17.0442, policy_entropy: 0.4889, score: 120.0000
episode: 32/15000, policy_loss: 0.4747, value_loss: 226.9470, reward_loss: 11.5003, policy_entropy: 0.5536, score: 178.0000
episode: 33/15000, policy_loss: 0.4693, value_loss: 282.7120, reward_loss: 16.1569, policy_entropy: 0.5670, score: 1499.0000
episode: 34/15000, policy_loss: 0.5162, value_loss: 245.2859, reward_loss: 11.9941, policy_entropy: 0.4875, score: 1499.0000
episode: 35/15000, policy_loss: 0.5049, value_loss: 228.7045, reward_loss: 13.8588, policy_entropy: 0.5310, score: 1499.0000
episode: 36/15000, policy_loss: 0.4621, value_loss: 168.3383, reward_loss: 8.6423, policy_entropy: 0.5155, score: 1499.0000
episode: 37/15000, policy_loss: 0.4796, value_loss: 207.0338, reward_loss: 10.7040, policy_entropy: 0.6197, score: 137.0000
episode: 38/15000, policy_loss: 0.5167, value_loss: 269.0144, reward_loss: 14.4384, policy_entropy: 0.5073, score: 110.0000
episode: 39/15000, policy_loss: 0.4945, value_loss: 197.1236, reward_loss: 8.9764, policy_entropy: 0.5106, score: 102.0000
episode: 40/15000, policy_loss: 0.4572, value_loss: 227.5951, reward_loss: 11.4231, policy_entropy: 0.5298, score: 170.0000
saving...
********** save weights ************
saved
episode: 41/15000, policy_loss: 0.4880, value_loss: 235.6346, reward_loss: 11.7058, policy_entropy: 0.4567, score: 1499.0000
episode: 42/15000, policy_loss: 0.5100, value_loss: 185.6799, reward_loss: 9.7845, policy_entropy: 0.5248, score: 1499.0000
episode: 43/15000, policy_loss: 0.4811, value_loss: 242.0638, reward_loss: 14.2436, policy_entropy: 0.4731, score: 1499.0000
episode: 44/15000, policy_loss: 0.4640, value_loss: 298.4590, reward_loss: 19.5121, policy_entropy: 0.5439, score: 163.0000
episode: 45/15000, policy_loss: 0.5090, value_loss: 315.2503, reward_loss: 18.7176, policy_entropy: 0.5334, score: 124.0000
episode: 46/15000, policy_loss: 0.4924, value_loss: 181.2119, reward_loss: 9.5249, policy_entropy: 0.5446, score: 138.0000
episode: 47/15000, policy_loss: 0.4690, value_loss: 248.3943, reward_loss: 13.5997, policy_entropy: 0.5809, score: 178.0000
episode: 48/15000, policy_loss: 0.4698, value_loss: 196.4921, reward_loss: 8.8744, policy_entropy: 0.5634, score: 1499.0000
episode: 49/15000, policy_loss: 0.4823, value_loss: 276.5391, reward_loss: 16.1656, policy_entropy: 0.5306, score: 1499.0000
episode: 50/15000, policy_loss: 0.4626, value_loss: 203.7857, reward_loss: 10.1539, policy_entropy: 0.5711, score: 1499.0000
episode: 51/15000, policy_loss: 0.4529, value_loss: 225.3459, reward_loss: 13.7119, policy_entropy: 0.6220, score: 194.0000
episode: 52/15000, policy_loss: 0.4617, value_loss: 279.8626, reward_loss: 18.1389, policy_entropy: 0.4605, score: 151.0000
episode: 53/15000, policy_loss: 0.4510, value_loss: 286.3064, reward_loss: 16.6993, policy_entropy: 0.4454, score: 201.0000
episode: 54/15000, policy_loss: 0.4298, value_loss: 183.6426, reward_loss: 9.8382, policy_entropy: 0.5466, score: 1499.0000
episode: 55/15000, policy_loss: 0.4484, value_loss: 169.9896, reward_loss: 9.6564, policy_entropy: 0.5182, score: 1499.0000
episode: 56/15000, policy_loss: 0.4559, value_loss: 277.7924, reward_loss: 16.5925, policy_entropy: 0.5096, score: 1499.0000
episode: 57/15000, policy_loss: 0.4329, value_loss: 201.3654, reward_loss: 14.3267, policy_entropy: 0.5282, score: 1499.0000
episode: 58/15000, policy_loss: 0.4371, value_loss: 173.4138, reward_loss: 8.8108, policy_entropy: 0.5297, score: 189.0000
episode: 59/15000, policy_loss: 0.4652, value_loss: 191.4496, reward_loss: 9.5502, policy_entropy: 0.4893, score: 170.0000
episode: 60/15000, policy_loss: 0.4681, value_loss: 180.4527, reward_loss: 11.0071, policy_entropy: 0.5424, score: 154.0000
saving...
********** save weights ************
saved
episode: 61/15000, policy_loss: 0.4479, value_loss: 144.0589, reward_loss: 7.3609, policy_entropy: 0.4894, score: 1499.0000
episode: 62/15000, policy_loss: 0.4661, value_loss: 229.0062, reward_loss: 13.8491, policy_entropy: 0.5207, score: 1499.0000
episode: 63/15000, policy_loss: 0.4832, value_loss: 206.1293, reward_loss: 12.6785, policy_entropy: 0.5008, score: 1499.0000
episode: 64/15000, policy_loss: 0.4740, value_loss: 185.5161, reward_loss: 9.8251, policy_entropy: 0.5723, score: 1499.0000
episode: 65/15000, policy_loss: 0.4698, value_loss: 252.1371, reward_loss: 15.6446, policy_entropy: 0.5422, score: 382.0000
episode: 66/15000, policy_loss: 0.4815, value_loss: 144.9785, reward_loss: 9.5499, policy_entropy: 0.5144, score: 229.0000
episode: 67/15000, policy_loss: 0.4814, value_loss: 208.4067, reward_loss: 11.8008, policy_entropy: 0.4968, score: 1499.0000
episode: 68/15000, policy_loss: 0.4669, value_loss: 193.6922, reward_loss: 11.3720, policy_entropy: 0.5751, score: 1499.0000
episode: 69/15000, policy_loss: 0.4610, value_loss: 114.4614, reward_loss: 5.4580, policy_entropy: 0.5195, score: 262.0000
episode: 70/15000, policy_loss: 0.4583, value_loss: 132.0404, reward_loss: 6.2988, policy_entropy: 0.5308, score: 1499.0000
episode: 71/15000, policy_loss: 0.4570, value_loss: 214.1931, reward_loss: 12.5732, policy_entropy: 0.5076, score: 1499.0000
episode: 72/15000, policy_loss: 0.4554, value_loss: 188.7498, reward_loss: 9.9031, policy_entropy: 0.5062, score: 1499.0000
episode: 73/15000, policy_loss: 0.4540, value_loss: 146.6611, reward_loss: 7.9460, policy_entropy: 0.5047, score: 1499.0000
episode: 74/15000, policy_loss: 0.4553, value_loss: 240.3613, reward_loss: 13.0326, policy_entropy: 0.4990, score: 194.0000
episode: 75/15000, policy_loss: 0.4634, value_loss: 223.2887, reward_loss: 14.6363, policy_entropy: 0.5375, score: 160.0000
episode: 76/15000, policy_loss: 0.4532, value_loss: 229.6967, reward_loss: 12.6051, policy_entropy: 0.4485, score: 203.0000
episode: 77/15000, policy_loss: 0.4566, value_loss: 164.9831, reward_loss: 9.1544, policy_entropy: 0.5072, score: 222.0000
episode: 78/15000, policy_loss: 0.4538, value_loss: 223.0203, reward_loss: 13.0986, policy_entropy: 0.5476, score: 234.0000
episode: 79/15000, policy_loss: 0.4399, value_loss: 154.3588, reward_loss: 8.4952, policy_entropy: 0.5343, score: 1499.0000
episode: 80/15000, policy_loss: 0.4416, value_loss: 162.0921, reward_loss: 9.9977, policy_entropy: 0.5164, score: 1499.0000
saving...
********** save weights ************
saved
episode: 81/15000, policy_loss: 0.4396, value_loss: 209.8708, reward_loss: 12.2546, policy_entropy: 0.5011, score: 1499.0000
episode: 82/15000, policy_loss: 0.4384, value_loss: 146.2521, reward_loss: 6.6164, policy_entropy: 0.4671, score: 1499.0000
episode: 83/15000, policy_loss: 0.4351, value_loss: 135.2223, reward_loss: 6.6867, policy_entropy: 0.5104, score: 1499.0000
episode: 84/15000, policy_loss: 0.4361, value_loss: 132.8678, reward_loss: 7.1288, policy_entropy: 0.5776, score: 1499.0000
episode: 85/15000, policy_loss: 0.4281, value_loss: 205.0150, reward_loss: 11.6280, policy_entropy: 0.5348, score: 1499.0000
episode: 86/15000, policy_loss: 0.4405, value_loss: 137.1094, reward_loss: 6.7496, policy_entropy: 0.5450, score: 1499.0000
episode: 87/15000, policy_loss: 0.4248, value_loss: 192.6676, reward_loss: 12.1025, policy_entropy: 0.5006, score: 1499.0000
episode: 88/15000, policy_loss: 0.4192, value_loss: 204.3442, reward_loss: 12.0603, policy_entropy: 0.4659, score: 1499.0000
episode: 89/15000, policy_loss: 0.4206, value_loss: 151.4276, reward_loss: 8.0763, policy_entropy: 0.5052, score: 1499.0000
episode: 90/15000, policy_loss: 0.4099, value_loss: 101.7047, reward_loss: 5.5445, policy_entropy: 0.5272, score: 1499.0000
episode: 91/15000, policy_loss: 0.4040, value_loss: 81.9400, reward_loss: 4.1511, policy_entropy: 0.4570, score: 1499.0000
episode: 92/15000, policy_loss: 0.4072, value_loss: 130.3055, reward_loss: 6.9052, policy_entropy: 0.5051, score: 1499.0000
episode: 93/15000, policy_loss: 0.4011, value_loss: 144.9068, reward_loss: 7.7640, policy_entropy: 0.6181, score: 1499.0000
episode: 94/15000, policy_loss: 0.4216, value_loss: 143.0369, reward_loss: 8.1342, policy_entropy: 0.4576, score: 1499.0000
episode: 95/15000, policy_loss: 0.4272, value_loss: 106.0717, reward_loss: 5.1978, policy_entropy: 0.4924, score: 1499.0000
episode: 96/15000, policy_loss: 0.4083, value_loss: 211.5314, reward_loss: 13.0955, policy_entropy: 0.3815, score: 1499.0000
episode: 97/15000, policy_loss: 0.4120, value_loss: 84.5841, reward_loss: 3.8120, policy_entropy: 0.4969, score: 1499.0000
episode: 98/15000, policy_loss: 0.4336, value_loss: 145.2616, reward_loss: 8.1135, policy_entropy: 0.4316, score: 732.0000
episode: 99/15000, policy_loss: 0.4232, value_loss: 195.7559, reward_loss: 10.8406, policy_entropy: 0.4542, score: 1499.0000
episode: 100/15000, policy_loss: 0.4142, value_loss: 188.2957, reward_loss: 9.8935, policy_entropy: 0.5051, score: 1499.0000
saving...
********** save weights ************
saved
episode: 101/15000, policy_loss: 0.4179, value_loss: 90.9623, reward_loss: 5.7983, policy_entropy: 0.4633, score: 1499.0000
episode: 102/15000, policy_loss: 0.4113, value_loss: 133.3348, reward_loss: 6.9438, policy_entropy: 0.5085, score: 1499.0000
episode: 103/15000, policy_loss: 0.4132, value_loss: 139.1751, reward_loss: 8.5288, policy_entropy: 0.4697, score: 1499.0000
episode: 104/15000, policy_loss: 0.4300, value_loss: 130.9949, reward_loss: 7.1526, policy_entropy: 0.5330, score: 1499.0000
episode: 105/15000, policy_loss: 0.4149, value_loss: 136.3931, reward_loss: 7.4422, policy_entropy: 0.5242, score: 1499.0000
episode: 106/15000, policy_loss: 0.4432, value_loss: 168.4173, reward_loss: 8.4631, policy_entropy: 0.4652, score: 239.0000
episode: 107/15000, policy_loss: 0.4539, value_loss: 162.3065, reward_loss: 9.4388, policy_entropy: 0.6536, score: 1499.0000
episode: 108/15000, policy_loss: 0.4042, value_loss: 109.3871, reward_loss: 4.4598, policy_entropy: 0.5160, score: 1499.0000
episode: 109/15000, policy_loss: 0.4453, value_loss: 137.5493, reward_loss: 7.7495, policy_entropy: 0.4238, score: 1499.0000
episode: 110/15000, policy_loss: 0.4619, value_loss: 151.0615, reward_loss: 7.7821, policy_entropy: 0.4258, score: 1499.0000
episode: 111/15000, policy_loss: 0.4091, value_loss: 158.0819, reward_loss: 8.4372, policy_entropy: 0.4191, score: 1499.0000
episode: 112/15000, policy_loss: 0.4124, value_loss: 135.9255, reward_loss: 6.8434, policy_entropy: 0.4593, score: 1499.0000
episode: 113/15000, policy_loss: 0.4487, value_loss: 205.1536, reward_loss: 12.3879, policy_entropy: 0.4244, score: 225.0000
episode: 114/15000, policy_loss: 0.4136, value_loss: 113.9060, reward_loss: 5.7282, policy_entropy: 0.4505, score: 1499.0000
episode: 115/15000, policy_loss: 0.4007, value_loss: 157.4366, reward_loss: 8.7329, policy_entropy: 0.3948, score: 1499.0000
episode: 116/15000, policy_loss: 0.4028, value_loss: 156.5356, reward_loss: 8.9431, policy_entropy: 0.6158, score: 1499.0000
episode: 117/15000, policy_loss: 0.3941, value_loss: 111.8174, reward_loss: 5.8977, policy_entropy: 0.5187, score: 1499.0000
episode: 118/15000, policy_loss: 0.3956, value_loss: 127.4793, reward_loss: 6.4444, policy_entropy: 0.3781, score: 1499.0000
episode: 119/15000, policy_loss: 0.3852, value_loss: 56.8715, reward_loss: 2.0524, policy_entropy: 0.3738, score: 1499.0000
episode: 120/15000, policy_loss: 0.3904, value_loss: 163.3692, reward_loss: 9.6969, policy_entropy: 0.4368, score: 1499.0000
saving...
********** save weights ************
saved
episode: 121/15000, policy_loss: 0.3907, value_loss: 139.1056, reward_loss: 8.3761, policy_entropy: 0.4708, score: 1499.0000
episode: 122/15000, policy_loss: 0.3772, value_loss: 130.9609, reward_loss: 8.2894, policy_entropy: 0.4824, score: 1499.0000
episode: 123/15000, policy_loss: 0.3987, value_loss: 161.7403, reward_loss: 10.1152, policy_entropy: 0.5091, score: 1499.0000
episode: 124/15000, policy_loss: 0.3969, value_loss: 124.7367, reward_loss: 7.3211, policy_entropy: 0.4296, score: 1499.0000
|
|