katawong 发表于 2024-6-5 10:49

关于电机设计平台的一些技术分享

本帖最后由 katawong 于 2024-6-5 11:02 编辑

我公司最近与深圳迪曼联合举办了电机设计友谊擂台赛,https://bbs.simol.cn/thread-214866-1-1.html。 对此有批评的,有鼓励的,非常感谢大家的意见。无论是批评还是鼓励,都是我们前进的动力。

在这里,我把我们的平台的一些技术分享出来,以便我们共同进步。

我们并不是设计了一款电机设计软件,而是设计了一个电机设计工程师。

这个电机设计工程师是一个AI, 他和我们人类工程师一样,也是使用电机设计软件进行电机设计的。到目前为止,它调用了ansysEM,speed, MotorCAD软件,可以进行电机电磁设计,温度分析及应力计算。

这个AI工程师,叫做alphaMD, 我们就叫他小A吧。

这个小A是怎么进行设计工作的呢?

小A内部,设计有两个工程师,一个叫设计工程师D,它是专门来设计电机电磁方案的。另一个是评估工程师T,它是专门用于对D设计的电磁方案进行好坏评估并给出分数。

一开始,D 和 T 都是随机的,D是随意的设计,T是随意的打分。

D 工程师在不断的学习,它的目标是使 T 工程师的打分尽可能的高,D 要尽可能的学习如何调整方案以便得到高分。

T 工程师也在不断的学习,它的目标是使自己的打分尽可能的准确。T 只负责打分,它并不告诉 D 应该怎么做才能得到更高的分数,它只告诉 D 这个方案你得了多少分。

最终,T 的打分极为精准,可以准确的告诉 D, 这次你的设计方案可以得多少分。

而 D,设计水平也得到了极大的提高,它每次的设计都能得到很高的分。

就这样,D成长为设计专家,而T,也成长为评估专家。

这个D最终会到什么水平呢?我们作为设计者,其实也不知道,因为它在不断的学习,不断的成长。

下面这张图纸,是它的一个水平训练记录。

纵轴表示它的设计能力,200分 表示合格线,1500分 是我们人为限制的最高分,达到了这个分数就不再寻求更好的方案,该次训练中断,重新开始训练。 蓝色的点表示它的设计方案所得分数。红色实线表示它的平均得分。

横轴表示训练次数。当然,在本图中,之前的大量训练数据平均分都在200以下,这里省略了。

从图中可以看出,它的平均得分,也即它的平均设计水平,在某个时刻超过了合格线200分,之后出现了瓶颈,很长时间没有提升,一直保持在300分左右。再到某一时刻,横坐标2500处,其设计水平又开始提升了。

但不管如何,它给出了大量的,得分可以达到最高值1500分的方案。



katawong 发表于 2024-6-5 11:36

对于没有AI预备知识的朋友来说,文字解释一般不容易理解。我做了一些科普性质的小视频,放在我的抖音号上,如果有兴趣的同仁,可以私信我,我把抖音号发给您。编号001-013用比较浅显的语言解释了alphaMD的原理。谢谢大家,并请提出您的宝贵意见。

huanhuangui 发表于 2024-6-5 13:17

抢我们的饭碗,还让我们来评价,,,,

katawong 发表于 2024-6-5 16:23

huanhuangui 发表于 2024-6-5 13:17
抢我们的饭碗,还让我们来评价,,,,

呵呵。

在我参加工作时,设计部门还有一个工种叫描图员。不知您知道不知道。

我们设计电机时完全靠手算。

后来技术部门引进了电脑,我记得是长城8088,编程语言是BASIC,没有硬盘,要用软盘启动。电算程序是自己编的。算一次大约要5分钟。

后来又引进了康柏486电脑,算一次方案中只需要1秒钟不到。

后来又引入了autoCAD10,慢慢的描图员的工作就被取消了。

huanhuangui 发表于 2024-6-5 16:36

katawong 发表于 2024-6-5 16:23
呵呵。

在我参加工作时,设计部门还有一个工种叫描图员。不知您知道不知道。


看吧,成功抢了绘图员的工作

katawong 发表于 2024-6-5 17:16

本帖最后由 katawong 于 2024-6-5 17:19 编辑

AI 的趋势已经无法阻挡了。与其抵制,不如加入。
有对AI感兴趣的同仁,我这里推荐一些快速入门的方法。
前置条件是,您有高等数学、线性代数、概率论及神经网络、编程的知识
(1)哔哩哔哩网站,找一找王树森的《深度强化学习》。听懂了这个系列,人工智能基本入门了。
(2)哔哩哔哩网站,找一找《世界冠军带你从零实践强化学习》,这个系列也不错,但建议只听原理,不要跟着用视频里的代码编程,太老了。
(3)编程框架:强烈建议使用tensorflow2.0以上。与pytorch的区别不大,但tensorflow有量子模块,方便以后设计《量子电路+人工智能》时进行无缝对接。

如要系统的学习,可以找一找李宏毅老师的课,机器学习,深度学习,强化学习,网站上好多。就不一一列举了
以上是个人浅见,仅供有意了解人工智能的同仁参考。

katawong 发表于 2024-6-5 19:28

本帖最后由 katawong 于 2024-6-5 19:32 编辑

以下分享一些训练过程的真实数据。
最后一列数据是AI所得分数。我们的AI就是要尽可能的得分最高。
可以很清楚的看出,有很多方案,已经达到了最高分数1499(人为规定是不超过1500,达到了就中断训练)。
有95%的方案超过了合格线200。


episode: 3001/15000, policy_loss: 0.5889, value_loss: 400.4844, reward_loss: 25.3274, policy_entropy: 0.5674, score: 1499.0000
episode: 3002/15000, policy_loss: 0.5774, value_loss: 450.3242, reward_loss: 30.1226, policy_entropy: 0.5580, score: 886.0000
episode: 3003/15000, policy_loss: 0.5662, value_loss: 366.6266, reward_loss: 20.7778, policy_entropy: 0.6137, score: 198.0000
episode: 3004/15000, policy_loss: 0.6122, value_loss: 367.1500, reward_loss: 23.0490, policy_entropy: 0.5312, score: 213.0000
episode: 3005/15000, policy_loss: 0.6194, value_loss: 389.2270, reward_loss: 24.8775, policy_entropy: 0.5752, score: 163.0000
episode: 3006/15000, policy_loss: 0.5738, value_loss: 388.0405, reward_loss: 23.4865, policy_entropy: 0.5354, score: 198.0000
episode: 3007/15000, policy_loss: 0.5531, value_loss: 422.0123, reward_loss: 25.2088, policy_entropy: 0.5582, score: 382.0000
episode: 3008/15000, policy_loss: 0.5890, value_loss: 483.1084, reward_loss: 34.8594, policy_entropy: 0.5401, score: 306.0000
episode: 3009/15000, policy_loss: 0.6478, value_loss: 382.3741, reward_loss: 21.7543, policy_entropy: 0.5351, score: 560.0000
episode: 3010/15000, policy_loss: 0.6406, value_loss: 379.4818, reward_loss: 24.1996, policy_entropy: 0.5805, score: 333.0000
episode: 3011/15000, policy_loss: 0.5831, value_loss: 403.4811, reward_loss: 24.2433, policy_entropy: 0.5053, score: 1499.0000
episode: 3012/15000, policy_loss: 0.5621, value_loss: 365.0352, reward_loss: 21.5038, policy_entropy: 0.6209, score: 216.0000
episode: 3013/15000, policy_loss: 0.6248, value_loss: 382.5297, reward_loss: 21.2794, policy_entropy: 0.4618, score: 177.0000
episode: 3014/15000, policy_loss: 0.6934, value_loss: 360.6250, reward_loss: 25.6778, policy_entropy: 0.5919, score: 131.0000
episode: 3015/15000, policy_loss: 0.6645, value_loss: 418.4210, reward_loss: 26.0287, policy_entropy: 0.5436, score: 171.0000
episode: 3016/15000, policy_loss: 0.5989, value_loss: 416.5288, reward_loss: 28.6917, policy_entropy: 0.6702, score: 204.0000
episode: 3017/15000, policy_loss: 0.5557, value_loss: 452.5594, reward_loss: 28.1444, policy_entropy: 0.6915, score: 1499.0000
episode: 3018/15000, policy_loss: 0.6013, value_loss: 414.3475, reward_loss: 27.7788, policy_entropy: 0.5458, score: 1499.0000
episode: 3019/15000, policy_loss: 0.6668, value_loss: 358.4592, reward_loss: 21.8769, policy_entropy: 0.4503, score: 504.0000
episode: 3020/15000, policy_loss: 0.6756, value_loss: 418.0124, reward_loss: 27.4253, policy_entropy: 0.4468, score: 89.0000
saving...
********** save weights ************
saved
episode: 3021/15000, policy_loss: 0.6416, value_loss: 405.3523, reward_loss: 22.4845, policy_entropy: 0.6673, score: 192.0000
episode: 3022/15000, policy_loss: 0.5622, value_loss: 484.0711, reward_loss: 30.2956, policy_entropy: 0.5616, score: 785.0000
episode: 3023/15000, policy_loss: 0.6074, value_loss: 453.8478, reward_loss: 28.7955, policy_entropy: 0.5946, score: 246.0000
episode: 3024/15000, policy_loss: 0.7206, value_loss: 381.4115, reward_loss: 19.5020, policy_entropy: 0.4755, score: 172.0000
episode: 3025/15000, policy_loss: 0.7822, value_loss: 359.0367, reward_loss: 22.6848, policy_entropy: 0.5977, score: 166.0000
episode: 3026/15000, policy_loss: 0.6798, value_loss: 390.9240, reward_loss: 21.8696, policy_entropy: 0.4692, score: 203.0000
episode: 3027/15000, policy_loss: 0.5614, value_loss: 426.5884, reward_loss: 25.3384, policy_entropy: 0.6695, score: 866.0000
episode: 3028/15000, policy_loss: 0.6252, value_loss: 400.8597, reward_loss: 23.8455, policy_entropy: 0.5034, score: 1499.0000
episode: 3029/15000, policy_loss: 0.7626, value_loss: 410.4993, reward_loss: 27.9803, policy_entropy: 0.3412, score: 117.0000
episode: 3030/15000, policy_loss: 0.7921, value_loss: 399.9077, reward_loss: 27.2595, policy_entropy: 0.3460, score: 100.0000
episode: 3031/15000, policy_loss: 0.6812, value_loss: 349.7978, reward_loss: 19.1662, policy_entropy: 0.6817, score: 356.0000
episode: 3032/15000, policy_loss: 0.5560, value_loss: 330.2840, reward_loss: 19.4452, policy_entropy: 0.4999, score: 1499.0000
episode: 3033/15000, policy_loss: 0.6104, value_loss: 383.2799, reward_loss: 21.9814, policy_entropy: 0.2742, score: 249.0000
episode: 3034/15000, policy_loss: 0.7216, value_loss: 392.9241, reward_loss: 23.9855, policy_entropy: 0.6846, score: 149.0000
episode: 3035/15000, policy_loss: 0.7650, value_loss: 450.2687, reward_loss: 29.6649, policy_entropy: 0.3273, score: 163.0000
episode: 3036/15000, policy_loss: 0.6531, value_loss: 483.7832, reward_loss: 34.7097, policy_entropy: 0.6098, score: 208.0000
episode: 3037/15000, policy_loss: 0.5452, value_loss: 352.3396, reward_loss: 14.6076, policy_entropy: 0.4677, score: 325.0000
episode: 3038/15000, policy_loss: 0.6130, value_loss: 466.1357, reward_loss: 31.2627, policy_entropy: 0.5313, score: 424.0000
episode: 3039/15000, policy_loss: 0.7076, value_loss: 448.2494, reward_loss: 27.8226, policy_entropy: 0.4738, score: 300.0000
episode: 3040/15000, policy_loss: 0.6485, value_loss: 345.8099, reward_loss: 18.0412, policy_entropy: 0.4459, score: 227.0000
saving...
********** save weights ************
saved
episode: 3041/15000, policy_loss: 0.5850, value_loss: 450.7605, reward_loss: 28.3210, policy_entropy: 0.4083, score: 386.0000
episode: 3042/15000, policy_loss: 0.5506, value_loss: 350.1736, reward_loss: 14.1603, policy_entropy: 0.4408, score: 254.0000
episode: 3043/15000, policy_loss: 0.5669, value_loss: 332.9909, reward_loss: 18.1965, policy_entropy: 0.4495, score: 183.0000
episode: 3044/15000, policy_loss: 0.5711, value_loss: 410.7605, reward_loss: 26.3451, policy_entropy: 0.6474, score: 181.0000
episode: 3045/15000, policy_loss: 0.5700, value_loss: 414.6470, reward_loss: 26.3620, policy_entropy: 0.6480, score: 190.0000
episode: 3046/15000, policy_loss: 0.5711, value_loss: 420.7523, reward_loss: 27.6770, policy_entropy: 0.5627, score: 200.0000
episode: 3047/15000, policy_loss: 0.5593, value_loss: 379.0274, reward_loss: 22.5677, policy_entropy: 0.5471, score: 194.0000
episode: 3048/15000, policy_loss: 0.5592, value_loss: 495.3597, reward_loss: 33.5175, policy_entropy: 0.5880, score: 781.0000
episode: 3049/15000, policy_loss: 0.5574, value_loss: 376.4867, reward_loss: 24.1480, policy_entropy: 0.4933, score: 642.0000
episode: 3050/15000, policy_loss: 0.5639, value_loss: 418.0936, reward_loss: 26.5566, policy_entropy: 0.5535, score: 399.0000
episode: 3051/15000, policy_loss: 0.5527, value_loss: 399.0091, reward_loss: 22.8227, policy_entropy: 0.5625, score: 1047.0000
episode: 3052/15000, policy_loss: 0.5485, value_loss: 397.8051, reward_loss: 26.9163, policy_entropy: 0.4968, score: 353.0000
episode: 3053/15000, policy_loss: 0.5571, value_loss: 383.1962, reward_loss: 22.3722, policy_entropy: 0.5422, score: 405.0000
episode: 3054/15000, policy_loss: 0.5573, value_loss: 440.2970, reward_loss: 30.2657, policy_entropy: 0.4528, score: 733.0000
episode: 3055/15000, policy_loss: 0.5620, value_loss: 370.8466, reward_loss: 18.2645, policy_entropy: 0.5177, score: 1077.0000
episode: 3056/15000, policy_loss: 0.5447, value_loss: 411.5377, reward_loss: 25.6230, policy_entropy: 0.5227, score: 389.0000
episode: 3057/15000, policy_loss: 0.5503, value_loss: 423.1735, reward_loss: 25.9506, policy_entropy: 0.4764, score: 1499.0000
episode: 3058/15000, policy_loss: 0.5659, value_loss: 474.4288, reward_loss: 36.0015, policy_entropy: 0.5636, score: 1499.0000
episode: 3059/15000, policy_loss: 0.5664, value_loss: 444.2757, reward_loss: 30.1305, policy_entropy: 0.5215, score: 1499.0000
episode: 3060/15000, policy_loss: 0.5525, value_loss: 430.8563, reward_loss: 26.8061, policy_entropy: 0.5026, score: 354.0000
saving...
********** save weights ************
saved
episode: 3061/15000, policy_loss: 0.5499, value_loss: 393.0177, reward_loss: 21.4153, policy_entropy: 0.5425, score: 532.0000
episode: 3062/15000, policy_loss: 0.5792, value_loss: 472.7725, reward_loss: 32.7278, policy_entropy: 0.5405, score: 225.0000
episode: 3063/15000, policy_loss: 0.5913, value_loss: 442.6436, reward_loss: 27.9917, policy_entropy: 0.4920, score: 216.0000
episode: 3064/15000, policy_loss: 0.5778, value_loss: 420.1801, reward_loss: 26.8551, policy_entropy: 0.5056, score: 206.0000
episode: 3065/15000, policy_loss: 0.5507, value_loss: 405.6987, reward_loss: 20.2358, policy_entropy: 0.5042, score: 369.0000
episode: 3066/15000, policy_loss: 0.5480, value_loss: 399.6070, reward_loss: 21.9495, policy_entropy: 0.4232, score: 1111.0000
episode: 3067/15000, policy_loss: 0.5800, value_loss: 371.2757, reward_loss: 20.7878, policy_entropy: 0.5029, score: 370.0000
episode: 3068/15000, policy_loss: 0.5948, value_loss: 414.4673, reward_loss: 24.9255, policy_entropy: 0.4848, score: 513.0000
episode: 3069/15000, policy_loss: 0.5856, value_loss: 369.4235, reward_loss: 23.0812, policy_entropy: 0.5035, score: 907.0000
episode: 3070/15000, policy_loss: 0.5464, value_loss: 355.9464, reward_loss: 22.2776, policy_entropy: 0.4904, score: 1499.0000
episode: 3071/15000, policy_loss: 0.5867, value_loss: 396.2905, reward_loss: 27.0339, policy_entropy: 0.5564, score: 260.0000
episode: 3072/15000, policy_loss: 0.6323, value_loss: 363.5394, reward_loss: 21.6746, policy_entropy: 0.3115, score: 232.0000
episode: 3073/15000, policy_loss: 0.5935, value_loss: 387.5013, reward_loss: 19.4194, policy_entropy: 0.3350, score: 782.0000
episode: 3074/15000, policy_loss: 0.5366, value_loss: 293.5663, reward_loss: 14.6844, policy_entropy: 0.5376, score: 1499.0000
episode: 3075/15000, policy_loss: 0.5665, value_loss: 362.8860, reward_loss: 21.9905, policy_entropy: 0.4699, score: 939.0000

NJNTJSB 发表于 2024-6-6 08:48

听起来似乎有道理,关注中

峰哥学MOTOR 发表于 2024-6-6 09:03

AI的到来确实是势不可挡,对于电机设计工程师而言,目前需要做哪些才能迎合AI趋势?学会应用AI相关工具还是要学习AI底层逻辑?该作何准备,特别对于非编程专业的该如何应对。

katawong 发表于 2024-6-6 09:27

本帖最后由 katawong 于 2024-6-6 09:38 编辑

峰哥学MOTOR 发表于 2024-6-6 09:03
AI的到来确实是势不可挡,对于电机设计工程师而言,目前需要做哪些才能迎合AI趋势?学会应用AI相关工具还是 ...

谢谢您的回复。

我个人经验,由于我们都是工科出身,我们的强项在于实际应用。

我们可以做的有:了解AI, 了解其基本原理和算法。我们不去发明新的算法,但我们要了解最新的算法,并考虑与自己的工作相结合。

当我们可以自己编写AI 并用于我们的设计工作时,我们可以观察AI的行为轨迹,并提升自己对专业知识的理解。

举个简单的例子。

我们在利用AI对YE4电机进行设计时,发现转子槽形的设计对 eff, cos, Tmax, Tst ,Ist的影响很大。如果槽形设计不好,这些指标很难同时满足。即使不断的加长铁芯效果也不好。但如果设计得好,其实是很容易满足要求的,

这样,我们做为电机工程师,通过AI, 提高了对电机的理解和认识。

另外,AI 对大多数人来讲其实属于全新的知识。

大家都处于相同的起跑线,这时,就要看谁的学习能力更强了。

本人电机专业毕业,大学时学的计算机语言是fortron,pascal,毕业后,为了自己工作效率高,自学了C, C++,VC++,VB, matlab, python, 神经网络,遗传算法,机器学习,深度学习,量子电路编程,等等,其他还有autoCAD,solidworks,catia, creo, 单片机。都是自学。

然后会思考怎么用于自己的产品设计工作,怎么提高自己的效率,怎么进行二次开发,等等。

当然英文水平要提高。因为最新的技术都是英文书写的。很多软件的帮助文件也是英文的。

现在编程工作非常简单。很多功能网上都有代码。实在不行,提出要求,让chatGPT来写,其他类似的大语言模型还有genimi, claude, ollama3,pi3,通义千问,chatGLM,太多了。








katawong 发表于 2024-6-6 10:55

下面帖图是我本机安装的大语言模型,供大家参考。无法上外网时,可以用本机模型写代码:

katawong 发表于 2024-6-6 10:58

下图是在本机用大语言模型写的一段AI代码。我让大语言模型写一段DQN算法的代码。

katawong 发表于 2024-6-6 11:07

上述代码其实很简单。但如果看不懂怎么办?很好办。再请这个本机大模型给出详细解释:如果能上外网的话,可以请chatGPT写代码,并给出详细说明。也可以把自己看不懂的代码让chatGPT来分析,给出说明。

dazhong 发表于 2024-6-6 11:28

新事物,!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

huanhuangui 发表于 2024-6-6 15:22

大神,请教个问题,AI优化电机方案,也是通过输入电机槽型、尺寸之类的多个参数计算出电机性能、转矩、转速、效率等等,然后通过AI算法得出最优的方案是吧?由电机尺寸参数得到电机性能参数是通过电机路算程序或者有限元计算出来的吧?路算或有限元计算性能是不是计算的快、是不是准,直接影响AI优化

katawong 发表于 2024-6-6 16:25

huanhuangui 发表于 2024-6-6 15:22
大神,请教个问题,AI优化电机方案,也是通过输入电机槽型、尺寸之类的多个参数计算出电机性能、转矩、转速 ...

谢谢回复。
您说得非常对。
我们设计的这个AI,它是一个工程师,它的操作和人一样的。
人是怎么设计电机电磁方案的,它就是怎么设计的。
电机电磁方案的精度,取决于电机设计软件的计算准确度。

AI会和人一样,通过大量的设计,慢慢提高自己的设计水平,并不断成长,在某一时刻,它会超过人类,然后,一骑绝尘,人类不可能再超过它了。

由于现有硬件计算力的问题,现阶段还是用路算法,理论上用有限元法也是可行的。在AI训练原理上没有差别。但是由于算力的问题,用有限元法算一次要花费十几分钟甚至数个小时,这对于大规模训练来说,这个时间花费无法接受。所以现阶段还没有用有限元方法进行训练。

峰哥学MOTOR 发表于 2024-6-7 08:56

katawong 发表于 2024-6-6 09:27
谢谢您的回复。

我个人经验,由于我们都是工科出身,我们的强项在于实际应用。


感谢楼主掏心窝子的认真答复,楼主真的很强

katawong 发表于 2024-6-7 10:24

峰哥学MOTOR 发表于 2024-6-7 08:56
感谢楼主掏心窝子的认真答复,楼主真的很强

谢谢鼓励。

俗话说,一枝独放不是春,万紫千红春满园。

大家共同进步。

katawong 发表于 2024-6-7 15:07

本帖最后由 katawong 于 2024-6-7 15:09 编辑

在第7楼,我帖出了一些实际的数据。在这些数据中,可以看到,AI的得分能力虽然能达到最高值1499,但不太稳定,那么,经过一段时间的训练,它的水平改进的怎么样了呢?下面列出它的最新表现。数据如下:

可见,它的设计水平已经获得了极大的提高。

episode: 1/15000, policy_loss: 0.4161, value_loss: 215.2850, reward_loss: 5.3234, policy_entropy: 0.6239, score: 212.0000
episode: 2/15000, policy_loss: 0.7824, value_loss: 536.4194, reward_loss: 24.4589, policy_entropy: 0.3556, score: 115.0000
episode: 3/15000, policy_loss: 0.4982, value_loss: 777.7183, reward_loss: 12.4306, policy_entropy: 0.4113, score: 157.0000
episode: 4/15000, policy_loss: 0.5999, value_loss: 491.3422, reward_loss: 18.5668, policy_entropy: 0.5290, score: 123.0000
episode: 5/15000, policy_loss: 0.4910, value_loss: 525.6333, reward_loss: 25.0361, policy_entropy: 0.5887, score: 178.0000
episode: 6/15000, policy_loss: 0.6155, value_loss: 566.3329, reward_loss: 15.6904, policy_entropy: 0.5261, score: 965.0000
episode: 7/15000, policy_loss: 0.5150, value_loss: 453.9730, reward_loss: 8.0561, policy_entropy: 0.5793, score: 439.0000
episode: 8/15000, policy_loss: 0.5349, value_loss: 589.5212, reward_loss: 10.4976, policy_entropy: 0.5743, score: 118.0000
episode: 9/15000, policy_loss: 0.5593, value_loss: 538.7057, reward_loss: 8.4509, policy_entropy: 0.6550, score: 148.0000
episode: 10/15000, policy_loss: 0.4949, value_loss: 513.8514, reward_loss: 11.3358, policy_entropy: 0.6725, score: 173.0000
episode: 11/15000, policy_loss: 0.4489, value_loss: 333.5057, reward_loss: 5.8847, policy_entropy: 0.6871, score: 1499.0000
episode: 12/15000, policy_loss: 0.4858, value_loss: 380.6137, reward_loss: 13.5092, policy_entropy: 0.4660, score: 673.0000
episode: 13/15000, policy_loss: 0.4853, value_loss: 326.1999, reward_loss: 10.7604, policy_entropy: 0.4214, score: 1499.0000
episode: 14/15000, policy_loss: 0.4360, value_loss: 251.4935, reward_loss: 8.3748, policy_entropy: 0.4258, score: 1499.0000
episode: 15/15000, policy_loss: 0.4405, value_loss: 266.2397, reward_loss: 6.9854, policy_entropy: 0.5010, score: 250.0000
episode: 16/15000, policy_loss: 0.4612, value_loss: 329.6295, reward_loss: 12.1463, policy_entropy: 0.5029, score: 137.0000
episode: 17/15000, policy_loss: 0.4328, value_loss: 246.7722, reward_loss: 9.5955, policy_entropy: 0.4708, score: 146.0000
episode: 18/15000, policy_loss: 0.4205, value_loss: 363.3236, reward_loss: 17.4401, policy_entropy: 0.4099, score: 221.0000
episode: 19/15000, policy_loss: 0.4038, value_loss: 286.7342, reward_loss: 12.3858, policy_entropy: 0.4490, score: 1499.0000
episode: 20/15000, policy_loss: 0.4126, value_loss: 302.7286, reward_loss: 13.7845, policy_entropy: 0.4567, score: 1499.0000
saving...
********** save weights ************
saved
episode: 21/15000, policy_loss: 0.4221, value_loss: 268.0316, reward_loss: 12.9250, policy_entropy: 0.3954, score: 1499.0000
episode: 22/15000, policy_loss: 0.4103, value_loss: 164.1839, reward_loss: 4.9871, policy_entropy: 0.5047, score: 1499.0000
episode: 23/15000, policy_loss: 0.4098, value_loss: 176.3108, reward_loss: 6.6844, policy_entropy: 0.4693, score: 185.0000
episode: 24/15000, policy_loss: 0.4451, value_loss: 335.0486, reward_loss: 17.4868, policy_entropy: 0.6215, score: 187.0000
episode: 25/15000, policy_loss: 0.4455, value_loss: 317.3674, reward_loss: 16.0549, policy_entropy: 0.5449, score: 177.0000
episode: 26/15000, policy_loss: 0.4370, value_loss: 323.3865, reward_loss: 16.7364, policy_entropy: 0.4715, score: 240.0000
episode: 27/15000, policy_loss: 0.4674, value_loss: 212.3390, reward_loss: 9.8769, policy_entropy: 0.4446, score: 1499.0000
episode: 28/15000, policy_loss: 0.4824, value_loss: 258.5034, reward_loss: 13.2846, policy_entropy: 0.5335, score: 1499.0000
episode: 29/15000, policy_loss: 0.4589, value_loss: 256.1392, reward_loss: 12.9572, policy_entropy: 0.5453, score: 235.0000
episode: 30/15000, policy_loss: 0.4912, value_loss: 281.6791, reward_loss: 15.7824, policy_entropy: 0.5648, score: 172.0000
episode: 31/15000, policy_loss: 0.5186, value_loss: 311.5533, reward_loss: 17.0442, policy_entropy: 0.4889, score: 120.0000
episode: 32/15000, policy_loss: 0.4747, value_loss: 226.9470, reward_loss: 11.5003, policy_entropy: 0.5536, score: 178.0000
episode: 33/15000, policy_loss: 0.4693, value_loss: 282.7120, reward_loss: 16.1569, policy_entropy: 0.5670, score: 1499.0000
episode: 34/15000, policy_loss: 0.5162, value_loss: 245.2859, reward_loss: 11.9941, policy_entropy: 0.4875, score: 1499.0000
episode: 35/15000, policy_loss: 0.5049, value_loss: 228.7045, reward_loss: 13.8588, policy_entropy: 0.5310, score: 1499.0000
episode: 36/15000, policy_loss: 0.4621, value_loss: 168.3383, reward_loss: 8.6423, policy_entropy: 0.5155, score: 1499.0000
episode: 37/15000, policy_loss: 0.4796, value_loss: 207.0338, reward_loss: 10.7040, policy_entropy: 0.6197, score: 137.0000
episode: 38/15000, policy_loss: 0.5167, value_loss: 269.0144, reward_loss: 14.4384, policy_entropy: 0.5073, score: 110.0000
episode: 39/15000, policy_loss: 0.4945, value_loss: 197.1236, reward_loss: 8.9764, policy_entropy: 0.5106, score: 102.0000
episode: 40/15000, policy_loss: 0.4572, value_loss: 227.5951, reward_loss: 11.4231, policy_entropy: 0.5298, score: 170.0000
saving...
********** save weights ************
saved
episode: 41/15000, policy_loss: 0.4880, value_loss: 235.6346, reward_loss: 11.7058, policy_entropy: 0.4567, score: 1499.0000
episode: 42/15000, policy_loss: 0.5100, value_loss: 185.6799, reward_loss: 9.7845, policy_entropy: 0.5248, score: 1499.0000
episode: 43/15000, policy_loss: 0.4811, value_loss: 242.0638, reward_loss: 14.2436, policy_entropy: 0.4731, score: 1499.0000
episode: 44/15000, policy_loss: 0.4640, value_loss: 298.4590, reward_loss: 19.5121, policy_entropy: 0.5439, score: 163.0000
episode: 45/15000, policy_loss: 0.5090, value_loss: 315.2503, reward_loss: 18.7176, policy_entropy: 0.5334, score: 124.0000
episode: 46/15000, policy_loss: 0.4924, value_loss: 181.2119, reward_loss: 9.5249, policy_entropy: 0.5446, score: 138.0000
episode: 47/15000, policy_loss: 0.4690, value_loss: 248.3943, reward_loss: 13.5997, policy_entropy: 0.5809, score: 178.0000
episode: 48/15000, policy_loss: 0.4698, value_loss: 196.4921, reward_loss: 8.8744, policy_entropy: 0.5634, score: 1499.0000
episode: 49/15000, policy_loss: 0.4823, value_loss: 276.5391, reward_loss: 16.1656, policy_entropy: 0.5306, score: 1499.0000
episode: 50/15000, policy_loss: 0.4626, value_loss: 203.7857, reward_loss: 10.1539, policy_entropy: 0.5711, score: 1499.0000
episode: 51/15000, policy_loss: 0.4529, value_loss: 225.3459, reward_loss: 13.7119, policy_entropy: 0.6220, score: 194.0000
episode: 52/15000, policy_loss: 0.4617, value_loss: 279.8626, reward_loss: 18.1389, policy_entropy: 0.4605, score: 151.0000
episode: 53/15000, policy_loss: 0.4510, value_loss: 286.3064, reward_loss: 16.6993, policy_entropy: 0.4454, score: 201.0000
episode: 54/15000, policy_loss: 0.4298, value_loss: 183.6426, reward_loss: 9.8382, policy_entropy: 0.5466, score: 1499.0000
episode: 55/15000, policy_loss: 0.4484, value_loss: 169.9896, reward_loss: 9.6564, policy_entropy: 0.5182, score: 1499.0000
episode: 56/15000, policy_loss: 0.4559, value_loss: 277.7924, reward_loss: 16.5925, policy_entropy: 0.5096, score: 1499.0000
episode: 57/15000, policy_loss: 0.4329, value_loss: 201.3654, reward_loss: 14.3267, policy_entropy: 0.5282, score: 1499.0000
episode: 58/15000, policy_loss: 0.4371, value_loss: 173.4138, reward_loss: 8.8108, policy_entropy: 0.5297, score: 189.0000
episode: 59/15000, policy_loss: 0.4652, value_loss: 191.4496, reward_loss: 9.5502, policy_entropy: 0.4893, score: 170.0000
episode: 60/15000, policy_loss: 0.4681, value_loss: 180.4527, reward_loss: 11.0071, policy_entropy: 0.5424, score: 154.0000
saving...
********** save weights ************
saved
episode: 61/15000, policy_loss: 0.4479, value_loss: 144.0589, reward_loss: 7.3609, policy_entropy: 0.4894, score: 1499.0000
episode: 62/15000, policy_loss: 0.4661, value_loss: 229.0062, reward_loss: 13.8491, policy_entropy: 0.5207, score: 1499.0000
episode: 63/15000, policy_loss: 0.4832, value_loss: 206.1293, reward_loss: 12.6785, policy_entropy: 0.5008, score: 1499.0000
episode: 64/15000, policy_loss: 0.4740, value_loss: 185.5161, reward_loss: 9.8251, policy_entropy: 0.5723, score: 1499.0000
episode: 65/15000, policy_loss: 0.4698, value_loss: 252.1371, reward_loss: 15.6446, policy_entropy: 0.5422, score: 382.0000
episode: 66/15000, policy_loss: 0.4815, value_loss: 144.9785, reward_loss: 9.5499, policy_entropy: 0.5144, score: 229.0000
episode: 67/15000, policy_loss: 0.4814, value_loss: 208.4067, reward_loss: 11.8008, policy_entropy: 0.4968, score: 1499.0000
episode: 68/15000, policy_loss: 0.4669, value_loss: 193.6922, reward_loss: 11.3720, policy_entropy: 0.5751, score: 1499.0000
episode: 69/15000, policy_loss: 0.4610, value_loss: 114.4614, reward_loss: 5.4580, policy_entropy: 0.5195, score: 262.0000
episode: 70/15000, policy_loss: 0.4583, value_loss: 132.0404, reward_loss: 6.2988, policy_entropy: 0.5308, score: 1499.0000
episode: 71/15000, policy_loss: 0.4570, value_loss: 214.1931, reward_loss: 12.5732, policy_entropy: 0.5076, score: 1499.0000
episode: 72/15000, policy_loss: 0.4554, value_loss: 188.7498, reward_loss: 9.9031, policy_entropy: 0.5062, score: 1499.0000
episode: 73/15000, policy_loss: 0.4540, value_loss: 146.6611, reward_loss: 7.9460, policy_entropy: 0.5047, score: 1499.0000
episode: 74/15000, policy_loss: 0.4553, value_loss: 240.3613, reward_loss: 13.0326, policy_entropy: 0.4990, score: 194.0000
episode: 75/15000, policy_loss: 0.4634, value_loss: 223.2887, reward_loss: 14.6363, policy_entropy: 0.5375, score: 160.0000
episode: 76/15000, policy_loss: 0.4532, value_loss: 229.6967, reward_loss: 12.6051, policy_entropy: 0.4485, score: 203.0000
episode: 77/15000, policy_loss: 0.4566, value_loss: 164.9831, reward_loss: 9.1544, policy_entropy: 0.5072, score: 222.0000
episode: 78/15000, policy_loss: 0.4538, value_loss: 223.0203, reward_loss: 13.0986, policy_entropy: 0.5476, score: 234.0000
episode: 79/15000, policy_loss: 0.4399, value_loss: 154.3588, reward_loss: 8.4952, policy_entropy: 0.5343, score: 1499.0000
episode: 80/15000, policy_loss: 0.4416, value_loss: 162.0921, reward_loss: 9.9977, policy_entropy: 0.5164, score: 1499.0000
saving...
********** save weights ************
saved
episode: 81/15000, policy_loss: 0.4396, value_loss: 209.8708, reward_loss: 12.2546, policy_entropy: 0.5011, score: 1499.0000
episode: 82/15000, policy_loss: 0.4384, value_loss: 146.2521, reward_loss: 6.6164, policy_entropy: 0.4671, score: 1499.0000
episode: 83/15000, policy_loss: 0.4351, value_loss: 135.2223, reward_loss: 6.6867, policy_entropy: 0.5104, score: 1499.0000
episode: 84/15000, policy_loss: 0.4361, value_loss: 132.8678, reward_loss: 7.1288, policy_entropy: 0.5776, score: 1499.0000
episode: 85/15000, policy_loss: 0.4281, value_loss: 205.0150, reward_loss: 11.6280, policy_entropy: 0.5348, score: 1499.0000
episode: 86/15000, policy_loss: 0.4405, value_loss: 137.1094, reward_loss: 6.7496, policy_entropy: 0.5450, score: 1499.0000
episode: 87/15000, policy_loss: 0.4248, value_loss: 192.6676, reward_loss: 12.1025, policy_entropy: 0.5006, score: 1499.0000
episode: 88/15000, policy_loss: 0.4192, value_loss: 204.3442, reward_loss: 12.0603, policy_entropy: 0.4659, score: 1499.0000
episode: 89/15000, policy_loss: 0.4206, value_loss: 151.4276, reward_loss: 8.0763, policy_entropy: 0.5052, score: 1499.0000
episode: 90/15000, policy_loss: 0.4099, value_loss: 101.7047, reward_loss: 5.5445, policy_entropy: 0.5272, score: 1499.0000
episode: 91/15000, policy_loss: 0.4040, value_loss: 81.9400, reward_loss: 4.1511, policy_entropy: 0.4570, score: 1499.0000
episode: 92/15000, policy_loss: 0.4072, value_loss: 130.3055, reward_loss: 6.9052, policy_entropy: 0.5051, score: 1499.0000
episode: 93/15000, policy_loss: 0.4011, value_loss: 144.9068, reward_loss: 7.7640, policy_entropy: 0.6181, score: 1499.0000
episode: 94/15000, policy_loss: 0.4216, value_loss: 143.0369, reward_loss: 8.1342, policy_entropy: 0.4576, score: 1499.0000
episode: 95/15000, policy_loss: 0.4272, value_loss: 106.0717, reward_loss: 5.1978, policy_entropy: 0.4924, score: 1499.0000
episode: 96/15000, policy_loss: 0.4083, value_loss: 211.5314, reward_loss: 13.0955, policy_entropy: 0.3815, score: 1499.0000
episode: 97/15000, policy_loss: 0.4120, value_loss: 84.5841, reward_loss: 3.8120, policy_entropy: 0.4969, score: 1499.0000
episode: 98/15000, policy_loss: 0.4336, value_loss: 145.2616, reward_loss: 8.1135, policy_entropy: 0.4316, score: 732.0000
episode: 99/15000, policy_loss: 0.4232, value_loss: 195.7559, reward_loss: 10.8406, policy_entropy: 0.4542, score: 1499.0000
episode: 100/15000, policy_loss: 0.4142, value_loss: 188.2957, reward_loss: 9.8935, policy_entropy: 0.5051, score: 1499.0000
saving...
********** save weights ************
saved
episode: 101/15000, policy_loss: 0.4179, value_loss: 90.9623, reward_loss: 5.7983, policy_entropy: 0.4633, score: 1499.0000
episode: 102/15000, policy_loss: 0.4113, value_loss: 133.3348, reward_loss: 6.9438, policy_entropy: 0.5085, score: 1499.0000
episode: 103/15000, policy_loss: 0.4132, value_loss: 139.1751, reward_loss: 8.5288, policy_entropy: 0.4697, score: 1499.0000
episode: 104/15000, policy_loss: 0.4300, value_loss: 130.9949, reward_loss: 7.1526, policy_entropy: 0.5330, score: 1499.0000
episode: 105/15000, policy_loss: 0.4149, value_loss: 136.3931, reward_loss: 7.4422, policy_entropy: 0.5242, score: 1499.0000
episode: 106/15000, policy_loss: 0.4432, value_loss: 168.4173, reward_loss: 8.4631, policy_entropy: 0.4652, score: 239.0000
episode: 107/15000, policy_loss: 0.4539, value_loss: 162.3065, reward_loss: 9.4388, policy_entropy: 0.6536, score: 1499.0000
episode: 108/15000, policy_loss: 0.4042, value_loss: 109.3871, reward_loss: 4.4598, policy_entropy: 0.5160, score: 1499.0000
episode: 109/15000, policy_loss: 0.4453, value_loss: 137.5493, reward_loss: 7.7495, policy_entropy: 0.4238, score: 1499.0000
episode: 110/15000, policy_loss: 0.4619, value_loss: 151.0615, reward_loss: 7.7821, policy_entropy: 0.4258, score: 1499.0000
episode: 111/15000, policy_loss: 0.4091, value_loss: 158.0819, reward_loss: 8.4372, policy_entropy: 0.4191, score: 1499.0000
episode: 112/15000, policy_loss: 0.4124, value_loss: 135.9255, reward_loss: 6.8434, policy_entropy: 0.4593, score: 1499.0000
episode: 113/15000, policy_loss: 0.4487, value_loss: 205.1536, reward_loss: 12.3879, policy_entropy: 0.4244, score: 225.0000
episode: 114/15000, policy_loss: 0.4136, value_loss: 113.9060, reward_loss: 5.7282, policy_entropy: 0.4505, score: 1499.0000
episode: 115/15000, policy_loss: 0.4007, value_loss: 157.4366, reward_loss: 8.7329, policy_entropy: 0.3948, score: 1499.0000
episode: 116/15000, policy_loss: 0.4028, value_loss: 156.5356, reward_loss: 8.9431, policy_entropy: 0.6158, score: 1499.0000
episode: 117/15000, policy_loss: 0.3941, value_loss: 111.8174, reward_loss: 5.8977, policy_entropy: 0.5187, score: 1499.0000
episode: 118/15000, policy_loss: 0.3956, value_loss: 127.4793, reward_loss: 6.4444, policy_entropy: 0.3781, score: 1499.0000
episode: 119/15000, policy_loss: 0.3852, value_loss: 56.8715, reward_loss: 2.0524, policy_entropy: 0.3738, score: 1499.0000
episode: 120/15000, policy_loss: 0.3904, value_loss: 163.3692, reward_loss: 9.6969, policy_entropy: 0.4368, score: 1499.0000
saving...
********** save weights ************
saved
episode: 121/15000, policy_loss: 0.3907, value_loss: 139.1056, reward_loss: 8.3761, policy_entropy: 0.4708, score: 1499.0000
episode: 122/15000, policy_loss: 0.3772, value_loss: 130.9609, reward_loss: 8.2894, policy_entropy: 0.4824, score: 1499.0000
episode: 123/15000, policy_loss: 0.3987, value_loss: 161.7403, reward_loss: 10.1152, policy_entropy: 0.5091, score: 1499.0000
episode: 124/15000, policy_loss: 0.3969, value_loss: 124.7367, reward_loss: 7.3211, policy_entropy: 0.4296, score: 1499.0000

katawong 发表于 2024-6-24 22:18

本帖最后由 katawong 于 2024-6-25 08:07 编辑

关于状态,下一状态,动作,奖励,累积回报,动作-状态对,动作-状态价值,状态价值

在AI中,这些都是基本的概念。弄清这些概念,对理解编程思路非常重要。

在此,结合电机设计,我来个抛砖引玉。有错误的地方,还请同仁们不吝指出。

状态。以state表示,是电机当前参数的集合。比如,对于一台异步电动机,可以以下性能参数作为一个状态列表:state=[效率,功率因数,启动电流倍数,启动转矩倍数,最大转矩倍数]
下一状态,以next_state表示,是当采取一个动作后,比如,改变了电机铁芯长,槽形,匝数后,状态会改变,next_state=[效率,功率因数,启动电流倍数,启动转矩倍数,最大转矩倍数]
动作,以action表示,是一个参数的集合,表示将对哪些参数进行调整,对于离散动作,可以取【-1,0,0】,分别表示减少,不变,增加,对于连续动作,取值为【-1,1】之前的值,比如取值为0.5,表示该值增加50%, 取值为-0.2,表示该值减少20%。比如,对于槽形尺寸,每槽导体数,铁芯长,可以定义如下:action=,当取值为【0.01,0.02,0.01,-0.01,0.01,0.01,-1,-0.02】,表示H0增加1%,Ns减少一匝,Lfe减少2%。
奖励,以reward表示,这个值需要工程师自行设计。比如说,我们设置5个分奖励如下:效率奖励,功率因数奖励,启动电流位数奖励,启动转矩倍数奖励,最大转矩倍数奖励,以上每个奖励,满足要求时为0,不满足要求时为-100(具体值是多少可以自行设定)
我们还可以设置一个成本奖励,它的值是成本的负值,比如,当前方案成本是9900,那么它的奖励就是-9900.
我们现在设置总奖励为以上6个分奖励的和。很显然,如果所有技术要求都满足,则总奖励值为-9900.如果总奖励值为-10000,则必定有一项技术要求不满足,
累积回报,以return表示,它代表从现在开始,对以后的奖励的期望。这个值在马尔科夫等式中有重要作用,也是保证方案收敛的重要条件。
动作-状态对,以(S, a)表示,代表在当前状态下,采取动作actor。
动作-状态价值,,以Q(s,a)表示,它是累积回报的期望值,代表在当前状态下,采取动作a,能够得到多少累积回报。这个值表示在当前状态下,动作a的好坏。
状态价值,以V(s)表示,它是对Q(s,a)的期望值,表示当前状态的好坏,也就是说,状态价值可以判断现在的设计方案是好是坏,可以得多少分,这个分值,由V(s)给出。



页: [1] 2
查看完整版本: 关于电机设计平台的一些技术分享