此條目可参照外語維基百科相應條目来扩充。 (2017年2月4日)若您熟悉来源语言和主题,请协助参考外语维基百科扩充条目。请勿直接提交机械翻译,也不要翻译不可靠、低品质内容。依版权协议,译文需在编辑摘要注明来源,或于讨论页顶部标记{{Translated page}}标签。
此條目目前正依照其他维基百科上的内容进行翻译。 (2018年6月14日)如果您擅长翻译,並清楚本條目的領域,欢迎协助翻譯、改善或校对本條目。此外,长期闲置、未翻譯或影響閱讀的内容可能会被移除。
在计算网络中, 一个节点的激活函数定义了该节点在给定的输入或输入的集合下的输出。标准的计算机芯片电路可以看作是根据输入得到开(1)或关(0)输出的數位電路激活函数。这与神经网络中的线性感知机的行为类似。然而,只有非線性激活函数才允許這種網絡僅使用少量節點來計算非平凡問題。 在人工神經網絡中,這個功能也被稱為傳遞函數。
单变量输入激活函數[编辑]
名稱
函數圖形
方程式
導數
區間
连续性[1]
單調
一阶导数单调
原点近似恒等
恆等函數
f
(
x
)
=
x
{\displaystyle f(x)=x}
f
′
(
x
)
=
1
{\displaystyle f'(x)=1}
(
−
∞
,
∞
)
{\displaystyle (-\infty ,\infty )}
C
∞
{\displaystyle C^{\infty }}
是
是
是
單位階躍函數
f
(
x
)
=
{
0
for
x
<
0
1
for
x
≥
0
{\displaystyle f(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}}
f
′
(
x
)
=
{
0
for
x
≠
0
不 存 在
for
x
=
0
{\displaystyle f'(x)={\begin{cases}0&{\text{for }}x\neq 0\\{\text{不 存 在}}&{\text{for }}x=0\end{cases}}}
{
0
,
1
}
{\displaystyle \{0,1\}}
C
−
1
{\displaystyle C^{-1}}
是
否
否
邏輯函數 (S函數的一种)
f
(
x
)
=
σ
(
x
)
=
1
1
+
e
−
x
{\displaystyle f(x)=\sigma (x)={\frac {1}{1+e^{-x}}}}
[2]
f
′
(
x
)
=
f
(
x
)
(
1
−
f
(
x
)
)
{\displaystyle f'(x)=f(x)(1-f(x))}
(
0
,
1
)
{\displaystyle (0,1)}
C
∞
{\displaystyle C^{\infty }}
是
否
否
雙曲正切函數
f
(
x
)
=
tanh
(
x
)
=
(
e
x
−
e
−
x
)
(
e
x
+
e
−
x
)
{\displaystyle f(x)=\tanh(x)={\frac {(e^{x}-e^{-x})}{(e^{x}+e^{-x})}}}
f
′
(
x
)
=
1
−
f
(
x
)
2
{\displaystyle f'(x)=1-f(x)^{2}}
(
−
1
,
1
)
{\displaystyle (-1,1)}
C
∞
{\displaystyle C^{\infty }}
是
否
是
反正切函數
f
(
x
)
=
tan
−
1
(
x
)
{\displaystyle f(x)=\tan ^{-1}(x)}
f
′
(
x
)
=
1
x
2
+
1
{\displaystyle f'(x)={\frac {1}{x^{2}+1}}}
(
−
π
2
,
π
2
)
{\displaystyle \left(-{\frac {\pi }{2}},{\frac {\pi }{2}}\right)}
C
∞
{\displaystyle C^{\infty }}
是
否
是
Softsign 函數[1][2]
f
(
x
)
=
x
1
+
|
x
|
{\displaystyle f(x)={\frac {x}{1+|x|}}}
f
′
(
x
)
=
1
(
1
+
|
x
|
)
2
{\displaystyle f'(x)={\frac {1}{(1+|x|)^{2}}}}
(
−
1
,
1
)
{\displaystyle (-1,1)}
C
1
{\displaystyle C^{1}}
是
否
是
反平方根函數 (ISRU)[3]
f
(
x
)
=
x
1
+
α
x
2
{\displaystyle f(x)={\frac {x}{\sqrt {1+\alpha x^{2}}}}}
f
′
(
x
)
=
(
1
1
+
α
x
2
)
3
{\displaystyle f'(x)=\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}}
(
−
1
α
,
1
α
)
{\displaystyle \left(-{\frac {1}{\sqrt {\alpha }}},{\frac {1}{\sqrt {\alpha }}}\right)}
C
∞
{\displaystyle C^{\infty }}
是
否
是
線性整流函數 (ReLU)
f
(
x
)
=
{
0
for
x
<
0
x
for
x
≥
0
{\displaystyle f(x)={\begin{cases}0&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}}
f
′
(
x
)
=
{
0
for
x
<
0
1
for
x
≥
0
{\displaystyle f'(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}}
[
0
,
∞
)
{\displaystyle [0,\infty )}
C
0
{\displaystyle C^{0}}
是
是
否
帶泄露線性整流函數 (Leaky ReLU)
f
(
x
)
=
{
0.01
x
for
x
<
0
x
for
x
≥
0
{\displaystyle f(x)={\begin{cases}0.01x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}}
f
′
(
x
)
=
{
0.01
for
x
<
0
1
for
x
≥
0
{\displaystyle f'(x)={\begin{cases}0.01&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}}
(
−
∞
,
∞
)
{\displaystyle (-\infty ,\infty )}
C
0
{\displaystyle C^{0}}
是
是
否
參數化線性整流函數 (PReLU)[4]
f
(
α
,
x
)
=
{
α
x
for
x
<
0
x
for
x
≥
0
{\displaystyle f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}}
f
′
(
α
,
x
)
=
{
α
for
x
<
0
1
for
x
≥
0
{\displaystyle f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}}
(
−
∞
,
∞
)
{\displaystyle (-\infty ,\infty )}
C
0
{\displaystyle C^{0}}
Yes iff
α
≥
0
{\displaystyle \alpha \geq 0}
是
Yes iff
α
=
1
{\displaystyle \alpha =1}
帶泄露隨機線性整流函數 (RReLU)[5]
f
(
α
,
x
)
=
{
α
x
for
x
<
0
x
for
x
≥
0
{\displaystyle f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}}
[3]
f
′
(
α
,
x
)
=
{
α
for
x
<
0
1
for
x
≥
0
{\displaystyle f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}}
(
−
∞
,
∞
)
{\displaystyle (-\infty ,\infty )}
C
0
{\displaystyle C^{0}}
是
是
否
指數線性函數 (ELU)[6]
f
(
α
,
x
)
=
{
α
(
e
x
−
1
)
for
x
<
0
x
for
x
≥
0
{\displaystyle f(\alpha ,x)={\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}}
f
′
(
α
,
x
)
=
{
f
(
α
,
x
)
+
α
for
x
<
0
1
for
x
≥
0
{\displaystyle f'(\alpha ,x)={\begin{cases}f(\alpha ,x)+\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}}
(
−
α
,
∞
)
{\displaystyle (-\alpha ,\infty )}
{
C
1
when
α
=
1
C
0
otherwise
{\displaystyle {\begin{cases}C_{1}&{\text{when }}\alpha =1\\C_{0}&{\text{otherwise }}\end{cases}}}
Yes iff
α
≥
0
{\displaystyle \alpha \geq 0}
Yes iff
0
≤
α
≤
1
{\displaystyle 0\leq \alpha \leq 1}
Yes iff
α
=
1
{\displaystyle \alpha =1}
擴展指數線性函數 (SELU)[7]
f
(
α
,
x
)
=
λ
{
α
(
e
x
−
1
)
for
x
<
0
x
for
x
≥
0
{\displaystyle f(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}}
with
λ
=
1.0507
{\displaystyle \lambda =1.0507}
and
α
=
1.67326
{\displaystyle \alpha =1.67326}
f
′
(
α
,
x
)
=
λ
{
α
(
e
x
)
for
x
<
0
1
for
x
≥
0
{\displaystyle f'(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x})&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}}
(
−
λ
α
,
∞
)
{\displaystyle (-\lambda \alpha ,\infty )}
C
0
{\displaystyle C^{0}}
是
否
否
S 型線性整流激活函數 (SReLU)[8]
f
t
l
,
a
l
,
t
r
,
a
r
(
x
)
=
{
t
l
+
a
l
(
x
−
t
l
)
for
x
≤
t
l
x
for
t
l
<
x
<
t
r
t
r
+
a
r
(
x
−
t
r
)
for
x
≥
t
r
{\displaystyle f_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}t_{l}+a_{l}(x-t_{l})&{\text{for }}x\leq t_{l}\\x&{\text{for }}t_{l} t l , a l , t r , a r {\displaystyle t_{l},a_{l},t_{r},a_{r}} are parameters. f t l , a l , t r , a r ′ ( x ) = { a l for x ≤ t l 1 for t l < x < t r a r for x ≥ t r {\displaystyle f'_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}a_{l}&{\text{for }}x\leq t_{l}\\1&{\text{for }}t_{l} ( − ∞ , ∞ ) {\displaystyle (-\infty ,\infty )} C 0 {\displaystyle C^{0}} 否 否 否 反平方根線性函數 (ISRLU)[3] f ( x ) = { x 1 + α x 2 for x < 0 x for x ≥ 0 {\displaystyle f(x)={\begin{cases}{\frac {x}{\sqrt {1+\alpha x^{2}}}}&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}} f ′ ( x ) = { ( 1 1 + α x 2 ) 3 for x < 0 1 for x ≥ 0 {\displaystyle f'(x)={\begin{cases}\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}} ( − 1 α , ∞ ) {\displaystyle \left(-{\frac {1}{\sqrt {\alpha }}},\infty \right)} C 2 {\displaystyle C^{2}} 是 是 是 自適應分段線性函數 (APL)[9] f ( x ) = max ( 0 , x ) + ∑ s = 1 S a i s max ( 0 , − x + b i s ) {\displaystyle f(x)=\max(0,x)+\sum _{s=1}^{S}a_{i}^{s}\max(0,-x+b_{i}^{s})} f ′ ( x ) = H ( x ) − ∑ s = 1 S a i s H ( − x + b i s ) {\displaystyle f'(x)=H(x)-\sum _{s=1}^{S}a_{i}^{s}H(-x+b_{i}^{s})} [4] ( − ∞ , ∞ ) {\displaystyle (-\infty ,\infty )} C 0 {\displaystyle C^{0}} 否 否 否 SoftPlus 函數[10] f ( x ) = ln ( 1 + e x ) {\displaystyle f(x)=\ln(1+e^{x})} f ′ ( x ) = 1 1 + e − x {\displaystyle f'(x)={\frac {1}{1+e^{-x}}}} ( 0 , ∞ ) {\displaystyle (0,\infty )} C ∞ {\displaystyle C^{\infty }} 是 是 否 彎曲恆等函數 f ( x ) = x 2 + 1 − 1 2 + x {\displaystyle f(x)={\frac {{\sqrt {x^{2}+1}}-1}{2}}+x} f ′ ( x ) = x 2 x 2 + 1 + 1 {\displaystyle f'(x)={\frac {x}{2{\sqrt {x^{2}+1}}}}+1} ( − ∞ , ∞ ) {\displaystyle (-\infty ,\infty )} C ∞ {\displaystyle C^{\infty }} 是 是 是 S 型线性加权函数 (SiLU)[11] (也被稱為Swish[12]) f ( x ) = x ⋅ σ ( x ) {\displaystyle f(x)=x\cdot \sigma (x)} [5] f ′ ( x ) = f ( x ) + σ ( x ) ( 1 − f ( x ) ) {\displaystyle f'(x)=f(x)+\sigma (x)(1-f(x))} [6] [ ≈ − 0.28 , ∞ ) {\displaystyle [\approx -0.28,\infty )} C ∞ {\displaystyle C^{\infty }} 否 否 否 软指数函數[13] f ( α , x ) = { − ln ( 1 − α ( x + α ) ) α for α < 0 x for α = 0 e α x − 1 α + α for α > 0 {\displaystyle f(\alpha ,x)={\begin{cases}-{\frac {\ln(1-\alpha (x+\alpha ))}{\alpha }}&{\text{for }}\alpha <0\\x&{\text{for }}\alpha =0\\{\frac {e^{\alpha x}-1}{\alpha }}+\alpha &{\text{for }}\alpha >0\end{cases}}} f ′ ( α , x ) = { 1 1 − α ( α + x ) for α < 0 e α x for α ≥ 0 {\displaystyle f'(\alpha ,x)={\begin{cases}{\frac {1}{1-\alpha (\alpha +x)}}&{\text{for }}\alpha <0\\e^{\alpha x}&{\text{for }}\alpha \geq 0\end{cases}}} ( − ∞ , ∞ ) {\displaystyle (-\infty ,\infty )} C ∞ {\displaystyle C^{\infty }} 是 是 Yes iff α = 0 {\displaystyle \alpha =0} 正弦函數 f ( x ) = sin ( x ) {\displaystyle f(x)=\sin(x)} f ′ ( x ) = cos ( x ) {\displaystyle f'(x)=\cos(x)} [ − 1 , 1 ] {\displaystyle [-1,1]} C ∞ {\displaystyle C^{\infty }} 否 否 是 Sinc 函數 f ( x ) = { 1 for x = 0 sin ( x ) x for x ≠ 0 {\displaystyle f(x)={\begin{cases}1&{\text{for }}x=0\\{\frac {\sin(x)}{x}}&{\text{for }}x\neq 0\end{cases}}} f ′ ( x ) = { 0 for x = 0 cos ( x ) x − sin ( x ) x 2 for x ≠ 0 {\displaystyle f'(x)={\begin{cases}0&{\text{for }}x=0\\{\frac {\cos(x)}{x}}-{\frac {\sin(x)}{x^{2}}}&{\text{for }}x\neq 0\end{cases}}} [ ≈ − 0.217234 , 1 ] {\displaystyle [\approx -0.217234,1]} C ∞ {\displaystyle C^{\infty }} 否 否 否 高斯函數 f ( x ) = e − x 2 {\displaystyle f(x)=e^{-x^{2}}} f ′ ( x ) = − 2 x e − x 2 {\displaystyle f'(x)=-2xe^{-x^{2}}} ( 0 , 1 ] {\displaystyle (0,1]} C ∞ {\displaystyle C^{\infty }} 否 否 否 说明 ^ 若一函数是连续的,则称其为 C 0 {\displaystyle C^{0}} 函数;若一函数 n {\displaystyle n} 阶可导,并且其 n {\displaystyle n} 阶导函数连续,则为 C n {\displaystyle C^{n}} 函数( n ≥ 1 {\displaystyle n\geq 1} );若一函数对于所有 n {\displaystyle n} 都属于 C n {\displaystyle C^{n}} 函数,则称其为 C ∞ {\displaystyle C^{\infty }} 函数,也称光滑函数。 ^ 此處H是單位階躍函數。 ^ α是在訓練時間從均勻分佈中抽取的隨機變量,並且在測試時間固定為分佈的期望值。 ^ ^ ^ 此處 σ {\displaystyle \sigma } 是邏輯函數。 多变量输入激活函数[编辑] 名稱 方程式 導數 區間 光滑性 Softmax函數 f i ( x → ) = e x i ∑ j = 1 J e x j {\displaystyle f_{i}({\vec {x}})={\frac {e^{x_{i}}}{\sum _{j=1}^{J}e^{x_{j}}}}} for i = 1, …, J ∂ f i ( x → ) ∂ x j = f i ( x → ) ( δ i j − f j ( x → ) ) {\displaystyle {\frac {\partial f_{i}({\vec {x}})}{\partial x_{j}}}=f_{i}({\vec {x}})(\delta _{ij}-f_{j}({\vec {x}}))} [7] ( 0 , 1 ) {\displaystyle (0,1)} C ∞ {\displaystyle C^{\infty }} Maxout函數[14] f ( x → ) = max i x i {\displaystyle f({\vec {x}})=\max _{i}x_{i}} ∂ f ∂ x j = { 1 for j = argmax i x i 0 for j ≠ argmax i x i {\displaystyle {\frac {\partial f}{\partial x_{j}}}={\begin{cases}1&{\text{for }}j={\underset {i}{\operatorname {argmax} }}\,x_{i}\\0&{\text{for }}j\neq {\underset {i}{\operatorname {argmax} }}\,x_{i}\end{cases}}} ( − ∞ , ∞ ) {\displaystyle (-\infty ,\infty )} C 0 {\displaystyle C^{0}} 说明 ^ 此處δ是克羅內克δ函數。 參見[编辑] 邏輯函數 線性整流函數 Softmax函數 人工神經網路 深度學習 參考資料[编辑] ^ Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d’Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. (原始内容存档于2018-09-25). ^ Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, (原始内容存档 (PDF)于2017-04-01) ^ 3.0 3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967 [cs.LG]. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852 [cs.CV]. ^ Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853 [cs.LG]. ^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289 [cs.LG]. ^ Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515 [cs.LG]. ^ Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030 [cs.CV]. ^ Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830 [cs.NE]. ^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. (原始内容存档 (PDF)于2018-06-19). ^ Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. (原始内容存档于2018-06-13). ^ Searching for Activation Functions. [2018-06-13]. (原始内容存档于2018-06-13). ^ Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321 . ^ Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389 . 查论编可微分计算概论 可微分编程 自動微分 张量微积分 信息几何 统计流形 神经形态工程(英语:Neuromorphic engineering) 模式识别 运算学习理论(英语:Computational learning theory) 归纳偏置 概念 梯度下降 SGD(英语:Stochastic gradient descent) 聚类 回归 过拟合 幻觉 对抗(英语:Adversarial machine learning) 注意力 卷积 損失函數 反向传播 激活函数 softmax sigmoid ReLU 正则化 数据集 扩散(英语:Diffusion process) 自回归 应用 机器学习 人工神经网络 深度学习 科学计算 人工智能 語言模型 大型语言模型 硬件 TPU VPU IPU(英语:Graphcore) 憶阻器 SpiNNaker(英语:SpiNNaker) 软件库 Theano TensorFlow Keras PyTorch Caffe JAX MindSpore(英语:MindSpore) Flux.jl(英语:Flux (machine-learning framework)) 主题 计算机编程 技术 分类 人工神经网络 机器学习