In this visualization, we train a neural network to learn the arctangent function. The network has sine activation. More specifically, the network is of the form discriminator=Chain(Dense(1,m,sin),Dense(m,m,sin),Dense(m,m,sin),Dense(m,1)) where m=40.
For the visualization, we show the graph of y=N(x) during the course of training.
The neural network N is trained to minimize the expected value of (N(x)-atan(x))^2 using gradient descent where x is uniformly randomly selected from the interval colored blue in the visualization. The blue interval increases in length whenever the network N is performing well, but it decreases in length otherwise.
In the visualization, we see that at first the neural network learns the atan function surprisingly well, but the training suddenly fails and does not fully recover.
The notion of a neural network is not my own.
There are several differences between this network and the neural networks that one would encounter in practice for real-world problems. For example, this network maps the real numbers to the real numbers which is not common among neural networks, and this network has a periodic activation function which means that it computes an almost periodic function. But there are a few things that we can take away from this visualization. While the sine function is not a commonly occurring activation function, we can see that it may have a couple of advantages. A sine layer easily maps long intervals on the real number line into a high dimensional space, and this embedding allows the subsequent layers in the neural network to interpret the real numbers more easily. For example, in the original paper on transformers, the authors hand constructed a embedding that maps scalars into a higher dimensional space using the sine/cosine functions.
We observe that throughout the training, the network is nearly symmetrical but not completely symmetrical. This is because the biases were initialized to zero and the network is solving a symmetric problem, so the network finds it easier to keep the biases near zero than to learn atan(x) for x negative in a different way than its learns atan(x) for x positive.
Unless otherwise stated, all algorithms featured on this channel are my own. You can go to [ Ссылка ] to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.
Ещё видео!