Once in a while, we write articles intended to help programmers do their work more effectively. Nowadays more and more programmers do not really write code. Instead, they configure neural networks and hope for the best. Since I have several years of experience working with such networks and managing such programmers, I decided to share some thoughts. Hope this will help you.
The rise of the AI
I will not go into the history of the first neural networks, the accelerated development of GPUs, and the revolution generated by AlexNet. This is something you either already know or do not care about. The important issue is the abundance of jobs and projects dealing with various aspects of deep neural networks. The neural networks outperform any other technology we have, and yet tend to be somewhat unpredictable and mindboggling.
Very few people actually write the neural networks from scratch, but adapting a network to specific scenarios is a complex task. We need to collect and label tons of data, simulate many situations with data augmentation or adversarial networks, implement and analyze countless ideas. If something goes wrong, the debugger usually will not help and we need to use our wits and other resources to understand what happened. Then we need to fine-tune our networks and weights, removing unnecessary layers, optimizing the performance, retraining selected layers. If any of this sounds strange, you probably need an online course and some hands-on training. Being a scientist or a programmer is a good starting point, but neural networks require a lot of dedicated training and experience.
Why programming is different
This began from a couple of programmers asking: wait a minute, how can your course help me in my job? Most knowledge workers need to read and write texts: protocols, descriptions, latest discoveries and so on. Programmers usually do not write documents in the conventional sense but write code which has its own structure and logic. The development environment takes care of many aspects for us: the code gets colored for easy visualization, the class names serve as keywords and meaningful variable and function names mean we do not have to visualize too much. So we use our skills creatively to achieve additional goals. The focus is less on memorizing thins, but more on visualizing the logical connections and navigating complex structures. With neural networks we do not have long codes and super-useful development environments, so we need to use the same skills as everybody else, and this can be confusing.
Skimming and scanning
Most people use speedreading to get the latest industry news and to organize the information. Similar skills allow us to navigate in our environment and evaluate the situation very fast. When working with neural networks this is what we do. First, we read a lot of related researches to find some useful ideas. Then we analyze the databases we have to find patterns that will help us to train the neural network. While most reading requires very high memorization, working with neural network we optimize skimming and scanning skills. We analyze literature looking for a new idea. If there is nothing new, there is no reason to remember it. Then we scan inputs and outputs looking for patterns. If there is no pattern, there is no reason to remember it. In a way, this is just the opposite of what we usually teach. We teach memory first and then build reading skills that are adequate for our memory. Here we use visual analysis first and hope that the memory and understanding will catch along.
Trying new things
Systematic creativity is a large part of the work with neural networks. We generate ideas, try them and see what works. This is not a blind trial and failure. If we do not have good intuition and a plan, we will be losing time. We should understand why the network behaves the way it does, and we should also understand how other people improve their networks. Then we are able to apply the same ideas to our networks. Moreover, we usually start with the data and then if the network does not learn, we try to find what is blocking our data from training the network, or why the network chooses to overfit a specific situation. The creativity and visualization exercises really help, since we can visualize in color each level of the network in our head and understand how it behaved with each input. Furthermore, we can systematically list various attributes of our network and address each attribute in a particular situation. Eventually, we develop an intuitive understanding of the network we work with, and when we try new things we choose the things most likely to succeed.
Crossing tech chasms
Neural networks experience an explosion of technological tools and approaches. About half of the time we need to take things from one technology and reapply them in some other technology. To do that we need to know both technologies good enough to apply them and to cross the chasm between them. Here we need much more than reading very fast. We need to actually understand and remember very well the differences, as well as try each of the relevant technologies hands-on. This is both the classical learning skill and the classical programming skill of actually writing and modifying the relevant codes, integrating various systems and configuring the relevant software. While this is not the most common task for people working with neural networks, this is the task were the traditional knowledge and experience shines. Younger people tend to be very good with other skills, and often fail to adapt technologies between various programming implementations.
Optimizing the computational aspects of the network, require a different sort of programming skills. For example, we may need to port the network from PyTorch to Cafe, from Cafe to TensorRT, while at the same time deploying the network on small computers for development, large cloud solutions for training, and embedded devices for inference. Clearly, this requires a lot of technical knowledge and understanding the limitations of each platform.
Creating new solutions
Very few people actually create entirely new neural networks or processing architecture optimized for neural networks. Many people try, but very few actually succeed. Typically creation of entirely new neural networks is a team effort, where each team member knows his job extremely well. Some people are responsible for collecting and organizing huge amounts of data. Others improve the representation of the data. Very few people actually modify the network to represent the data in the most appropriate form. And then there are people who post-process the network outputs and feedback the information. Yet more people are responsible for computational aspects of training and inference. For the network to be great, all of the people involved in its construction must do a great job. In most modern networks, building an entirely new network is a job just too big for one person. Moreover, an entirely new network may result from ideas of every person involved in the network construction, not just the person responsible for the specific network architecture. For the teamwork to work, each member of the team should not only understand his job but to some extent understand the jobs of other team members.
Competitive advantage
To have a competitive advantage in training neural networks, we need to combine many qualities. We need to be systematically creative to find new ideas. Our skimming and scanning should be blazingly fast. The experience we have as programmers should help us cross technical chasms and optimize the computational aspects of the networks. YOung people are often excellent trying new ideas, while more experienced mentors know how to integrate full solutions. Either way, everybody involved in the neural network training should be well educated and trained, and should be willing to learn constantly as the technology changes.