FACEBOOK’S COMPUTER VISION SYSTEM SUPERVISES ITS OWN LEARNING PROCESS
The extent to which AI systems have made life easy in possibly every field needs no special mention. Healthcare, defence, transportation, or any sector for that matter – you name it and you know how positive the impact has been. Be it assisting the doctors while surgery is performed, controlling the traffic, assisting you at restaurants, teaching online or even getting done with your daily chores, AI has got you covered. However, a point here to note is that no matter how much AI promises to automate, human involvement cannot be eliminated totally. Simply put, AI has no meaning unless humans are involved. Ultimately, it is human intelligence that forms the base of machine intelligence.
No doubt, the invention of self-supervised learning (SSL) methods has transformed natural language processing (NLP) works. This is because these methods inculcate much-needed common sense into AI. This makes it easy to bring in additional features.
And for the first time ever, a company has been successful in applying these self-supervised learning methods to computer vision training. Facebook’s AI research division (FAIR) has come up with this remarkable innovation.
Facebook AI researchers wrote in a blog that they have developed SEER (SElf-supERvised). This is one-of-a-kind billion-parameter self-supervised computer vision model. This model has all the potential to learn from any random group of images on the internet. Well, there’s more to it. All of this is done without the need for organizing and labelling that goes into most computer vision training.
One of the key challenges faced when the system is shown two different images, it produces different vectors in addition to different ‘embedding’. The very traditional way to do this is to randomly pick millions of pairs of images that you know are different and run them through the network. Needless to say, these are resource extensive and also time-consuming.
When thinking of applying the same SSL techniques used in NLP to computer vision, there are some additional challenges to deal with. Semantic language concepts can be easily broken into words and discrete phrases. But, when it comes to images, the case isn’t the same. The algorithm has to decide which pixel belongs to which concept. Also, the fact that the same concept varies greatly between images adds to the complexity.
For the method to yield fruitful results, an algorithm that’s flexible enough to learn from large numbers of unannotated images as well as a convoluted network which is capable of sorting through the algorithmically generated data is the need of the hour. Facebook’s way of dealing with it goes like this – it uses online clustering to group the images rapidly with similar visual concepts and leverage their similarities. Talking about convoluted network, it could be found in RegNets, a convoluted network that can apply tons of parameters to a training model while optimizing its function depending on the available computing resources.
Facebook’s new system has shown impressive results. SEER has been capable enough of outperforming other networks.
The results obtained are satisfying enough to be able to apply unsupervised learning methods to computer vision applications. This paves the way for flexibility and Facebook along with other social media platforms could be better equipped to deal with banned content.