The era of big data presents an important challenge for data analysis, processing, and learning, as information become available in such a volume. In this context, images, video, audio, text, and other metadata are challenging to explore and analyze. To take advantage of such data, machine learning and deep learning approaches, in particular, are developed to analyze either independent modalities or the whole multimedia spectrum. Such approaches become more and more feasible, transitioning from hypotheses to tangible systems and applications in many fields such as computer vision, economics and medical. In this context, this thesis presents a literature review that addresses the methods that underpin deep learning implementations, covering the basic elements of popular neural networks, and goes further in details with advanced regularization and optimization techniques for building, implementing, and configuring neural networks models. Furthermore, it presents contributions to niche fields. It starts with methods related to the medical domain involving tuberculosis types detection, multi-drug resistance prediction, and lipreading, followed by methods related to surveillance involving person re-identification and person search, methods related to machine learning involving ensemble learning for predicting various configurations, from binary classification and regression to multi-label classification achieving state-of-the-art results in the literature, methods related to fintech involving stock trend classification, and a series of common evaluation frameworks that are currently the reference of the literature regarding interestingness prediction, violence detection and face identification alongside insights, observations and recommendations concerning the development of systems to predict the relevant representations for each of the datasets.
This work was disseminated in 5 book chapters, 3 journals and 14 conference papers. An abstract of the work can be found [here].