The Data-Driven Artificial Intelligence (D2AI) research area is devoted to handling Big Data, using advanced data science and machine learning, to tackle a variety of grand challenges in artificial intelligence.
Specific research strands include:
- Advanced machine learning: to develop new methods and techniques for the creation of models based on both big and small data, such us multi-task learning, transfer learning, and reinforcement learning;
- Machine learning in embedded systems: development of new methodologies to miniaturize and accelerate machine learning algorithms, to enable these in mobile devices, considering scenarios such as intelligent Internet of Things and smart sensing;
- Computer vision (-> Visual Computing and Learning Lab): development of deep learning models to address complex computer vision problems, such as activity recognition in video, anomaly detection in images and volumetric data, and the development of algorithms for the semi-automatic creation of artificial neural networks for the extraction of information from images and video;
- Databases and analytics (-> Database Systems Group): development of new techniques for managing, processing, and analysing multi-dimensional and temporal data, considering complex scenarios such as smart systems, industrial IoT, and predictive maintenance.
In the context of complex/smart systems, ML models are more robust and adaptive than physical models
- Building AI models based on data, and use them for optimization, control, and decision-making
- Using big data, data science and machine learning to ´gain insights´ about the system
- Using ML to enhance, enrich, and transform data, so as to more effectively train algorithms and models
Machine intelligence must start at the data source, and then integrate with cloud intelligence
- Data science and ML are peculiar in the context of Intelligent-IoT
- Distributed rather than centralized learning
- Data are particularly noisy, unstructured, uncurated
- Loads of data are: missing, incorrect, irrelevant
- ML is real-time and starts at the very edge:
- Learning while sensing
- Edge ML for data quality enhancement
HW systems (edge-ML)
- The science of miniature ML
- Embedded ML for real data quality enhancement
- Edge ML for data series imputation, cleaning, processing, characterization, event/anomalies, etc.
- Edge-Cloud ML
- Acceleration of ANNs for cloud ML processes
- Smart networks
Data (fusion)
- Sensor and IoT multi-variate streams
- Spatio-temporal data series
- Sensor/Satellite data fusion (hyper-spectral, synthetic aperture radar)
- Ice and underwater imaging
- Subjective/objective data fusion (smart city)
Models (data-driven)
- Urban mobility
- Concurrent sensing and learning
- Intelligent protocols for energy-efficiency
- Object detection
- Hydro-climatic
- Ice classification
- Oceanic eddy detection
Machine Learning techniques
- Sparse least squares Support Vector Regression
- Bayesian inference
- Scalable Clustering
- Graph Mining (frequent subgraphs, heaviest k–subgraph)
- Feature Selection
- Outlier Detection
Multi-Task Deep Learning
Data Science Tools
- KNIME is a popular, open-source and free Data Science and ML platform
- Contributions to:
- first KNIME release in 2006
- adoption for Data Science teaching and training since 2008
- KNIME Certification in Data Science in 2019
- Applications to Prediction of Alzheimer’s Disease from Brain Imaging
Distributed Computing / Big data
- Big Data Mining
- Parallel and Distributed Clustering
- Rule-based Classification for data streams
- Fast retrieval of weather analogues in the ECMWF multi-petabytes archive
- Distributed Data Mining and Distributed Computing
- Epidemic Computing for fully-decentralised data mining
- Decentralised consensus
- Data Mining in Cloud/Edge Computing
- IOT- & Blockchain-Enabled Security Framework for New Generation Critical Cyber-Physical Systems In Finance Sector
- Distributed multi-dimensional scaling
- Non-Euclidean network coordinates
Computer Vision
- Representation Learning for Video Understanding
- Action Recognition from Video and Sensor Streams
- Text-Video Retrieval and Video Question Answering
- Full-body Human Tracking and Object Detection
- Anomaly Detection in Image and Volumetric Data
- Vision-based Quality Control of Produced Parts
- Neural Digital Twin of Produced Parts for Inspection
- Anomaly Classification and Segmentation in CT Scans
- Neural Architecture Search (NAS)
- AutoML for Image and Video Understanding Tasks
Query processing of interval-timestamped historical data in RDBMSs
- Efficient algorithms and indexing structures for join and aggregation
- First RDBMS with support for all temporal operators
Algorithms for processing time series data
- Imputation of missing values and anomaly detection
- Motif discovery
- Correlation analysis
- Predictive analytics