Stream Mining & Analytics (SMA)

Data Streams are open ended collections of data that do not have a finite length. Classical data mining methods developed for static and fixed size datasets do not apply to data streams. Research in this area has flourished over the last 15 years or so but some open research issues remain. Some of these open issues include the methods that will detect concept drift. In most data streams changes in the underlying stochastic data distribution occur periodically and such changes need to be detected so that models can be kept up to date with the data in real time. The other major challenge comes from data streamlining in from BigData repositories. New scalable methods that can mine ultra-high data arrival rates are needed while ensuring that model accuracy is not compromised by the higher throughput rates that are achieved. Some of the sample projects from the research group are:

  • Change detection of unsupervised data streams where class labels are either not relevant or not available.
  • Understanding the evolution of a stream in both time and space
  • Capturing of recurrent patterns or motifs in stream data
  • New frameworks for data stream mining to scale up to data arrival rates of terabytes per second.