Course Description. Every
generation has considered one or the other data set they had as being
"massive". Today, we (are positing to) have truly massive datasets.
This is primarily because
we now collect data via automatic datafeeds from the internet, from satellites,
from the financial industry,
using sensors, etc. which challenges the transmitting (T), computing
(C) and storage (S) capacity of our systems.
Dealing with this challenge to TCS calls for fundamental ideas and
techniques from many areas, and a
new mindset to embed them all into novel systems. This course will explore
all aspects of meeting this
challenge. We are looking for students from any (or all!) of
the following areas:
Statistics: What statistical
analysis is feasible and useful on massive data?
Algorithms: How to design
new algorithms that work within the resource constraints?
Complexity Theory: Develop
a theory of what can be processed on massive streams?
Probability: Develop
the theory of random variables and sampling that underpin many of the algorithms.
Harmonic Analysis, Mathematical
Approximation Theory: Fundamental questions in these areas arise.
Databases: How to
extend the current database technology to deal with massive streams?
Networking: What
networking issues can be resolved using massive data analysis on traffic
logs?
Sensors: How
to build a massive information gathering infrastructure using sensors?
Hardware: How to
build specialized hardware to process massive data streams?
Programming Languages:
What special programming language support is needed?
Course Organization: We meet once a week
for 2 hours. Tentatively it will be on Thursdays, 6---8PM. First few
weeks will be lectures on the basics. In second half of the course, you
will present research papers or projects or
perspectives from each of the different fields, drawing from whatever
is your area of expertise or interest. There is
scope to explore only programming projects, only mathematical papers,
or whatever.
Lecture 1:
pdf.
Lecture 2:
pdf.
Lecture 3:
pdf.
Lecture 4:
pdf.
Lecture 5:
pdf.
Code fragments
for generating values from stable distributions (Lecture 4)