Monday, October 09, 2017

Amazon Docs, Theory Research kindles some fire.

I read through whitepapers to keep me updated on how companies see themselves and their products, in a technical arena.

I was checking out AWS whitepapers, where among docs that limn the core strategic view of Amazon Web Services and their products, is a research paper, theory research on random cut forest, anomaly detection, and streaming algorithms!

Here is the paper by Nina, Sudipto and others, remarkably nestled within core whitepapers for the entire AWS universe. This research must have kindled some fire, deep within Amazon.  


Sunday, October 08, 2017

Planar Matching in NC

There are problems that I carry in my bones, even if I am not actively working on them. Every once in a while you are working on something that reminds you of these problems and try to renew the attack, other times they simply simmer in your psyche.

One such problem is perfect matching in NC. I teach the Mulmuley, Vazirani, Vazirani result that perfect matching is in RNC, and even recently went back to the open problem of producing a NC solution when I was looking at some MapReduce variants.

Vijay and Nina have an arXiv paper showing planar graph perfect matching is in NC and I am looking forward to reading it. 


Wednesday, September 20, 2017

Online Advertising as Influence Channel

Several years ago I pointed out that online advertising was a very cheap, simple way to reach  selective population most places in the world. Imagine you want to reach folks in Indonesia who can say weave, in South Africa interested in chemicals, or in Brazil interested in hacking cellphone chips. For a few 100 dollars, you can set up an ad campaign through many intermediaries and have it running in a few days.  Now imagine we substituted "enemies" for "country" and whatever you like, for "weave, chemicals, chips", and I thought this channel was far more effective and efficient than running old world spy operations, you can snoop and influence without having a personal presence or sophisticated electronic and broadcasting devices, in a highly targeted way.

In the past year or so,  the scenario above has played out in crucial ways.

Politics and governments aside, I am finally glad to see online ads get attention from media and researchers  as a potential influence channel. Over dinner I have asked folks, "Imagine you can spend a few hundred dollars and target **a specific person** in the world and govern messages they see online. What could one do?"

Monday, September 18, 2017

Sunsetting ICORE

When friends remember me, I am happy to go to distant lands. I managed to travel to Israel to give a talk at the ICORE day.  I-CORE is Israeli Center for Research Excellence program, and the day marked the end of the funding period. Israel has great talent, so any money the govt puts in finds great use; in  this case, it seems to have filled a severe need, providing much-needed support for postdocs.

I talked about Heavy Hitters (HHs), a bit of the classical Count-Min stuff but also much less sculpted stuff like high dimensional HHs, H-influence and other topics. I find HHs interesting because it is not top k, it is a distinct concept, and a concept that represents what is (often the only thing) possible within resource constraints. Robi asked a great question, if this can be formalized.

There were many good talks. I managed to catch a bit of the talks by Rotem Oshman, Michael Shapira, Shahar Dobzinski, and others. I also managed to catch Yuval Ishai and Eylon Yagev talk secure multiparty computing and complexity theory resp in Hebrew. Finally, Bernard Haeupler gave an excellent  talk on using shortcuts to break natural bottlenecks with message passing algorithms. This talk introduced the audience to principled theory methods (was happy to see Leighton-Maggs-Rao O(Congestion+Dilation) result reenter the psyche) to attack worst case performance of message passing algorithms.

ps: Thanks to Moni for mentioning Mossel's paradox with dice/6 over dinner, he said there was an elegant solution, and my neurons stayed awake figuring it out.


Sunday, September 10, 2017

AI + Marketing Workshop

We are running a workshop, trying to bring together the AI and Marketing Sciences communities.  AI community of course includes algorithms, optimization and game theory; Marketing Sciences of course includes online advertising, auctions and other topics. But the goal is to bring in more than these topics, and try to forge a common venue, getting each of these (sub)communities to stretch a bit to meet others.

The workshop is collocated with AAAI 2018, and the deadline is Oct 29, 2017.



Sunday, August 27, 2017

Data Science and Social Good

Bloomberg folks organize a day of activities on Data for Good, Sept 24, 2017, FYI. 


(Sublinear) Time Meets Space

Maybe this post should be called "property testing meets streaming". For more than a decade now we have managed to put together meetings that had both property testing folks (with methods running in time sublinear in input size) and streaming folks (with methods using space sublinear in input size), and in many of these meetings, researchers debate the nuances and results of these communities (the usual: algorithms vs complexity, for each vs for all, yaada yaada yaada). Obviously each community has learned from the other, but there were not many results that formally related the underlying concerns.

Recently, Christian, Pan, Morteza and I showed: ".. for bounded degree graphs, any property that is constant-query testable in the adjacency list model can be tested with constant space in a single-pass in random order streams." This is a human-sized hop towards establishing formal connections.  Hope more hops and leaps will ensue. 


Some Insta Humor

The left photo tells you what NY restaurants have to deal with, artists painting amazing pieces over lunch; the right photo tells you the height of human aspiration found on a sidewalk in mission, SF. 

Friday, August 18, 2017

Heavy Hitters Continued

Recently I revisited the classical heavy hitters problem on streams (I think I can call it "classical", I remember the meeting where we chose to call overwhelmingly large items as heavy hitters, in the shadow of the baseball scandals in early 2000's)  but now looked at the high dimensional version, where each stream item is a d dimensional vector.  

There is a d^2 space bound that is inherent, and we manage to circumvent it, by using graphical model (in our case, Naive Bayes) on the dimensions. This is in general a fruitful direction, I think, using a dependency model on dimensions to half-circle around lower bounds we get with high dimensional analyses.  Our paper is in arXiv, joint work with Brano and Hoa. 

This work was motivated by collaboration with Adobe that needs to analyze high  dimensional web/ad analytics data. That is covered in the Ad Exchanger article, as part of the coverage of the university funding program that Anil Kamath spearheads at Adobe (and does a superb job of eliciting very specific projects but drawn from a wide set of areas).