Monday, March 20, 2017

Self II

Here is how the world sees me.
  • There is a coffee place in SFO, tucked away where people dont make it often. I pick up coffee there when I fly in from EWR, and they give me a discount because "you are a taxi driver". 
  • I went to a WeWork location in NY city to meet a friend, and as I walked in, the receptionist said, "talk to that person, she knows who needs the handyman in the building", and I got to go through the process of a fixer-upper to enter the building. 

Self I

People seem to like it when I poke at myself: 

In a recent conversation, we discussed Dad Jeans, described precisely here, but more as a state of mind. A parent needs to think several steps ahead on behalf of their kid who can swerve from disegaged to insightful, be prepared for spills, and be prepared to be out the door the instant the kids are ready unexpectedly for the playground in the winter. So, Dad Jeans, is the choice of wear, it communicates that you are unable to be anything or be anywhere else, beyond your control. 

Being me, I have to find my own way to express that state of mind, so these days I am doing Dad Hair, baggy, ready to follow me instantly, and unable to be anything else. :)



Monday, March 06, 2017

CS Divisions

Thanks to a recommendation from Marc Donner from old google days who now runs Uber, NYC, I am reading Sapiens by Yuval Harari. The author tries to explain the history of humans, succinctly, and succeeds by having an insightful view of anthropology, sociology, behavioral theory, and of course, science and religion too.  One of the interesting parts for me was the need humans felt to divide people into categories (think commoner/noble, castes, etc).  Alas, with division into categories, comes an imposed order among them and fights to invert the order. The author argues that this imagined order among humans keeps societies stable when it works, and unstable when it doesnt.

I have always been suspicious of divisions. In CS, folks divide areas of research. These are not islands.  In any area of research (say AI, social networks, Robotics, Brain, whatever), there are (a) theoretical foundations and optimizations, (b) new systems research into hardware and software needed to program them, compile into executables, execute them efficiently, (c) new data and UI systems to use, analyze, report, mine and troubleshoot, and so on. A great research will include conceptual breakthroughs, cacophony of math symbols no more than what is needed, potential for pretty plots, and a storyline for NY Times for societal impact. Most individuals' research doesnt hit on all these metrics, doesnt have to, we rely on the cumulation of research to hit all of the metrics. Any research area will be potentially less engaging without ALL of these elements, no order amongst them is needed. 

Extreme Streaming

I am making my way back into researching streaming problems.

One of the directions I am focusing on: how to use not polylog memory as is standard in streaming algorithms, but even smaller, say O(1) memory.  My coauthors and I have such algorithms for estimating the H-index on streams (to appear in PODS 2017, will be on arxiv soon) and estimating heavy hitters in a stream of streams model (to appear in SDN 2017).

I was sort of pushed into this model the way I like to find problems in general. If you look at modern applications, there are some real constraints. For examples in SDNs (Software Defined Networks), there are memory pipelines that packets can percolate through, each memory stage can be thought of as a row of standard sketches, and then one needs to compute something on top of these row estimates, but use only memory that can fit into a single packet header. Another example is that streaming analyses are done for a very large number of groups (say for each source IP address or internet user) and in that case, polylog memory per group is already far too much.

I call these extreme streaming problems, inspired by Extreme Classification in Machine Learning, which studies ML problems with a very large number of labels. I think there is more to mill here.

Labels:

Sunday, February 26, 2017

Sundries

Bit longer than a tweet:

  • I was late for my meeting because I was stopped at a traffic light at some NYC corner, a couple asked, "Hey, you from around here? Can you recommend a restaurant?", and I responded. Within a 3 block radius of  any street intersection in Manhattan, there are enough good restaurants to keep one talking for a while. 
  • I entered some word into iPhone during email and it autocorrected to "art". I must have done or at least talked about art sometime in the past. :)
  • I was at a pst and the corporation that trying to convince someone to be a customer of their "critical" service said, "We want to be the ONE throat you choke, if you have a problem." 

Sunday, February 19, 2017

Exciting new book

My long term collaborator, thinker, and a theory researcher Ramesh Hariharan has put together a book that sounds fascinating: Genomic Quirks: The Search for Spelling Errors.

"This is a book of real stories about the search for genomic spelling errors that have stark consequences -- infants who pass away mysteriously, siblings with misplaced organs, a family with several instances of vision loss, sisters whose hearts fail in the prime of their youth, a boy whose blood can’t carry enough oxygen, a baby with cancer in the eye, a middle-aged patient battling cancer, and the author’s own color blindness. The search in each case proves to be a detective quest that connects the world of medical practice with that of molecular biology, traversing the world of computer algorithms along the way."

Labels:

Friday, February 03, 2017

9th annual NYCE meeting

The NY area CS+Econ meeting, the 2017 year version, will be held May 19th at NYU. More info:

--
The New York Computer Science and Economics Day is an annual meeting of researchers in the NY area working on the interplay between computer science and economics. As usual, NYCE will feature invited talks by leading researchers -- this year's keynote speakers include David Rothschild (Microsoft), Emin Gun Sirer (Cornell) and Assaf Zeevi (Columbia)  - as well as contributed talks and a poster session.

We are soliciting contributed short talks and posters. Topics of interest to the NYCE community include (but are not limited to) the economics of Internet activity such as search, user-generated content, or social networks; the economics of Internet advertising and marketing; the design and analysis of electronic markets; algorithmic game theory; mechanism design; and other subfields of algorithmic economics.  We welcome posters and short talks on theoretical, modeling, algorithmic, and empirical work.

Please see the website for registration and the call for papers.

Organizers: Amy Greenwald (Brown University), Ilan Lobel (NYU Stern), Renato Paes Leme (Google Research) and  James Wright (Microsoft Research)
----

Labels:

Experiments with Reviewing

This is not an experiment with truth but an experiment with reviewing truth.  The WSDM PCs experimented with double-blind reviewing. Here is the PDF of preliminary analysis.

Labels:

Sunday, January 22, 2017

Job/Intern Announcements

Following job/intern announcements are of interest to folks.

1. NYT: The Data Science Group at The New York Times is expanding, and we are hiring in data scientist / machine learning roles (http://bit.ly/nyt-datasci). The group focuses on developing and deploying machine learning solutions to meet newsroom and business challenges throughout the company. These challenges include prediction and prescription problems (e.g., supervised learning, targeting), resulting in a variety of internal data products: e.g., webapps, APIs, and slackbots. For examples of public-facing details on machine learning at The New York Times, see the URLs below for an interview [1], talk [2], news [3], or blog [4].

[1] http://www.columbia.edu/itc/applied/wiggins/DSatW-wiggins.pdf
[2] http://www.youtube.com/watch?v=jy_4tljIFqY
[3] http://www.niemanlab.org/2015/08/the-new-york-times-built-a-slack-bot-to-help-decide-which-stories-to-post-to-social-media/
[4] http://bit.ly/AlexCTM

2. Adobe:  Full-Time Positions in Big Data Experience Lab (BEL) at Adobe Research, San Jose, CA

Big Data Experience Lab (BEL) at Adobe Research (https://research.adobe.com/about-the-labs/bigdata-experience-lab/) in San Jose is looking for full-time researchers to define and execute next generation machine learning and AI research for digital marketing applications and services. Adobe Marketing Cloud (http://www.adobe.com/marketing-cloud.html) is one of the largest data collection platforms in the world, managing approximately 35 petabytes of customer data and processing one trillion transactions per quarter. But it's not just the quantity of data - it's the quality of the work that makes this an amazing time to be at Adobe Research. BEL has excellent publication record with dozens of papers at top-tier machine learning and AI conferences and journals in recent years. Join us to turn data into impact as you analyze unique problems, draw inferences, test theories, and see your theories come to life in solutions that help our customers rack up business successes. If you're interested in the problems related to finding information hidden in large data sets, then Adobe Research is your opportunity to make a huge impact on the academic community as well as our customers, who represent the top 10,000 biggest web and mobile businesses.

We accept applications throughout the year. The application should include a brief description of the applicant's research interests and past experience, plus a CV that contains the degrees, GPAs, relevant publications, names and the contact information of references, and other relevant documents. To apply, please send your application to adoberesearchjobs@adobe.com.

3. Adobe: Machine Learning Internship at Adobe Research, San Jose, CA

Machine Learning Group in Big Data Experience Lab (BEL) at Adobe Research (https://research.adobe.com/about-the-labs/bigdata-experience-lab/) in San Jose is looking for interns to work on a range of problems in machine learning, deep learning, digital marketing, and analytics. Our interns will have opportunity to work on real-world terabyte-scale problems in Adobe Marketing Cloud (http://www.adobe.com/marketing-cloud.html). The interns will be supervised by researchers in the group who have excellent publication record with dozens of papers at top-tier machine learning and AI conferences and journals in recent years.  The internship will be in San Jose, California, at the heart of the Silicon Valley. The duration of the internship is 12 weeks and it can start any time from April 1, 2017.

The successful candidate will be mentored and work closely with one or more of the following Adobe researchers:
- Yasin Abbasi Yadkori (http://webdocs.cs.ualberta.ca/~abbasiya)
- Branislav Kveton (http://www.bkveton.com)

The deadline for the application is January 31, 2017.  To apply, please send your application to machine-learning-internships@adobe.com.

Labels:

Tuesday, January 17, 2017

Ballet and Gender Roles, A Discussion

Can ballet express modernist view of sexes and gender roles? This is a much needed discussion, and NYT steps up. Good to see friend Amar make cameos in pictures.