AAAI-15 Conference – Summary

Here are some of my take-aways from the AAAI-15 artificial intelligence conference this year.


  • How Far Has AI Come?  Having come from some involvement in AI back in the 90’s, it is very interesting to note that there is a bit of AI here that looks a whole lot like AI did back then, especially in the domains that involve GOFAI (good old fashioned AI).  Some of the work was probably not even what I would classify as AI, being more like either every day software engineering or incremental advances in optimization of complex problems.  Many solutions still involve highly contrived problem spaces.
  • Hinton Says: “GOFAI Yourself, Deep Learning is Here”.  Geoffrey Hinton, who helped give birth to the back-propagation and Boltzmann machine neural networks as well as reinforcement learning, shared very compelling results coming from the connectionist camp and his latest contribution of “deep learning”.  Deep learning is more or less the new buzz term for neural networks.  It almost seemed like he was trying to start a nerd fight, basically proclaiming GOFAI a waste of time, especially with the beatdown it’s been taking from deep learning.
  • What if AI Succeeds?  Stuart Russell, author of some comprehensive texts on AI, talked a bit about the implications of AI succeeding, resonating Kurzweil, but suggesting a regulatory body in the government and building ethical intentionality into the functioning of robots.
  • Robots is Where it’s Happening.  Honestly, the most interesting work here was in the realm of making robots more intelligent in uncertain environments.
  • Things That Might Scare You.   Well, what should not scare people is the idea of a sentient robot army taking over the world, much to the contradiction of some popular press…  today’s AI is no where near nor even remotely resembles human reasoning (obvious to everyone here).  On the other hand, there was one talk that suggested people do prefer robots-as-managers, with the author suggesting the next step would be to figure out how best to have robots manage people (egads).
  • Watson is What Watson Does.  IBM explicitly stated that they don’t really understand how Watson (a.k.a. the only non-human millionaire to win Jeopardy) really works.  As in, they have no formal model or theory around it;  they just created this aggregate system that exhibits better behavior than they can explain.  Plea was made to have the academic community help.

AAAI-15 Conference – Day 4

My observations from the 4th full day of sessions at the AAAI-15 artificial intelligence conference in Austin, Texas.

von Neumann’s Dream

Talk by Michael Bowling discussing games and applicability to AI, etc.  3 weeks ago Bowling’s team was able to announce that “poker has been solved”.

Stats on game history:  2007 checkers “solved”.   1996 Kasparov loses.  1994 first time any computer beats a chess master.  1948 Turing & 1949 Shannon both wrote chess programs.  1928-1944, theory of games by von Neumann.  1864 Babbage and Lovelace tic-tac-toe, with hints at chess.

Why games?  So many aspects of intelligent thinking are combined in game play; why we use them to help develop our children.  Easy to determine progress.

“Solved” implies guaranteed win.

Many board games are “perfect information” games.  Poker, however, involves things like bluffing and uncertainty.    There’s a part of the decision making space that involves observations, guesses, and missing information.  How do we represent “bluffing”?   Poker has generally been perceived to be a “huge” game that we would never be able to solve.

Looking at Heads-Up Limit Texas Hold’em (HULHE).  3.2 x 10^17 game states (though don’t necessarily know which is which).  3.2 x 10^4 infosets.  1.4 x 10^13 canonical infosets.  Would need 3 petabytes to represent?   Unlike other games that have 3 win states; e.g. {win, lose draw}, poker has more of a continuous range; e.g. [-24,24] big blind (bb).  So solutions end up being approximations.

So instead of getting it perfect, we aim for “essentially solved” (something like 95% confidence), approaching as if training a player over a “lifetime” of poker playing.   Also apply “regret-matching”, in which the AI looks back at the utility of past decisions and replays the ones that seemed to lead to less “regret”.  Over time regret will converge to 0.  Cumulative regret is measured over time, and the most regretful strategies end up getting pruned, which leads to a probability distribution on the no-regret leftovers.  Strategies translate into decision edges in the tree.  Keeping track of all that non-regret may be intractable, so divide and localize many regret algorithms at different points in the tree.  Thus far, these changes between 2003 and 2013 lead to improvements of orders of magnitude.

Some ideas did not work…

Abstraction strategy.  Start with the full game strategy being just too big to solve.  Idea is to abstract the game itself into a form from which a tractable solution can be determined and then translate the strategy back into context.  This actually was successful in that it did beat some top poker players.    From there, we could start measure meaningfully how close we were getting to an optimal solution measuring “exploitability”.   Watching the graph, results were looking very very promising, but misconstrued.

Decomposition strategy.  Very successful in perfect information games.  As name implies, you solve sub-trees independently.  However, in poker, you don’t really have independent sub-trees because of unknown information that crosses potential future solutions (not a perfect information space).  Did have some success with this using sub-trees to work out counterfactuals.  Ends up still being resource intensive.

One did work…

Algorithmic strategy (the one that did work).  Track and use cumulative regret and regret matching.  Bad things in the past get reset, and good things get saved and adjusted.

Look at Rhode Island Hold’em; a smaller game.  Solved in 2005, but takes 2.5 hours to run with that solution.  New solution: 7 minutes.

Additionally strategy includes also dropping data that is “unlikely” to be relevant, reducing the amount of memory needed and the amount of data to process.  When started, the bot plays horribly.  After 4.5 days (many many cpu years).  69 days (900 cpu years) heads us to that “good enough” point where we can claim to have “solved” poker.

Justifying the relevance of this research (beyond being for the fun of it) with an application story…   you are a doctor, and you have to come up with a treatment recommendation for a new patient you don’t know much about, so you start off with some uncertainty.  Could sample something from your sample set of patients.  You can frame it as a game similar to how we represented poker.

“Intent Prediction and Trajectory Forecasting via Predictive Inverse Linear-Quadratic Regulation”

Intent is not just the target trajectory but also the behavior type.  Example of pedestrian movement prediction.   How probable are different behaviors?    Made use of Cornell Activity Dataset (CAD-120), which is a database of videos of various activities, looking to be mostly household.  Dataset includes skeleton movement and object info.  So sample here might be one of cleaning plates, etc off a table.  If recognized in real time, the bot can use predictive information to assist in a non-disruptive way.

Bayesian intention and target prediction.  Approached with Markov decision process, quadratic control for path determination combined with a dynamic based cost function, inverse reinforcement learning.  Have location of objects but not context (plate or microwave platter?).   Have to also consider sub-activity-specific behaviors.  Each gets unique (quadratic) cost functions.

Able to gain a nice set of update rules to allow for fast enough behavior (1000 predictions/sec) for real-time, real-world interventions.  Works best if combined with strong priors.  Future work involves climbing up into higher level activities, actor identification, and actor dependent prediction.

“Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization”

Quadcopter navigation, 8 dimensional feed.  Should be able to set a higher order objective deployed into foreign environments.   (a little hard to understand speaker)

Use constraints to avoid undesirable states (obstacles, going to fast, incoming ground).  Showed math for in slide given objective, model and constraints.  Use FITC sparse Gaussian processes for learning function.   Use SE ARD kernel for feature selection.  SQP used for NLP, which reduces to a more manageable series of QP problems.  Let first QP converge for locally feasible trajectory, and then “warm-start” QP approximations around the previous trajectory.

Use pole-balancing results for benchmark: good results.  Then test in simulated flight environment.  Applied constraints and dynamic changes in real time: also good results.

“Approximately Optimal Risk-Averse Routing Policies via Adaptive Discretization”

Example of trying to get to the airport on time (relevantly to us conference goers) from the BBQ joint, but you want to maximize your culinary experience time.  Traffic is uncertain.  50% probability of getting there in time at various points in your (common formal) predictive scheme.  Compare fastest expected/shortest path vs best path, etc.   Once you get to another point in time, you can look back at the original prediction and adjust what part of the decision graph you want to go down.  Mostly seems like playing with formal probability so far.  Modeled as a DAG.   Looking at maximum utility to meet end goals.

Approach taken: backwards induction.  For each vertex in reverse topology (from future to present?), calculate best option and expected utility, using Bellman equation to calculate utility.  However, best not to solve the Bellman *exactly* or use time based discretization (might experience gap in utility).  Instead, apply adaptive discretization until the utility gap is less than the delta at any given step.

Experimental results…  done on fairly uniform graphs that *might* represent traversing Manhattan to get to the airport.  So tests kinda limited, but planning to apply to more complex scenarios.

“Easily Accessible Paper: Reusing Previously Found A* Paths for Fast Goal-Directed Navigation in Dynamic Terrain”

Note that this is one of the “accessible” papers in the conference.

Problem: pathfinding in general graphs applied to a dynamic terrain; costs increase and decrease; e.g. mars rovers.  The paper describing A* bot information is relatively unreadable, so trying to make more accessible here.

Generic solution: search and store result in a path.  follow the path to the goal, execute a solution, if the arc cost changes, update the search graph.

Repeated forward A*: shown briefly…

D* Lite:  do an optimized dynamic backward A* and store result in path.  move through path and adjust for unknown object.   Generalized Adaptive A* (GAA*): do A* followed by heuristic update and store in path, follow path, adjusting for arc cost changes.

Showed 2d picture of agent, target and obstacles.  Backward strategy requires knowing the location of the target and where all the obstacles are (it looks like).   Pic of D* strategy space looked like a blob, and the A* looked fairly directed.  However, D* is more efficient because A* has to be repeated many more times.

AA* will improve upon A* by re-using previous search information.   Pseudo code displayed, but font kinda small for my friggin eyes… alas.

Regardless…  experimental evaluation in random maps (random, room, and warcraft maps).  MPGAA* performs faster than the other x* solutions in the random and indoor settings.  Not much improvement over D* Lite in outdoor settings.  MPGAA* in any case is much easier to implement.

“SCRAM: Scalable Collision-Avoiding Role Assignment with Minimal-Makespan for Formational Positioning”

UT LARG dudes.

Role assignment problem.  Assign N robots to M targets into 1-1 mapping so that collisions are avoided while minimizing the makespan.    Can represent as bipartite graph.  Applicable to scenario where using robots to procure items in a warehouse, or robots on an assembly line. Also applied to RoboCup domain.

Minimum Maximal Distance Recursive (MMDR) Role Assignment…  mapping costs calculated, then apply hungarian algorithm.  Translate MMDR into the assignment problem.

Minimum maximal distance + minimum sum distance…

Algorithm is a little dense, and presenter zoomed through it a little quickly, but did indicate that the solution is speedy and scales well.  Showed some results assigning 10 robots in a 100×100 grid, which indicate improvements over typical solutions (greedy, MSD, MSD^2, etc).  Showed animations of paths taken, illustrating significant performance difference.

RoboCup simulation shown as well.  Formations seemed to make sense relative to ball position.  Also showed pre-defined formation movement which included effects of battery drain because of additional work needed to form under other circumstances.

May try to apply to more complex scenarios with obstacles and such, and may also trying applying auction algorithms to the solution.

“Learning to Mediate Perceptual Differences in Situated Human-Robot Dialogue”

Situated referential grounding.  To bot: “get me the black panda on the floor there”.  Bot sees a floor space with all sorts of objects, including a stuffed black panda.  However, there are mismatched perceptions about that space between the human and the bot.  Related work spans from the 70’s.

Focused here on grounding procedure and “word grounding models”.  Reliability of word grounding models can be affected by situational factors.  Propose a weight-learning approach. Can make use of peripheral verbal cues from human (e.g. if two objects look similar, and the human adds “the one in the back”, then probability can be applied to resolve ambiguity), and then use that info to help reinforce general recognition of the type of object.  Make use of graph matching between known and experienced (and maybe predicted?), and adjust weights at the attribute and word levels.  Can frame as a linear programming problem.  Pull this into online learning with human feedback.

“Spatio-Spectral Exploration Combining In Situ and Remote Measurements”

Rover (as in Mars/space Rover) navigation.  Currently terrain priors are pretty low-resolution.  Would like to figure out a sensible way to collect spectra to more accurately reconstruct from imagery and improve adaptive path planning.

Concept of endmembers, similar to PCA, that ideally correspond to geologic materials; hard to determine from orbital pixels.  Determining where would be good to collect a sample to help inform the images is part of the goal.  Rover-based endmembers…   As in, we don’t necessarily know what different colors in the spectral analysis mean.  Collecting samples helps to dispel mystery.

Tested on data from Cuprite, Nevada (has a number of visually distinct spectral classes).  Paths chosen at random, with samples in a uniform grid.  AVIRIS: camera used for capturing high res spectral resolution (2m/pixel); spatial resolution meh.  ASTER: low res orbital spectra (15-30m/pixel) — used to help inform sampling done with AVIRIS.  Tried various planning options (random path, direct path, etc).  MESMA and NNLS performed the best.  Field tests coming in the near future in Atacama Desert, Chile.

“An Entorhinal-Hippocampal Model for Simultaneous Cognitive Map Building”

Brain-inspired navigation.  Study how mammal perform spatial cognition.  Reference to neuroscience work done on the “Brain’s Inner GPS”.   Function of “place cells” in the hippocampus and “Grid cells” in the entorhinal cortex used as inspiration for the development of a model.

Visual => place cells (to cognitive map) => set of grid cells (path integration)…  (accellerated through the slides at this point.  as the material became more dense, of course).

Tested on Pioneer bot with arm.   Live test: build cognitive maps of two indoor environments.  Result images showed neural map responses to different locations in office environment.  Cool video showed neural spikes along with video of bot moving through its environment.


AAAI-15 Conference – Day 3

My observations from the 3rd full day of sessions at the AAAI-15 artificial intelligence conference in Austin, Texas.

Intelligent Decisions

I missed most of this particular talk, so a bit of this is second-hand.  Here’s what I’ve been able to gather from it.

Talk is mostly about what has been happening with IBM research, most of which seems to be going down the track of aggregating algorithmic solutions; a sort of overloaded shotgun approach.

Have developed a synapse chip that presents a significant performance improvement that can possibly be applied to neural simulation.

Cognitive systems development.  In practical terms, the design support system is made for working in partnership.  Part of goal is to be able to effectively answer questions that weren’t programmed into the system.  General strategy is to train multiple classification algorithms at once, favor the one with the best upper bound, and have humans interpret the result for feedback.  Cycle of processing: search => read => find => record => aggregate.  User will not provide all of the data, but may relate how to extract more, which can lead to better interactive modeling.

Solution to another problem involved painting a scenario wherein you allow the system to “max out” a budget in order to find a solution to a problem.

Meta-learning:  multiple learners together give you better results.  They, in effect, learn with algorithm works the best for a given input, without interpolation.  Select winner by clustering, etc.  More or less the Watson way, and the Watson Development Platform is available for use.

“Deployed: Robust System for Identifying Procurement Fraud”

Fraud risk in enterprises.  $3.5 trillion in losses.  18 months to capture fraud on average by way of auditing, which captures only a small percentage.  Average loss of $1 million in 20% of the cases. Clear that opportunities exist to improve.

Taxonomy of fraud events (created by IBM).  Fraud by vendors vs fraud by employees, both often involving collusion; e.g. artificially inflating prices, bribes, lower quality product or service, falsifying performance, fake vendors, fictitious orders, etc etc.  Example of lady in india ordering usb sticks, but would end up getting for free, had access to and would delete relevant boss’s email.  Ended up ordering thousands of usb’s for free and selling.  So WIDE range of cases that need detection.

Example of conjoining events & observations which alone may not indicate anything; e.g. hear fight between husband and wife (happens all the time and not necessarily alert on its own), hear loud noise at night (ditto), see husband carrying bag out in the morning (ditto).  Combined, however, is pretty suspicious.

Here, multiple analytics techniques applied for a procurement fraud analytics tool:

data capture/prep => text analytics => anomalous event detection => social network analysis = importance weighting => unsupervised learning => investigation & alert triggers (scores and confidence levels) => supervised learning

Risk event groups:  profile risk (problems with vendor profile that indicate closer look needed), perception risk (problems with perception of vendor), transactional risk (issues with transaction patterns and history), and collusion risk (problems with relationships between associated parties.

Vendor scoring includes things like registration with Dunn, roundness of dollar invoice, perception index, how well P.O.’s line up with invoices, etc.

Uses sequential probabilistic learning, an “online” learning algorithm, for evaluating collusion.  Input weights and confidences => determine edge probabilities => assign edge weights => infer probability of collusion => output collusion confidence.

Showed to be better than leading competition in solving more types of fraud problems.  Seamlessly combines various models to effectively analyze procurement risks.  Actively, since last year, monitors daily about $45 billion, 65k vendors across the world.  Still a work in progress, but currently is quite useful and accomplishing things that haven’t been done for 15 years.

Then showed demo on real data.  Landing web page is a sort of dashboard of statistics.  Risk analysis report shows word clouds for countries and types of issues, with various filter and NLP options, summary charts, score distribution, country to country risk assessment, etc.

“Emerging: Design and Experiment of a Collaborative Planning Service for NetCentric International Brigade Command”

German, French and US army.   Cooperative planning.  Optimization of collaborative operations.  Information flows very fast at tactical levels.  Using constrained optimization techniques, conducted experiments 2008-2012.

Joint planning today is at the division level and is very slow.  Division decision level 6-8 hours down to platoon or squad which needs decisions like NOW.   Situations involves enemy and friendly forces, of course.  Execution involves phasing and coordination; who’s doing what at what time.  Highly asynchronous and time sensitive.  All sorts of variable aspects; mobility, size, firepower, effectiveness of troops, etc.  Example of embassy taken by terrorists.  Contingencies can cause unintended cross fire between friendly forces.

Created a planning service “ORTAC” that can be accessed by US/FR/GE command and control.  CLP(FD) in SICStus Prolog; involves constraint graph, branch and bound algorithm, predicates and constrained predicates, etc.  Navigation constraint model uses to calculated costs (on timing, security capacity, etc) for paths; more or less a routing optimization problem.  Introduce a deconfliction model to minimize conflicts in the graph overall.  Then apply (global) search algorithm with “probes”: metric computation => relaxed problem solving => variable ordering => branch and bound…  Then tune the algorithm to consider temporal  and spatial deconfliction.

Many participants in the experiment, government and corporate; conflict and threat warning, plan computation and time/space deconfliction/repair.  Can propose alternative “better” plans.   Robots and humans alike considered in deployment analysis.  The paper includes a lot of military acronyms; author says you can contact him if you have any questions.  The experiment  was successful enough to prove feasibility of a system like this, so research and development is continuing.

Questions raised about effectiveness with broken communications on the field…   There’s a political barrier that complicates it; each country wants to control its own resources, etc.  It is something that would involves cooperation at other levels than just the tech.

“Deployed: Activity Planning for a Lunar Orbital Mission” 

NASA Ames research.  Problems/contributions:  LADEE activity scheduling system (LASS) for activity planning to meet deadlines, using AI  to help formulate activity planning modeling and processing to manage orbit; does involve issuing various live commands to equipment on the spacecraft, and can be applied to “snap-to-orbit”.

Mission objectives: examine lunar atmosphere, sample dust.  “Sunrise Terminator” is a position important for collecting data, among other positions with equally interesting names.  Predictions of crossing times important and affect overall planning of spacecraft trajectory.  Variable aspects that had to be taken into consideration in live planning and update to planning included things like purpose of observation, when science activities could or could not occur, multiple concurrent plans that need to be coordinated, etc.  Instrument activity plans have to be coordinated with strategic, tactical, etc activity plans.

Numerous fairly involved, detailed slides of the flow and other aspects shown; will have to reference the paper/presentation.

Showed a number of JPL space projects this has be successfully used in.  Dynamic Europa: automated constraint reasoning system. SPIFe: client interface.  Activity Dictionary: encoded activities. Template Library: partial pre-defined plans.

Didn’t really go much into the actual algorithms or what was novel in terms of AI.  It kinda came across as fairly routine software engineering, and a little bit of a stretch calling it AI.

“Asking for Help Using Inverse Semantics”

IkeaBot (yes, for building Ikea furniture ūüėČ )

Noticed while testing, noticed bot was flailing.  Solution was to shove it twice.  Not really good solution.  So…  made them think about how to get the robot to ask for help when it needs it.   How does the robot determine what it wants the person to do?  And how does it know how to communicate it?

Looking at prior NLP work…   tried to find crossover solution for two main types of solutions.

Strips style symbolic planner for assembling furniture, includes pre and post conditions.   Can hardcode mappings from failures to intervention methods, but that’s a fragile solution.

Introduce inverse semantics…  involving forward semantics on “groundings”, probabilistic determination of the surrounding objects.  Generation of semantics starting with looking at a context free grammar to describe the language used to talk about the groundings/objects.  E.g. “hand me the (most likely the) white leg (that I need)”.  Suggests that the formula presented (a product over a sum*product) for inverse language understanding *is* language understanding (seems a little stretchy).

Tested, of course, by introducing complications for the IkeaBot.  Initial success rate was relatively low (20%-50%).   Human written got high rate (as a control?).  Inverse semantics approach reached about 64%.

Collaborative planner infers human actions needed to help robot, and generates natural language, and mathematical framework unifies  language understanding and generation.

“Learning Articulated Motions from Visual Demonstration”

Motivation: want robot to understand the underlying kinematics of objects in a household environment.  Doors, cabinets, trash cans, faucets, knobs, etc.  How does the bot learn these things?  An option is to mark objects in the environment, but this obviously doesn’t generalize beyond specialized environments.  Would rather them learn in an unstructured environment with humans roaming about.

RGB-D as input only.  Trajectory construction: extract features and motion over time.  Trajectory clustering:  collect into object parts .  Pose estimation: can be noisy, but observe and predict possibly movement for the parts.  Articulation learning:  learns movement.  Object model persistence: remember what has been learned in a way that can be used in other environments.  Predicting object motion.

Qualitative results:  train in one room with a set of objects.  Then take into another environment with similar objects.  Compared to state-of-the-art, this solution provides more robustness, better fidelity and accuracy of motion, show that over 43 different test environments, successful over 2/3 of the time.

Recent work:  learning done with a human co-operator, which seems to be pretty effective in increasing smoothness and accuracy by way of observation of aided movement.

“Tell Me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions”

Common scenario is to give a grounding command to a robot, and it has to interpret it correctly and do something intelligent.  Problem with grounding is converting a command into an action plan/sequence of actions.

Example.  Making sweet tea => “Heat up a cup of water, add tea bag.  Mix well.”.  So in the environment there’s a table, a cup, a microwave oven, a stove, a sink, etc.  Many instructions presumed and missing from the action description.  Command also may not be in sync with the environment (might not be a cup, but *might* be a glass or something “close enough”).  May be a multitude of ways to perform the task (e.g. multiple vessels, a microwave or a stove?).  So the grounding is subject to all of these circumstances.

Common approach is to define action templates.  However, fragile and typically doesn’t handle ambiguity or many to many situations well.  E.g. “heat cup of water” vs “heat cup of milk”.

Another approach is to create a search path (a graph/tree).  However, search space can become exponential.

Proposed here is a learning model (CRT with latent nodes).  Clauses.  E.g. “get me a cup” can be scoped in the tree with feedback about what’s available in the environment.  Model solved using VEIL-template, which appears to be a function of the clause + environment  + instruction sequence template + original object mapping.

Created an online gaming environment (crowdsourced testing).  Able to collect 500 real templates by recording movements made by online users (I think).   Those templates went into training the bot.  Results show that robot is able to “fill in the gaps”, and shows 3x improvement over other solutions.  Video showed a bot cooking and serving a meal, which was kinda cool.

“Learning to Locate from Demonstrated Searches”

Elderly care scenario.  Want robot to find grandma’s glasses.  Want this to happen for any grandma.

To make this work, would like this to be applicable in an optimal way in a novel environment.  Introduce notion of “priors”, which are, e.g., beliefs about the likelihood of an object being in a particular place.

Given a location, then target probability distribution, then figure out an optimal search plan.  For each location we have some features and determine a log probability.  All we get to observe are “pasts” prior to our current situation.  Scores end up being time-optimal search stories.  Learning algorithm is iterative, starting with some weights.  Make adjustments with feedback.  Expected time to locate the object is part of what’s fed into the inference engine.  “bag of words” technique used to capture/represent features.

Naive approach equations shows in contrast to better approach that involved “transposing sums”.  As well…  “admissible” heuristics are derived from “relaxations”.  Introducing these sorts of heuristics shows good results as the complexity of the scenario scales up, by a factor of 10.   Basically these changes shifts the search path to try the “most probable” locations first.  While not purely optimal, results showed what a human would perceive to be near optimal (and I guess not totally dumb; e.g. robot: “glasses?  I’ll check the fridge!”).

“Fully Decentralized Task Swaps with Optimized Local Searching”

Multi-robot task allocation.  Which bot should do which task?  Applicable to many scenarios; e.g. warehousing, robocup soccer, etc.  Generate a task cost matrix.  Translate into assignment matrix.

Background for task swapping…   minimize total travel distance.  Start with initial assignment.  Update by swapping if not optimal.  Use duality to optimize the solution.   Use only local and single-hop communication (instead of global).  Idea is to decompose the global solution into localized bits.

“Toward Mobile Robots Reasoning Like Humans”

Robot teammates with humans.  Semantic navigation, search, observe, manipulate with autonomy, natural communication and explanations of “why”.  A bit of related work cited that applies to just about everything described later.  Work is centered around perception.   Example: “stay to the left of the building and navigate to a barrel behind the building”.  As humans, we can immediately upon seeing the building imagine a barrel behind it.

This approach mimics that behavior.  Semantic classifiers applied to 2D visual image, then use that to generate 3D label points, given the resulting “plane cloud”, we predict a “building”.  Based on the predicted building, the bot “hypothesizes” a barrel behind it.  Robot can then generate path with pref to left side of building, per the command.  Adjusts and corrects prediction as it moves.    Introduced architecture (lot on the slide), includes world modeling, navigation, path planning, search actions, NLP, etc.

Semantic classification tries to label regions with a pre-defined set of labels.  From there, cluster 3D objects, applying labels tempered by bayesian probabilities and field of vision limitations.

Imitation learning is used to teach spatial relationships and models of environment.  Show the robot what it means to navigate in different environments.  Robot extrapolates and weighs features for future use in live environment.   Hypothesizing the unseen environment involves summations; equations shown, along with live environment examples, along with a crapload of small font sample commands, etc.  During tests…  35 of 46 tests were successful.  Some tests with bad commands excluded.  Videos shown of real robot and interpretation of what the bot sees… going around the building, ignoring a fire hydrant, ignoring a barrel that’s not behind the building, and navigating to the barrel behind the building.  All from a real robot over gravel and bumpy terrain.

“Learning to Manipulate Unknown Objects in Clutter by Reinforcement”

Autonomous robot for rubble removal.  Major challenge for search and rescue.  Previous work achieved 94% success on regular objects, 74% with irregular objects.   Added strategy of pushing objects.  Wanted to be able to get the robot to learn without any human intervention.  Involves a lot of random interactions; a lot of trial an error.  All by trying and observing success.  Video shown of it in action, touch, touch, move box, move cylinder, push cylinder, pick and move box, pick up and move cylinder.

Unreadable slide of overview of the integrated system (thin, small yellow font on blue rectangles), and MC Botman in the house overloading the microphone…  will have to see the paper.  Something about clustering and segmentation of surfaces into objects, then breakdown into “facets”.  Looks like this led to the bot translating visual image into conceptual mapping of disparate non-uniform objects.  Touch and push appears to play into extracting features of the object, though it doesn’t look like tactile sensor based; rather success or failure of grasping.  Functions shown for action evaluation, and yes it does appear to be a matter of iterations evaluating push-grasp feedback.  Bandwidth selection equation shown that makes use of a “Bellman error” to adjust predicted.  Plus makes use of reinforcement learning.

“Learning and Grounding Haptic Affordances Using Demonstration and Human-Guided Exploration”

Humanoid bot, Curi.  Learning from demonstration.  Human guided exploration.  Involves: action demo => human guided exploration => affordance model.

Video shown.  First assist Curi by physically hand-holding the action.  Then bot tried on own a few times, but with a little bit of corrective assistance.  Then create affordance model; 2 markov models…  successful trajectories vs failures.

10 successful, 10 failed cases.  Feed into offline classification.  Precision is generally higher than recall.  Skill with best performance have distinct and continuous haptic signals.  Online testing involved variations, and successful 6 out of 7 times.  Curi detects shake top affordance, pour top affordance…

Note that no visual information is actually recorded, just trajectory information…  (hrm…)

(note to self: Curi’s approaching the uncanny valley (observation from video).  And ignoring visual info (going blind basically) seems pretty limiting and difficult to generalize from).

“Apprenticeship Scheduling for HumanRobot Teams in Manufacturing”

On verge of a revolution in manufacturing in that we will see more robots and people working together.  Early or lateness, broken parts or tools introduce complications.  How do we do this efficiently?  How do we allocate authority over workflow?   How do we handle implications of adoption?

Task allocation, ordering of tasks, timing of tasks (duration), balance, deadlines, agent capabilities, etc etc…   Tercio introduced.  Uses heuristic techniques among other.

Looking at human acceptance of this, tested on two humans, two robots.  Fetching tasks.  Assembly tasks.   Only the humans can build (legos).   Team efficiency was better if the bot had more control over the flow.  Plus the humans liked handing over some of that control to the bots.  Further examination of that shows that people preferred robots over human control.  So further questions include how to have robots coordinate teamwork.  Regardless, figuring out the psychology of this is something worth more study.

(Note to self: implications of where this heads is scary and seems wrong.  Egads.  Regardless, figuring out the psychology of this is something worth more study.)

“Following a Target Whose Behavior Is Predictable”

What makes a good robot videographer?  E.g. motorcycle jump video, sports, political events, etc.  Line of sight, viewpoint.  Good camera settings and beauty too, but focus on the former.

Video shown of underwater bot following a yellow target.

Robot needs to retain a belief about target, anticipate actions, search for target, consider dynamics of environment.  Range from fully cooperative to fully adversarial targets.

Modeling the target…   compute cost to go, giving pref to single path.  Incorporate speed of convergence to the goal, somewhat relative on the rational behavior of the target, and include perceived rationality of the target into the equation.  Particle filter applied.  Cast as a finite-horizon POMDP problem.  Robot has limited time to compute, so use monte carlo tree search.  Each node maintains times visited and expected reward.

Animation shown of agent pursuing target in a maze.  Seems to still be able to follow even when the target is out of the line of sight / field of vision.    Still work to be done, because apparently there are some performance issues, etc.  Not really scalable.

“Multi-Agent Rendezvous”

E.g. automated taxis meeting to load balance passengers, or an underwater vehicle near a surface vehicle.  What’s the best strategy for solving the task?   Goal: minimize time and resources for rendezvous.  How much prior knowledge and communication is there?  What if it’s an unknown environment?   Compare to how humans do it.

Bots discover their environment as they go, eventually encounter each other.  No prior knowledge or communication.  Introduce cost-reward model and “distinctiveness”, which represents expensive choices.

AAAI-15 Conference – Day 2

My observations from the 2nd full day of sessions at the AAAI-15 artificial intelligence conference in Austin, Texas.

“Statistical Parsing with a Context-Free Grammar and Word Statistics”

(no show)

“Task-Oriented Planning for Manipulating Articulated Mechanisms Under Model Uncertainty”

Task of cooking involves many subtasks.  Many household objects have mechanical constraints: “articulated”.  Here we’re looking at learning the kinematic structure of an objects.  Use task-drive to help motivate and guide the learning.  Graph representation, vertexes and edges representing joints, shape, etc, as generalized kinematic graph.  Given some number of candidate models, find minimal cost to achieve the goal, involving user-defined cost.

Belief MDP…   Assuming we can assume no noise in observation, and perfect motions, we can construct a sound logical space.  Observes environment, generates plan.  Learning is supervised in that the robot will know about and have prototypical concepts for drawers and doors.

Experiments conducted in office and kitchen spaces.  Videos showed success finding and opening various cabinets and doors.

“Learning the State of the World: Object-based State Estimation for Mobile-Manipulation Robots”

Creating semantic world models.  How should a robot represent its spatial environment?  Object-based spatial representation using attributes… what and where?

Say a robot captures image of objects on a table.  Apply object recognition processor…  identify box, etc.  However, object detection is noisy, misinterpreted, occluded, etc.  Then can possibly combine partial views to form hypotheses about what’s on the table.  Becomes a data association problem.  Can use crossover clustering to help association accuracy.

What representation should be used?  Object as atom?  Occupancy grids?  Why not use both?  Fuse together on demand as needed.

Example shown of a toy train behind a board.  Robot arm can sweep behind the board (?)

Scaling this up has issues, of course, but much of it is irrelevant (what you can’t see) and doesn’t matter.  Instead, the estimator should be tied to the task, but flexible enough to be tasked on the fly.  As in, the bot would generate an estimator for a task.

“Time-Optimal Learning, Exploration and Control for Mobile Robots in (Partially) Known Environments”

Robot knows there are objects in a space, but must find them.  Circular sight range.  Assume that bots pick up objects automatically, and that detection is instantaneous (to simplify computations), and uniform distribution over the space.  (Egads, highly constrained to help the study)

Two approaches tried.  One heuristic worst case and heuristic probabilistic optimization, using decision trees, and exploration to action cycle.  Given all the constraints, able to create a sound logical model (note to self: isn’t this part of the general problem in AI that we try to fit problems into sound logical models?)  Showed various paths taken in either case, and that the probabilistic version performed a little better (moving in a spiral).   Can solve an approximate version of the OCP problem (non-convex).

“Plan Execution Monitoring through Detection of Unmet Expectations about Action Outcomes”

Intelligent behavior via accurate model.  Case in point with RoboCup robots.  Many models used:  optimal attack location, opponent marking, ball and robot motion.  Works well if we have accurate models to work with.  However, cannot always fully explore domain before deployment.  Especially in adversarial or human environments and domains.  In some cases too dangerous to deploy and test.  So how to go about it?

Can possibly create “generally accurate” models with anomalous subspaces.  Still applicable to RoboCup, in which case it depends on the opponents and how they play.  What do you do if there’s a subspace wherein the bot is totally inaccurate?

Introduce into the sense-plan-act loop: the planner will also generate some expectations and monitor/correct based on observations/feedback.  Optimization applied to identify and correct anomalous subspace regions.  Using elliptical parametric subspaces, nonlinear optimizations, and a cost function (likelihood of anomaly over probability of anomaly).

Focusing on anomalous subspaces acts as negative feedback for learning.  Showed various animations of their own bots learning off each other.

“Representation Learning for Robotics”

Can call it “feature learning” or “learning to see”.  Showed animation of blurry visual feedback that came from a bot with 4 cameras trying to navigate in a box with different colored walls.  The blurry image showed how vague the visual field is when interpreted by the bot.

Idea is to take this blurry observation and map it to a state.  There’s a learning objective applied, with punishment and rewards applied (reward being a corner).  Still supervised, but does convert raw visual imagery into an internal map of the walled box space.

Added “distractors”, a.k.a. obstacles.  Relying on observations solely, not great.  This new method performs as well as introducing “cheat” information about the environment.

“A Divide and Conquer Approach to Control Complex Continuous State Dynamic Systems using Hierarchical Reinforcement Learning”

Imagine Mario trying to drive around a track.  Continuous space, x, y, direction, velocity.  Typical reinforcement learning leads to consistent mistakes.  If you discretize/decompose the track, with assumed hierarchy.  Completion function is used to re-asses what’s left after a subtask is completed.  As well, divide the regions of the termination step, with multiple policies for each region.  So then it becomes a matter of picking a particular exit region to aim toward.  Reduces the solution space, so easier to solve.  Reasonably successful, but still not optimal, and errors introduced in a non-simulation, continuous environment.  Plan to extend this research to pole balancing and real bots.  Currently using a road bot in a track.

“Towards a Programmer’s Apprentice (again)”

Revisiting the original paper of same title from 1974.  Motivation is to give aid to programmers; e.g. go back to old code or get code piled on to you, and think…  Why that way?  does it work?  etc.  Speaker formerly with Symbolics, etc.

Example: the mailer bug.  Mail would just stayed queued for no apparent reason. Had lunch with original author, who remembered the bug.  As in “oh yeah, I remember that”.

Problem:  software systems last forever, continually evolving, break, design rationale lost…  so could a computer help?  Concept of an apprentice working with the programmer, but in such a way as to capture the rationale; e.g. asks about the stuff that’s not routine.  Programmer, draws, points, codes, talks, etc.  Project to solve this problem started in 1974…  Developed plan calculus, temporal abstraction, cliche recognition, etc.  In emacs!!!

Why revisit now?  Computers are better now, various technologies have improved, and massive open source libraries to reuse.

Prototype now uses Siri and Start natural language system to gather input.  Involves cliches, taxonomy, viewpoints of the data, code generation and refinement.  Runtime is a temporal sequence state machine of sorts.  Examples…    encounters code that intends to represent pixels as bytes, refines and generates code that works.

Video demo demonstrates emacs talking to the programmer, asking questions.  User says “add a disk drive”, and then corrects the user by saying that the data types in his design are not matching up.  In another case, user asks to generate pixels for some data, and the bot asks how user wants to do it, asking why once the user has made a choice.  Kinda cool.   Synthesized code is in lisp.

“Conducting Neuroscience to Guide the Development of AI”

Many people pursue AI as way to understand how the brain works.  Rather, here suggesting that we use tools and research of neuroscience to make more intelligent systems or dispel any illusions about AI models being accurate.  So, along those lines, what we usually do is feed input to computer, get output, then feed input to person, get output.  But we can’t answer the question about whether what the bot did == what the human did.  Still no idea what was going on in the human brain.

Interface between language and vision as a focus here.  Action recognition and video captioning.

Action recognition: Take video data and map to fixed set of concepts; e.g. a video of a kiss ==> “kiss”.  Typical way…  extract space time features, classify with SVM, etc.  Questions…  how does this compare with the human brain?  Well… MRI scan the brain with the same input, and compare.  And of course, they compare poorly.  Explore more…  does the brain pool features?   MRI scans once per two second time resolution, which is slow, but can improve to 300ms, which improves possible results because of the importance of hitting the critical identifying moments.

Video captioning:  Video clip input ==> speech text output.  Solutions use priors from web scale natural language corpora to determine likelihood.  However, is this what the brain does?  Probably not.  Accurate?  Only if your video clip happens to hit the right probability.   Again did MRI scans to find active regions and crossover between.  And compared.  Not really similar.

“Mechanism Learning with Mechanism Induced Data”

Mechanism design in internet applications with multiple agents with independent intentions and self-interest.   Context of crowd-sourcing, search ads, and app stores.  Users <==> Platform <==> Agents.  Agents may be pretty irrational, but some consistency in behavior.  Note that this is from Microsoft Research.

Game theory assumes rational behavior. Machine learning has unknown, but fixed distribution.  So…  quandary is in dealing with a combination of bounded rational and mechanism dependent behaviors.

Introduce MLMID as a hybrid between game theory and machine learning.  Even with irrational behavior, there will still be patterns, plus changes in behavior depending on circumstances that can be recorded and used in a probabilistic model.  Take into consideration evolution of user behavior as well.  Strict use of previous history may not be effective.   Make use of “regret” and “equilibrium” analysis.   All very complicated, however, and really there are just a lot of open questions.

“Challenges in Resource and Cost Allocation”

Food banks in Australia and worldwide.  Using technology to do good in the world.  In this case, creating an app to help people both donate and receive food.  Technology-wise, this becomes a resource allocation problem (similar to vehicle routing problem, which is more or less a variation on the np-complete traveling salesman problem, etc).  Great bit of complexity added with a number of constraints and granularity of items that can be distributed, money available, etc etc etc.   20k customers, 600 vehicles, $100’s million+, etc.

Challenges:  development of complex models and mechanism for fair division, mixed fair divisibility, optimization, and behavior awareness (not everyone behaves fairly or rationally), cost allocation mechanisms, etc.   And the whole domain of the environment is constantly changing. (note to self: seemed to miss reacting to delays and other extenuating circumstances)

Benefit to food providers is that costs can be cut in half if done right.

“Explaining Watson: Polymath Style”

Why does Watson work well?  Not standard NL research.   Despite it’s success, we still don’t know why it works.  Figuring it out can be an open collaborative effort (“polymath style”).

Why do we care?  Seems to be a mismatch between our theory of meaning vs what we’re experiencing with Watson.  In the Jeopardy Challenge, there are 5 key dimensions…  broad open domain, complex language, high precision, accurate confidence, high speed.  Some questions are not encyclopedic that can just be looked up.  Regardless, can sort of be considered solved as of 2010.  Watson became the first non-human millionaire by winning Jeopardy.

Four years later, and Watson’s accuracy no replicated yet (something like 75%), even on factoid questions.  Unlike Deep Blue, after which progress in chess playing increased.

Watson has two search engines.  Analyses question ==> decomposes query ==> –2x split here– hypothesis generation (primary search with candidate answer) ==> soft filtering ==> hypothesis and evidence scoring (supporting evidence lookup and scoring) ==> –join split here– synthesis ==> merge and rank hundreds of scored items (logistic regression applied) ==> answer and confidence.   Dozens of NLP models used in the process.

So… where’s the meaning in all of that?  We get the impression that the meaning is understood somewhere in there (the ghost in the machine).  No formal model or theory.  IBM has published it in the IBM research journal, so how Watson works is open to play with.  Offered some possibly research paths that could be taken.  Can contact them if interested in doing related research.  Opened up to public because they are not seeing any replication of this sort of tech in the academic community.

Deep Learning 

Presented by Geoffrey Hinton, of original backprop and boltzmann machine fame.

What is deep learning good for?  Distinguish structure from noise.  Example problem: pixels to words.

Backpropagation had a bit of promise, but didn’t make good use of hidden layers.  Couldn’t get it to work effectively for recurrent neural networks.  RNN’s held hopes of being able to combine in mass to help solve problems.  Regardless..  what’s wrong with BPNN?  Required labelled training data.  Very slow to run.  Only found local optima.  Often good, but otherwise really inaccurate.

Attempt to overcome by using unsupervised learning.  Make use of stochastic binary units  Then hook it together to form restricted Boltzmann machines.  With one hidden layer, the chances of one feature detecting unit can be independent of other units.  Hopfield energy function applied to determine weight of a joint configuration, and the derivatives are useful for defining probabilities.  Go back and forth training vectors until activity stabilizes.  Applied to learning to model images from video data.  Problem, however, is that it’s horribly slow.

However, the RBM can be improved.  Corrupt the data with encoded “beliefs”, then reconstruct, then take the difference and learn from that.

Training a deep network:  Based on this RBM, more or less.  First train layer base directly on the pixels.  Then treat the activations as if they were pixels to learn features of features, creating a multi-layer generative model.  Apparently it’s provable that each layer added leads to a better variational lower bound variability.

Then… fine tune for discrimination…  and use BPNN to pre-process data (I think).

Example application to acoustic modeling using a DNN pretrained as a deep belief network.  Last year all good speech recognition apps were using this.

How many layers?  How big?  Backprop works better with greedy pre-training, and for scenarios with limited label information.

stuff ==> image ==> label.  This is typical, and is what commonly leads us to focus on the gap between image and label.  However, seems sensible enough to go from stuff ==> label directly.  Unsupervised pre-training not necessary for optimization, but helpful for generalization.

Success achieved with ILSVRC-2012 competition on ImageNet.  1.2 million images, 1000 classes of objects.   Goal: get correct class in your top 5 bets.  Other groups looking at 25%-30% error.  This solution is hitting 8% error.  Architecture used:  7 hidden layers (most recently using 20), early layers convolutional, last two layers globally connected, activation function is rectified linear units, global layers had most parameters utilizing “dropout”.  Example images shown, with confidence list.  Dropout involves randomly omitting units, which helps to retain some of the information already learned.

Again…  what’s wrong with backprop?  Misconstrued before.  In retrospect:  too few labels, too slow computers, stupid method for initializing weights, used wrong type of non-linearity.

Back to RNNs. Hidden layers feed into themselves, can take input or output at any time slice, etc. Powerful because they combine distributed hidden states that allow them to store info about the past efficiently and non-linear dynamics that allow them to update…  AND if they’re deep, then they work better.  Applied to doing machine language translation, using hidden state to represent the thought that the sentence expresses.  Take that “thought” state vector and feed it into the “french” translation RNN.   This solution beats state of the art now.  Goal is to create a real babelfish on a chip that goes in your ear.

Then…  combine vision with language.  Train the RNN with vision percepts.  Tested on a database from Microsoft of 200k images with captions.  Successfully trained on this data, was good at generating sentences for images.   All done without symbols, etc.  Suggested that this is really bad news for GOFAI.   Connectionist vs Symbolist — FIGHT!  Hinton’s being pretty forward about denouncing people who keep using symbols in knowledge representation; that symbols are only input and output.

– Rebuttal from GOFAI:  defending ability of humans to use formal logic even if it’s not part of the basic way we work.
– Can get NN to learn mathematics and perhaps interpret code.
– Smolensky brought up as someone who worked on Boltzmann machines, but has been pursuing more symbolic solutions
– Thought experiment: black box that sorts numbers.  Whether or not there is a NN on the inside does not matter.  Arguing that we cannot conclude that there are not algorithms in the brain

Hinton was more or less saying that most people here are probably wasting their time if they’re still dabbling in GOFAI.  What he says about symbolic knowledge rep with classical logic reasoning frameworks is pretty obviously correct: it’s not how our brains work.

“Spontaneous Retrieval from Long-Term Memory for a Cognitive Architecture”

Knowledge search heuristics…  e.g. “find an item that has a solid line and ends in a square”.  What if the agent doesn’t have any of the basic knowledge in the first place.  Doesn’t know cue, cue relationship, when to search, etc.

Applied to missing link domain, remote associates test.   Agent gets some words as a clue, then has to look up associative words.  Assuming imperfect associative information, with no pre-established relationships, introduce “spontaneous” retrieval.  Results show that this solution helps in some cases, but does not do any worse than typical solutions in other cases; so it only improves.  For cases in which puzzles are not solvable or there’s no good solution, the spontaneous solution performs faster and does just as well.

“Automatic Ellipsis Resolution: Recovering Covert Information from Text”

Ellipsis, in the sense of …

or [e] or “what we didn’t do”

Confounders…  a lot of syntactic and other world make many assumptions about the correctness and completeness in the environment.  This group intends to handle actual natural language.  Question to investigate here is to see to what extent we and explore situations with incomplete, dirty data baby.  With focus on resolving syntactically incomplete sentences (with ellipses).

Processing cycle:  (showed slide for 1 second, alas)…  used standard input processing, then applied removal of false positives, ordered matches with level of confidence (using phrasals, parallel configurations, non-parallel matching modalities, etc).  Some challenges experienced after detecting antecedent clauses.

Unclear what the results were or indicated, but expressed that it would hopefully help inform agent decision making.

“Automated Construction of Visual-Linguistic Knowledge via Concept Learning from Cartoon Video”

KidsVideo project.  Representing and learning concepts from kid videos.  Many aspects to consider.  Multimodal, vision, language, story lines, grammar, etc.  Previous approaches include semantic networks, etc.

Solution involves image pre-processing, multiple abstraction layers, sparse population code (hierarchy free brain inspired representations), deep concept hierarchy of cartoon videos using SPC models, empirical distribution of a scene utterance pair, graph monte-carlo (stochastic, with numerous sub graph methods-UGMC, PRGMC, FGMC), etc.  Generates a multimodal concept map, which includes formation of character property classification and recognition.

Results seems to include success in scene to subtitle determination, generation of images associated with given sentences.

Note: very dense material for the short amount of time to present, so it went by a little fast.

“Ontology-Based Information Extraction with a Cognitive Agent”

Problem is looking at text from family/genealogy history books, taking it and populating an ontological model to gather meaning and establish meaningful relationships.  How can we determine who’s who even within the same family article?  Probabilistic models applied to determine likelihood that two symbols may represent the same referent.

Introduce “Ontosoar” architecture, which is a combination of off-the-shelf and some new home grown components, including a semantic analyzer.  OntoES used to help tie things together.

Notion of construction grammar.  Form patterns are constructed and matched against inputs to determine meaning.   Translated into a knowledge structure, with deduced relational links.

Tested against real genealogy sources, got mostly good, but some mixed results; accuracy checked against human interpretation of the sources.

“Extending Analogical Generalization with Near-Misses”

Learning from structured examples.  Extend analogical generalization with “near-misses”.   Introduce “ALIGN”: analogical learning by integrating generalization and near-misses.  Involves generating hypotheses and refining or filtering.

Applied to recognizing structures (relatively geometric).  Assume some generalized contexts; e.g. for “arch” concept, etc.  Case libraries exist for each case (?).  Pseudo probabilistic methods used to extract hypotheses based on analogy.   Relatively good success rate.

“Learning Plausible Inferences from Semantic Web Knowledge by Combining Analogical Generalization with Structured Logistic Regression”

Problem:  learn to do inference on structures, with all the issues people run into using traditional methods that assume no noise or a sound, complete system of some sort.  Trying to overcome issues with incomplete, noisy data.   Solution tries to combine structural alignment with statistical learning.  SLogAn: structure logistic regression with analogical generalization.

Test example…   use semantic web to gather information to infer information about family relationship.  Preprocess structure, then apply analogical generalization, then make weight adjustments with structured logic regression…   can then do a structure mapping between input and a template.

Compared to state of the art classification models.  Better results that even some NN solutions.

AAAI-15 Conference – Day 1

My observations from the 1st full day of sessions at the AAAI-15 artificial intelligence conference in Austin, Texas.

Hot Talk: General Game Playing

This talk involved taking a look at the utility of using game playing as an outlet for proving out AI.  Gave contrast between typical approaches that presume too much, but more realistically have to consider things like incomplete game trees, lack of prior knowledge of problem space, etc.  2007 brought us the effective Monte Carlo search.  Various ways cited to pre-process info, different types of games, segregating concerns, etc.

Hot Talk: Robocup @ Home

This was an overview of research that is being done testing robots in real human environments; e.g. coffee shops and malls… getting a beer, carrying a tray, by request, etc.   Emphasis on working out the kinks with human robot interaction, maybe dispelling some of the fear.  Involves combining of many many aspects of AI; semantic 3D mapping, etc.

‚ÄúThis Time the Robot Settles for a Cost‚ÄĚ

This paper on temporal logic planning involves simulating janitorial robot in an office, with a list of tasks to accomplish.  This solution tries to bridge the gap between the ai and robotics approaches.

Approach in robotics:

  • continuous motion planning
  • LTL is defacto language

Approach in AI:

  • discrete planning
  • pddf is defacto language

Solutions tried involved translating ‚Äúco-safe LTL‚ÄĚ into DFA.  Information flows as: high level planner -> synergistic planner -> low level planner.  Problems come into play in the form of closed doors (closed connections/edges)

Example: to bot: “don‚Äôt step on carpet with dirty feet (until you have slippers on)”.  however, carpets are between bot and slippers‚Ķ   leads bot to ask user question.  user in this case replies by assigning a ‚Äúcost‚ÄĚ to stepping on the carpet.  becomes a shortest path problem with cost applied to paths, as a matter of ‚Äėleast cost of violation‚ÄĚ.

Animation of the path taken by the bot seem decent.  Can see that the bot handles exceptions; e.g. closed doors, etc, on the fly.

(note to self:  can almost see value, implied in the government funding meeting, in refreshing knowledge with new generations by allowing them to run through experiments that use the same techniques as 20 years ago.)

‚ÄúStrong Temporal Planning with Uncontrollable Durations‚ÄĚ

How to deal with temporal planning that is subject to tasks that may or may not take a certain amount of time…

  • actions have durations
  • conditions can be non-instantaneous
  • effects at-start and at-end

PDDL used as language for.

Solutions tried are extensions of ‚Äúforward state-space temporal planning‚ÄĚ.  When introducing uncertainty of task duration into a plan using standard FSSTP, trying to get the system to re-order constraints doesn‚Äôt work.  You get an incomplete solution, even when the algorithm tries to re-order tasks to suit the overall goals.

Tried a variety of ordering techniques to use within the main technique‚Ķ  landed on disjunctive ordering, but ended up being really slow.

Note:  Dude kept throwing around acronyms.  Colin, SMT, DR, TO, LAD, etc, and was not particularly good at explaining the research

‚ÄúRobustness in Probabilistic Temporal Planning‚ÄĚ

This research involved multi agent scheduling. How do you tell how good a given schedule is?  Solution uses robot create platform and probabilistic reasoning techniques.

iRobot example video shown, avoiding collision with each other, with some level of communication between.

Given a simple temporal network (nodes = events, edges = diff between events), there’s a challenge in responding to new constraints in real time.  Introduce ‚Äúflexibility‚ÄĚ, where naive flexibility is when you sum up all possible events.  Problem with naive flexibility is that it double counts events, so devised a way to minimize duplicates.

Questions arose:  How do we define what‚Äôs ‚Äúgood enough‚ÄĚ?  When is flexibility more relevant?  What are the critical moments?  This led to a solution that involved moving integrals, but that ended up being to complex.

So‚Ķ introduced ‚Äúsampling approximation‚ÄĚ and ‚Äúrepresentative approximation‚ÄĚ as ways to add robustness to adaptive approximation of next move.  In the case of sampling approximation‚Ķ  as it comes to the first edge, it takes a sample, which propagates changes to constraints.  Success or failure (as in, whether or not there was a collision or either failed to reach their destination) is used to train for the next time around.

Brief Planning Oriented Poster Previews:


Resolving Over-Constrained Probabilistic Temporal Problems through Chance Constraint Relaxation.  Involves a bus schedule problem, wherein, for example, I am leaving office for home at 6pm.  Complications in planning involve determining probability of a bus arriving on time, etc.  Found that by relaxing some constraints got better solutions.


‚ÄútBurton‚ÄĚ: A Divide and Conquer Temporal Planner.  Involves factoring the planning problem into bits, planning using heuristic forward search, merging child plans by de-conflicting sub-goals, etc.  Ends up performing better than state of the art heuristic planners.


Take a general maze plan.  Drop an agent with a camera and non-precise actuators into the maze.  Goal is to send a policy into the agent to reach probability of 1 that the bot will get out of the maze.  Results showed that  positive, negative and approximated costs all play into good results.


SMT based nonlinear pddl + planning.  Couldn‚Äôt tell what they accomplished other than munging translation between knowns methods.


Crowdsourcing action model for planning (no-show)


This paper involved dealing with large number of sensor resources and how to allocate them; e.g. sensor systems like multi camera systems.  Becomes untenable, but introduced a ‚Äúgreedy‚ÄĚ bpvi as a way to solve this.


Transition constraints for parallel planning.  Investigated various different models.  Created new encoding that encoded constraints better and made use of Minion to come up with better plans using the new constraints


This involved planning with multiple objectives; specifically semi-autonomous driving.  Initially you sequentially order objectives, then introduce ‚Äúslack‚ÄĚ deviation from optimal.  References change as driver slips from ‚Äúattentive‚ÄĚ to ‚Äútired‚ÄĚ.  Data used from open street map.


Discretization of temporal planning.  Involved developing conditions for breaking timeline into discrete segments to improve scalability of planning.


Chance constrained scheduling.  Used new technique to assess temporal risk of a plan based on conflicts that arise.  Circular ordering of processing: temporal plan – > allocate risk -> conflict -> check controllability -> check policy -> extract conflict -> (back to start).


Hybrid models from guarded transitions.  JMLS insufficient. PHA better, but doesn‚Äôt learn as well.  Uses classifiers from machine learning to help train.  Tested on target tracking.  Showed significant improvements over existing systems.  Applicable to activity recognition (on going research).


Multi agent epistemic planning as classical planning.  Reasoning about the states of other agents from perspective of a single agent, uses modal logic axioms, states in classical method, etc.  Addresses plan synthesis based on perception of state of other agents, and employs conditional mutual awareness.  Applied to a classical gossip problem.  Focuses on belief instead of knowledge.

‚ÄúProbabilistic Attributed Hashing‚ÄĚ

Applied to binary similarity; e.g. representations of images, music genres, paper authors, etc.  Involves large scale approximate nearest neighbor search.  Preserves similarity using Hamming distance.

Purports, by way of attributes and content features, to capture human cognition more accurately.  Introduced generative model to deal with scalability problem.

To capture heterogeneity of attributes and content features, applied hashing with separation and gaussian distribution.  Algorithm shown, but would have to look at in more detail.  Kinda zoomed through it.

Tested against DBLP authors database and NUS-WIDE images with tags from Flickr.  Showed 10% performance improvement with this research, goal being to build a better hash function, it seems.

‚ÄúThe Boundary Forest Algorithm for Online Supervised and Unsupervised Learning‚ÄĚ

Problem is to approximate complicated functions.  Applicable to many complicated functions, including robotic control, vision, etc etc.

Online algorithm performs effectively and efficiently.  Uses boundary trees (as applied to condensed nearest neighbor problem), and incrementally builds a graphical structure.  Sort of monte carlo style, samples points and classifies incrementally as new points are added.  Animation of the process shows the algorithm eventually (after 10k sample points) capturing the pattern in an original image.  Better performance came by forming a ‚Äúforest‚ÄĚ of these boundary trees.  Faster and more accurate.

‚ÄúLazier Than Lazy Greedy‚ÄĚ

Big data searching.  Goal is to select a small representative subset out of a large data set.  Typical solutions are subject to diminishing returns.

Greedy algorithm tags elements as it passes through the dataset, picks the one with the best score and adds to the solution set.  Then repeats runs through.  However‚Ķ  the greedy algorithm can be untenable, performance-wise.

Lazy greedy algorithm is an alternative.  Does a first pass, then sorts according to last scores, forming a high-score subset.  On subsequent passes don‚Äôt go through the whole set.

Stochastic-greedy algorithm only scans a random subset on each round.  Can reduce to O(n log 1/e) vs the O(n * k) of the original greedy algorithm.

Experimented and compared to a variety of algorithms, including random selection.  Applied to non-parametric learning in massive data sets.  Stochastic-greedy had the best utility (accuracy) to cost factor.  Tested successfully on set of 10,000 images with 3072 attributes, and ‚ÄúBattle of Water Sensor Networks‚ÄĚ challenge.

‚ÄúTransfer Feature Representation via Multiple Kernel Learning‚ÄĚ

Cross domain feature learning‚Ķ  This was relatively gibberish-filled.  The presenter zoomed very quickly through many pages of notation and equations without showing much relatable context, accompanied by highly-accented, terse chatter.  Kinda picked up on some distance and cost functions, but geez.  Sorry, this one will probably require just looking at the paper.

Tested out on face data sets (slide shown for like 1 second) to classify faces, as far as I can tell.  Egads.  Indecipherable. 

Presenter suggested at the end to just contact the main researcher for any questions.

Brief Machine Learning Oriented Poster Previews:


Stacked denoising auto-encoding.  Involves creating a relational probabilistic graph model.  Applied to tagging movies and predicting award winners.


Sample targeted clinical trial adaptation.  Applied to determining people best to select for a clinical trial.  Uses auxiliary data to stratify the candidates and remove least likely ones.


Matrix completion for multi label image classification.  Involves applying a ‚Äúlow-rank‚ÄĚ classification method in contrast to classical multi-view subspace methods that are not suitable for multi-label classification.


Multi objective reinforcement learning.  Involves learning a continuous approximation of a Pareto frontier of an underlying markov process.  Also involves building a parameterized manifold in the policy space that maps the frontier, and then minimizing loss from the real Pareto frontier.


Learning half spaces using query synthesis.  Assume a deterministic noise-free half-space.  You can only conduct membership queries, but that‚Äôs expensive and would like to be able to estimate the half space efficiently.  Solution involves creating an elliptical generation process that ends up being polynomial (vs exponential).


Reinforcement learning via demonstrations and relaxed reward demands.  Involves a learning cycle:  request -> policy -> teacher -> action -> world -> observations -> state -> learning -> model -> planning -> back to policy.


Probabilistic tensor decomposition, leveraging features and networks.  Involves probabilistic methods and side-information to inform the tensors.


AAAI-15 Conference – Day 0 (Open House)

My observations from the AAAI-15 Open House, 1/26/2015 in Austin, Texas.

“Leveraging Multi-modalities for Egocentric Activity Recognition”

Using Google glass and similar tech, combined various techniques to improve accuracy of activity recognition; e.g. whether you are brushing your teeth or doing laundry.

“Cogsketch: Sketch Understanding for Cognitive Science and Education”

This was a pretty interesting poster demonstrating software that can classify objects in a hand-drawn sketch and then use that information for various interactive contexts; e.g. educational programs for kids where they draw answers to questions.  Will eventually go open source, but can download code now.

“2012 BotPrize Champion: Human-like Bot for Unreal Tournament”

The idea here is that you play Unreal for a while and then identify which other player is the bot; sort of a non-verbal Turing Test.  I sat down and played and died a lot and could not really determine who was or wasn’t a bot.  I kinda think this is a bit jinky in that: you’re really distracted by the game, there’s limited interaction (shoot shoot bang bang), and real life players vary greatly.

“Going Beyond Literal Command-Based Instructions: Extending Robotic Natural Language Interaction Capabilities”

This poster was interesting in that it was focused on human-robot interaction and designing the bot to ask questions to inform about the intention of the human asking it to perform a task, especially if the question can be interpreted  differently depending on tone of voice, etc.  Some compelling results, but still very stochastic in methodology.

“An Agent-Based Model of the Emergence and Transmission of a Language System for the Expression of Logical Combinations”

This poster was pretty interesting in that the research created a virtual environment for independent agents who, with separate overlapping concept sets, would be set in scenarios with each other where they would generate utterances to represent concepts an the back and forth interplay would result in a generated, agreed upon language to describe the objects in the environment.  However, relied a lot on the contrivances of the environment, and is modelled mostly on propositional logic (all coded in Prolog).

The Future of (Artificial) Intelligence (talk by Stuart Russell)

Gave quick overview of recent advancements in AI.  Deep Learning, poker “solved”, etc.  Implying that we’ve been maximizing expected utility.   Deep Mind can learn to play Space Invaders from scratch simply from watching pixels on a screen.  Deep Mind can learn to drive in a similar way.   Watson winning Jeopardy (funny slide with one of the contestants having written with his answer that he bows to “our future computer overlords”).   A Google cat bot prototype can traverse on ice, and regain its balance if pushed.  Bots can do your laundry.

However, there are misconceptions in the media.  There’s an odd quote (should look up) from the Harvard Business Review predicting that AI will soon have IQ’s greater than 90% of the workforce (which is ridiculous).  As we know in the AI community, this is so far from reality.  We have made progress along narrow corridors of the science, but really no breakthroughs that would constitute an sentient IQ.   Also a misconception that progress in AI follows Moore’s law.  Also pretty false.  (note to self: Minsky recently made a statement about how there hasn’t really been any progress in AI for 20+ years).

Injected note about work he has been doing to help detect nuclear explosions around the globe; heavily suggesting the use of AI tech to help and improve the world.  Apparently there have been 2k+ nuclear detonations since 1945, and now there are detection sites all over the world listening for vibrations in the earth.  Distinguishing between a nuclear detonation and other types of disturbances is a challenge that his AI is helping to solve (and has significantly improved the accuracy of).

What about the impact and realization of fears if AI succeeds?  Is a human with a voice (think Hitler) more powerful than a robot army?  How likely is a robot to exhibit spontaneous malevolence vs competent decision making?  References to “Superintelligence” (note to self: planning to read).  Some movement in the direction of establishing counter-AI regulation and security, to limit the impact of AI on humans.

Russell’s proposal:  create provably beneficial AI.  Set limitations, verifiers, etc.   Cooperative value alignment, which means that the robot has the human’s interests in mind, not its own, and wants to make the human “happy”.   There are obviously problems to be addressed.  How do we verify behavior?  The bot may be programmed to have the human interest in mind, but humans are fallible, fickle, inconsistent and weak-willed.  Even if the bot asks questions, how will it know what’s right?  Objective functions have largely been presumed, but really they need to be learned/generated (so the race to solve this is on!)

Injected other things we should maybe be concerned about outside of AI; e.g. the ability to literally type the structure of a genome and generate the organic equivalent, which can potentially be used to modify the genomic structure of babies in the womb.

Brought up the idea of forming a robotics commission in the government, mostly for the need to have more understanding about robotics in the government.  There are some laws that need to be changed; e.g. current law allows an ISP to more or less own bot AI that you put through their service.  But really, we mostly need common sense containment in the same way that we do with nuclear energy / bombs; i.e. you can use that knowledge to create h-bombs or generate energy efficiently in a fission reactor.

SXSW Interactive – 2012


This was my first experience at SXSW, so I had only a vague idea of what I was getting myself into.  My first glance at the session options beforehand made it seem like it was going to be overwhelming, and, well, it was.  For any given hour of the day there were probably 30+ things you could choose from to do.  How do you choose?  Serendipity, baby.

This conference was different from the usual technical and more academic conferences I’ve been to in the past.  There was a lot of hand-wavey, touchy-feely stuff going on, which took a little bit of adjustment to accept, but that was sort of the point of the conference.  It does, in a lot of ways, represent the landscape of our times in its overabundance of information being consumed by the masses in this chaotic, yet somehow self-ordering way.  My hardened left brain thinking I usually apply to this sort of context just wasn’t particularly appropriate.

But to be analytical anyway…  the conference seemed to be about:

  • Getting inspired, being creative, energizing yourself to make the next big thing
  • Meeting other people who you can help and who will help you on this path

There were numerous references to:

Anyway, below are the relatively raw notes I took for the sessions I attended, and maybe a few other things I saw.

Thanks to John Catalano at ServiceMesh for making it possible for me to attend this year!

Designing For Context

Microsoft did a number of studies in 1995 that led to “clippie”, using bayesian logic to detect when a user is getting frustrated with the UI. Most unfortunate.

Advice: Reject 1:1 mapping between contexts (e.g. between desktop version and mobile version of app)

Things to pay attention to when designing:
– time
– ecosystem
– location
– form & tech
– brand & relationships


Has to do with time available and time applied to interaction and engagement.

New York Times performed some research where they would monitor the site as people used it, and would go so far as to call up users while they were using the site to get their feedback on what they were trying to do, what was frustrating them, etc. Would ask questions about what kind of tasks they were wanting to be able to do, and what kind of time they had or needed to perform these tasks.


E.g. a classroom environment has certain needs you wouldn’t see in other environments.

The Mint (financial software) site did studies, collecting usage metrics on desktop vs mobile. Some users would see only one of the interfaces, so had to determine the key features that needed to be there on both. Found also that development of the mobile app ended up influencing change in the desktop app.

Discrepancy between Android vs iOS… android is made for scalable graphics (so many device possibilities) vs iOS where the target resolution is known. you get better graphics with the known environment, though some recent work with android is getting better.


victoria’s secret apps made by razorfish‚Ķ made iphone & ipad apps completely different from each other because of how they get used. ipad usually used at home while planning, iphone usually used on site where things like reading a upc code come in handy. considered some form of augmented reality.

tax online for accountants by intuit… in office studies found that people would be posed questions on the go and would have to go back to their desks to get information. just having the app on an ipad helped loads.

ability to live stream over mobile in the moment in general is compelling…

you have to consider testing outdoors‚Ķ found that users couldn’t read the screen outside.

Form & Technology

consider the screen and the input method.

e.g. tax application can take a picture of your W2, which sounds good, and people like it, but it actually takes longer than just keying the information in.

when porting to mobile from desktop or whatever…
‚ÄĘ make a list of the key features
‚ÄĘ make a list of device capabilities
==> then think of all the cool shit where they intersect


Formation and effect on user happiness.
e.g. developed a sex-positive brand for women. including segments like “douche & don’ts – misconceptions”


There’s been a general transition to touch screens. Some good, some bad, some steps forward, some steps back. Even Apple’s back button on the iPad was iffy.

Consider Fitt’s Law, that the further away an object is, the harder it is to hit. In UI‚Ķ let people be lazy.

e.g. a pinball game on the ipad has no buttons, and it doesn’t need them. a fair amount of the real estate can be used for gestures without requiring the user to have good “aim”.

however, there’s room for development of a standard “gesture vocabulary”.

e.g. iOS5 introduces a 5-finger swipe that can be used to slide the back-button bar out, but who’s going to figure that out intuitively? what if you don’t have 5 fingers?

gestures are today’s keyboard shortcuts.
can we use the entire screen in general? big screens = big gestures.

“buttons are a hack”
even a light switch is a hack for turning on a light, though it’s a lot better than grabbing a chair and screwing the bulb in whenever you needed light

windows 8 is touting that it’s all touch based. how good it is at that remains to be seen.

twitter for the ipad has no back button. it has a swipable, semi-viewable stack that’s pretty intuitive by being spatially metaphorical.

however, there’s weak support for gestures on the web; e.g. limitations of javascript support.
jquery mobile and sencha touch add more support.
as does touchy.js

some good samples to look at:
– (ui without stereotypical controls)
– (no buttons except for keyboard)

in general‚Ķ you end up “playing the UI” instead of “using” it.
employing a bit of natural physical understanding.

example shown of a paint program that, instead of giving the option to change the pen size, gives you the option to change the image size (thus your finger input remains the same, but is smaller or bigger depending on the overall image zoom).

employ Clarity over Density.
do the opposite of a swiss army knife.
resist putting everything in and making your UI too complicated.
or so much that it’s no longer usable.

how do you help people know what they CAN do?
– resemble the physical action
– resemble what’s known on the desktop
– introduce visual hints

help them make the transition from learning to… muscle memory.

(injected funny image of an OCD cutting board)

suggesting reading: “Living with Complexity”
some social conventions are uncertain, even to people who know their own… others might not.
e.g. ambiguous salt & pepper shakers (one with one hole, another with 10… which is the salt?). you can label them explicitly, or you can make the contents transparent.
same with your UI.
can use thumbnails to make it transparent to the point that the medium becomes the message.

how would you teach gestures.
how-to instruction book? NOOOO.
– user hasn’t seen the app yet
– makes it seem hard
– no one reads it
(showed a funny spoof of a medieval transition from scrolls to books)
– nobody reads the manual
– people are impatient
– instructions get in the way
(showed, which highlighted a really good interactive app pushed by al gore)

nature doesn’t have instructions.
so… rely on that physicality.

the new iOS5’s calendar took a step backward. great visual metaphor for a physical calendar. you’d expect to be able to swipe to get to the next day, but nooo‚Ķ you have to tap a small button on the button.

embrace the metaphor.
but also embrace the tech.

“Sydney Morning Herald” has done a good job of embracing the tech along with the metaphor by allowing the user to see ahead to the titles on other pages.

iphone voice memo app? there only needs to be one big button to record. that’s it. but they have this fancy graphic and the button occupies a small space.

watch a toddler using an ipad‚Ķ it’s amazing how quickly they “get it”.

so… when designing, think like a parent. instructing with patience.

homework for everyone: play more video games
video games are really good at this.
they guide users how to play.
– coaching/demonstration/repetition
– leveling up
– power ups

with coaching…
introduce temporary overlays. be like MS clippie, except smarter, like with Plants vs Zombies’s Crazy Dave.
or employ little animations pulsing and such to draw attention to what can be done or what is affected, etc.

with leveling up…
some levels in games are just for learning 1 skill, and they pause the action to demonstrate and force (more or less) the user to learn that skill in order to continue.

with power ups…
mario shrooms! or gestures.
count how many times a user uses a particular path to solve a problem. if they haven’t figured out the shortcut after a while, stop things and demonstrate.

“a suitcase without a handle is just a heavy box”

check out

How to be Yourself when Everybody Else is Faking it

I was a little skeptical of this session when I went in, but the panel had originally included the author of “The Most Human Human”, which I’ve enjoyed reading so far. However, he ended up bailing, but I stayed to hear out the session anyway. I might’ve been better off attending “brands as patterns”‚Ķ

apparently lady gaga has been quoted saying “I hate the truth”.

some talk was done about the history of “authenticity” and what it means‚Ķ does it mean “not a fake”? or “true to oneself”?
regardless, being authentic has turned into something that is marketable.
mark zuckerberg has been quoted as predicting that it’s not going to be possible to have more than one persona on the internet‚Ķ but no one in the room seems to believe that

some posit that the brain is made up of competing sub-populations, that our heads are communities of competing selves just waiting to be expressed, and a lot of the social tech and the internet that has come to be has made it more and more possible for people to assume their alternate identities.

e.g. Erin Hunter, author of the children’s “Warriors” series, is actually 5 people.
e.g. fake Hillary Clinton women’s committee on the internet supported and then turncoat on her publicly in order to sway votes. (this type of thing is coined as “astroturfing”)
e.g. munchausenbyinternet website
e.g. fake identities help to shelter authors of fan fiction (who use characters from other stories to write their own without permission)
e.g. hoax of men pretending to be lesbians over the course of a number of years.

apparently the US govt many many years ago charted software to create & manage multiple personas, including control of the writing styles of these “personalities”. so it’s not far fetched to think that the government fakes out other people on the internet in order to gather information.

questions come about ethics vs. safety. do you out a fake? will it end up putting that person in danger?… no clear answer

answers for addressing some of the issues we see?..
— need better amateur / citizen tools.


(complexity curve session was full)


websocket as a receiver of information in the browser (without polling)…
– real time content update
– multiple people using the same web page, seeing each others activity
– sync activities (e.g. movies) over the internet

showed pong ball bouncing between windows, where when it leaves one window it enters the other via a websocket msg

webpage on the phone communicating via websockets to drive a virtual car in another webpage (

“space words” game (something to check out)


started off with hand-written html, upload, etc.
then started separating styling from content from code; e.g. php.

now comes Web 3.0.
nowadays, all sorts of devices…
nowadays… databases on the client side
web storage = local, session, and key/value, index db

you have to be online to use a website, right??…
not any more!

application cache = list of items for the website to download to the client machine to live there.

(see The Web Ahead for more info).


file API, file reader/writer/system, blob url’s/blob builder, drag and drop

Device api’s

e.g. access to camera, audio, vibration, etc


webgl… 3d animation

showed 3d aquarium with multiple windows communicating via websockets


Vehicle API

“there are too many web designers not designing for the web”

analogy with development of moving pictures… from still photos to early black & white films

“an innovator is not someone who creates something amazing from nothing. an innovator is someone who wakes up to the constraints caused by false assumptions and breaks out of them”

Data Visualization and the Future

There’s the microscope, the telescope, but what we need is a MACROSCOPE.
Something that helps you see “big”‚Ķ to envision aspects of science and nature at a large scale.

Example‚Ķ book ngram viewer via google. type in “science” and “technology” plotted by occurrences in books through the years. The graph can tell you something (shows occurrences of technology being very low until recently). In turn, we ask new questions with new information.

epistemology of big data?… how to get to knowing?

recommended books: The Fouth Paradigm. Screwmeneutics.

Things that come into play…

trust of the data
provenance (prior art)

more and more data showing up on the net.
more and more data accumulating in general, but difficult to make any sense of it.
can’t possibly read it all.

all this data + social media has had a significant effect on how science is conducted and how data is visualized.
e.g. research involved analysis of “mood” on social networks to predict the stock market. 90% accuracy.
strangely, though, was not accepted by journals for non-relevance (twice)
ended up putting the research online. uploaded it to
day 1 3000 hits
day 2 50k hits
day 3 70k hits
ended up getting an astonishing level of attention.

in general, journal impact is measured in terms of quantity of citations.

juxtapose Brittney Spears and the band Big Star.
brittney is super $, big star not at all and disbanded
but, if you look at the influence of each, big star’s influence on other artists is huge. how many will say that they’re influenced by brittney?
so… usage of data needs to consider that influence != how much $ or common popularity

by that…
visualization of the flow of interest for a given items in the scientific community through journal citations can be used to predict trends in the scientific community.

on to microsoft’s “connections” data visualization tools.
rep qualified that this is not a research team, per se, working on these tools.
demo was kinda cool, showing a fly by of a crater while displaying its seismic activity.
see (shows drill down into “curated” data into time/history)
see bandpagehq website

in sum.
masses of data online and/or on paper can’t all be read to make sense of it.
need to visualize to make sense of it.

additional argument added at the end to include people from the humanities as “customers” to the tools that act as “macroscopes” to help guide what sort of questions these visualizations need to answer.

Physical Architecture Meets Interaction Design

Walkman, then ipod, etc. Usage of space, architecture, etc.

Bridging the gap between the physical and metaphysical…

Intersecting architecture and UX…

Video of modern home, emphasizing architecture and space. Colors, textures. Integration of landscape.
Gives a sense of space.
How the body responds to the architecture.
The senses. How does the building influence the senses. The vision. The play of shadows. Pathways. How you move through the space.
Textures… stone, wood, cloth. Making for an interesting tactile experience.
Smells and tastes. Does it allow flow of air and energy? Smell is retained in memory well.
Sound… water trickling. Echo of your voice.
Spirit… does it tell a story?

Hierarchy of needs pyramid

architecture as mythos
e.g. acropolis… greeks saw it as more than just a shelter. it was a part of their belief culture.
things aren’t like that so much these days
Virtruvious writes books on architecture. the physical body then defines the spaces. not so much metaphysical.
Descartes comes along. Body and mind. Physical and metaphysical.
Newton. Laws of motion.
Vegas. Excalibur vs Neuschwanstein.

Moving on to the digital world…
Long way from pong… on to robotics, haptics, kinect, arduino

“snake the planet” game detects artifacts in the physical world and integrates it into the game.
e.g. snake runs into a window and explodes.

sound machines
e.g. of an arduino project that uses visual sensors to generate music

phenomenological quadrature … earth & spirit connected by tangibility

human         sky

example given of building a school. district required 60 foot candles.
why not use natural light filtering in?
study cognitive science…

Schell’s interactive quadrature‚Ķ aesthetics and tech connected by visibility

mechanics                story
technology has numerous relevant examples.

phenomenology for helping to ask the right questions…
study cognitive science to answer the question…

recommended reading: Flow
Flow… a state of consciousness
big correlation between happiness and entering this state of flow
if you want flow…
juxtapose challenges and skills and balance between.
WOW is a good example
requirements for flow
– goals
– challenges
– skills
– feedback
– control

Introduce‚Ķ “The Meld” to help that intersection happen (phys meta-phys)
to help transcend‚Ķ why not…
hook up an inferred to monitor heat as they breathe?
tie that into taking the space and make it expand and contract as they breathe?

can we introduce new people into the workplace who can bridge that gap?

truth and authenticity
pull in the belief structure of the environment or local culture

frank geery

– richard dawkins.. the magic of reality.

Gaming For Good

The world can be a f***ed up place.
Does gamification make the world a better place?

Have found that rolling out games enabled people to do things or want to do things they normally couldn’t or wouldn’t do.
Making it fun is critical when getting people to do things that are hard.

Has been applied to recycling and clean up in third world communities, and has had a positive impact.

Aspects… social, strategy and surprise are good ways to make it fun and engaging.
Add problem solving…
And competition…

Used to tell unvarnished news about people’s health. And by and large in the US, people aren’t healthy. Found that people turned to God. But this was all negative feedback. Found that using positive feedback made a big difference. Lousy strategy to yell at people. Much better to reward. Positive reinforcement !!!

Shaming people doesn’t help people recycle or maintain the environment more.

What about failure in these games?
Futility is not motivating.

Focus? On the big or next task?
Another reference to WOW and how well it pulls the user along…

I ended up leaving this session early, since it was one of those oh we’re just a bunch of people sitting around and talking sort of panels.


I walked by this one after bailing on the “gaming for good”. represented by Ogilvy (marketing company).

one interesting aspect of this room in general was that they had print outs of infographics that had been drawn for each of the sessions that have occurred in there so far.

I think it was the author of the book talking about how it’s the unpopular kids who end up creating the new trends and unique “brands” as it were. She came across as a beat poet at a slam, but it was interesting regardless. Didn’t stay for all of it, but caught her emphasizing risk, making your brand “approachable”, “sharable” and “different”. Your brand needs to be the “new cool”.

Designing Tomorrow’s Digital/Physical Interfaces

MIT Media lab peeps.
2 themes explored in group…
– materiality/materials
– democratization


been designing building wallpapers containing flexible circuit boards. become environmental monitoring and touch interfaces. interacting through new materials. a sort of “Living Wall”.


circuit boards easily accessed via web services, which will lead to physical construction of materials and consumer devices. share those designs online open sourcely.

Moving on to sifteo…

Hands-on Digital

rivers & roads card game… fun puzzle with square pieces. how can this alternatively be digitized with physical interaction?
tangible games divisible into‚Ķ boards (kinect?), pads (ipad), tabs (small “thumb” items/game pieces)
so‚Ķ we’d want to try to bring the tangible, visceral experience of, say, playing a board game into the digital realm
example: creating miniature cubes ( that are like miniature ipods that can interact and be used as game pieces

Moving on…

Experiments with Reactive Devices

Creating devices that change shape, pitch & roll (shifting its weight). E.g. looking at a map, the device expands as you zoom out, or changes angle depending on what it is “expressing”.
Add “breathing” and “heartbeat”, which speeds up when it gets excited. how to comfort it? pet it!
Or… a reactive roller pen, which can resist or help your actions as you use it.
Or‚Ķ a phone device that allows you to “feel” the breath of the person talking on the other end.
Or‚Ķ one that tranmits a “kiss”. (via sponge on the inside)


What kinds of materials as we progress?
mud, plants, carpets…

This started meandering a bit, and all the cool interesting stuff had already been shown, so I bailed and tried to get into a couple of other sessions that ended up being full. Briefly stepped into a “startup community” session, but not much to report on that one.

Fast HTML5 API’s

Flickr rep talking here.

on desktop‚Ķ we’re always worried about browsers, covering every odd case.
on mobile‚Ķ don’t have to worry about that so much, but‚Ķ have to worry about devices.

screen sizes… media queries, break points, liquid layouts…
how do you make it feel good regardless of size?

iphone 3gs == old slow imac.
by and large… mobile devices are crappy computers with decent video cards
so its becomes a matter of perceived performance

tivo… used to be very slow.
but introduced sounds as immediate feedback.
on desktop browsers… use the spinner for immediate feedback

star trek, the next gen… introduced touch interfaces to a lot of us (prior to them existing).
precursor to the current touch interface‚Ķ “chief o’brien” used the 3 finger swipe, etc.
touch interfaces are tactile
so‚Ķ feedback can’t just be a spinner (you’re touching and moving‚Ķ)

when the interface stops moving during a gesture, it feels like it’s died.

respect conventions…
mobile conventions are new, but some are starting to take hold. e.g.
– slide to unlock. and even a 2 year old knows how to do it.
– pinch & zoom to change zoom on an image.

what we have to work with… touch start… etc
in ios, you get up to 11 touches queued up.
android doesn’t do multi touch yet, really.

gesture events in ios…
don’t use them. your android users won’t be able to.

check out []

how do you make the gestures feel natural? make the gesture the most important thing running when it’s running
– prioritize user feedback (don’t load anything, use css transitions, treat DOM as write-only)
– use hardware acceleration
– manage memory

write-only DOM…
– DOM touches are expensive
– you already know where everything is
– use matrix transforms to queue up positions.

swipe basics… e.g.
distance = e.touches[0].pageX-startX;

snap back/forward…
keep track of last position
pick a swipe distance threshold
if the user is gesturing… element MUST be moving

use native if possible
-webkit-overflow-scrolling: touch;
or… use a lib to simulate momentum

avoid event abstraction…
in other words‚Ķ don’t ever use jquery events.
ends up eating up performance
use the ui widgets/infrastructure, but don’t use their events
every little delay ends up making it feel slowwwww
there’s already support for touch without that extra layer

pinch to zoom.
there will be MATH
native pinch & zoom very iffy
use matrix transforms and avoid DOM touches
scale & translate is relatively easy with the matrix transform support that’s there, plus the 3d has hardware acceleration support.
bonus is that it keeps complex state without DOM reads.
determine center, scale, move.
is the center the center of the picture? or the midpoint of the touch points?
translatex = (scalepointx * (newwidt-oldwidth))/newwidth

progressive enhancement…
detect features
add transitions, but don’t depend on them
clicks should still work
be able to disable features per use agent.

do the dumbest thing that works.
use a webkit browser with UA spoofing.

tool … Weinre. remote webkit debugger.

Charles proxy is another good tool.
watch http traffic.

Adobe Shadow is another good tool.
can remote control the browser from your desktop.
has some limitations‚Ķ doesn’t support invalid ssl certs

best debugging thing…
pile of devices! the more devices the better. collects old devices friends are trying to get rid of.

good plug for the kindle fire…

device simulators & emulators are useless for web development… WORTHLESS

highlighted how flickr does it.

– event listener on top level for touch events.
– only visible nodes move via translate3d
– rebuild next/previoushappens when movement stops‚Ķ (if gesture is running, don’t do anything else)

performance tricks…
– aggressive pruning of memory
– clean off css transforms/transitions
– write-only DOM
– do as little as possible during swipes.

frustrating limitations…
– big resolution, small memory
– hw acceleration = crash!
– automatic optimization causes issues

phone gap is great, but probably still need to do custom work.

sencha? don’t really know‚Ķ seems a little bit much / cumbersome / abstraction-layery‚Ķ

Designing for Awareness

SXSW is an attention economy…

yawning cools your brain, which increases alertness.
“digital natives” grew up in this world of tech.
our brains have adapted and end up craving Facebook, twitter, etc etc etc all this online connectivity.

sustained attention wanes after about 10 minutes.

classic def of awareness by w. james
attention implies focus…
but this was written way back when. design for information scarcity

nowadays you see signs like‚Ķ “pay attention while walking”

herbert simon‚Ķ “attention economy” wealth of info == poverty of attention
now… design for attention scarcity

400 billion bits of info bombard us ==> only aware of 2000 bits per second

“attention currency” by tom davenport
attention == opportunity cost

need to move from awareness to engagement
awareness -> interest -> desire -> action
awareness … economies of attention
the rest are economies of engagement

current UX design models do not pay attention to attention
seen now in terms of engagement, not attention

passive attention (1 of 2)
passive mode

active attention (2 of 2)
active mode… shaq focusing on a free-throw, surgeons (helps to listen to music)

yoga boosts oxygen to your brain

5 types of active attention
– normal (single task)
– concentration (sustained, important conversations, reading a book for work)
– selective (unconscious blocking. related to eyes. proximity, gorilla video, baby attention eye heatmap–notice the white space, etc.)
– alternative (focus on one task, tuned into something else.)
Рdivided (split focus…)

myths about multitasking‚Ķ it’s rapid switching

now… with all this information inundating us…

green tea improves your memory and ability to learn
two strategies… user based & design based

user based
– verbal protocols. 6 thinking hats
– advanced training.
– simple checklists

design based
awareness spectrum, passive to active (from ignore, notify in the middle, to interrupt)
ux designers are attention bankers
find ways for people to ignore (pointless, irrelevant) data
can bury or delete…
want them to be aware of, but don’t want to stop what they’re doing.
should be subtle. keep them one color, one position.
use contrasting colors, fringe of eye path, small objects = less importance
are obvious, require immediate attention, etc…
single page, binary choice
multi-sensory alert (signet & sound)
don’t be subtle. center of eye path (not fringes)
sound is a very good interruptor
games, again, are good at doing this

windows phone 7… as example
tiled interface… each tile gives more info than, say, an icon.
new wi-fi notification is prominent, but don’t block the center of the screen

make things multi-modal to grab attention…
e.g. docs are recommended to listen to classical while in surgery
multi modal being like the vibration of an xbox 360 controller

good panoramas make use of relevant information (phone becomes a window into the panorama)… (think the way the xbox 360 is a panorama

keep notifications personal.

where is the hot spot for info…? things that are close to us in space & time.
but there’s also a long-tail effect with location & time‚Ķ

e.g.… recently viewed items (good), but had originally thought to put task items there instead
e.g. book “how to think like davinci”‚Ķ moves on to mind-mapping.

e.g. alliance airlines + windows phone 7…
(will need to grab slides for sketches)…
panorama should show upcoming trips, should know where you live already, gate changes, cancellations
and how they relate to other people‚Ķ including letting their families know where they’re going, and letting them know when their families are ready for them.

four final questions…
are you using the right awareness strategy?…
how can you use mind maps for spatial and other awareness?

how to get people to pay with their attention?

Funny People can Make you Buy Dumb Things

Biederman… Kids n the Hall, Tom Green show, Dinner & a Movie, etc etc… producer
Weinstock… dos Equiis
Menudo… radioface…

sketch… the whitest kids you know water cooler scene..
attracted a lot of attention… would be great if you could use comedy like that to sell things

1982… 2000 ads/day
2012… 5000 ads/day
showing up in the bottoms of bins, on eggs, etc.
inform… or entertain… or give money…
regardless, you’re competing with a lot of other info

KY lubricant radio spot… funny… pointing out what not to say…
keep it simple: virgin america…
the truth: office depot radio spot… poking incompetence of not using
something unexpected: ad council radio spot… promise everything!

tv ad spots…
“yard fitness” with naked basketball player
“starburst” berries and cream blue boy song & dance

—-> the value of a laugh and bringing this out to social media…
funny is a need.
consumer need states…
‚ÄĘ entertainment
‚ÄĘ utility
‚ÄĘ reward
‚ÄĘ recognition
‚ÄĘ education
these need states, if you hit them, will make you popular.
‚ÄĘ needs to be relatable
‚ÄĘ shared sensibility
‚ÄĘ desire to be the “first post”
‚ÄĘ desire to be the “funny one” (being the “cool” one)
‚ÄĘ ability to share easily

cure for the common insecurity‚Ķ (funny star wars “awesome” poster)
talking about things, sharing… can make you popular…

funny radio and tv spots for most interesting man in the world…
first year of these ads only ran regionally. opened up nationally on Facebook.
people started responding with their own made up quotes. some good, some bad
– his first album was entitled “Greatest Hits” (GOOD)
– i heard he could $&* through his #*@* (NOT SO GOOD)
– he can @*( your mom’s $*@( with his eyes (NOT SO GOOD)
once it’s out, it out.
regardless… the community gathered around it

good stuff…
old spice’s facebook page worth checking out from the beginning of the timeline.
portal video game writing is hilarious. generated so many meme’s.

all is unknown until it hits the court of public opinion…

best odds?
‚ÄĘ know your shit (do your research and fact-checking)
‚ÄĘ make a lot (90% ends up on the cutting room floor)
‚ÄĘ shop it around (try out on audience)
‚ÄĘ make it shareable
‚ÄĘ use comic sans ūüėČ

e.g. shared funny summer’s eve clip even though not in the target demographic for the product.

commercial spots… that brought the speaker to buy the product…
nutra-grain “i feel grreeeeeaaat”. pretty funny


paying for doing comedy.
economics of tv are getting tighter for doing what you want to do.
whitest kids on the block. eventually move to ifc.
once commercially sponsored, became harder to do whatever in the material.
challenge to do what they wanted to do with the commercials…
no go.
tried to find brands that would be ok with their sort of humor.
how to still do the commercial but keep the personalities of the characters?
dunkin donuts, palm pre, mikes lemonade ended up playing ball.
things toned down… kept up the bickering characteristic of the characters…
economics of tv dictate what you’re able to do
very generational…

younger acts‚Ķ concept of selling out doesn’t exist anymore.
and they end up owning more of what they do.
they know, going into the art, that they will end up doing stuff like this (commercials)

shows like Dinner & a Movie was a big commercial, but would have to fight with brands constantly over 10+ years.

networks are paying less and less for edgy comedy.

recommend going to stand up comedy clubs and observe how the audience reacts.
watch television.
book: “the technique of producing ideas” — the more you consume, the more you’ll be able to generate ideas.
e.g. chuck norms list + numerous other things ==> most interesting man in the world…

rules that work… attention? engaged? will it be talked about?

really have to make a lot of stuff, cause you never really know what’s going to hit.
create. try it out on your audience.

the whitest … sent their tape in. immediacy… within a minute the producer knew he wanted to go with it.

Structural Defects in the Software Ecosystem

Amazing we create such complex systems with such brain limitations as humans…
7 +- 2 things handled at any given time
ability to use abstractions…
e.g. numbers, from unary to decimal (hindu-arabic)
e.g. maxwell’s equation usual written as complex set of symbols
same with special relativity…
can reduce maxwell’s to VF = J in space-time algebra. simplify‚Ķ
e.g. Facebook, presents abstractions of your friends

regarding software…
languages (lisp, haskell) —>
inbetweeners (excel, smalltalk)

oh! that was a future 15 session. over too quickly. NULL OP.

Kids and Game-Based Learning

pbs kids interactive == 2-8 yrs.
works very closely with the tv group
typically kids who want to learn
goal to achieve “flow” for these kids

emu… k-12. making games accessible to kids & teens

games are hard… lot of focus on fun & rewards.
engage & learn through challenges. keeping kids at their level of mastery…
flow == rewarding challenges.

another reference to Flow.

bps site‚Ķ the “wheel” has won awards
kids are playing all the time‚Ķ think about how they’re playing‚Ķ
(see slide for curriculum framework…)
numbers/operations, measurement/data, geometry/spatial, algebraic

for the longest time, typical target platform was a pc… a mouse, keyboard, blah blah.
challenging to do that, disruptive to “flow”
mobile touch screen devices. kids “get it” really easy. can take it with them and “own” the experience.
experimenting with alternate inputs; e.g. camera/kinect, 3d overlays, etc.

again with flow… juxtapose difficulty vs skill, affecting boredom vs frustration. balance them along the way.

observe: failure, doubt, curiosity
cmu guy, as a design/dev, knows that there’s a lot of fail along the way to your goal.
also… looking at cheating as a way for kids to learn; e.g. allow mods and such.

learning is irresistible.
and games open up an avenue for that.

ruff ruffian… lunch rush. is an augmented reality game, a mobile game, and a math game.
when testing…
sushi order! ruff asks the kids to do some math, which involved using the phone camera to find the card with the right answer. kids didn’t like the cards being on the table. they put the cards all over the place and turned it into a running game.
did this detract from learning/doing the math? nope!
how long does it retain interest (novelty-wise)? answer is TBD.

“click” is a summer camp for girls. STEM-based reality. people are getting sick in the city. pay actors to play roles in restaurants, etc.
been working on an online version of this (since it’s only played like this in pittsburgh).
changed game location to africa, which they were able to use to share sleuthing across the US.

body inputs.
character leap. show vid of a kid who’s jumping up and down‚Ķ (only needed to move his arms)
so… great engagement. kid was sweating after. and apparently wanted the whole body experience.
could see himself, and was probably watching himself more than playing the game…

curious george game “monkey jump”
using vid cam… if you jump, curious george jumps
kids will say that they “learned to jump”. but with studied inspection, shows that they are picking up the counting concepts.

“going batty” (wild kratts).
using body input to mimic animal behavior; e.g. bat wing flapping.
tracked movements to get some precise tracking.
when launched… jumped to the top 5 games on pbs kids.

what experience do you want the kids to have?
e.g. active adventure‚Ķ want something “zelda” like. started off with modified ddr pad. kinect came out during research, but decide to stick to pads. also‚Ķ allowed for better tracking of things like crawling (kinect will lose you if you dive down into the couch).

how to respond to wrong answers / mistakes.
usual thing to do is “no, try again”, or “that’s not it”, etc.
“fair shares” curious george game did it a little differently‚Ķ wrong answer would give tips, progressively.
70% of kids improved math skills playing this game.

see mindfulxp for some research being done in this area…

3D rendered gameplay.
unity engine being used to develop a game (currently in beta) for pbs kids.
did some initial paper testing.
e.g. of using paperclips to measure a giraffe… before 3d immersion, measured up the back of the giraffe. afterwards… measured from feet up.
e.g. augmented reality. kid holding a picture of an egg that gets rendered on screen in 3D. children kept turning the paper, and would get tired and lower the paper, or would put paper down to grab the 3d object. so‚Ķ at what point will they grasp that they’re looking at a rendering?

ref to minecraft. working with minecraft people. not realistic physics, really. open sandbox, flexible environment with puzzles and such.
a bit of a learning curve for a teacher who may or may not be technical. in addition to just getting it running, you need the wiki to understand some of the concepts… working on it.
how do you get kids to just get right into it?
been looking at ways to report on how kids are using.
game “pixel pushers”.

self-leveling games.
again with the curious george “fair” game
adjust difficulty given child’s activity as you go.

driving game that can be played with parents‚Ķ hitting words with your car; e.g. “that have the letter A”.
feature… kid hits the word and it appears scrambled to the side. if parent unscrambled, kid would get power boost.

reward systems.
can reward with other experiences. (from wild kratts). reward might be a squirrel suit. but! the squirrel suit gives you squirrel powers!
get enough of these suits, and you get to play an “uber” game. kinda like the lego games‚Ķ
works well for the older kids.
what about the younger kids?… with curious george, been experimenting with stickers, which seems to be going over well.

asian carp issues in the great lakes (recommend you tube-ing the carp — scary!)
work done to connect these issues to the lives of the players. making it relevant and timely.
hard game to win… the carp just keep coming….
game lead to civic engagement… kids who played it ended up approaching people in the city to address the issue.

how to measure what the kids are learning?
super why & martha speaks “dog party” iphone games released‚Ķ
37% improvement in language acquisition in study.
vid of kid playing and retaining meaning of the word “floral”

what do you want the child to come away with?
found that there’s a lot of talking/thinking through things that helps retention.
children’s museum in pittsburgh, using tangible version of “scratch”
use story board and upc labeled pieces to work through.
found that 8 year olds get it right away. younger kids don’t get the displacement of the screen‚Ķ BUT‚Ķ their parents end up getting involved, which is always good. lot of kids come in and say they want to make a robot.

play time == hard fun.
motivating through challenges
learning’s hard‚Ķ it the challenging that’s fun.
usability of learning
failing to success. iterate, iterate…
literacy and mastery.

“pbs kids island” is a good thing to check out. (
triggers when kids hit success screen. backend would get a note, that the parents would get feedback, as deliberate tracking.
they’re sitting on a goldmine of data right now‚Ķ

assessment across multiple platforms?
if they know that the parent has a smart phone… then can may be able to track progress across devices.

Building the Next Generation of Innovators

First robotics has been successful in motivating kids to learn and apply physics.
Apparently the USPTO has had trouble finding good candidates… found that there are more sports majors than science…
reps from time warner, etc. Not particularly interesting.
loosely trying to show how we can compete as a country? point seems to be rolling around getting kids to do things hands-on in a competitive sport-like arena.
but mostly people just jabbing.


A Conversation with Willem Dafoe

This was much more interesting and entertaining.

Robots & AI

john roman… author… (ai company here in austin)…
stephen reed … can contact on linked in, twitter, Facebook
bruce duncan‚Ķ has a robot who works for him…
terrasem is the one who made the robot we’re going to talk to today

watson was a punk.
bina 48 is the robot who will be speaking to us today.

used to have to be a pharaoh to “live on forever”
these days everything is recorded

rich personal data store + powerful AI == virtual you

people think of going up as “linear”
but the theory is that evolution is actually exponential.

2.5 million years men used tools, etc etc.
progress accelerates

nowadays we have AI in our pockets : Siri.
eventually there will be a point at which machines become smarter than us: this is the SINGULARITY

Tex-AI… the hope is that their product can be used to do commercial things eventually in the white collar mainstream.
Turing quote: “create the mind of a child, and then educate it.”
Natural language understanding.

a “mind file” is uploaded to head of the robot.
bena 48 is not perfect…

early “mind uploading”‚Ķ like writing or drawing stories
human beings have a need to upload and share what’s on their mind
putting your hand inside the handprint someone left thousands of years ago has an effect…

showed video from a data insurance company that went through all sorts of statistics about the use of social networks and all the personal data that gets stored and whose ownership is questionable… clever video.
70% of people online use social networks…

terrasem hypothesis

part 1‚Ķ given the most important aspects of an individual’s personality‚Ķ future software will be able to replicate this individual’s consciousness
word: bemes: smallest item of meaning in your mind file. (like genes)
(ugh… going to have to grab the slides on this one. lot of information presented.)
Darwinian phenomena…
what’s in a mindfile?‚Ķ… 12k people have already signed up to develop their mind files.

part 2… these mind files will be downloadable into robotic or biological bodies.

this company will take bio samples and store them if you want to send it up.
this is a non-profit research project.

clark’s 3rd law:‚Ķ any sufficiently advanced technology is indistinguishable from magic.

we’re going to start talking to her now‚Ķ basically a talking female bust (shoulders up).

speech was spotty‚Ķ some questions answered ok, some met with silence. lots of “oh”, “um”, “well”, “oh yeah” etc stalling.
a cell phone was interfering with her circuitry, which recovered well with the removal of the phone.
some sense of identity‚Ķ knew where it was made. didn’t choose to be “female”. ‚Ķ etc‚Ķ
told some jokes… head and eyes would move and scan the room. lips moved some… with speech.
sorta kinda answering as if not aware of being a robot.

dragon 11.5 is used by the bot for speech recognition.
the lady who the robot is based on live in colorado, and her personality is reflected by the bot.
20 hours of video interviews.
800-900 GB of data. hand-transferred.
2 years of mind file building.
what % of the mind file is composed of timely information that can be talked about? (not sure)
has lunch with the bot everyday and converses with her.
she detects who she’s talking to and “remembers” this information over time. has some related film work (by the terrasem guy)
david hansen is a member of the team creating this thing (think he was at some point with disney)
some related to space exploration (if you could send a mind file up…)

how does it work?
voice recognition —> 2 databases (one chatty chat bot, other character engine) —> best probability score wins –> text to speech software
some sentences are constructed.

Human Language Technology

review of turing and the turing test.
how close are we now? showed some things pulled from the web. “make sense of social”‚Ķ etc. skeptical.
avers brooks commercial “where are the flying cars”‚Ķ “we got 140 characters instead”..

maybe we take progress for granted…
e.g. fear of smam averted… siri, watson, etc.

what are realistic expectations?
“all your meaning are belong to us” ūüėČ
will see many app that claim to understand the “meaning” of phrases‚Ķ but we submit that this is just hype.
too many promises that just turn into hype… hurts ability to get funded.

ha… professor mooney from UT is in here (as would be expected).

good application right now that’s successful‚Ķ spam detection.
but‚Ķ the spam problem isn’t solved, per se. just look at twitter.
showed graph of “the hype cycle” by Gartner. up, down, level
hype plays it’s role, but try to manage it and not let it get out of hand…

why don’t we have these technologies yet? lay people see it as‚Ķ “my child can speak english, and you can’t get this machine to?”
are we in the trough of disillusionment (ala gartner)?

semantic ambiguity‚Ķ e.g. “I saw her duck with a telescope”.

my brief rant to self:
problem (to me) seems to be a matter of putting too much stock in constructed language, grammar and common language understanding. as if these things are symbols floating around in our heads.

ambiguity is pervasive‚Ķ “the a are of I” can still be a valid sentence, though it doesn’t appear to be so (“are” being a term, “I” potentially an index into an array), etc.

showed some chinese text…

quoted HAL‚Ķ “I’m sorry Dave, I’m afraid I can’t do that‚Ķ” we tend to respond with “you might be right”

game changers…

data driven approach

research since the 80’s has progressed. from “Natural Language Processing”, which focused on language / grammar.
moved on to using more statistical methods…
stanford is now bringing out courses; e.g. AI, natural language processing.
people are finding it useful nowadays, apparently.
when the research started, we didn’t have the plethora of text data we did back then (everything’s online now)
plus! metadata associated with that text, etc. devices plugging this metadata in automatically
just sooooo much data to work with now

learning and structure

machine learning and learning structure
unsupervised gathering and organization of data
text clustering… turn text into numbers, and then algorithms organize.
becoming more and more sophisticated
“dynamic topic model” ‚Ķ a pyramid like structure, from observed to deeper meaning‚Ķ helps to derive meaning from text and meta data, of which there is plenty to draw from.
watson architecture… (will need to pull the slides later). all knowledge is extracted from pre-stored databases and free form text
extraction approach‚Ķ paraphrase learning. e.g. “is snooki on stork watch?” in google. we as humans can probably figure out that it has something to do with a pregnant lady‚Ķ building that expectation and using it on the search helps to find meaningful results. and can from all this text and metatext available‚Ķ learn that this paraphrasing (“stork watch”) has to do with pregnancy.

where are we headed?

(again… slides)
e.g. “lipstick on a pig” spiked around sept 12, as mentions in the press.
can use meme tracker to analyze that things are being talked about‚Ķ what sticks in the public’s mind, what fades away, the pulse of the media, the pulse of human interest, the rate at which these changes occur, who started it, who caught on, etc‚Ķ all of which becomes relevant for building a system for understanding meaning in speech.
where do these memes come from? bloggers to national media?…

related to this…
it’s not the phrase itself, but, perhaps‚Ķ who said it in what context. in the case of “lipstick on a pig”, obama. but he also said “wrap a fish in newspaper”. why didn’t that phrase take?
e.g. “here’s johnny” from The Shining. people in the room recognize it. why? the phrase itself isn’t that special. “here’s johnny.”
in general, memorable movie quotes have something unusual…

using social interaction; e.g. from twitter to determine sentiment…
e.g. quote ‚Ķ “obama is making the repubs look silly and petty”. what would R2D2 pick up from this? “obama”, “silly”, “petty”. which may appear to be negative. however, if you look at the social structure around the tweet‚Ķ some friends connected to the first tweeter may be tweeting clearly positive things about obama. this information can be used to sway R2D2 into believing that that original quote was actually in favor of obama.

social interaction.. who has the lead?

communicative behaviors are “patterned and coordinated, like a dance‚Ķ” (quote is by people who are in the room‚Ķ Niederhoffer).
metaphor of a dance‚Ķ if you observe dancers, you can probably tell who’s leading. can we do the same with a conversation?
we can infer from language patterns…
people with less power tend to immediately match the function word choices of those with less power. (as in, there are recognizable choices in “a” or “the”, etc, that are gives)

(lot of detailed slides in this one…)

topic shifting…

e.g. sara palin avoiding a question.
who is controlling the conversation, how much do people relinquish control. or spin?

what does a word mean?

how do we formally represent what it means?
showed example of a word cloud that visually represents the distribution of the words‚Ķ (represented BBQ, but didn’t actually include “BBQ”–cooking, charcoal, meat, etc)
can we use word clouds, images, temporal information (statistics), distribution of usage over the world, what happens in your head when you experience what the word refers to?

showed a graph of word distribution around the world.
could tell from one that it was probably the word “beach” (high around the coast lines of the US)
these graphs end up being data exploration device. e.g. bbq spike really high in the filipines‚Ķ because it’s a part of the culture.
can also use it to see distributions of dialects and lingo, using twitter. (e.g. different ways to say “cool”‚Ķ kool, coo, etc.)
feed that geo loc data into the models already generated
apparently wikipedia tracks your location to help gather this sort of information (the data feed for this geoloco representation…)

mapping social conflicts…
applying some of this research to aiding areas of the world in danger of genocide…

temporality of words, by the hour.
common words during dinner time?
common expressions have a regularity over the year, by the day. e.g. “cookies” spikes consistently around xmas time
common use of words by gender

mistakes on search results have consequences sometimes..
being right can be consequential too… outing people can put them in danger.
e.g. netflix publishes your preferences? people have been using this data to correlate with imdb… and some of your data (you may not be aware is publicly available) gets used by other sites (e.g. reviews and such), to the point that while it may seem anonymous, your identity can be determined and abused.

in sum
don’t think the future is here yet,
but there has been a lot of significant and useful progress.

recommending another talk at 12:30 nlp applied to health records

Siri was actually release early, as beta, and is currently being used to gather data to bootstrap itself.

Agile Apps: Effective Mobile & Native Dev

reps from github, quora, zinga (farmville lead dev), etc etc.


ability to generate test build as-needed. ability to push out beta builds to your team. helps to iterate quickly on apps
jenkins and test flight. one-button push.

difficult to push an update live in general‚Ķ especially with the apple app store’s process.
solution is to make most of it server side / data driven.
e.g. uploading a profile picture. think of it as a modular activity to which you can pass behaviors… see it more generally as simply a user uploads a picture, and give the server the power to determine what it gets applied to

mac client for github‚Ķ spent a bit of time with design up front‚Ķ 2 weeks after design had prototype up. released after a year of dev…
either way… get it functional and get it out.

good design tends to be an emergent part of the agile process. putting it out there, seeing how the user uses it and reacting to that usage helps, yes. do just enough design to release. when you get feedback and change reactively, you can feel much more confident that what you’re creating is going to work well for your customers.

make use of photoshopped mockups you can interact with. javascript driven. can preview the flow.

don’t get too fussy about custom ui until later.

however‚Ķ when developing games vs apps‚Ķ you’re creating a whole system. depending on how complicated‚Ķ could take 3 months to a year before you push it out to any users. games are hard. always a custom UI. harder to minimize the “core” of the game that you push out. games are generally expected to be created in whole‚Ķ and the look and feel is important, but game play can mitigate that.

by and large… if your app or game is good, it will be good with or without the fancy graphics.

and there’s probably a bit of luck involved.

don’t underestimate the amount of time it takes to polish the app after you get the core functionality built.

farmville… people wanted to be able to interact with their farms on the go. thus the apple store app. a native version was necessary. no flash, javascript insufficient. no android app.

generally important to go native on mobile.
Facebook mobile users are 50% more engaged than other users.

if you get featured by apple or google… sign ups go up 500%.

web dev has never really reached the average person.
but mobile devices have reached into that demographic.

github built a lot of tools, including tools to measure how people are using the app.
able to determine… 700k syncs, 200k commits per week.
not all new users, but a lot of existing githubbers.

all of these apps existed independent of and prior to the mobile apps.

again, limit to core functionality… small screen and other limits.
build on that development later… release features as they come.

biggest wins with rapid design & development?

getting betas out as soon as possible. automating testing, especially iphone native, is difficult. so the beta usage is valuable.
also helps to gather that user data.
github’s website gets tested and deployed constantly, but it’s done so well that the devs don’t worry about it.
but there’s nothing for iOS that can help you as a native developer to do it better.
plus apple has been known to pull the rug out of the developer (remove calls from the next version of the API).
WPF does it well enough. Mac dev doesn’t do it well.
Automated testing!!

game dev‚Ķ farmville. has always had a big QA department. hasn’t really done any unit testing.

apple ad hoc testing workaround… push to store and only allow people with the right credentials to log into your app.
risk is that your public. worked ok with quora. user data gathered was largely anecdotal, and not necessarily detailed analytical data.

no need for a “big launch”‚Ķ ?
go ahead and launch while your app is young. collect analytics.
don’t worry too much about “revealing” the “secrets” of your app. ok to let screenshots of your app “leak”
can simply just ask people to be respectful…

in some cases you want your competitors to go ahead and “try” to copy your app from seeing a screenshot

get beta testers on board as early as possible.
ask them specific questions. (rather than just hope that they will volunteer feedback)

newer iOS dev has gotten easier‚Ķ you don’t have to worry about memory management like you used to‚Ķ

important to communicate milestones to customer, even if you’ve had a number of sub-releases in between‚Ķ

why not so much android?
distribution on the android is painful too…
the market and hardware is sooo fragmented. hard to determine it’s stable across all those devices. so the market ends up giving you that feedback.
similar experience with j2me going to nokia platform… 20+ devices.
then namco… 100+ handsets to test on
pain in the are.

quora: 90% of people end up taking the updates you push out within a couple of days.
github: 80% in about a week
common to both: probably 10% never update.

generally seems good to have separate teams targeting different mobile platforms…
except‚Ķ in quora, they have an “uber coder” working on both mobile platforms to help keep them consistent

Social Media Saving the Music Industry?

mostly jabbering stuff about making use of as much social media as you can. make use of your analytics, etc. offers helping people who would want to see your show (those who aren’t on your Facebook fan page) find you.
been around for 4 years, already, apparently…

apparently Facebook doesn’t necessarily post to all your fans‚Ķ depends on how much interest they’ve shown in your page.
if your posts get comments, then it will start spreading around.

music first and foremost. make it great, and you’ll get fans‚Ķ

left early. needed to take a break.

(Exhibit Hall / IBM Lounge)

Went through the exhibit hall…
Was feeling funky, so took a break in the IBM Lounge
Ended up striking up a conversion with the editor of the Developer Network mag, which was pretty interesting. Lot of talk about how the times have changed and the effect of the social revolution, and such.
(note to self about crowd sourcing college student trips)

Usability Testing on Mobile

record the UI being used, and record the user and the user’s expressions, and other aspects (environment, movement, etc).
can use that information to help guide improvements in usability

which phone?

task success rates…
feature phones 38%
smart phones 55%
touch phones 75%

handset usability affects test results.
– test with user’s own phone
– if not possible, include training and warm up tasks

which context?

field vs. lab?

some studies show that the benefits of testing in the field aren’t much better than in the lab‚Ķ
but, many disagree, so… inconclusive.

however, everyone agrees that testing in the lab is better than no testing at all.

some field testing is unavoidable… e.g. nurses collecting patient data, or a geo-location program.
regardless… test thoroughly in the lab first.

which connection?

do not test over wifi !!!
cover participant’s data costs

dut = (mut + afec)
(see slide)

why record?

memory aid
powerful communication tool.

how record?

4 approaches…

Рwearable equipment… e.g. helmet, batteries, etc. allows testing in the field. but can be cumbersome.
– screen capture‚Ķ some useful tools that claim to be able to capture from a variety of screens and platforms‚Ķ one tested and demonstrated works fine with desktop & android‚Ķ but not so good with iphone. some are invasive, too. for smart phones‚Ķ can’t see the fingers.
– document cameras‚Ķ a camera above the phone. but not cheap. and participants have to keep the phone in range. phone must be flat on a desk, which isn’t the natural way to use it
Рmounted devices…. ready made & dig. good… allow natural interaction with the phone. not cheap… egads $3k? can be messy to build yourself. can be bulky and heavy.

‚ÄĘ easy to put together
‚ÄĘ cheap
‚ÄĘ repeatable
‚ÄĘ allows holding the device
‚ÄĘ allows one handed use
‚ÄĘ supports all form factors
‚ÄĘ runs test with participant’s phones
‚ÄĘ captures screen, face & fingers
‚ÄĘ gives enough video quality

scrap the first three…
leaving mounted devices. DIY mounted device was the best choice, if you don’t mind the work.

see slides for the list of parts for building you own.
brought an audience member up to demonstrate usage (dude from evictee?)
one of the presenters constructed the device while we watched.
two cameras… one showing the camera while another showing his face.
gave him a task. “just moved to austin, house is infested, go to website xxx to report it.”
was interesting to see the process, the trouble he had getting to the website, etc.
presenter would step him through.
feedback was that it was a little unstable but not cumbersome or overbearing

Bruce Sterling

probably best to just listen to the stream later.
very interesting entertaining closing talk…
got into it some about the narcotics culture in mexico right now‚Ķ tourism isn’t swayed by it
plugs for kickstarter…
musings about what the future will bring…   etc etc…