star trek reading room: AI winter (links, URLs)

kaggle competition

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

source:

https://www.kaggle.com/competitions

https://en.wikipedia.org/wiki/Kaggle

https://www.kaggle.com/learn

https://zh.wikipedia.org/wiki/Kaggle

<---------------------------------------------------------------------------->

Mesin Belajar

Thursday, July 13, 2017

My Curated List of AI and Machine Learning Resources from Around the Web

https://mesin-belajar.blogspot.com/2017/07/my-curated-list-of-ai-and-machine.html

source:
►      https://mesin-belajar.blogspot.com/2017/07/my-curated-list-of-ai-and-machine.html
<---------------------------------------------------------------------------->

Surveying relevant emerging technologies for the Army of the future
Lessons from forecast II

contract no. MDA903-86-C-0059

Surveying relevant emerging technologies for the Army of the future
July 1988
1. united states. army--forecasting.
2. technology and state--united states.

UA23.5.S88 1988
355'.00973

p.4 (pdf 26)
Identifying future technology directions using enabling technologies

The Army's effort to identify future technology directions in Army 21 began with a listing of enabling technologies, described in the Army 21 document as those technologies that are pervasive in nature and contribute to operational improvements for many elements of the total force.2 The list of such technologies include:

   • Integrated circuits
   • Artificial intelligence
   • Materials
   • Robotics
   • Biotechnology
   • Simulation
   • Reliability
   • Space technology
   • Logistics support technology

The last three enabling technologies were added in later drafts of the Army 21 report. This is the only case that we could identify in which later version of the Army 21 document contained more that was relevant to our purposes than earlier versions.
<-------------------------------------------------------------------------->

     • "Data Science", which is the automatic (or semi-automatic) extraction of knowledge from data.;── Yann LeCun (self.MachineLearning).
     • the goal of extracting information from data.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
     • ... and thus discover something about data that will be seen in the future.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
     • All algorithms for analysis of data are designed to produce a useful summary of the data, from which decisions are made.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of massive datasets, 2010.'; http://infolab.stanford.edu/~ullman/mmds/book.pdf

► http://infolab.stanford.edu/~ullman/mmds/book.pdf
   (( checked: Thur May 13, 2021 [up] ))

(( checked: Tue Feb 1, 2022 [up] ))
<-------------------------------------------------------------------------->

Kai-Fu Lee., AI superpowers: China, Silicon Valley and the new world order, 2018

pp.6-10

A brief history of deep learning

Machine learning ── the umbrella term for the field that includes deep learning ── is a history-altering technology but that is lucky to have survived a tumultuous half-century of research. Ever since its inception, artificial intelligence has undergone a number of boom-and-bust cycles. Periods of great promise have been followed by “AI winters”, when a disappointing lack of practicl results led to major cut in funding. Understanding what makes the arrival of deep learning different requires a quick recap of how we got here.

Back in the mid-1950s, the pioneers of artificial intelligence set themselves an impossibly lofty but well-defined mission: to recreate human intelligence in a machine. That stiking combination of the clarity of the goal and the complexity of the task would draw in some of the greatest minds in the emerging field of computer science: Marvin Minsky, John McCarthy, and Herbert Simon.

As a wide-eyes computer science undergrad at Columbia University in the early 1980s, all of this seized my imagination. I was born in Taiwan in the early 1960s but moved to Tennessee at the age of 11 and finished middle and high school there. After four years at Columbia in New York, I knew that I wanted to dig deeper into AI. When applying for computer science Ph.D. programs in 1983, I even wrote this somewhat grandiose description of the field in my statement of purpose: “Artificial intelligence is the elucidation of the human learning process, the quantification of the human thinking process, the explication of human behavior, and the understanding of what makes intelligence possible. It is men's final step to understand themselves, and I hope to take part in this new, but promising science.”

That essay helped me get into the top-ranked computer science department of Carnegie Mellon University, a hotbed for cutting-edge AI research. It also displayed my naivaté about the field, both over estimating our power to understand ourselves and underestimating the power of AI to produce superhuman intelligence in narrow spheres.

By the time I began my Ph.D., the field of artificial intelligence had forked into two camps: the “rule-based” approach and the “neural networks” approach. Researchers in the rule-based camp (also sometimes called “symbolic systems” or “expert systems”) attempted to teach computers to think by encoding a series of logical rules: If X, then Y. This appproach worked well for simple and well-defined games (“toy problems”) but fell apart when the universe of possible choices or moves expanded. To make the software more applicable to real-world problems, the rule-based camp tried interviewing experts in the problems being tackled and then encoding their wisdom into the program's decision-making (hence the “expert systems” moniker).

The “neural networks” camp, however, took a different approach. Instead of trying to teach the computer the rules that had been mastered by a human brain, these practitioners tried to reconstruct the human brain itself. Given that the tangled webs of neurons in animal brains were the only thing capable of intelligence as we knewt it, the researchers figured they'd go straight to the source. This approach mimics the brain's underlying architecture, constructing layers of artificial neurons that can receive and transmit information in a structure akin to our networks of biological neurons. Unlike the rule-based approach, builders of neural networks generally do not give the networks rules to follow in making decisions. They simply feed lots and lots of examples of a given phenomenon ── pictures, chess games, sounds ── into the neural networks and let the networks themselves identify patterns within the data. In other words, the less human interference, the better.

Differences between the two approaches can be seen in how they might approach a simple problem, identifying whether there is a cat in a picture. The rule-based approach would attempt to lay down “if-then” rules to help the program make a decision: “If there are two triangular shapes on top of a circular shape, then there is probably a cat in the picture”. The neural network approach would instead feed the program millions of sample photos labeled “cat” or “no cat”, letting the program figure out for itself what features in the millions of images were most closely correlated to the “cat” label.

During the 1950s and 1960s, early versions of artificial neural networks yielded promising results and plenty of hype. But then in 1969, researchers from the rule-based camp pushed back, convincing many in the field that neural networks were unreliable and limited in their use. The neural networks approach quickly went out of fashion, and AI plunged into one of its first “winters” during the 1970s.

Over the subsequent decades, neural networks enjoyed brief stints of prominence, followed by near-total abandonement. In 1988, I used a technique akin to neural networks (Hidden Markov Models) to create Sphinx, the world's first speaker-independent program for recognizing continuous speech. That achievement landed me a profile in the New York Times. But it wasn't enough to save neural networks from once again falling out of favor, as AI reentered a prolonged ice age for most of the 1990s.

What ultimately resuscitated the field of neural networks ── and sparked the AI renaissance we are living through today ── were changes to two of the key raw ingredients that neural networks feed on, along with one major technical breakthrough. Neural networks require large amounts of two things: computing power and data. The data “trains” the program to recognize patterns by giving it many examples, and the computing power lets the program parse those examples at high speeds.

Both data and computing power were in short supply at the dawn of the field in the 1950s. But in the intervening decades, all that has changed. Today, your smartphone holds millions of times more processing power than the leading cutting-edge computers that NASA used to send Neil Armstrong to the moon in 1969. And the internet has led to an explosion of all kinds of digital data: text, images, videos, clicks, purchases, Tweets, and so on. Taken together, all of this has given researchers copious amounts of rich data on which to train their networks, as well as plenty of cheap computing power for that training.

But the networks themselves were still severely limited in what they could o. Accurate results to complex problems required many layers of artificial neurons, but researchers hadn't found a way to efficiently train those layers as they were added. Deep learning's big technical break finally arrived in the mid-2000s, when leading researcher Geoffrey Hinton discovered a way to efficiently train those new layers in neural networks. The result was like giving steroids to the old neural networks, multiplying their power to perform tasks such as speech and object recognition.

Soon, these juiced-up neural networks ── new rebranded as “deep learning” ── could outperform older models at a variety of tasks. But years of ingrained prejudice against the neural networks approach led many AI researchers to overlook this “fringe” group that claimed outstanding results. The turning point came in 2012, when a neural network built by Hinton's team demolished the competition in a international computer vision contest.

After decades spent on the margins of AI research, neural networks hit the mainstream overnight, this time in the form of deep learning. That breakthrough promised to thaw the ice from the latest AI winter, and for the first time truly bring AI's power to bear on a range of real-world problems. Researchers, futurists, and tech CEOs all began buzzing about the massive potential of the field to decipher human speech, translate documents, recognize images, predict consumer behavior, identifying fraud, make lending decisions, help robot “see”, and even drive a car.

p.10

So how does deep learning do this? Fundamentally, these algorithms use massive amounts of data from a specific domain to make a decision that optimizes for a desired outcome. It does this by training itself to recognize deeply buried patterns and correlations connecting that many data points to the desired outcome. This pattern-finding process is easier when the data is labeled with that desired outcome ─ “cat” versus “no cat”; “clicked” versus “didn't click”; “won game” versus “lost game”. It can then draw on its extensive knowledge of these correlations ─ many of which are invisible or irrelevant to hman observers ─ to make better decisions than a human could.

Doing this requires massive amounts of relevant data, a strong algorithm, a narrow domain, and a concrete goal. If you're short any one of these, things fall apart. Too little data? The algorithm doesn't have enough examples to uncover meaningful correlations. Too broad a goal? The algorithm lacks clear benchmarks to shoot for in optimization.

Deep learning is what's known as “narrow AI” ─ intelligence that takes data from one specific domain and applies it to optimizing one specific outcome. While impressive, it is still a far cry from “general AI”, the all-purpose technology that can do everything a human can.

Deep learning's most natural application is in fields like insurance and making loans. Relevant data on borrowers is abundant (credit score, income, recent credit-card usage), and the goal to optimize for is clear (minimize default rates).

pp.10-11

Take one step further, deep learning will power self-driving cars by helping them to “see” the world around them ─ recognize patterns in the camera's pixels (red octagons), figure out what they correlate to (stop signs), and use that information to make decisions (apply pressure to the brake to slowly stop) that optimize for your desired outcome (deliver me safetly home in minimal time).

p.11

deep learning

to recognize pattern,

optimize for a specific outcome,

make a decision

can be applied to so many different kinds of everyday problems.

p.11

People are so excited about deep learning precisely because its core power ─ its ability to recognize a pattern, optimize for a specific outcome, make a decision ─ can be applied to so many different kinds of everyday problems.

p.110

the fact that internet users are automatically labeling data as they browse.

p.110

traditional companies have also been automatically labeling huge quantities of data for decades. For instance, insurance companies have been covering accidents and catching fraud, banks have been issuing loans and documenting repayment rates, and hospitals have been keeping records of diagnoses and survival rates.

p.110

Business AI mines these databases for hidden correlations that often escape the naked eye and human brain.

p.110

historic decisions and outcomes within an organization and

uses labeled data to train an algorithm that can outperform even the most experienced human practitioners.

p.110

strong features

human normally make predictions on the basis of strong features, a handful of data points that are highly correlated to a specific outcome, often in a clear cause-and-effect relationship. For example, in predicting the likelihood of someone contracting diabetes, a person's weight and body mass index are strong features.

p.111

weak features

weak features: peripheral data points that might appear unrelated to the outcome but contain some predictive power when combined across tens of millinos of examples.

These subtle correlations are often impossible for any human to explain in terms of cause and effect: why do borrowers who take out loans on Wednesday repay those loans faster?

p.111

Optimizations like this work well in industries with large amounts of structured data on meaningful business outcomes. In this case, “structured” refers to data that has been categorized, labeled, and made searchable. Prime examples of well-structured corporate data sets include historic stock prices, credit-card usage, and mortgage defaults.

(AI superpowers: China, Silicon Valley and the new world order / Kai-Fu Lee.;

Boston: Houghton mifflin Harcourt, 2018; includes bibliographical references and index; subjects: artificial intelligence ── economic aspects ── china.| artificial intelligence ── economic aspects ── united states.; HC79.155 (ebook)

HC79.155 L435 2018 (print); 338.4; https://lccn.loc.gov/2018-17250; 2018, )

____________________________________

• most medical doctor are trained to look for strong features when making a diagnosis, because ...

• if you can overcome the Garbage In, Garbage Out (GIGO) problem; the machine learning (also refer to as Artificial Intelligence [AI] in mainstream articles) algorithm that has been trained to detect a specific type of cancer would look at strong and weak features ...

Kai-Fu Lee., AI superpowers: China, Silicon Valley and the new world order, 2018

pp.190-191

My first doctor classified the disease as stage IV, the cancer's most advanced stage. On average, patients with 4th-stage lymphoma of my type have around a 50 percent shot surviving the next five years. I wanted to get a second opinion before beginning treatment, and a friend of mine arranged for me to consult his family doctor, the top hematology practitioner in Taiwan.

It would be a week before I could see that doctor, and in the meantime I continued to conduct my own research on the disease.

p.190

But as a trained scientist whose life hung in the balance, I couldn't help trying to better understand the disease and quantify my chances of survival.

p.190

lymphoma: possible causes, cutting-edge treatment, and long-term survival rates. Through my reading, I came to understand how doctors classify the various stages of lymphoma.

pp.190-191

Medical textbooks use the concept of “stages” to describe how advanced cancerous tumors are, with later stages generally corresponding to lower survival rates. In lymphoma, the stage has traditionally been assigned on the basis of a few straightforward characteristics: Has the cancer affected more than one lymph node? Are the cancerous lymph nodes both above and below the diaphragm (the bottom of the rib cage)? Is the cancer found in organs outside the lymphatic system or in the patient's bone marrow? Traditionally, each answer of “yes” to one of the above questions bumps the diagnosis up a stage. The fact that my lymphoma had affected over twenty sites, had spread above and below my diaphragm, and had entered an organ outside the lymphatic system meant that I was automatically categorized as a stage IV patient.

p.191

But what I didn't know at the time of diagnosis was that this crude method of staging has more to do with what medical students can memorize than what modern medicine can cure.

p.191

Ranking stages based on such simple characteristics of a complex disease is a classic example of the human need to base decisions on “strong features”. Humanas are extremely limited in their ability to discern correlations between variables, so we look for guidance in a handful of the most obvious signifiers. In making bank loans, for example, these “strong features” include the borrower's income, the value of the home, and the credit score. In lymphoma staging, they simply include the number and location of the tumors.

p.191

These so-called strong features really don't represent the most accurate tools for making a nuanced prognosis, but they're simple enough for a medical system in which knowledge must be passed down, stored, and retrieved in the brains of human doctors.

p.191

Medical research has since identified dozens of other characteristics of lymphoma cases that make for better predictors of five-year survival in patients. But memorizing the complex correlations and precise probabilities of all these predictors is more than even the best medical students can handle. As a result, most doctors don't usually incorporate these other predictors into their own staging decisions.

p.191

In the depths of my own research, I found a research paper that did quantify the predictive power of these alternate metrics. The paper is from a team of researchers at the University Modena and Reggio Emilia in Italy, and it analyzed fifteen (15) different variables, identifying the five (5) features that, considered together, most strongly correlated to five-year survival.

pp.191-192

These features included some traditional measures (such as bone marrow involvement) but also less intuitive measures (are any tumors over 6 cm in diameter? Are hemoglobin levels below 12 grams per deciliter? Is the patient over 60?). The paper then provides average survival rates based on how many of those features a patient exhibited.

p.192

this new decision rubric still seemed far from rigorous.

But it also showed that the standard staging metrics were very poor predictors of outcomes and had been created largely to give medical students something they could easily memorize and regurgitate on their tests. The new rubric was far more data-driven, and I leaped at the chance of quantify my own illness by it.

p.192

my age, diameter of largest involved node, bone-marrow involvement, β2-microglobulin status, and hemoglobin levels. Of the five features most strongly correlated to early death, it seemed to appear that I exhibited only one.

my risk factors and survival rate.

(AI superpowers: China, Silicon Valley and the new world order / Kai-Fu Lee.;

HC79.155 L435 2018 (print); 338.4; https://lccn.loc.gov/2018-17250; 2018, )

____________________________________

HBR's 10 must reads on emotional intelligence.

1. emotional intelligence.

2. work──psychological aspects.

BF576.H394 2015

152.4─dc23

2015

What makes a Leader? 1

by Daniel Goleman

pp.8-9

Scientific inquiry strongly suggests that there is a genetic component to emotional intelligence. Psychological and developmental research indicates that nurture plays a role as well. How much of each perhaps will never be known, but research and practice clearly demonstrate that emotional intelligence can be learned.

One thing is certain: Emotional intelligence increases with age. There is an old-fashioned word for the phenomenon: maturity. Yet even with maturity, some people still need training to enhance their emotional intelligence.

Unfortunately, far too many training programs that intend to build leadership skills ── including emotional intelligence ── are a waste of time and money.

The problem is simple: They focus on the wrong part of the brain.

Emotional inteligence is born largely in the neurotransmitters of the brain's limbic system, which governs feelings, impulses, and drives. Research indicates that the limbic system learns best through motivation, extended practice, and feedback. Compare this with the kind of learning that goes on in the neocortex, which governs analytical and technical ability.

The neocortex grasps concepts and logic. It is the part of the brain that figures out how to use a computer or make a sales call by reading book. Not surprisingly ── but mistakenly ── it is also the part of the brain targeted by most training programs aimed at enhancing emotional intelligence. When such programs take, in effect, a neocortical approach, my research with the Consortium for Research on Emotional Intelligence in Organizations has shown they can even have a negative impact on people's job performance.

To enhance emotional intelligence, organizations must refocus their training to include the limbic system. They must help people break old behavioral habits and establish new ones. That not only takes much more time than conventional training programs, it also requires an individualized approach.

p.9

It's important to emphasize that building one's emotional intelligence cannot ── and will not ── happen without sincere desire and concerted effort. A brief seminar won't help; nor can one buy a how-to manual. It is much harder to learn to empathize ── to internalize empathy as a natural response to people ── than it is to become adept at regression analysis. But it can be done. “Nothing great was ever achieved without enthusiasm”, wrote Ralph Waldo Emerson. If your goal is to become a real leader, these words can serve as a guidepost in your efforts to develop high emotional intelligence.

“”─“”‘’•

• strong features and

• weak features

• for the human mammals (primate), the tendency is that most mammalian primate brain would concentrate on the few strong features (subconsciously blind to the weighted value in the basket of each strong features; in other words, we find it difficult to attached a magnitude to the strong feature; if you can, think vector := (direction, force); meaning, the feature has at least two components, a pair, not one value.

• two, the human primate has difficulties considering multiple weak features.

Why good leaders make bad decisions 59

by Andrew Campbell, Jo Whitehead, and Sydney Finkelstein

pp.59-60

Decision making lies at the heart of our personal and professional lives. Every day we make decisions. Some are small, domestic, and innocuous. Others are more important, affecting people's lives, livelihoods, and well-being. Inevitably, we make mistakes along the way. The daunting reality is that enormously important decisions made by intelligent, responsible people with the best information and intentions are sometimes hopelessly flawed.

Consider Jürgen Schrempp, CEO of Daimler-Benz. He led the merger of Chrysler and Daimler against internal opposition. Nine years later, Daimler was forced to virtually give Chrysler away in a private equity deal. Steve Russell, chief executive of Boots, the UK drugstore chain, launched a health care strategy designed to differentiate the stores from competitors and grows through new health care services such as dentistry. It turned out, though, that Boots managers did not have the skills needed to succeed in health care services, and many of these markets offered little profit potential. The strategy contributed to Russell's early departure from the top job. Brigadier General Matthew Broderick, chief of the Homeland Security Operation Center, who was responsible for alerting President Bush and other senior government officials if Hurricane Katrina breached the levees in New Orleans, when home on Monday, August 29, 2005, after reporting that they seemed to be holding, despite multiple reports of breaches.

p.60

All these executives were highly qualified for their jobs, and yet they made decisions that soon seemed clearly wrong. Why? And more important, how can we avoid making similar mistakes? This is the topic we've been exploring for the past four (4) years, and the journey has taken us deep into a field called decision neuroscience. We began by assembling a database of 83 decisions that we felt were flawed at the time they were made. From our analysis of these cases, we concluded that flawed decisions start with errors of judgment made by influential individuals. Hence we needed to understand how these errors of judgment occur.

p.60

To put all this in context, however, we first need to understand just how the human brain forms its judgments.

pp.60-64

How the Brain Trips Up

We depend primarily on two hardwired processes for decision making. Our brains assess what's going on using pattern recognition, and we react to that information ── or ignore it ── because of emotional tags that are stored in our memories. Both of these processes are normally reliable; they are part of our evolutionary advantage. But in certain circumstances, both can let us down.

Pattern recognition is a complex process that integrates information from as many as 30 different parts of the brain. Faced with a new situation, we make assumptions based on prior experiences and judgments. Thus a chess master can assess a chess game and choose a high-quality move in as little as six seconds by drawing on patterns he or she has seen before.

([ see Daniel Kahneman writing, and, talk on youtube.com about system 1, and, system 2 thinking/ decision making process ])

([ Malcolm Gladwell, Gary Klein, and an author which I do not recall, their writings with explanations about an expert or a novice who has studied a topic deeply for a period of time intensely, in contrast with someone who is learning and processing the subject for the first time, still working on getting their sea leg, with very little to not enough experiential examples and stories on the subject to develope a feel for pattern recognition; ... ])

([ see Gary Klein, Sources of power : how people make decision, 1998 ])

pp.60-61

But pattern recognition can also mislead us.

p.61

When we're dealing with seemingly familiar situations, our brains can cause us to think we understand them when we don't.

p.61

When happened to Matthew Broderick during Hurricane Katrina is instructive. Broderick had been involved in operations centers in Vietnam and in other military engagements, and he had led the Homeland Security Operations Center during previous hurricanes. These experiences had taught him that early reports surrounding a major event are often false: It's better to wait for the “ground truth” from a reliable source before acting. Unfortunately, he had no experience with a hurricane hitting a city built below sea level.

p.61

By late on August 29, some 12 hours after Katrina hit New Orleans, Broderick had received 17 reports of major flooding and levee breaches. But he also had gotten conflicting information. The Army Corps of Engineers had reported that it had no evidence of levee breaches, and a late afternoon CNN report from Bourbon Street in the French Quarter had shown city dwellers partying and claiming they had dodged the bullet. Brokerick's pattern-recognition process told him that these contrary reports were the ground truth he was looking for. So before going home for the night, he issued a situation report stating that the levees had not been breached, although he did add that further assessment would be needed the next day.

p.61

Idea in Brief

• Leaders make decisions largely through unconscious processes that neuroscientists call pattern recognition and emotional tagging. These processes usually make for quick, effective decisions, but they can be distorted by self-interest, emotional attachments, or misleading memories.

• Managers need to find systematic ways to recognize the source of bias ── what the authors call “red flag conditions” ── and then design safeguards that introduce more analysis, greater debate, or stronger governance.

• By using the approach described in this article, companies will avoid many flawed decisions that are caused by the way our brains operate.

pp.62-63

Emotional tagging is the process by which emotional information attaches itself to the thoughts and experiences stored in our memories. This emotional information tells us whether to pay attention to something or not, and it tell us what sort of action we should be contemplating (immediate or postponed, fight or flight [or freeze] [or play dead]). When the parts of the brains controlling emotions are damaged, we can see how important emotional tagging is: Neurological research shows that we become slow and incompetent decision makers even though we can retain the capacity for objective analysis.

Like pattern recognition, emotional tagging helps us reach sensible decisions most of the time. But it, too, can mislead us. Take the case of Wang Laboratories, the top company in the word-processing industry in the early 1980s.

p.63

These feelings made him reject a software platform linked to an IBM product even though the platform was provided by a third party, Microsoft.

p.63

Why doesn't the brain pick up on such errors and correct them? The most obvious reason is that much of the mental work we do is unconscious. This makes it hard to check the data and logic we use when we make a decision. Typically, we spot bugs in our personal software only when we see the results of ours errors in judgment. Matthew Broderick found out that his ground-truth rule of thumb was an inappropriate response to Hurricane Katrina only after it was too late. An Wang found out that his preference for proprietary software was flawed only after Wang's personal computer failed in the market.

p.63

Compounding the problem of high levels of unconscious thinking is the lack of checks and balances in our decision making. Our brains do not naturally follow the classical textbook model: Lay out the options, define the objectives, and assess each option against each objective. Instead, we analyze the situation using pattern recognition and arrive at a decision to act or not by using emotional tags. The two processes happen almost instantaneously.

pp.63-64

Indeed, as the research of psychologist Gary Klein shows, our brains leap to conclusions and are reluctant to consider alternatives. Moreover, we are particularly bad at revisiting our initial assessment of a situation ── our initial frame.

p.64

An exercise we frequently run at Ashridge Business School shows how hard it is to challenge the initial frame.

____________________________________

Joshua Cooper Ramo (author), The seventh sense (book), 2016

pp.276-80

Pattie Maes

p.276

When I first met her, in the 1990s, she was in charge of much of the work on artificial intelligence (AI) at MIT's Media Lab, Danny Hillis's old home.

p.276

he introduced me to a puzzle of her field that has stayed on my mind in the year since. It is called the disappearing AI problem.

p.276

Back in the 1990s, ..., Maes and her team were tinkering with what was known as computer-aided prediction.

pp.276-277

Maes intended to design a computer that could ask, for instance, what movie stars you like. “Robert Redford”, you'd type. And then the machine would spit back some films you might enjoy. The Paul Newman classic Cool Hand Luke, for instance.

p.277

And, well, you had liked that film. This seemed magic, just the sort of data-meets-human question that showcased a machin learning and thinking. An honestly artificial intelligence. Maes hoped to design a computer that could predict what moview or music or books you or I might enjoy. (And, of course, buy.)

p.277

A recommendation engine.

p.277

But to confidently bridge your knowledge of a friend's taste and the nearly endless library of movies and songs and books? Beyond human capacity. It seemed an ideal job for a thoughtful machine.

The traditional approach to such a problem was to devise a formula that would mimic your friend. What are his hobbies? What areas interest him? What cheers him up? Then you'd program a machine to jump just as deep into movies and music and books, to break them down by plot and type of character to see what might fit your friend's interests.

p.277

But after years building programs that tried ── and failed ── to tackle the recommendation problem in this fashion, the MIT group changed tack.

p.277

Instead of teaching a machine to understand you (or Tolstoy), they simply began compiling data about what movies and music and books people liked. Then they looked for patterns. People were not, they discoverd, all that unique.

p.277

Pretty much everyone who liked Redford in Downhill Racer loved Newman in The Hustler. Anyone who enjoyed Radiohead's Kid A could be directed safely to Sigur Rós's Ágaetis Byrjun.

pp.277-278

Maes and her team found themselves, as a result, less focused on the mechanics of making a machine think than on devising formulas to organize, store, and probe data.

p.278

What had begun as a problem of artificial intelligence became, in the end, a puzzle of mathematics.

p.278

The mystery of human thought, that great, unknowable sea of chemicals and instinct and experience that would have let you place your finger on just the song to open the heart of your date, had been unlocked by data. Here was the disappearing AI problem. A puzzle that looked like it needed computer intelligence demanded, in the end, merely math. The AI had disappeared.

p.278

Many problems that once seemed to demand the miracle of thought really only needed data.

p.279

You and I might be able to spot patterns in movie habits, given enough time, but as more complex problems emerge, as a world of a trillion connected points becomes a sea of data to examine, there is no chance we'll match the machines.

pp.282-283

• predictive learning (AI systems design) and

• representation learning (AI systems design)

The AI systems designer Roger Grosse has named two paths to this sort of wired sensibility: predictive learning and representation learning. That first approach is what Mae's movie machine pursued. The computer is simply checking what it encounters against a database. It teaches itself to predict based on what has been seen before. This sort of knowledge begins with massive amounts of data and then hunts for patterns, tests their reliability, and improves by mapping quirks and similarities.

p.283

Google engineers have a device that can gaze into a human eye and spot signs of impending optical failure. Is the machine smarter than your ophthalmologist? Hard to know, but let's just say this: It has seen, studied, and compared millions of eyes to find patterns that nearly perfectly predict a diagnosis. It can review in seconds more cases than your doctor will see in a lifetime ── let alone recall and compare at submillimeter accuracy. Fast, thorough predictive algorithms make what might once have been regarded as AI disappear. The machine isn't all that wise; it just knows a lot.

p.283

On the other path, the one of representation learning, the machine uses a self-sketched image of the world, a “representation”. Say you wanted a computer to identify a restaurants with outdoor seating. A predictive system might be told, Look for pictures in which a third of the pixels are sky colored. You can see how such a primitive approach might be limited. But a representation-based program would use a neural network to examine thousands of photos ── such a collection is called “training data” ── of restaurant patios. It would develop its own sense of what makes these images special: sunlight glinting off glasses, sky reflected in silverware. It would assemble, bit by bit, an accurate feeling for the features of an outdoor dining space. And over time, it could aspire to near-perfect fidelity.

p.284

Faces, disease markers, obscure sounds

p.284

Today, basic versions of representational AI can study a map and name the most important roads. They can predict cracks in computer networks days before a fault. Representation-based programs take longer to train, as you might expect. But these training times are getting shorter. And though representational AIs are harder to program ── and they demand almost unimaginable amounts of computing power ── they product a subtle, lively kind of insight.

p.284

A machine with a prediction-based understanding of classical music can listen to a clip of a symphony and name it. One with a representation-based understanding of, say, Mozart's forty-one symphonies can write you an extremely convincing forty-second symphony ── or, if you wish, an even earlier First Symphony, based on what it knowns about Mozart's evolution as a composer. It can do it again and again. In seconds.

Joshua Cooper Ramo, The seventh sense: power, fortune, and survival in the age of network, 2016.

____________________________________

Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021

• The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent [ 3% ] increase in Alexa's accuracy., p.37, Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021.

p.23

The initiative was originally designated inside Lab126 as Project D. It would come to be known as the Amazon Echo, and by the name of its virtual assistant, Alexa.

p.24, p.45

Project D, also known as ‘Amazon Alexa’, later named ‘Amazon Echo’

January 4, 2011, first email from Bezos on Project D, p.24

November 6, 2014, product launch, p.45

([

within a four year time horizon Amazon developed a voice-enable user interface, inside a real-world working product,

- development far-field speech recognition

- refine speech communication (speak and sound like natural voice)

- backoffice technical development

- developed the plan to gather enough data for the far-field speech recognition

- the heavy lifting of the speech recognition and other technical data processing happen at the data center

• The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent [3%] increase in Alexa's accuracy., p.37, Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021.

])

p.462 Index

Amazon Alexa, 26-38

AMPED and, 43-44

beta testers

Bezos's sketch for,

bug in,

as Doppler project, 26-38, 40, 42-47

Evi and, 34-36

Fire tablet and, 44

language-specific version of, 60

launch of, 44-46

name of, 32

Skills Kit, 44-46

social cue recognition in, 34-35

speech recognition in,

voice of, 27-30

voice service, 47

see also Amazon Echo

far-field speech recognition, 27-28

p.24

Greg Hart

([ in 2010, Greg Hart pointed out to Jeff Bezos that speech recognition technology was good a dictation and search; he did this by showing to Jeff, Google's voice search on an Android phone; ])

speech recognition 2010

Google's voice search, Android phone

technology was finally getting good at dictation and search

p.24

Hart remembered talking to Bezos about speech recognition one day in late 2010 at Seattle's Blue Moon Burgers. Over lunch, Hart demonstrated his enthusiasm for Google's voice search on his Android phone by saying, “pizza near me”, and then showing Bezos the list of links to nearby pizza joints that popped up on-screen. “Jeff was a little skeptical about the use of it on phones, because he thought it might be socially awkward”, Hart remembered. But they discussed how the technology was finally getting good at dictation and search.

p.24

January 4, 2011

Greg Hart,

Ian Freed, device vice president,

Steve Kessel

Amazon's HQ, Day 1 North building

p.25

voice-activated cloud computer

speaker, microphone, a mute button

Fiona, the Kindle building

p.26

One early recruit, Al Lindsay,

Al Lindsay, who in a previous job had written some of the original code for telco US West's voice-activated directory assistance. Lindsay spent his first three weeks on the project on vacation at his cottage in Canada, writing a six-page narrative that envisioned how outside developers might program their own voice-enabled apps that could run on the device.

p.26

internal recruit,

John Thimsen, director of engineering

p.26

To speed up development

Hart and his crew started looking for startups to acquire.

p.27

Yap, a twenty-person startup based in Charlotte, North Carolina, automatically translated human speech such as voicemails into text, without relying on a secret workforce of human transcribers

p.27

though much of Yap's technology would be discarded, its engineers would help develop the technology to convert what customers said into a computer-readable format.

p.27

industry conference in Florence, Italy

Amazon's newfound interest in speech technology

p.27

Jeff Adams, Yap's VP of research

two-decade veteran of the speech industry

pp.27-28

after the meeting, Adams delicately told Hart and Lindsay that their goals were unrealistic. Most experts believed that true “far-field speech recognition” ── comprehending speech from up to 32 feet away, often amid crosstalk and background noise ── was beyond the realm of established computer science, since sound bounces off surfaces like walls and ceilings, producing echoes that confuse computers.

“They basically told me, ‘We don't care. Hire more people. Take as long as it takes. Solve the problem,’” recalled Adams. “They were unflappable.”

p.28

Polish startup Ivona generated computer-synthesized speech that resembled a human voice.

Ivona was founded ìn 2001 by Lukasz Osowski, a computer science student at the Gdansk university of technology. Osowski had the notion that so-called “text-to-speech”, or TTS, could read digital texts aloud in natural voice and help the visually impaired in Poland appreciate the written word.

Michael Kaszczuk

he took recording of an actor's voice and selected fragments of words, called diphones, and then blended or “concatenated” them together in different combinations to approximate natural-sounding words and sentences that the actors might never have uttered.

p.28

While students, they paid a popular Polish actor named Jacek Labijak to record hours of speech to create a database of sounds. The result was their first product, Spiker, which quickly became the top-selling computer voice in Poland.

Over the next few years, it was used widely in subways, elevators, and for robocall campaigns.

p.29

annual Blizzard Challenge, a competition for the most natural computer voice, organized by Carnegie Mellon university.

p.29

Gdansk R&D center were put in charge of crafting Doppler's voice.

p.29

the team considered lists of characteristics they wanted in a single personality, such as trustworthiness, empathy, and warmth, and determined those traits were more commonly associated with a female voice.

pp.29-30

Atlanta area-based voice-over studio, GM Voices, the same outfit that had helped turn recording from a voice actress named Susan Bennett into Apple's agent, Siri.

p.30

To create synthetic personalities, GM Voices gave female voice actors hundreds of hours of text to read, from entire books to random articles, a mind-numbing process that could stretch on for months.

p.30

voice artist behind Alexa

professional voice-over community: Boulder-based singer and voice actress Nina Rolle.

warm timbre of Alexa's voice

Nina Rolle (Boulder-based singer and voice actress)

p.32

Bezos also suggested “Alexa”, an homage to the ancient library of Alexandria, regarded as the capital of knowledge.

p.32

[ seven omnidirectional microphones ] at the top

a cylinder elongated to create separation between the array of seven omnidirectional microphones at the top and the speakers at the bottom, with some 14 hundred holes punctured in the metal tubing to push out air and sound.

p.34

In 2012, inspired by Siri's debut, Tunstall-Pedoe pivoted and introduced the Evi app for the Apple and Android app stores. Users could ask it questions by typing or speaking. Instead of searching the web for answer like Siri, or returning a set of links, like Google's voice search, Evi evaluated the question and tried to offer an immediate answer. The app was downloaded over 250,000 times in its first week and almost crashed the company's servers.

p.34

Evi employed a programming technique called knowledge graphs, or large databases of ontologies, which connect concepts and categories in related domains. If, for example, a user asked Evi, “What is the population of Cleveland?” the software interpreted that question and knew to turn to an accompanying source of demographic data. Wired described the technique as a “giant treelike structure” of logical connections to useful facts.

Putting Evi's knowledge base inside Alexa helped with the kind of informal but culturally common chitchat called phatic speech.

p.35

Integrating Evi's technology helped Alexa respond to factual queries, such as requests to name the planets in the solar system, and it gave the impression that Alexa was smart. But was it? Proponents of another method of natural language understanding, called deep learning, believed that Evi's knowledge graphs wouldn't give Alexa the kind of authentic intelligence that would satisfy Bezos's dream oa versatile assistant that could talk to users and answer any question.

p.35

In the deep learning method, machines were fed large amounts of data about how people converse and what responses proved satisfying, and then were programmed to train themselves to predict the best answers.

p.35

The chief proponent of this approach was an Indian-born engineer named Rohit Prasad. “He was a critical hire”, said engineering director John Thimsen. “Much of the success of the project is due to the team he assembled and the research they did on far-field speech recognition.”

p.35

BBB Technologies (later acquired by Raytheon)

Cambridge, Massachusetts-based defense contractor

At BBN, he [Rohit Prasad] worked on one of the first in-car speech recognition systems and automated directory assistance services for telephone companies.

p.37

For years, Google also collected speech data from a toll-free directory assistance line, 800-GOOG-411.

p.37

Hart, Prasad, and their team created graphs that projected how Alexa would improve as data collection progressed. The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent increase in Alexa's accuracy.

• The math suggested they would need to roughly double the scale of their data collection efforts to achieve each successive 3 percent increase in Alexa's accuracy., p.37, Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 2021.

p.37

“How will we even know when this product is good?”

early 2013

p.38

“First tell me what would be a magical product, then tell me how to get there.”

p.38

Bezos's technical advisor at the time, Dilip Kumar,

p.38

they would need thousands of more hours of complex, far-field voice commands.

p.38

Bezos apparently factored in the request to increase the number of speech scientists and did the calculation in his head in a few seconds.

“Let me get this straight. You are telling me that for your big request to make this product successful, instead of it taking forty years, it will only take us twenty?”

p.42

the resulting program, conceived by Rohit Prasad and speech scientist Janet Slifka over a few days in the spring of 2013

p.42

Rohit Prasad and speech scientist Janet Slifka

spring of 2013

p.42

answer a question that later vexed speech experts ──

how did Amazon come out of nowhere to leapfrog Google and Apple in the race to build a speech-enabled virtual assistant?

pp.42-43

internally the program was called AMPED

Amazon contracted with an Australian data collection firm, Appen, and went on the road with Alexa, in disguise.

p.43

Appen rented homes and apartments, initially in Boston, and then Amazon littered several rooms with all kinds of “decoy” devices: pedestal microphones, Xbox gaming consoles, televisions, and tablets. There were also some twenty Alexa devices planted around the rooms at different heights, each shrouded in an acoustic fabric that hid them from view but allowed sound to pass through.

p.43

Appen then contracted with a temp agency, and a stream of contract workers filtered through the properties, eight hours a day, six days a week, reading scripts from an iPad with canned lines and open-ended request

p.43

The speakers were turned off, so that Alexa didn't make a peep, but the seven microphones on each device captured everything and streamed the audio to Amazon's servers. Then another army of workers manually reviewed the recordings and annotated the transcripts, classifying queries that might stump a machine,

p.43

so that next time, Alexa would know.

p.43

The Boston test showed promise, so Amazon expanded the program, renting more homes and apartments in Seattle and ten other cities over the next six months to capture the voices and speech patterns of thousands more paid volunteers. It was a mushroom-cloud explosion of data about device placement, acoustic environments, background noise, regional accents, and all the gloriously random ways a human being might phrase a simple request to hear the weather, for example, or play a Justin

p.44

by 2012

multimillion-dollar cost.

p.44

By 2014, it has increased its store of speech data by a factor of ten thousand and largely closed the gap with rivals like Apple and Google.

p.47

over the next few months, Amazon would roll out the Alexa Skills Kit, which allowed other companies to build voice-enabled apps for the Echo, and Alexa Voice Service, which let the makers of products like lightbulbs and alarm clocks integrate Alexa into their own devices.

p.47

a smaller, cheaper version of Echo, the hockey puck-sized Echo Dot,

a portable version with batteries, the Amazon Tap.

Echo

Echo dot

Amazon Tap (a portable batteries version of Echo)

─“”‘’•

p.24

January 4, 2011

p.45

November 6, 2014

Brad Stone, Amazon unbound: Jeff Bezos and the invention of a global empire, 202

____________________________________

• modeling of abnormal distributions was a problem largely unsolved in mathematics., pp.104-105, Sebastian Mallaby., More money than god : hedge funds and the making of a new elite, 2010.

• Even in the early 1960s, a maverick mathematician named Benoit Mandelbrot argued that the tails of the distribution might be fatter than the normal bell curve assumed; and Eugene Fama, the father of efficient-market theory, who got to know Mandelbrot at the time, conducted tests on stock-price changes that confirmed Mandelbrot's assertion., pp.104-105, Sebastian Mallaby., More money than god : hedge funds and the making of a new elite, 2010.

• The trouble with [Benoit] Mandelbrot's insight was that it was too awkward to live with; it rendered the statistical tools of financial economics useless, since the modeling of abnormal distributions was a problem largely unsolved in mathematics., pp.104-105, Sebastian Mallaby., More money than god : hedge funds and the making of a new elite, 2010.

• the ... hypothesis did not apply to moments of crisis., p.106, Sebastian Mallaby., More money than god : hedge funds and the making of a new elite, 2010.

• [models] do not work in crisis; rather, the models stop working.

- Self-organized criticality [SOC] might provide a possible perspective and thinking tool kit in moments of crisis

- ...

- Per Bak, How nature works, 1996 (book)

- see 1987 Per Bak, Chao Tang, and Kurt Weisenfeld

Sebastian Mallaby., More money than god : hedge funds and the making of a new elite, 2010.

• modeling of abnormal distributions was a problem largely unsolved in mathematics.

• the ... hypothesis did not apply to moments of crisis.

pp.104-105

The efficient-market hypothesis had always been based on a precarious assumption: the price changes conformed to a “normal” probability distribution ── the one represented by the familiar bell curve, in which numbers at and near the median crop up frequently while numbers in the tails distribution are rare to the point of vanishing. Even in the early 1960s, a maverick mathematician named Benoit Mandelbrot argued that the tails of the distribution might be fatter than the normal bell curve assumed; and Eugene Fama, the father of efficient-market theory, who got to know Mandelbrot at the time, conducted tests on stock-price changes that confirmed Mandelbrot's assertion. If price changes had been normally distributed, jumps greater than five standard deviations should have shown up in daily price data about once every 7,000 years. Instead, they cropped up about once every three to four years.

Having made this discovery, Fama and his colleagues buried it. The trouble with Mandelbrot's insight was that it was too awkward to live with; it rendered the statistical tools of financial economics useless, since the modeling of abnormal distributions was a problem largely unsolved in mathematics.

p.105

Paul Cootner, complained that “Mandelbrot, like Prime Minister Churchill before him, promises us not utopia but blood, sweat, toil and tears. If he is right, almost all of our statistical tools are obsolete ── least squares, spectral analysis, workable maximum-likelihood solutions, all our established sample theory, closed distribution functions. Almost without exception, past econometric work is meaningless.”66

p.105

To prevent itself from toppling into this intellectual abyss, the economics profession kept its eyes trained the other way, especially since the mathematics of normal distributions was generating stunning breakthroughs.

p.105

In 1973 a trio of economists produced a revolutionary method for valuing options, and a thrilling new financial industry was born. Mandelbrot's objections were brushed off.

p.105

The crash of 1987 forced the economics profession to reexamine that assertion.

p.105

To put that probability into perspective, it meant that an event such as the crash would not be anticipated to occur even if the stock market were to remain open for twenty billion years, the upper end of the expected duration of the universe,

p.106

As well as challenging the statistical foundation of financial economists' thinking, Black Monday forced a reconsideration of their institutional assumptions.

p.106

In the chaos of the market meltdown, brokers' phone lines were jammed with calls from panicking sellers; it was hard to get through and place an order.

p.106

And, most important, the sheer weight of selling made it too risky to go against the trend. When the whole world is selling, it doesn't matter whether sophisticated hedge funds believe that prices have fallen too far. Buying is crazy.

At a minimum, it seemed, the efficient-market hypothesis did not apply to moments of crisis.

pp.106-107

But the crash raised a further question too: If markets were efficient, why had the equity bubble inflated in the first place? Again, the answer seemed to lie partly in the institutional obstacles faced by speculators. In the summer of 1987, investors could see plainly that stocks were selling for higher multiples of corporate earnings than they had historically; but if the market was determined to value them that way, it would cost money to buck it.

(More money than god : hedge funds and the making of a new elite / Sebastian Mallaby., 1. hedge funds., 2. investment advisors., HG4530.M249 2010, 332.64'524──dc22, 2010, )

____________________________________

• [models] do not work in crisis; rather, the models stop working.

- Self-organized criticality [SOC] might provide a possible perspective and thinking tool kit in moments of crisis

- ...

- Per Bak, How nature works, 1996 (book)

- see 1987 Per Bak, Chao Tang, and Kurt Weisenfeld

Nassim Nicholas Taleb, Fooled by Randomness, 2nd edition, hardcover, 2004 [ ]

ergodicity, 57-58, 96, 156-57, 254

p.96

on average, animal will be fit, but not every single one of them, and not at all times.

Just as an animal could have survived because its sample path was lucky, the “best” operators who survived because of overfitness to a sample path ── a sample path that was free of the evolutionary rare event.

One vicious attribute is that the longer these animals can go without encountering the rare event, the more vulnerable they will be to it.

We said that should one extend time to infinity, then, by ergodicity, that event will happen with certainty ── the species will be wipe out!

For evolution means fitness to one and only one time series, not the average of all the possible environments.

(Taleb, Nassim (2004)., Fooled by Randomness, 2nd edition, hardcover)

(Fooled by Randomness: the hidden role of chance in life and in the markets / Nassim Nicholas Taleb, 1. investments, 2. chance, 3. random variables, 123.3 Taleb, )

<-------------------------------------------------------------------------->

pp.439-440 (pdf page: 457/513)

Many algorithms are today classified as “machine learning”. These algorithms share, with the other algorithms studied in this book, the goal of extracting information from data. All algorithms for analysis of data are designed to produce a useful summary of the data, from which decisions are made. Among many examples, the frequent-itemset analysis that we did in Chapter 6 produces information like association rules, which can then be used for planning a sales strategy or for many other purposes.

However, algorithms called “machine learning” not only summarize our data; they are perceived as learning a model or classifier from the data, and thus discover something about data that will be seen in the future. For instance, the clustering algorithms discussed in Chapter 7 produce clusters that not only tell us something about the data being analyzed (the training set), but they allow us to classify future data into one of the clusters that result from the clustering algorithm. Thus, machine-learning enthusiasts often speak of clustering with the neologism “unsupervised learning”; the term unsupervised refers to the fact that the input data does not tell the clustering algorithm what the clusters should be. In supervised machine learning, which is the subject of this chapter, the available data includes information about the correct way to classify at least some of the data. The data classified already is called the training set.

In this chapter, we do not attempt to cover all the different approaches to machine learning. We concentrate on methods that are suitable for very large data and that have the potential for parallel implementation. We consider the classical “perceptron” approach to learning a data classifier, where a hyperplane that separates two classes is sought. Then, we look at more modern techniques involving support-vector machines. Similar to perceptrons, these methods look for hyperplanes that best divide the classes, so that few, if any, members of the training set lie close to the hyperplane. We end with a discussion of nearest-neighbor techniques, where data is classified according to the class(es) of their nearest neighbors in some space.

filename: Mining of massive datasets.pdf

Chapter 12

Large-scale machine learning

<-------------------------------------------------------------------------->

The Department of Defense’s Looming AI Winter

Marc Losito and John Anderson
May 10, 2021

Commentary

frenzy for AI
► http://www.foreignaffairs.com/articles/united-states/2021-04-06/perils-overhyping-artificial-intelligence

development of AI
► https://warontherocks.com/2020/06/the-humble-task-of-implementation-is-the-key-to-ai-dominance/

employ AI
► https://www.ai.mil/docs/2020_DoD_AI_Training_and_Education_Strategy_and_Infographic_10_27_20.pdf

facial recognition
► https://www.historyofinformation.com/detail.php?id=2126

language translation
► https://www.semanticscholar.org/paper/Machine-Translation%3A-Past%2C-Present%2C-Future-Ward/ff333f66c54d827b7ed4a1d0bf690e2ec3570e39

target detection
► https://ai.stanford.edu/~nilsson/QAI/qai.pdf

“a direct and apparent relationship to a specific military function.”

The United Kingdom’s 1973 Lighthill Report highlighted the “pronounced feeling of disappointment” for AI research over the preceding 25 years.
► http://www.chilton-computing.org.uk/inf/literature/reports/lighthill_report/p001.htm

“Fifth Generation Project”
► https://www.nytimes.com/1992/06/05/business/fifth-generation-became-japan-s-lost-generation.html

In October 1983, Defense Advanced Research Projects Agency launched the Strategic Computing Initiative
► https://warontherocks.com/2020/05/cautionary-tale-on-ambitious-feats-of-ai-the-strategic-computing-program/

1985 Packard Commission
► https://assets.documentcloud.org/documents/2695411/Packard-Commission.pdf

In 1993, the Strategic Computing Initiative closed up shop
► https://aiws.net/the-history-of-ai/aiws-house/this-week-in-the-history-of-ai-at-aiws-net-darpa-ends-the-strategic-computing-initiative-in-1993/

The Four Horsemen of the Department of Defense’s Looming AI Winter

• a lack of AI expertise,
   ٠ AI expertise (lacking)
• too much AI bureaucracy,
   ٠ bureaucracy (too much)
• the challenge of democratizing AI to the user level, and
   ٠ get AI to the user (tight feed back loop)
   ٠ let real world problems (situation) drive the use cases
      · practical scenarios about product usefulness
         • detective case file
         • unsolved case files
         •
      · Use cases can be valuable tools for understanding a specific system's ability to meet the needs of end users. When designing software or a system, enhance your development efforts by thinking through practical scenarios about product usefulness.
      · Simply put, a use case is a description of all the ways an end-user wants to "use" a system. These "uses" are like requests of the system, and use cases describe what that system does in response to such requests. In other words, use cases describe the conversation between a system and its user (s), known as actors.
   ٠ hire a milk shake business story
      · Theodore Levitt: "People don't want to buy a quarter-inch drill. They want a quarter-inch hole."
      · https://duckduckgo.com/?q=hire+a+milk+shake+business+story&t=ffsb&ia=web
      · if the military hire an AI system, what is the job?
      · what job is the field commander trying to address?
   ٠ does it address the pain point(s)
   ٠ lesson learned (if failure, what did we learn from this)
      · after action review (AAR - internal - redacted for public release)
      · after action review (AAR - consolidated - is there a pattern)
         • how are we shooting ourselves - in some ways - each time.
• an old-fashioned approach to AI integration.
   ٠ integration (AI: liken to a tool, in the kit)
•
   ٠
      ·
         •

Domain Expertise: The Education Challenge
► https://warontherocks.com/2020/01/the-ai-literacy-gap-hobbling-american-officialdom/

AI Smartcards
► https://smallwarsjournal.com/jrnl/art/commanders-ai-smartcard-artificial-intelligence-commanders-business

slick briefings full of drive-by tech-splanations of AI capability
► https://www.urbandictionary.com/define.php?term=techsplain

Department of Defense bureaucracies have a way of calcifying around programs and people, not actual capabilities.
► https://www.gao.gov/assets/gao-18-592.pdf

shrink centralized groups like the Joint AI Center and the Army Futures Command AI Task Force and instead send their authorities, decision-making power, and resources down to operational units.
► https://www.c4isrnet.com/artificial-intelligence/2020/06/10/the-army-ai-task-force-takes-on-two-key-projects/

► https://www.ai.mil/

AI centers of excellence should be creating processes, policies, and resourcing to facilitate a constant feedback loop between user and developer, with no-one in between to garble the message. They should exist as idea-centric organizations at the service level that cut across the warfighting functions, providing the tools, lessons learned, resources, and expertise to help commanders to operationalize AI. The bureaucracy should connect intent, constraints and restraints, and resources — and then get out of the way.

The Department of Defense’s managing philosophy with respect to AI bureaucracy should mirror the computer programming concept called the “self-deleting executable.” It is a string of code designed to allow a program to delete itself.
► https://www.codeproject.com/Articles/17052/Self-Deleting-Executables

The Department of Defense’s managing philosophy with respect to AI bureaucracy should mirror the computer programming concept called the “self-deleting executable.” It is a string of code designed to allow a program to delete itself.

think in terms of creating an organization and then setting an egg timer for it to expand and contract, responsive to a direct user-developer feedback loop. The AI bureaucracy only survives as a platform to connect users with development talent, contracting, and computing power. It should shrink, not expand, over time.

The Team Room Challenge

team room — where planning turns to action — and ask battlefield operators, “What is one thing you can do better today than yesterday because of AI?” the responses are not encouraging.

get AI to the warfighters on the ground who actually employ these capabilities. AI only works for the military when it’s unleashed against a real problem defined by warfighters, not bureaucrats.

ensures users drive capability development from the field, not the Pentagon.
“field to learn”
► https://mwi.usma.edu/big-data-at-war-special-operations-forces-project-maven-and-twenty-first-century-warfare/

what the department’s leaders have been saying about AI over the past few years
► https://www.c4isrnet.com/intel-geoint/2018/02/17/ai-makes-mattis-question-fundamental-beliefs-about-war/

AI’s potential impact
► https://www.executivegov.com/2021/03/lt-gen-michael-groen-says-dod-must-avoid-ai-obsolescence/

how much money is being thrown at program after program, AI still hasn’t made it into the team room to empower actual warfighters.
► https://www.defenseone.com/ideas/2021/03/solid-plan-compete-china-artificial-intelligence/172360/

Embed AI developers down to warfighting components to ensure that use cases are driven by users and firmly understood by developers. This will guarantee a tight feedback loop to ensure that AI capabilities are hyper-relevant to real, viable needs and that development doesn’t stop when an obscure, ill-conceived requirements list developed at a separate command is deemed “complete.”

the department should start thinking about cases where AI will actually increase the warfighting advantage. If the Department of Defense’s AI capabilities fail this time it will not be because of flaws with the actual technology, but because the use cases were defined in the halls of the Pentagon rather than in the team room by the actual service members.

Integration: The Iceberg Challenge

Department of Defense’s initial glimpse of AI led it to believe it could develop or buy AI on the cheap
► https://www.defense.gov/Explore/News/Article/Article/1254719/project-maven-to-deploy-computer-algorithms-to-war-zone-by-years-end/

integrate it with existing programs of record as a bolt-on
► https://breakingdefense.com/2021/04/army-artillerys-ai-gets-live-fire-exercises-in-europe-pacific/

the Air Force experienced some initial success with integrating AI into the Distributed Common Ground System, but then realized the one-by-one “AI decision aids” are “single point demos”
one-by-one “AI decision aids” are “single point demos”
► https://breakingdefense.com/2021/03/some-quick-wins-but-air-force-struggles-with-ai/

purchased commercial off-the-shelf tools without going through the hassle of dealing with the bureaucracy.
► https://www.wired.com/2012/11/no-spy-software-scandal-here-army-claims/

The department’s development and acquisition machine — programs of record, program executive offices, requirements specialists, acquisition professionals — never got the message of how truly rotten and eroded the warfighter’s underlying digital foundation had become.

the same programs that have failed in the past now assume “we too can do this,” rolling out buzzwords like “AI-ready” or “AI-enabled” — and asking for more money to build their own AI on the cheap, anchoring their cost estimates and requirements on the same failed legacy programs.
► https://www.ai.mil/blog_06_18_20-a_roadmap_to_getting_ai_ready.html
► https://www.c4isrnet.com/artificial-intelligence/2020/09/25/the-army-just-conducted-a-massive-test-of-its-battlefield-artificial-intelligence-in-the-desert/

it’s difficult to distinguish between “drive-by AI” and real, enduring capabilities. The Department of Defense is falling for the former. Drive-by AI is best characterized as one-off pilot projects without a robust AI pipeline — data, data labeling, computational power, algorithm development, test and evaluation — integrated on legacy tools that don’t allow users to continuously feed back existing and new use cases for the AI.

Former Secretary of Defense Mark Esper, in remarks on the National Security Commission on AI
► https://www.defense.gov/Newsroom/Transcripts/Transcript/Article/2011960/remarks-by-secretary-esper-at-national-security-commission-on-artificial-intell/

The numbers around legacy IT systems in the U.S. government are shocking. The Government Accountability Office estimates that 80 percent of a $100 billion annual government-wide IT budget supports legacy tools, including a critical Department of Defense “maintenance system that supports wartime readiness, among other things.”
Government Accountability Office estimates
► https://www.gao.gov/assets/gao-21-524t.pdf

Lloyd Austin, Secretary of Defense
initial guidance to the Department of Defense on March 4, 2021, telling the workforce, “Where necessary, we will divest of legacy systems and programs that no longer meet our security needs.”
► https://media.defense.gov/2021/Mar/04/2002593656/-1/-1/0/SECRETARY-LLOYD-J-AUSTIN-III-MESSAGE-TO-THE-FORCE.PDF

Marc Losito
► http://www.linkedin.com/in/mlositompp22

John Anderson
► http://www.linkedin.com/in/john-anderson-a7a50738/

source:
►      https://warontherocks.com/2021/05/the-department-of-defenses-looming-ai-winter/
<---------------------------------------------------------------------------->
<---------------------------------------------------------------------------->

The Humble Task of Implementation is the Key to AI Dominance
Matthew Cook
June 29, 2020

Commentary

Qianlong Emperor in a letter to King George III, 1793
► https://china.usc.edu/emperor-qianlong-letter-george-iii-1793

The reality is that the data is either mired in legacy weapon systems, with proprietary code restricting access to would-be developers, or it is not thoroughly cleaned, organized, and wrangled (transforming the data format to another useable format) for a specific use case.
► https://warontherocks.com/2020/02/the-input-output-problem-managing-the-militarys-big-data-in-the-age-of-ai/

replicate the DevSecOps ecosystems commercial developers use so there is no seam between the government solution and the best commercial AI tools.
► https://warontherocks.com/2020/02/the-abcs-of-ai-enabled-intelligence-analysis/

We were not allowed to bring in the latest versions of vendor-preferred tools like PyTorch, which was vital to our model development. Instead, we were forced to make our vendors use an older version, since that was the only information assurance-approved tool suite available. By developing an ecosystem with updated, industry-standard containers, the Defense Department can architect various sandboxes or domains where developers can have access to the tools they need to develop models.
► https://pytorch.org/

the team inspired by the Strangler pattern philosophy that corporations like Google and now the U.S. Air Force recognize as a best practice. The Strangler pattern can be a useful approach for the Defense Department because it edges out an old system while replacing it with a new system.
► https://www.michielrook.nl/2016/11/strangler-pattern-practice/

The Answer to AI Implementation: Joint Common Foundation

sluggish pace of system upgrades in the Defense Department
► https://www.defensenews.com/pentagon/2017/12/08/heres-how-ellen-lord-will-reduce-acquisition-time-by-50-percent/

cloud computing experts
► https://warontherocks.com/2020/02/how-to-actually-recruit-talent-for-the-ai-challenge/

highlight its unique mission while simultaneously creating a familiar development environment for the cloud engineers.
► https://www.nextgov.com/emerging-tech/2020/06/3-ways-hire-more-tech-talent/166239/

Today, the defense acquisition system is so difficult to navigate that it makes little business sense for the best AI companies to enter the defense market.
► https://www.defenseone.com/technology/2018/07/us-air-force-wants-more-commercial-companies-working-ai-projects/149803/

should favor a licensing or royalty approach to AI development
► https://www.dau.edu/cop/mosa/DAU%20Sponsored%20Documents/Data%20Rights%20Focus%20Sheet%20final.pdf

techniques are rapidly developed and fielded
► http://arxiv.aiindex.org/search (had problem w/ the URL; Thur May 13, 2021)
► https://aiindex.stanford.edu/

When it comes to AI, we should consider the lessons learned regarding data management and parameter tuning — inherent in the skillset of the AI developer’s workforce — as the true value proposition to the government.
► https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview

leverage acquisition tools like the Commercial Solutions Opening, Other Transaction Authority
► https://warontherocks.com/2020/03/what-can-the-picatinny-rail-teach-us-about-artificial-intelligence/

Acquisition Pathway Interim Policy and Procedures
► https://www.acq.osd.mil/ae/assets/docs/USA002825-19%20Signed%20Memo%20(Software).pdf

Many say that the Defense Department must move faster.
► https://www.govexec.com/feature/slow-and-steady-losing-defense-acquisition-race/

new technology frontiers
► https://www.wired.com/story/why-china-can-do-ai-more-quickly-and-effectively-than-the-us/

shift defense acquisition to be more rapid, affordable, modular, and operationally relevant
► https://insights.sei.cmu.edu/sei_blog/2019/04/-the-organizational-impact-of-a-modular-product-line-architecture-in-dod-acquisition-third-in-a-seri.html

Maj. Matthew Cook is an active duty Air Force acquisition and intelligence officer currently stationed at the Joint Artificial Intelligence Center, Washington, D.C.

source:
        development of AI
►         https://warontherocks.com/2020/06/the-humble-task-of-implementation-is-the-key-to-ai-dominance/
<---------------------------------------------------------------------------->

The Input-Output Problem: Managing the Military’s Big Data in the Age of AI
David Zelaya and Nicholas Keeley

February 13, 2020

Special Series - AI and National Security

Editor’s Note: This article was submitted in response to the call for ideas issued by the co-chairs of the National Security Commission on Artificial Intelligence, Eric Schmidt and Robert Work. It addresses the third question (part b.) on how data should be collected, stored, protected, and shared.

In terms of process, military organizations are bound by constraints derived from the time it takes to formally request support from the various military branches. As outlined in Joint Publication 3-60, the joint targeting process pulls data from across the joint force as inputs to create a list of agreed upon targets each of the different components thinks is important (e.g., a priority target list). That list is made of targets that commanders seek to affect in some way with either missiles, intelligence, or myriad other tools. The targeting process is generally set to a time horizon of 96 hours from execution of an operation. This 96-hour time horizon is derived from time requirements levied against all the branches of the joint force by the air tasking cycle. The air tasking cycle plays a central role in the targeting process because it aligns and synchronizes a large portion of joint assets against the priority target list.

Joint Publication 3-60, the joint targeting process
► https://www.justsecurity.org/wp-content/uploads/2015/06/Joint_Chiefs-Joint_Targeting_20130131.pdf

As a result of the demand for unified command across all military branches, the air tasking cycle and joint targeting process bind all tiers of the joint force from the joint task force commander (a four-star general or admiral at the highest levels) down to the battalion commander (in the case of the Army) to a 96-hour data processing limit. If an organization plans sequentially, it takes too long and misses the window to request joint support. If an organization plans in parallel, it is most certainly cutting corners, making assumptions and increasing the risk of poorly allocating scarce resources. In either case, the inability to process requests and data undercuts the ability of the unit to accomplish its objective.

Technical “Building Blocks” of Autonomous Systems

Artificial intelligence is uniquely positioned to address the dilemma presented by the input-output problem. Its greatest strength is its superior ability to handle vast amounts of data that would overwhelm human military staffs.

Defense Science Board’s study suggests that “given the limitations of human abilities to rapidly process the vast amounts of data available today, autonomous systems are now required to find trends and analyze patterns.”
► https://www.hsdl.org/?view&did=794641

(autonomy at rest)
► https://www.hsdl.org/?view&did=794641

(autonomy in motion)
► https://www.hsdl.org/?view&did=794641

basic process
      While several variations of artificial intelligence do exist, autonomous systems all follow the same basic process: They collect data, process data, and generate an action. The two critical building blocks for this discussion are the data collection and data processing mechanisms.
► https://www.bcg.com/publications/2017/technology-digital-strategy-putting-artificial-intelligence-work.aspx

Data Collection Capacity

The most fundamental requirement of a functioning autonomous system is abundantly available data. These data pools — regardless of whether they are shoe purchases or missile strikes — drive the machine-learning process. Assimilating large amounts of labeled data allows programmers to “train” their systems to find minute correlations between inputs and resultant outputs. Examples include speech recognition (think Apple’s Siri or Amazon’s Alexa), natural language processing (think chatbots), or machine vision (think Google’s self-driving car). The exact quantity and type of data required naturally depends on the complexity of the task at hand. However, the magnitude tends to be robust, evidenced by multiple contemporary examples like Google’s self-driving car, which alone requires approximately 1 GB of sensorial data per second, derived from thousands of video streams and images.

barrier to entry
► https://www.bcg.com/en-us/publications/2017/technology-digital-strategy-putting-artificial-intelligence-work.aspx

Military planning methodologies
Military planning methodologies mirror the basic process
► https://armypubs.army.mil/epubs/DR_pubs/DR_a/pdf/web/ARN18323_ADP%205-0%20FINAL%20WEB.pdf

Boston Consulting Group claims “operational practices and processes are naturally suited for AI [artificial intelligence]. They often have similar routines or steps, generate a wealth of data, and produce measurable outputs.
► https://www.bcg.com/en-us/publications/2017/technology-digital-strategy-putting-artificial-intelligence-work.aspx

Data Processing Speed

identify fraudulent transactions within 10 milliseconds
► https://www.hsdl.org/?view&did=794641

IBM autonomous system providing oncologists with recommended
► https://www.hsdl.org/?view&did=794641

According to the Defense Science Board, the joint air tasking process is currently “heavily manual,” with up to 40–50 people involved in target data input, mission planning, and resource allocation planning.
► https://www.hsdl.org/?view&did=794641

Think Faster or Buy More Time

“autonomy at rest” — the softer and less elegant applications of artificial intelligence that revolve around data compilation, analysis, and a solution in the virtual realm — may be a more important contribution to the future of the U.S. military.
► https://www.hsdl.org/?view&did=794641

The technical foundations of artificial intelligence suggest applications in the latter category possess far more innovative potential — particularly in military decision-making processes.

David Zelaya is a U.S. Army officer commissioned from the University of Maryland’s ROTC program with a B.A. in Government and Politics. He is currently serving a tour in the Indo-Pacific at Fort Shafter, Hawaii.

Nicholas Keeley is a U.S. Army officer commissioned from Princeton University’s ROTC program with a B.A. in East Asian Studies. He is currently serving a tour in the Indo-Pacific and is stationed at Schofield Barracks, Hawaii.

source:
►      https://warontherocks.com/2020/02/the-input-output-problem-managing-the-militarys-big-data-in-the-age-of-ai/
<---------------------------------------------------------------------------->

The ABCs of AI-Enabled Intelligence Analysis
Iain J. Cruickshank
February 14, 2020

Special Series - AI and National Security

Editor’s Note: This article was submitted in response to the call for ideas issued by the co-chairs of the National Security Commission on Artificial Intelligence, Eric Schmidt and Robert Work. It addresses the third question (part a.) on the types of artificial intelligence research the national security community should be doing.

increasingly useful for intelligence.
► https://ndupress.ndu.edu/Media/News/Article/621113/defense-intelligence-analysis-in-the-age-of-big-data/

This change cannot simply be the acquisition of some new analysis software or implementation of a new policy, but rather must be more comprehensive changes across all military intelligence organizations. To meet the new realities of the information environment, and by corollary the new realities of intelligence analysis, the whole of military intelligence needs to modernize in three areas.

data-centric systems
► http://tdan.com/the-data-centric-revolution-data-centric-vs-data-driven/20288

Joint Artificial Intelligence Center
► https://www.defense.gov/Explore/News/Article/Article/2002304/chief-information-officer-touts-technological-progress/

Special Operations Command
► https://www.nationaldefensemagazine.org/articles/2018/7/10/algorithmic-warfare-special-operations-command-exploring-ai-tech

acquisition of some new analysis software
object identification in imagery
► https://techcrunch.com/2019/03/27/palantir-army-contract-dcgs-a/

Why does this matter? Imagine having a computer program that does a great job at detecting military vehicles from satellite imagery. Then the organization’s mission changes to counter-insurgency, and it now needs a program to detect individuals carrying weapons on foot.

(e.g. none of the algorithms or computational tools can be modified for different data scenarios).

There are two key concepts to any data-centric system: First, analysis tools and applications should change with the data, and second, data should be easily accessible. Analysts must be able to configure the tools and algorithms of the systems to meet the realities of the battlefield, and data access should be as seamless as possible.

Within a data-centric context, the use of machine learning algorithms has led to breakthroughs in nearly every analysis endeavor, from fraud detection to image identification.

nearly all analysis software products in use today — including advanced systems like Palantir or Analyst Notebook — are closed systems that do not allow analysts to code custom algorithms, use the latest machine-learning algorithms, use the latest research in “explainable AI,” or even allow analysts to provide feedback to the software’s algorithms.

“explainable AI”
► https://www.darpa.mil/program/explainable-artificial-intelligence

there will never be a particular algorithm or set of data that will always work to produce the best battlefield intelligence. (In fact, this a direct result of the foundational “No Free Lunch theorem.”)
“No Free Lunch theorem.”
► https://medium.com/@LeonFedden/the-no-free-lunch-theorem-62ae2c3ed10c

proposals
► https://admin.govexec.com/media/scan-to-me_from_10.3.5.20_2018-06-08_155522.pdf

Information Storage

Managing Analysts

regular expressions
► https://www.regular-expressions.info/quickstart.html

Adversarial machine learning
► https://openai.com/blog/adversarial-example-research/

surveillance and machine learning for intelligence purposes
► https://www.wired.com/story/inside-chinas-massive-surveillance-operation/?verso=true

concealment
► https://www.theverge.com/2017/4/12/15271874/ai-adversarial-images-fooling-attacks-artificial-intelligence

deception
► https://www.theverge.com/2017/11/2/16597276/google-ai-image-attacks-adversarial-turtle-rifle-3d-printed

Generative Adversarial Networks
► https://towardsdatascience.com/catch-me-if-you-can-a-simple-english-explanation-of-gans-or-dueling-neural-nets-319a273434db

Capt. Iain J. Cruickshank is a Ph.D. candidate in societal computing at Carnegie Mellon University as a National Science Foundation graduate research fellow. His previous assignments include company commander for D Company, 781st Military Intelligence Battalion (Cyber), and sub-element lead for planning and analysis and production on a national mission team in the Cyber National Mission Force.

This article was produced in conjunction with the Defense Entrepreneur Forum’s Gutenberg Writer’s Collaborative.

Gutenberg Writer’s Collaborative.
► https://defenseentrepreneurs.org/gutenberg/

source:
        replicate the DevSecOps ecosystems commercial developers use so there is no seam between the government solution and the best commercial AI tools.
►      https://warontherocks.com/2020/02/the-abcs-of-ai-enabled-intelligence-analysis/
<---------------------------------------------------------------------------->

The Data-Centric Revolution: Data-Centric vs. Data-Driven

    Published: September 21, 201612:40 am
    Author Dave McComb

In this column, I am making the case for Data Centric architectures for enterprises. There is a huge economic advantage to converting to the data centric approach, but curiously few companies are making the transition.

One reason may be the confusion of Data Centric with Data Driven, and the belief that you are already on the road to data centric nirvana, when in fact you are nowhere near it.

[ Data Centric ]

Data centric refers to an architecture where data is the primary and permanent asset, and applications come and go. In the data centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone.

Many people may think this is what happens now or what should happen. But it very rarely happens this way. Businesses want functionality, and they purchase or build application systems. Each application system has its own data model, and its code is inextricably tied with this data model. It is extremely difficult to change the data model of an implemented application system, as there may be millions of lines of code dependent on the existing model.

Of course, this application is only one of hundreds or thousands of such systems in an enterprise. Each application on its own has hundreds to thousands of tables and tens of thousands of attributes. These applications are very partially and very unstably “interfaced” to one another through some middleware that periodically schleps data from one database to another.

The data centric approach turns all this on its head. There is a data model—a semantic data model (but more on that will be in a subsequent white paper)—and each bit of application functionality reads and writes through the shared model. If there is application functionality that calculates suggested reorder quantities for widgets, it will make its suggestion, and add it to the shared database, using the common core terms. Any other system can access the suggestions and know what they mean. If the reordering functionality goes away tomorrow, the suggestions will still be there.

[ Data Driven ]

Many companies now claim to be data driven; far more than that claim to be data centric.

But they aren’t the same thing.

In Creating a Data-Driven Organization, Carl Anderson starts off saying, “Data-drivenness is about building tools, abilities, and, most crucially, a culture that acts on data.” This is a very good book, and I think it echoes what most people think of when they think “data driven.” It’s about acquiring and analyzing data to make better decisions.

As our appetites to acquire and analyze more data intensified, “Big Data” emerged.

But acquiring more data isn’t going to make you data centric, and may even make you less data centric. If each dataset you acquire has a different data model, and you just plop them down in a data lake without any attempt to harmonize them, you are getting less and less data centric—even as you become more data driven.

We understand why data lakes are popular now. The traditional data warehouse environment relied on complex ETL (extract, transform, and load) routines to scrub the data and get it all to conform to a predesigned data warehouse schema.

But this process is slow. It is not untypical for it to take weeks or months for a new data source to be incorporated into the data warehouse environment. The biggest problem is, until the data is normalized and cleansed, it’s unavailable for analytics. This is good for canned analytics, but for exploratory analytics, this is a problem. You would like to analyze the data to determine whether it will be worth the effort for normalizing, but by the time you can run your analytics, you invested the cost of conforming it.

The data lake approach says, “just put all your data in the lake, roughly in the format it was, and the data scientists will take it from here.” This is great for the initial exploration, but the explosion in data sources means that, over time, the data lake will be overrun with inconsistent variation and unnecessary variety.

[ Adding Data-Centricity to Your Data Driven Organization ]

It is possible to get the best of both possible worlds. You can become a data centric / data driven organization.

The key is to have a core model of the concepts in your organization. This core model, or enterprise ontology, can become the organizing principle for your firm.

Let’s say you’re a healthcare enterprise. You’ve acquired data about physicians in addition to data about doctors and nurses, not to mention data about people’s residential addresses. With a core model in place, you will learn over time that physicians, doctors, and nurses are people (really!!), and in addition to attributes they may have related to their person-ness (such as residing at a physical address), they have attributes about their specialties, etc.

The key is that you don’t have to do all this mapping up front, and you can use the model and the data to help you understand what you don’t yet know. You gradually get the data in the data lake cataloged in a way that makes it easier for your analysts to use. You catalog it in such a way that your applications can access it, if need be (as they will be sharing the same core model).

Skillfully executed, the meaning of the data in your data lake can grow right along with the lake itself and become an asset, rather than a liability.

[ Summary ]

Data centric and data driven are not synonyms. In fact, unchecked ambitions in acquiring and analyzing data sets could easily make your organization less data centric, as you drown in your data lake.

Luckily, the data centric approach has a life preserver: use the shared model of your data centric architecture as a way to organize and interpret the data you are acquiring in an agile way.

Dave McComb is President of Semantic Arts, Inc. a Fort Collins, Colorado based consulting firm, specializing in Enterprise Architecture and the application of Semantic Technology to Business Systems. He is the author of Semantics in Business Systems, and program chair for the annual Semantic Technology Conference.

source:
►      https://www.textise.net/showText.aspx?strURL=https://tdan.com/the-data-centric-revolution-data-centric-vs-data-driven/20288
<---------------------------------------------------------------------------->

5 Lessons I Have Learned From Data Science In Real Working Experience
The underrated yet important lessons…

[Image: Admond Lee]

Admond Lee

Sep 2, 2018·8 min read

It has been a while even since I posted on Medium. Having been in Data Science for almost half a year, I’ve made a lot of mistakes and learned from the mistakes along the way… through the hard way.

    There’s no failure, only feedback.

    And the real world is a feedback mechanism.

And YES you’re right, the learning journey wasn’t easy. Just keep grinding. LEARN and IMPROVE.

Through my learning experience, I have finally come to realize that there are a few common pitfalls that most beginners (like me) in Data Science would probably encounter. If you did, I hope the 5 biggest lessons that I have learned from these pitfalls would guide you through your journey. Let’s get started!

1. Business Domain Knowledge

To be honest, this lesson hit me hard right in my face when I first started as I did not put much emphasis on the importance of domain knowledge. Instead, I spent too much time on improving my technical knowledge (building a sophisticated model without really understanding the business needs).

Without understanding the business thoroughly, chances are your model would not add any values to the company as it simply doesn’t serve the purpose, regardless of how accurate your model is.

Most common technique to improve a model accuracy is Grid Search to search for the best parameters for the model. However, only by understanding business needs and adding relevant features to train your model, then can you boost your model performance significantly. Features engineering is still very important and Grid Search is just the final touch to improve your model.

As always, be really interested in your company’s business because your job is to help them solve their problems — through DATA. Ask yourself if you’re really passionate about what they’re doing and show your empathy at work.
Always know you’re talking about

Solely understanding the business itself is not sufficient, not until you can articulate your ideas and present to other colleagues/stakeholders in the terms that they could understand in the business context.

In other words, NEVER use strange (or probably self-defined) words that are not familiar to stakeholders as this would only give rise to misunderstanding between you and them.

    Despite your findings may be correct or insights may be impactful, your credibility would be questioned and your findings would be nothing but a debatable subject.

Before you show how data can be used to solve business problems, I’d suggest to first show that you understand the business as a whole (including technical terms that are commonly used in your day-to-day work) and subsequently identify a problem statement that could be answered by the available data.

2. Detail-Oriented Mindset and Workflow

Be like a Detective. Carry out your investigation with laser focus on details. This is particularly important during the process of data cleaning and transformation. Data in real life is messy and you must have the capability to pick up signals from the ocean of noise before you get overwhelmed.

Therefore, having a detail-oriented mindset and workflow is of paramount importance to be successful in Data Science. Without a meticulous mindset or a well-structured workflow, you might lose your direction in the midst of diving into exploring your data.

You may be diligently performing Exploratory Data Analysis (EDA) for some time but still may not have reached any insights. Or you may be consistently training your model with different parameters to hopefully see some improvement. Or perhaps, you may be celebrating the completion of arduous data cleaning process, when the data could in fact be not clean enough to feed to your model. I’d been through this aimless process, only to realize that I did not have a well-structured workflow and my mind was simply hoping for the best to happen.

    Hoping for the best to happen simply left me no control of what I was doing. The system was disordered and I knew something was wrong.

I stepped back to look at a bigger picture on what I’d been doing; I reorganized my thoughts and workflow, trying to make everything standardized and systematic. And it worked!

3. Design and Logic of Experiment

A systematic workflow gives a macroscopic view of your whole data science prototyping system (from data cleaning to interpreting model results etc.); an experiment is an integral part of the workflow that includes your logic for hypothesis testing as well as model building process.

Normal machine learning problems (Kaggle competition etc.) are straightforward as you can just get training data and start building your model.

However, things get complicated in real world in terms of framing your logic and designing an experiment to test your assumption and evaluate your model with suitable success metrics.

At the end of an experiment, every claim or conclusion should always be supported by facts and data. NEVER conclude something without verifying its validity.

4. Communication Skills

If there’s only one takeaway from this post, I hope you can always strive to improve your communication skills. It doesn’t matter if you are a beginner, intermediate or an expert in Data Science.

Promise me one thing — that you’d share your thoughts to others while attentively listening to their opinion at the same time. Be receptive of criticism and feedback.

Speak the language of the business and communicate with colleagues, managers and other stakeholders with the terms that they understand. This resonates with the importance of the first lesson — Business Domain Knowledge. Failure to grasp the language of the business would render your communication with team members less effective as people may have a hard time to understand your words from their point of view.

    As a result, time gets wasted; people get frustrated; your credibility and relationship with them would likely get affected. What a lose-lose situation!

Even worse, lack of communication skills would cause business stakeholders to face challenges in understanding your analysis results. Always communicate your ideas, approach, results and insights in a simple manner despite the complexity behind. Simply speaking, if you speak a business language to business people, they feel more comfortable, they feel empowered and are much more willing to invest their time into the process, leading to more active participation in the conversation to understand your analysis. This also leads to the importance of the last lesson — Storytelling.

5. Storytelling

If it wasn’t obvious by now, Data Science isn’t just about data crunching and models building to showcase results to stakeholders. With the stellar performance of your model that can meet business needs, your end goal should be to deliver your results to stakeholders through compelling data storytelling that can answer some of the following questions (depending on your project goals):

    Why do we have to analyze it?
    What insights can we obtain from the results?
    What decisions/action plans can we make out of it?

The art of storytelling is simple and complex at the same time. It’s often overlooked in data-driven analysis that sometimes even the best model performance and results would end up being useless due to lousy storytelling and presentation. What a waste!

Imagine your are the stakeholder, what makes a compelling and convincing storytelling?

Let’s sit back and relax. Imagine again when there is a data scientist now showing you a highly accurate model prediction for a business problem without further explanation. You might think: Impressive! The model is doing a great job… So what is next? And then?

Do you get what I’m trying to portray here? There is a definite gap between model results and action plans. Stakeholders wouldn’t know what to do even though you just show them a highly accurate model prediction. We have to bridge the gap by thinking from their perspective to answer their questions and concerns instead of solely meeting the business objectives to ultimately lead to action plans.

There are many ways of bridging the gap and I’ll briefly highlight two approaches that can provide illuminating insights and guide stakeholders to their action plans.

Set a benchmark for comparison

It is insufficient to claim that a model performance is good without having something to compare with. In other words, a benchmark is needed as the baseline so that we know if the model does a great job or the other way round.

Without this benchmark, it is practically meaningless to claim that a model performs well as there is still a question left unanswered: How good is considered good enough? Why should I believe your results?
Risk management

This is especially important as it will decide if your model will be pushed into production. It means that you have to show the BEST and WORST case scenarios from the model performance.

This is where risk management comes in because stakeholders want to know the model limitation where it works and where it fails. They want to know how much risk the company has to bear when the model is pushed into production which can eventually affect their final action plans.

Therefore, understanding the importance of risk management will not only make your results more compelling, but also increase stakeholders’ confidence in you and your outcome substantially (since you have helped the company to manage and minimize risk).

Thank you for reading. I hope that you would find these 5 lessons useful for your learning journey. I would be very excited to learn more about your lessons throughout your learning experience, so please feel free share with me and leave your comments below! 😃

As always, I still have so much more to learn and would love to hear from you on how I can improve my content (technical or non-technical) on Medium.

If you have any questions just add me and let’s chat on LinkedIn or visit my personal website for more:
About the Author

Admond Lee is now in the mission of making data science accessible to everyone. He is helping companies and digital marketing agencies achieve marketing ROI with actionable insights through innovative data-driven approach.

With his expertise in advanced social analytics and machine learning, Admond aims to bridge the gaps between digital marketing and data science.

Check out his website if you want to understand more about Admond’s story, data science services, and how he can help you in marketing space.

source:
►      https://www.textise.net/showText.aspx?strURL=https://towardsdatascience.com/5-lessons-i-have-learned-from-data-science-in-real-working-experience-3532c1b41fd7?gi=c435e0c6077d
<---------------------------------------------------------------------------->

Why The Army’s New Palantir Contract Won’t Fix Battlefield Intelligence

The U.S. Army recently announced that Palantir won the contract to build the new battlefield intelligence platform. Palantir has a … Continued

By Capt. Iain J. Cruickshank, U.S. Army
   August 08, 2018

    The Long March

[Image: Why The Army’s New Palantir Contract Won’t Fix Battlefield Intelligence]

The U.S. Army recently announced that Palantir won the contract to build the new battlefield intelligence platform.

Palantir has a great reputation for use on the battlefield, especially for counter-IED functions, and has attained an almost legendary status among some analysts and communities in the Army. When compared to the Distributed Common Ground System – Army (DCGS-A), its success is not surprising; most users of DCGS-A would agree that it is problematic.

٠ Distributed Common Ground System – Army (DCGS-A)

In particular, Palantir has a much friendlier user interface than DCGS-A, and its Gotham system is excellent at linking reports or other pieces of intelligence together. But Palantir’s Gotham system, the model for a new battlefield intelligence system, is susceptible to quickly becoming the next DCGS-A. Without some important changes, Palantir’s software will not satisfy battlefield intelligence needs and be doomed to repeat the failures of its predecessor.

There are two critical areas that Palantir must address to develop a superior battlefield intelligence platform: inter-operability and customization of analysis by an end-user.

First, Palantir must be interoperable with different analysis suites. Real battlefield intelligence problems require a variety of approaches, both qualitative and quantitative, and will require different tools for appropriate analysis. When performing intelligence analysis—such as an All-Source Analyst will do in a battlefield environment—various tools need to be integrated. Analysts use a plethora of tools such as Analyst Notebook for creating link diagrams, ORA for network analysis, ArcGIS for creating maps, Anaconda for data science, and Palantir Gotham for finding linked reports. Many of these tools have specialized algorithms and analysis formats that cannot be replicated easily in one comprehensive format; ORA’s socio-cognitive maps cannot easily be replicated, and few systems can create good link diagram visualizations like Analyst Notebook. But one analysis suite will simply not provide the functionality to perform all of the analyses needed for battlefield intelligence.

   ____________________________________

Analysts use a plethora of tools:
٠ Analyst Notebook for creating link diagrams (link diagram visualizations),
٠ ORA for network analysis (socio-cognitive maps),
٠ ArcGIS for creating maps,
٠ Anaconda for data science, and
٠ Palantir Gotham for finding linked reports

Many of these tools have specialized algorithms and analysis formats that cannot be replicated easily in one comprehensive format; ORA’s socio-cognitive maps cannot easily be replicated, and few systems can create good link diagram visualizations like Analyst Notebook.

But one analysis suite will simply not provide the functionality to perform all of the analyses needed for battlefield intelligence.
   ____________________________________

Palantir must be able to integrate data with different systems. This means that it must both select and export data into easily ingestible formats like JSON and csv, and load in and parse the native formats from different analysis suites. A closed system that is not easily interoperable with other systems will not provide the functionality needed by battlefield analysts to attack real intelligence problems. For Palantir to be more successful than DCGS-A, it must have a high degree of interoperability with many analysis suites.

٠ ingestible formats: like JSON and csv
٠ load in and parse the native formats from different analysis suites

Secondly, and perhaps more importantly, if Palantir is to succeed where DCGS-A did not, it must allow for analysts to write their own code with its data. Humanity, and with it, warfare, is becoming increasingly digitized. As a result, information is exploding and the days where one analyst or group of analysts could read through all of the information on a particular area of interest and become an expert in a reasonable amount of time are waning. As intelligence analysis increasingly incorporates machine learning, Big Data, and interactive visualizations, intelligence will demand systems that can incorporate these advances in technology and methodology (see here, here, or here for just a few examples).

Many of the best tools for machine learning and advanced visualization are open source and written in coding languages like Python, R, and Julia. So, an analyst equipped with the right data will need to not only pull in these tools easily to support their analyses but also customize their tools to the particular intelligence data that is relevant to their battlefield environment and do all of this at massive scale and with varied types of information.

٠ tools for machine learning and advanced visualization are open source and written in coding languages like Python, R, and Julia.

٠ coding languages like Python, R, and Julia.

Some of the best uses of Big Data and machine learning in industry result from applying numerous methods to data, which can only be done in an open framework where an analyst is not restricted to provided or existing tools and methods. For the Palantir system to be successful for battlefield intelligence, it must, at a minimum, have an application programming interface for analysts to programmatically and easily pull and submit data with the Palantir system.

Ideally, Palantir should feature a full interface that has all of the important programming languages and their associated packages already in it, such that an analyst can build code and query data in the same environment (see Kaggle for an example of this type of environment). Ultimately, as the sheer quantity of information available on any given battlefield continues to increase (and it will do so for the foreseeable future), any successful battlefield intelligence system must be able to fully leverage this information using Big Data and machine learning. To do so, Palantir has to support an analyst coding and programmatically querying data.

The decision to award Palantir the contract for creating the Army’s new battlefield intelligence platform is a step in the right direction. Palantir’s Gotham program is generally better than DCGS-A. However, Palantir is in danger of making the same fundamental mistake that DCGS-A made: trying to create an intelligence analysis system to rule them all. Fundamentally, battlefield intelligence is both a science and an art. So, any successful battlefield system must fundamentally be interoperable and customizable to empower creativity. The change in battlefield intelligence systems brought by Palantir presents the military intelligence community a great opportunity to modernize and ensure dominance in battlefield intelligence both now and in the future. We can’t afford to waste the opportunity by allowing Palantir to make the same mistakes as DCGS-A.

Captain Iain J. Cruickshank is a USMA Graduate, class of 2010, and is currently a PhD Candidate in Societal Computing at Carnegie Mellon University as National Science Foundation Graduate Research Fellow. His previous assignments include Company Commander for D Company, 781st Military Intelligence Battalion (Cyber), and Sub-Element lead for planning and analysis and production on a National Mission Team in the Cyber National Mission Force. The views expressed in this article do not officially represent the views of the U.S. Army, the U.S. military or the United States Government, and are the views of the author only.

source:
►      https://www.textise.net/showText.aspx?strURL=https://taskandpurpose.com/thelongmarch/army-palantir-contract-problems/
<---------------------------------------------------------------------------->

Courses
٠ python

٠ intro to machine learning
٠ intermediate machine learning

٠ pandas

٠ data visualization
٠ feature engineering
٠ data cleaning

٠ intro to SQL
٠ advanced SQL

٠ intro to AI ethics
٠ intro to deep learning

٠ computer vision
٠ geospatial analysis
٠ machine learning explain ability

٠ microchallenges

٠ natural language processing
٠ intro to game AI and reinforcement learning

Datasets
Public API
efficient GPU usage tips
tensor processing units (TPUs)

٠ Financial tweets • David Wallach • CSV dataset
٠ Face detection in images • DataTurks • JSON dataset
٠ Star Trek scripts • Gary Broughton • JSON dataset

source:
►      https://www.kaggle.com/learn
<---------------------------------------------------------------------------->

source:

<---------------------------------------------------------------------------->

source:

<---------------------------------------------------------------------------->

    „Machine learning is a mathematical technique for training computer systems to make accurate predictions from a large corpus of training data, with a degree of accuracy that in some domains can mimic human cognition.“
    —— Maciej Ceglowski,
       May 7, 2019,
       US Senate Committee on Banking, Housing, and Urban Affairs
       on Privacy Rights and Data Collection in a Digital Economy

<< long read - scroll down to skip this section >>
Maciej Ceglowski's Senate testimony on Privacy Rights and Data Collection in a Digital Economy

May 7, 2019,
Senate Committee on Banking, Housing, and Urban Affairs
Privacy Rights and Data Collection in a Digital Economy (Senate hearing)
    privacy
    pinboard
    regulation
    gdpr
    long read

► https://idlewords.com/talks/senate_testimony.2019.5.htm

Consent in a world of inference

For example, imagine that an algorithm could inspect your online purchasing history and, with high confidence, infer that you suffer from an anxiety disorder. Ordinarily, this kind of sensitive medical information would be protected by HIPAA, but is the inference similarly protected? What if the algorithm is only reasonably certain? What if the algorithm knows that you’re healthy now, but will suffer from such a disorder in the future?

The question is not hypothetical—a 2017 study showed that a machine learning algorithm examining photos posted to the image-sharing site Instagram was able to detect signs of depression before it was diagnosed in the subjects, and outperformed medical doctors on the task.

Addendum: Machine Learning and Privacy

Machine learning is a mathematical technique for training computer systems to make accurate predictions from a large corpus of training data, with a degree of accuracy that in some domains can mimic human cognition.

For example, machine learning algorithms trained on a sufficiently large data set can learn to identify objects in photographs with a high degree of accuracy, transcribe spoken language to text, translate texts between languages, or flag anomalous behavior on a surveillance videotape.

The mathematical techniques underpinning machine learning, like convolutional neural networks (CNN), have been well-known since before the revolution in machine learning that took place beginning in 2012. What enabled the key breakthrough in machine learning was the arrival of truly large collections of data, along with concomitant [accompanies or is collaterally connected with] computing power, allowing these techniques to finally demonstrate their full potential.

It takes data sets of millions or billions of items, along with considerable computing power, to get adequate results from a machine learning algorithms. Before the advent of the surveillance economy, we simply did not realize the power of these techniques when applied at scale.

Because machine learning has a voracious appetite for data and computing power, it contributes both to the centralizing tendency that has consolidated the tech industry, and to the pressure companies face to maximize the collection of user data.

Machine learning models poses some unique problems in privacy regulation because of the way they can obscure the links between the data used to train them and their ultimate behavior.

A key feature of machine learning is that it occurs in separable phases. An initial training phase consists of running a learning algorithm on a large collection of labeled data (a time and computation-intensive process). This model can then be deployed in an exploitation phase, which requires far fewer resources.

Once the training phase is complete, the data used to train the model is no longer required and can conceivably be thrown away.

The two phases of training and exploitation can occur far away from each other both in space and time. The legal status of models trained on personal data under privacy laws like the GDPR, or whether data transfer laws apply to moving a trained model across jurisdictions, is not clear.

Inspecting a trained model reveals nothing about the data that went into it. To a human inspecting it, the model consists of millions and millions of numeric weights that have no obvious meaning, or relationship to human categories of thought. One cannot examine an image recognition model, for example, and point to the numbers that encode ‘apple’.

The training process behaves as a kind of one-way function. It is not possible to run a trained model backwards to reconstruct the input data; nor is it possible to “untrain” a model so that it will forget a specific part of its input.

Machine learning algorithms are best understood as inference engines. They find structure and excel at making inferences from data that can sometimes be surprising even to people familiar with the technology. This ability to see patterns that humans don’t notice has led to interest in using machine learning algorithms in medical diagnosis, evaluating insurance risk, assigning credit scores, stock trading, and other fields that currently rely on expert human analysis.

The opacity of machine learning models, combined with this capacity for inference, also make them an ideal technology for circumventing legal protections on data use. In this spirit, I have previously referred to machine learning as “money laundering for bias”. Whatever latent biases are in the training data, whether or not they are apparent to humans, and whether or not attempts are made to remove them from the data set, will be reflected in the behavior of the model.

A final feature of machine learning is that it is curiously vulnerable to adversarial inputs. For example, an image classifier that correctly identifies a picture of a horse might reclassify the same image as an apple, sailboat or any other object of an attacker’s choosing if they can manipulate even one pixel in the image. Changes in input data not noticeable to a human observer will be sufficient to persuade the model. Recent research suggests that this property is an inherent and ineradicable feature of any machine learning system that uses current approaches.

In brief, machine learning is effective, has an enormous appetite for data, requires large computational resources, makes decisions that resist analysis, excels at finding latent structure in data, obscures the link between source data and outcomes, defies many human intuitions, and is readily fooled by a knowledgeable adversary.

—Maciej Ceglowski, 2019

source:
►      https://idlewords.com/talks/senate_testimony.2019.5.htm

        https://tildes.net/~tech
<-------------------------------------------------------------------------->

    Technology
    Computer Sciences

October 16, 2015
System that replaces human intuition with algorithms outperforms human teams

by Larry Hardesty, Massachusetts Institute of Technology

http://phys.org/news/2015-10-human-intuition-algorithms-outperforms-teams.html

http://groups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/DSAA_DSM_2015.pdf

Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which "features" of the data to analyze usually requires some human intuition.

"What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering," Veeramachaneni says. "The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas."

source:
►      http://phys.org/news/2015-10-human-intuition-algorithms-outperforms-teams.html
<-------------------------------------------------------------------------->

October 16, 2015

System that replaces human intuition with algorithms outperforms human teams

by Larry Hardesty, Massachusetts Institute of Technology

[Image: System that replaces human intuition with algorithms outperforms human teams]

Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which "features" of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers' "Data Science Machine" finished ahead of 615.

In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

"We view the Data Science Machine as a natural complement to human intelligence," says Max Kanter, whose MIT master's thesis in computer science is the basis of the Data Science Machine. "There's so much data out there to be analyzed. And right now it's just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving."

Between the lines

Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.

"What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering," Veeramachaneni says. "The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas."

In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT's online-learning platform MITx doesn't record either of those statistics, but it does collect data from which they can be inferred.

Featured composition

Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.

For instance, one table might list retail items and their costs; another might list items included in individual customers' purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.

It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.

Once it's produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.

"The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem," says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. "I think what they've done is going to become the standard quickly—very quickly."

source:
►      https://www.textise.net/showText.aspx?strURL=https://phys.org/news/2015-10-human-intuition-algorithms-outperforms-teams.html
<---------------------------------------------------------------------------->

If you only have time to read one paper on Deep Learning, read this paper.

A few quotes:

"This rather naive way of performing machine translation has quickly become competitive with the state-of-the-art, and raises serious doubts about whether understanding a sentence requires anything like the internal symbolic expressions that are manipulated by using inference rules. It is more compatible with the view that everyday reasoning involves many simultaneous analogies that each contribute plausibility to a conclusion"

"The issue of representation lies at the heart of the debate between the logic-inspired and the neural-network-inspired paradigms for cognition. In the logic-inspired paradigm, an instance of a symbol is something for which the only property is that it is either identical or non-identical to other symbol instances. It has no internal structure that is relevant to its use; and to reason with symbols, they must be bound to the variables in judiciously chosen rules of inference. By contrast, neural networks just use big activity vectors, big weight matrices and scalar non-linearities to perform the type of fast ‘intuitive’ inference that underpins effortless commonsense reasoning."

"Problems such as image and speech recognition require the input–output function to be insensitive to irrelevant variations of the input, such as variations in position, orientation or illumination of an object, or variations in the pitch or accent of speech, while being very sensitive to particular minute variations (for example, the difference between a white wolf and a breed of wolf-like white dog called a Samoyed). At the pixel level, images of two Samoyeds in different poses and in different environments may be very different from each other, whereas two images of a Samoyed and a wolf in the same position and on similar backgrounds may be very similar to each other. A linear classifier, or any other ‘shallow’ classifier operating on raw pixels could not possibly distinguish the latter two, while putting the former two in the same category.... The conventional option is to hand design good feature extractors, which requires a considerable amount of engineering skill and domain expertise. But this can all be avoided if good features can be learned automatically using a general-purpose learning procedure. This is the key advantage of deep learning."

"Deep neural networks exploit the property that many natural signals are compositional hierarchies, in which higher-level features are obtained by composing lower-level ones. In images, local combinations of edges form motifs, motifs assemble into parts, and parts form objects. Similar hierarchies exist in speech and text from sounds to phones, phonemes, syllables, words and sentences. The pooling allows representations to vary very little when elements in the previous layer vary in position and appearance"

source:
►      https://news.ycombinator.com/item?id=9613810
<---------------------------------------------------------------------------->

AI / Deep Learning
Nov 03, 2015
Deep Learning in a Nutshell: Core Concepts
By Tim Dettmers

Machine Learning

In machine learning we (1) take some data, (2) train a model on that data, and (3) use the trained model to make predictions on new data. The process of training a model can be seen as a learning process where the model is exposed to new, unfamiliar data step by step. At each step, the model makes predictions and gets feedback about how accurate its generated predictions were. This feedback, which is provided in terms of an error according to some measure (for example distance from the correct solution), is used to correct the errors made in prediction.

The learning process is often a game of back-and-forth in the parameter space: If you tweak a parameter of the model to get a prediction right, the model may have in such that it gets a previously correct prediction wrong. It may take many iterations to train a model with good predictive performance. This iterative predict-and-adjust process continues until the predictions of the model no longer improve.

Feature Engineering

Feature engineering is the art of extracting useful patterns from data that will make it easier for Machine Learning models to distinguish between classes. For example, you might take the number of greenish vs. bluish pixels as an indicator of whether a land or water animal is in some picture. This feature is helpful for a machine learning model because it limits the number of classes that need to be considered for a good classification.

Feature engineering is the most important skill when you want to achieve good results for most predictions tasks. However, it is difficult to learn and master since different data sets and different kinds of data require different feature engineering approaches. Only crude guidelines exist, which makes feature engineering more of an art than a science. Features that are usable for one data set often are not usable for other data sets (for example the next image data set only contains land animals). The difficulty of feature engineering and the effort involved is the main reason to seek algorithms that can learn features; that is, algorithms that automatically engineer features.

While many tasks can be automated by Feature Learning (like object and speech recognition), feature engineering remains the single most effective technique to do well in difficult tasks (like most tasks in Kaggle machine learning competitions).

   ....   ...   .....

About
Tim Dettmers is a masters student in informatics at the University of Lugano where he works on deep learning research. Before that he studied applied mathematics and worked for three years as a software engineer in the automation industry. He runs a blog about deep learning and takes part in Kaggle data science competitions where he has reached a world rank of 63.

source:
        https://developer.nvidia.com/blog/deep-learning-nutshell-core-concepts/
<---------------------------------------------------------------------------->

source:

<---------------------------------------------------------------------------->

source:

<---------------------------------------------------------------------------->

Posted by 7 years ago
AMA: Yann LeCun

Every new technology has potential benefits and potential dangers. As with nuclear technology and biotech in decades past, societies will have to come up with guidelines and safety measures to prevent misuses of AI.

One hope is that AI will transform communication between people, and between people and machines. Ai will facilitate and mediate our interactions with the digital world and with each other. It could help people access information and protect their privacy. Beyond that, AI will drive our cars and reduce traffic accidents, help our doctors make medical decisions, and do all kinds of other things.

But it will have a profound impact on society, and we have to prepare for it. We need to think about ethical questions surrounding AI and establish rules and guidelines (e.g. for privacy protection). That said, AI will not happen one day out of the blue. It will be progressive, and it will give us time to think about the right way to deal with it.

It's important to keep mind that the arrival of AI will not be any more or any less disruptive than the arrival of indoor plumbing, vaccines, the car, air travel, the television, the computer, the internet, etc.

source:

https://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/chifet0

<---------------------------------------------------------------------------->

    “ ... the arrival of AI will not be any more or any less disruptive than the arrival of indoor plumbing, vaccines, the car, air travel, the television, the computer, the internet, etc.”, Yann LeCun (self.MachineLearning),   http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun

<-------------------------------------------------------------------------->

AMA: Yann LeCun (self.MachineLearning)

http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun

        https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=50

        https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=100

        https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=200

        https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=282

     read, learn from on-line material, try things for yourself. As Feynman said: don't read everything about a topic before starting to work on it. Think about the problem for yourself, figure out what's important, then read the literature. This will allow you to interpret the literature and tell what's good from what's bad.
     Many deep learning approaches can be seen as factor graphs. I posted about this in the past. ( https://plus.google.com/104362980539466846301/posts/51gWtf7X3Ee )
     others greatly underestimated the difficulty of reducing these conceptual ideas to practice.
     The DeepMind video-game player that trains itself with reinforcement learning uses Q-learning (a very classical algorithm for RL) on top of a convolutional network (a now very classical method for image recognition). One of the authors is Koray Kavukcuoglu who is a former student of mine.
     My job as the department head was to make sure people like Vladimir could work on their research with minimal friction and distraction.
     Learning complex/hierarchical/non-linear features/representations/metrics cannot be done with kernel methods as it can be done with deep architectures.
     People use Facebook because it helps them communicate with other people.
     Integrating deep learning (or representation learning) with reasoning and making unsupervised learning actually work are two big challenges for the next several years.
     I believe there is a role to play for specialized hardware for embedded applications. Once every self-driving car or maintenance robot comes with an embedded perception system, it will make sense to build FPGAs, ASICs or have hardware support for running convolutional nets or other models.
     "Data Science", which is the automatic (or semi-automatic) extraction of knowledge from data.
     The amount of data generated by our digital world is growing exponentially with high rate (at the same rate our hard-drives and communication networks are increasing their capacity).
     This means that now or in the near future most of the knowledge in the world will be extracted by machine and reside in machines. It's inevitable. En entire industry is building itself around this, and a new academic discipline is emerging.
     Otherwise, the order in which we learn things would not matter. Obviously, the order in which we learn things does matter (that's why pedagogy exists). The famous developmental psychologist Jean Piaget established that children learn simple concepts before learning more complex/abstract ones on top of them.
     There are four main uses for unsupervised learning: (1) learning features (or representations); (2) visualization/exploration; (3) compression; (4) synthesis.
     Only (1) is interesting to me (the other uses are interesting too, just not on my own radar screen).
     representing data (mostly natural signals like audio and images).
     These are people who have worked on wavelet transforms, sparse coding and sparse modeling, compressive sensing, manifold learning, numerical optimization, scientific computing, large-scale linear algebra, fast transform (FFT, Fast Multipole methods). This community has a lot to say about how to represent data in high-dimensional spaces.
     The questions become: how well does my method work on this particular problem, and how large is the set of problems on which it works well.
     It's important to keep mind that the arrival of AI will not be any more or any less disruptive than the arrival of indoor plumbing, vaccines, the car, air travel, the television, the computer, the internet, etc.
     establishing causal relationships is a hugely important problem in data science. There are huge applications in healthcare, social policy....
     The barrier to entry was very high, and it was very difficult to get state-of-the-art performance with brand new methods.
     There has to be a process by which innovative ideas can be allowed to germinate and develop, and not be shut down before they get a chance to produce good results.
     Alex Graves (from Deep Mind) has quite a few nice papers on applying neural networks to [time series prediction] though most of his work is focused on classification rather than forecasting.
     In the early days of aviation, some people (like Clément Ader) tried to copy birds and bats a little too closely (without understanding the principles of lift, drag, and stability) while others (like the Wright Brothers and Santos-Dumont) had a more systematic engineering approach (building a wind tunnel, testing airfoils, building full-scale gliders....). Both were somewhat inspired by nature, but to different degrees. My problem with sticking too close to nature is that it's like "cargo-cult" science. A bird biologist will tell you how important the micro-structure of feathers is to bird flight. You will think that you need to reproduce feathers in their most minute details to build flying machines. In reality, flight relies on the Bernoulli principle: pushing an angled plate (preferably shaped like an airfoil) through air creates lift. I don't use neural nets because they look like the brain. I use them because they are a convenient way to construct parameterized non-linear functions with good properties. But I did get inspiration from the architecture of the visual cortex to build convolutional nets.

source:
►      https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=100

►      https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=200

►      https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=282
<-------------------------------------------------------------------------->

http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun
<snippet TEXT from AMA with Yann LeCun>

AMA: Yann LeCun (self.MachineLearning)

My name is Yann LeCun. I am the Director of Facebook AI Research and a professor at New York University.

     *****   ***** *****

don't get fooled by people who claim to have a solution to Artificial General Intelligence, who claim to have AI systems that work "just like the human brain", or who claim to have figured out how the brain works (well, except if it's Geoff Hinton making the claim). Ask them what error rate they get on MNIST or ImageNet.

     *****   ***** *****

AI "died" about four times in five decades because of hype: people made wild claims (often to impress potential investors or funding agencies) and could not deliver. Backlash ensued. It happened twice with neural nets already: once in the late 60's and again in the mid-90's.

     *****   ***** *****

Asimov's book "I, Robot" is all about the conflict between hard-wired rules and intelligent decision making.

     *****   ***** *****

I believe there is a role to play for specialized hardware for embedded applications. Once every self-driving car or maintenance robot comes with an embedded perception system, it will make sense to build FPGAs, ASICs or have hardware support for running convolutional nets or other models. There is a lot to be gained with specialized hardware in terms of Joules/operation. We have done some work in that direction at my NYU lab with the NeuFlow architecture. I don't really believe in the use of specialized hardware for large-scale training. Everyone in the deep learning business is using GPUs for training. Perhaps alternatives to GPUs, like Intel's Xeon Phi, will become viable in the near future. But those are relatively mainstream technologies.

I like the joke about Big Data that compares it to teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.

Seriously, I don't like the phrase "Big Data". I prefer "Data Science", which is the automatic (or semi-automatic) extraction of knowledge from data. That is here to stay, it's not a fad. The amount of data generated by our digital world is growing exponentially with high rate (at the same rate our hard-drives and communication networks are increasing their capacity). But the amount of human brain power in the world is not increasing nearly as fast. This means that now or in the near future most of the knowledge in the world will be extracted by machine and reside in machines. It's inevitable. En entire industry is building itself around this, and a new academic discipline is emerging.

     *****   ***** *****

It's not at all clear whether the brain minimizes some sort of objective function. However, if it does, I can guarantee that this function is non convex. Otherwise, the order in which we learn things would not matter. Obviously, the order in which we learn things does matter (that's why pedagogy exists). The famous developmental psychologist Jean Piaget established that children learn simple concepts before learning more complex/abstract ones on top of them. We don't really know what "algorithm" or what "objective function" or even what principle the brain uses.

Unsupervised learning is about discovering the internal structure of the data, discovering mutual dependencies between input variables, and disentangling the independent explanatory factors of variations. Generally, unsupervised learning is a means to an end.

There are four main uses for unsupervised learning: (1) learning features (or representations); (2) visualization/exploration; (3) compression; (4) synthesis. Only (1) is interesting to me (the other uses are interesting too, just not on my own radar screen).

     *****   ***** *****

Speech is one of those domains where we have access to ridiculously large amounts of data and a very large number of categories. So, it's very favorable for supervised learning.

Theses are folks who have long been interested in representing data (mostly natural signals like audio and images). These are people who have worked on wavelet transforms, sparse coding and sparse modeling, compressive sensing, manifold learning, numerical optimization, scientific computing, large-scale linear algebra, fast transform (FFT, Fast Multipole methods). This community has a lot to say about how to represent data in high-dimensional spaces.

     *****   ***** *****

I do believe in getting inspiration from the brain, but I don't believe at all in copying and reproducing the detailed functions of neurons in the hope that AI will simple emerge from large simulations. In the early days of aviation, some people (like Clément Ader) tried to copy birds and bats a little too closely (without understanding the principles of lift, drag, and stability) while others (like the Wright Brothers and Santos-Dumont) had a more systematic engineering approach (building a wind tunnel, testing airfoils, building full-scale gliders....). Both were somewhat inspired by nature, but to different degrees. My problem with sticking too close to nature is that it's like "cargo-cult" science. A bird biologist will tell you how important the micro-structure of feathers is to bird flight. You will think that you need to reproduce feathers in their most minute details to build flying machines. In reality, flight relies on the Bernoulli principle: pushing an angled plate (preferably shaped like an airfoil) through air creates lift. I don't use neural nets because they look like the brain. I use them because they are a convenient way to construct parameterized non-linear functions with good properties. But I did get inspiration from the architecture of the visual cortex to build convolutional nets.

source:
►      https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=100

►      https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=200

►      https://i.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun/?limit=282
<-------------------------------------------------------------------------->

key ideas

   data science/ extraction of knowledge from data //
   the order in which we learn things does matter //
   learn simple concept before learning more complex/abstract ones //
   covariance/ structural homology/
   homology/ the existence of shared ancestry between a pair of structures //
   ontology/ where to put things/ ways to organize things //
   The catastrophes are the surprises: all else is mere repetition //

   morphology -
             1. the branch of biology that deals with the form and
                 structure of animals and plants
             3. any scientific study of form and structure,
                 as in physical geography
             4. form and structure, as of an organism, regarded as a whole //

     • "Data Science", which is the automatic (or semi-automatic) extraction of knowledge from data.;── Yann LeCun (self.MachineLearning).
     • the goal of extracting information from data.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
     • ... and thus discover something about data that will be seen in the future.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman.
     • All algorithms for analysis of data are designed to produce a useful summary of the data, from which decisions are made.;── Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of massive datasets, 2010.'; http://infolab.stanford.edu/~ullman/mmds/book.pdf

http://infolab.stanford.edu/~ullman/mmds/book.pdf
(( last checked: Thur May 13, 2021 [up] ))

   data science/ DS/ machine learning/ ML/ unsupervised feature learning/
   unsupervised learning/ computer science/ CS/ science fiction/ SF/ sci-fi/
   fantasy/ fa/ fiction/ fi/ Finland/ fi/ reinforcement learning/ RL/
   deep learning/ DL/ artificial intelligence/ AI/ expert systems/
   representation learning/ RL/
   Natural Language / NL/
   natural language processing / NLP/
   Natural Language Speech Recognition/ NLSR/
   Speaker-Independent Natural Language Speech Recognition/ SI-NLSR or NLSR/
<-------------------------------------------------------------------------->

source:

<---------------------------------------------------------------------------->

source:

<---------------------------------------------------------------------------->

star trek reading room

Friday, May 14, 2021

AI winter (links, URLs)

No comments:

Post a Comment

RISC-V

Search This Blog