|Institutions||Weizmann Institute of Science|
|Thesis||Algorithmic Program Debugging (1982)|
|Doctoral advisor||Dana Angluin|
|Doctoral students||Aviv Regev, Yaakov Benenson, Tom Ran, Tuval Ben Yehezkel|
Ehud Shapiro (Hebrew: אהוד שפירא; born 1955) is a multi-disciplinary scientist, artist, entrepreneur and a Professor of Computer Science and Biology at the Weizmann Institute of Science. With international reputation, he made fundamental contributions to many scientific disciplines. Ehud was also an Internet pioneer, a successful Internet entrepreneur, and a pioneer and proponent of E-democracy. Ehud is the founder of the Ba Rock Band and conceived its original artistic program. He is a winner of two ERC (European Research Council) Advanced Grants.
Education and Professional background
Born in Jerusalem in 1955, the guiding light for Ehud Shapiro's scientific endeavors was the philosophy of science of Karl Popper, with which he became acquainted through a high-school project supervised by Moshe Kroy from the Department of Philosophy, Tel Aviv University. In 1979 Shaprio completed his undergraduate studies in Tel Aviv University in Mathematics and Philosophy with distinction. Shapiro's PhD work with Dana Angluin in Computer Science at Yale university attempted to provide an algorithmic interpretation to Popper's philosophical approach to scientific discovery, resulting in both a computer system for the inference of logical theories from facts and a methodology for program debugging, developed using the programming language Prolog. His thesis, "Algorithmic Program Debugging", was published by MIT Press as a 1982 ACM Distinguished Dissertation, followed in 1986 by "The Art of Prolog", a textbook co-authored with Leon Sterling.
Coming to the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science in 1982 as a post-doctoral fellow, Shapiro was inspired by the Japanese Fifth Generation Computer Systems project to invent a high-level programming language for parallel and distributed computer systems, named Concurrent Prolog. A two-volume book on Concurrent Prolog and related work was published by MIT Press in 1987. Shapiro's work had a decisive influence on the strategic direction of the Japanese national project, and he cooperated closely with the project throughout its 10-years duration.
In 1993, Shapiro took leave of absence from his tenured position at Weizmann to found Ubique Ltd. (and serve as its CEO), an Israeli Internet software pioneer. Building on Concurrent Prolog, Ubique developed "Virtual Places", a precursor to today's broadly-used Instant Messaging systems. Ubique was sold to America Online in 1995, and following a management buy out in 1997 was sold again to IBM in 1998, where it continues to develop SameTime, IBM's leading Instant Messaging product based on Ubique's technology.
Preparing for return to academia, Shapiro ventured into self-study of molecular biology. Shapiro attempted to build a computer from biological molecules, guided by a vision of "A Doctor in a Cell": A biomolecular computer that operates inside the living body, programmed with medical knowledge to diagnose diseases and produce the requisite drugs. Lacking experience in molecular biology, Shapiro realized his first design for a molecular computer as a LEGO-like mechanical device built using 3D stereolithography, which was patented upon his return to Weizmann in 1998. During the last decade and a half, Shapiro’s lab have designed and successfully implemented various molecular computing devices.
In 2004, Prof. Shapiro also designed an effective method of synthesizing error-free DNA molecules from error-prone building blocks. In 2011, Prof. Shapiro founded the CADMAD consortium: The CADMAD technological platform aims to deliver a revolution in DNA processing analogous to the revolution text editing underwent with the introduction of electronic text editors.
In 2005, Prof. Shapiro presented a vision of the next grand challenge in Human biology: To uncover the Human cell lineage tree. Inside all of us is a cell lineage tree – the history of how our body grows from a single cell (the fertilized egg) to 100 trillion cells. The biological and biomedical impact of such a success could be of a magnitude similar, if not larger than that of the Human Genome Project. In his TEDxTel-Aviv talk "Uncovering The Human Cell Lineage Tree – The next grand scientific challenge" Prof. Shapiro described the system and results obtained with it so far, and a proposal for a FET Flagship project "Human Cell Lineage Flagship initiative" for uncovering the Human cell lineage tree in health and disease.
Inductive Logic Programming
The philosopher of science Karl Popper suggested that all scientific theories are by nature conjectures and inherently fallible, and that refutation to old theory is the paramount process of scientific discovery. According to Popper’s Philosophy the Growth of Scientific Knowledge is based upon Conjectures and Refutations. Prof. Shapiro’s doctoral studies with Prof. Dana Angluin attempted to provide an algorithmic interpretation to Karl Popper's approach to scientific discovery – in particular for automating the "Conjectures and Refutations" method – making bold conjectures and then performing experiments that seek to refute them. Prof. Shapiro generalized this into the "Contradiction Backtracing Algorithm" – an algorithm for backtracking contradictions. This algorithm is applicable whenever a contradiction occurs between some conjectured theory and the facts. By testing a finite number of ground atoms for their truth in the model the algorithm can trace back a source for this contradiction, namely a false hypothesis, and can demonstrate its falsity by constructing a counterexample to it. The "Contradiction Backtracing Algorithm" is relevant both to the philosophical discussion on the refutability of scientific theories and in the aid for the debugging of logic programs. Prof. Shapiro laid the theoretical foundation for inductive logic programming and built its first implementation (Model Inference System): a Prolog program that inductively inferred logic programs from positive and negative examples. Inductive logic programming has nowadays bloomed as a subfield of artificial intelligence and machine learning which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Recent work in this area, combining logic programming, learning and probability, has given rise to the new field of statistical relational learning.
Algorithmic program debugging
Program debugging is an unavoidable part of software development. Until the 1980s the craft of program debugging, practiced by every programmer, was without any theoretical foundation. In the early 1980s, systematic and principled approaches to program debugging were developed. In general, a bug occurs when a programmer has a specific intention regarding what the program should do, yet the program actually written exhibits a different behavior than intended in a particular case. One way of organizing the debugging process is to automate it (at least partially) via an algorithmic debugging technique. The idea of algorithmic debugging is to have a tool that guides the programmer along the debugging process interactively: It does so by asking the programmer about possible bug sources. Algorithmic debugging was first developed by Ehud Shapiro during his PhD research at Yale University, as introduced in his PhD thesis, selected as a 1982 ACM Distinguished Dissertation. Shapiro implemented the method of algorithmic debugging in Prolog (a general purpose logic programming language) for the debugging of logic programs. In case of logic programs, the intended behavior of the program is a model (a set of simple true statements) and bugs are manifested as program incompleteness (inability to prove a true statement) or incorrectness (ability to prove a false statement). The algorithm would identify a false statement in the program and provide a counter-example to it or a missing true statement that it or its generalization should be added to the program. A method to handle non-termination was also developed.
The Fifth Generation Computer Systems project
The Fifth Generation Computer Systems project (FGCS) was an initiative by Japan's Ministry of International Trade and Industry, begun in 1982, to create a computer using massively parallel computing/processing. It was to be the result of a massive government/industry research project in Japan during the 1980s. It aimed to create an "epoch-making computer" with-supercomputer-like performance and to provide a platform for future developments in artificial intelligence. In 1982, during a visit to the ICOT, Ehud Shapiro invented Concurrent Prolog, a novel concurrent programming language that integrated logic programming and concurrent programming. Concurrent Prolog is a logic programming language designed for concurrent programming and parallel execution. It is a process oriented language, which embodies dataflow synchronization and guarded-command indeterminacy as its basic control mechanisms. Shapiro described the language in a Report marked as ICOT Technical Report 003, which presented a Concurrent Prolog interpreter written in Prolog. Shapiro's work on Concurrent Prolog inspired a change in the direction of the FGCS from focusing on parallel implementation of Prolog to the focus on concurrent logic programming as the software foundation for the project. It also inspired the concurrent logic programming language Guarded Horn Clauses (GHC) by Ueda, which was the basis of KL1, the programming language that was finally designed and implemented by the FGCS project as its core programming language.
In 1993, Prof. Shapiro took a leave of absence from the Weizmann Institute to found and serve as CEO of Ubique Ltd., an Israeli Internet software pioneer. Ubique was a software company that developed instant messaging and collaboration products. The company's first product, Virtual Places 1.0, integrated in one product instant messaging, voice-over-IP and browser-based social networking on top of Unix-based workstations. These ideas and technologies – integrated into one product – were novel and revolutionary and perhaps ahead of their time. Ubique, was sold to America Online in 1995, bought back by its management in 1997, and sold again to IBM in 1998.
Molecular programming languages
In the beginning of the 21st century, scientific progress has successfully managed to consolidate knowledge of the 'sequence' and 'structure' branches of molecular cell biology in an accessible manner. For example, the DNA-as-string abstraction captured the primary sequence of nucleotides without including higher and lower-order biochemical properties. This abstraction allows the application of a battery of string algorithms, as well as enabling the practical development of databases and common repositories.
As molecular circuits are the information processing devices of cells and organisms, they have been the subject of research of biologists for many decades. Prior to the advent of computational biology tools, biologists were unable to have access to large amounts of data and their analyses. The mountains of knowledge about the function, activity and interaction of molecular systems in cells remained fragmented. Moreover, these past studies that have identified and connected a few components or interactions one at a time, required decades of serial work.
In a seminal paper published in 2002 in Nature magazine "Cellular abstractions: Cells as computation" Prof. Shapiro raised the question: Why can’t the study of biomolecular systems make a similar computational leap? Both sequence and structure research have adopted good abstractions: ‘DNA-as-string’ and ‘protein-as-three-dimensional-labelled-graph’, respectively. He believed that computer science could provide the much-needed abstraction for biomolecular systems. Together with his Ph.D. student Aviv Regev he used advanced computer science concepts to investigate the ‘molecule-as-computation’ abstraction, in which a system of interacting molecular entities is described and modelled by a system of interacting computational entities. He developed Abstract computer languages for the specification and study of systems of interacting computations, in order to represent biomolecular systems, including regulatory, metabolic and signalling pathways, as well as multicellular processes such as immune responses. These "molecular programming languages" enabled simulation of the behavior of biomolecular systems, as well as development of knowledge bases supporting qualitative and quantitative reasoning on these systems’ properties.
The groundbreaking work (that initially used the π-calculus, a process calculus) was later taken over by IBM Cambridge in the UK (Luca Cardelli) that developed SPiM (Stochastic Pi Calculus Machine). In the last decade the field has flourished with a vast variety of applications. More recently, the field even evolved to a synthesis of two different fields – molecular computing and molecular programming. The combination of the two exhibits how different mathematical formalisms (such as Chemical Reaction Networks) can serve as 'programming languages' and various molecular architectures (such as DNA molecules architecture) can in principle implement any behavior that can be mathematically expressed by the formalism being used.
Doctor in a cell
By combining computer science and molecular biology, researchers have been able to work on a programmable biological computer that in the future may navigate within the human body, diagnosing diseases and administering treatments. This is what Professor Ehud Shapiro from the Weizmann Institute termed a "Doctor in a cell".
His group designed a tiny computer made entirely of biological molecules which was successfully programmed – in a test tube – to identify molecular changes in the body that indicate the presence of certain cancers. The computer was then able to diagnose the specific type of cancer, and to react by producing a drug molecule that interfered with the cancer cells’ activities, causing them to self-destruct. For this work was a member of the 2004 "Scientific American 50" as Research Leader in Nanotechnology.
In 2009, Shapiro and PhD student Tom Ran presented the prototype of an autonomous programmable molecular system, based on the manipulation of DNA strands, which is capable of performing simple logical deductions. This prototype is the first simple programming language implemented on molecular-scale. Introduced into the body, this system has immense potential to accurately target specific cell types and administer the appropriate treatment, as it can perform millions of calculations at the same time and 'think' logically.
Prof Shapiro’s team aims to make these computers perform highly complex actions and answer complicated questions, following a logical model first proposed by Aristotle over 2000 years ago. The biomolecular computers are extremely small: three trillion computers can fit into a single drop of water. If the computers were given the rule 'All men are mortal and the fact 'Socrates is a man', they would answer 'Socrates is mortal'. Multiple rules and facts were tested by the team and the biomolecular computers answered them correctly each time.
The team has also found a way to make these microscopic computing devices 'user-friendly' by creating a compiler – a program for bridging between a high-level computer programming language and DNA computing code. They sought to develop a hybrid in silico/in vitro system that supports the creation and execution of molecular logic programs in a similar way to electronic computers, enabling anyone who knows how to operate an electronic computer, with absolutely no background in molecular biology, to operate a biomolecular computer.
In 2012, Prof. Ehud Shapiro and Dr. Tom Ran have succeeded in creating a genetic device that operates independently in bacterial cells. The device has been programmed to identify certain parameters and mount an appropriate response. The device searches for transcription factors – proteins that control the expression of genes in the cell. A malfunction of these molecules can disrupt gene expression. In cancer cells, for example, the transcription factors regulating cell growth and division do not function properly, leading to increased cell division and the formation of a tumor. The device, composed of a DNA sequence inserted into a bacterium, performs a "roll call" of transcription factors. If the results match pre-programmed parameters, it responds by creating a protein that emits a green light – supplying a visible sign of a "positive" diagnosis. In follow-up research, the scientists plan to replace the light-emitting protein with one that will affect the cell's fate, for example, a protein that can cause the cell to commit suicide. In this manner, the device will cause only "positively" diagnosed cells to self-destruct. Following the success of the study in bacterial cells, the researchers are planning to test ways of recruiting such bacteria as an efficient system to be conveniently inserted into the human body for medical purposes (which shouldn't be problematic given our natural microbiome; recent research reveals there are already 10 times more bacterial cells in the human body than human cells, that share our body space in a symbiotic fashion). Yet another research goal is to operate a similar system inside human cells, which are much more complex than bacteria.
Prof. Shapiro designed an effective method of synthesizing error-free DNA molecules from error-prone building blocks. DNA programming is the DNA-counterpart of computer programming. The basic computer programming cycle is to modify an existing program, test the modified program, and iterate until the desired behavior is obtained. Similarly, the DNA programming cycle is to modify a DNA molecule, test its resulting behavior, and iterate until the goal (which is either understanding the behavior or improving it) is achieved. One key difference between the two is that unlike computer programming, our understanding of DNA as programming language is very far from being perfect, and therefore trial and error are the norm rather than the exception in DNA-based research and development. Hence DNA programming is more efficient if multiple variants of a DNA program, also called a DNA library, are created and tested in parallel, rather than creating and testing just one program at a time. Hence the basic DNA programming cycle, when operating in full steam, takes the best DNA programs from the previous cycle, uses them as a basis for creating a new set of DNA programs, tests them, and iterates until the goal is achieved.
Furthermore, Polymerase Chain Reaction (PCR) is the DNA-equivalent of Gutenberg's movable type printing, both allowing large-scale replication of a piece of text. De novo DNA synthesis is the DNA-equivalent of mechanical typesetting; both ease the setting of text for replication. What is the DNA-equivalent of the word processor? Word processing was rapidly adopted as a replacement for the typewriter when users had discovered its revolutionary advantages in document creation, editing, formatting and saving. While the electronic representation of text in computers allows the processing of text within a simple unified framework, DNA processing – the creation of variations and combinations of existing DNA – is performed by biology labs daily using a plethora of unrelated manual labor-intensive methods. As a result, so far no universal method for DNA processing has been proposed and, consequently, no engineering discipline that further utilizes the processed DNA has emerged. Prof. Shapiro founded the CADMAD consortium: The CADMAD technological platform aims to deliver a revolution in DNA processing analogous to the revolution text editing underwent with the introduction of electronic text editors. The biotechnology revolution has, to a large extent, been held back by its notoriously prolonged R&D cycle compared to the computer programming cycle. A CAD/CAM technology for DNA which will bring word processor ease to DNA processing and thus support rapid DNA programming will revolutionize biotechnology by shortening the R&D cycle of DNA-based applications. This can only be accomplished by concerting the development of complex, multi-layered technologies which integrate expertise from fields as varied as algorithmics, software engineering, biotechnology, robotics and chemistry. These are only now starting to emerge as feasible.
Human cell lineage tree
In 2005, Prof. Shapiro presented a vision of the next grand challenge in Human biology: To uncover the Human cell lineage Tree. Inside all of us is a cell lineage tree – the history of how our body grows from a single cell (the fertilized egg) to 100 trillion cells. The biological and biomedical impact of such a success could be of a magnitude similar, if not larger than that of the Human Genome Project.
Every human being starts as a single cell – the fusion of an egg and a sperm – and progresses via cell division and cell death through development, birth, growth, and aging. Human health depends on maintaining a proper process of cell division, renewal and death, and humanity's most severe diseases, notably cancer, auto-immune diseases, diabetes, neuro-degenerative and cardiovascular disorders, and the multitude of inherited rare diseases are all the result of specific aberrations in this process.
The history of a person's cells, from conception until any particular moment in time, can be captured by a mathematical entity called a cell lineage tree. The root of the tree represents the fertilized egg, the leaves of the tree represent the person's extant cells, and branches in the tree capture every single cell division in the person's history.
Science knows precisely the cell lineage tree of only one organism – a worm called Caenorhabditis elegans that reaches its full size of 1 millimeter and 1,000 cells in 36 hours. By comparison, a newborn mouse, weighing only a few grams, has about 1 billion cells. An average person has about 100 trillion cells. Understanding the structure and dynamics of the human cell lineage tree in development, growth, renewal, aging, and disease is a central and urgent quest of biology and medicine. The challenge of uncovering the Human Cell Lineage Tree is reminiscent, both in nature and in scope, to the challenge faced by the Human Genome Project at its inception and, in fact, its results will decisively contribute to the functional translation and ultimate understanding of the genome sequence. A technological leap of a magnitude similar to the one that occurred during the Human Genome Project is required for the success of the human cell lineage project, and the biological and biomedical impact of such a success could be of a magnitude similar, if not larger than that of the Human Genome Project.
Central open problems in biology and medicine are in effect questions about the human cell lineage tree: its structure and its dynamics in development, growth, renewal, aging, and disease. Consequently, knowing the Human Cell Lineage Tree would resolve these problems and entail a leapfrog advance in human knowledge and health.
Many central questions in biology and medicine that are actually specific questions about the Human cell lineage tree, in health and disease:
- Which cancer cells initiate relapse after chemotherapy?
- Which cancer cells can metastasize?
- Do insulin-producing beta cells renew in healthy adults?
- Do eggs renew in adult females?
- Which cells renew in healthy and in unhealthy adult brain?
Knowing the Human cell lineage tree would answer all these questions and more. Fortunately, our cell lineage tree is implicitly encoded in our cells’ genomes via mutations that accumulate when body cells divide. Theoretically, it could be reconstructed with high precision by sequencing every cell in our body, at a prohibitive cost. Practically, analyzing only highly-mutable fragments of the genome is sufficient for cell lineage reconstruction. Shapiro's lab has developed a proof-of-concept multidisciplinary method and system for cell lineage analysis from somatic mutations.
In his TEDxTel-Aviv talk "Uncovering The Human Cell Lineage Tree – The next grand scientific challenge" Prof. Shapiro described the system and results obtained with it so far, and a proposal for a FET Flagship project "Human Cell Lineage Flagship initiative" for uncovering the Human cell lineage tree in health and disease.
Ehud initiated in 2012 and led the "open party" (later "open community") project within the Public Knowledge Workshop, which aimed to provide foundations for the operation of an e-party espousing direct democracy via the internet. He further extended his concepts of e-democracy in his Davos 2016 WEF lecture and Financial Times Opinion article.
- Ehud Shapiro's talk in TEDxTel-Aviv: Uncovering The Human Cell Lineage Tree – The next grand scientific challenge
- Ehud Shapiro at the Mathematics Genealogy Project
- http://www.wisdom.weizmann.ac.il/~udi/ Ehud Shapiro at the Weizmann Institute
- https://www.youtube.com/watch?v=GgS9myPsGUw From biomolecular computing to internet democracy | Ehud Shapiro at Davos World Economic Forum
- http://www.ft.com/intl/cms/s/0/bf4644e6-ef75-11e5-9f20-c3a047354386.html#axzz44IIDymk6 Financial Times
- Shapiro, Ehud Y. (1983). Algorithmic program debugging. Cambridge, Mass: MIT Press. ISBN 0-262-19218-7.
- Shapiro, Ehud Y.; Sterling, Leon (1994). The Art of Prolog: advanced programming techniques. Cambridge, Mass: MIT Press. ISBN 0-262-69163-9.
- "Ehud Shapiro: Uncovering The Human Cell Lineage Tree". tedxtelaviv.com. Archived from the original on 2014-04-07.
- "The Human cell Lineage Flagship Initiative". lineage-flagship.eu.
- Popper, Karl (2004). Conjectures and refutations : the growth of scientific knowledge (Reprinted. ed.). London: Routledge. ISBN 0-415-28594-1.
- Silva, Josep. "A survey on algorithmic debugging strategies." Advances in Engineering Software 42.11 (2011): 976-991/
- Zeller, Andreas. Why programs fail: a guide to systematic debugging. Elsevier, 2009./
- Shapiro, Ehud Y. (1983). Algorithmic program debugging. Cambridge, Mass: MIT Press. ISBN 0-262-19218-7
- Clocksin, William F., Christopher S. Mellish, and W. F. Clocksin. Programming in PROLOG. Vol. 4. Berlin etc.: Springer, 1987.
- Shapiro E. A subset of Concurrent Prolog and its interpreter, ICOT Technical Report TR-003, Institute for New Generation Computer Technology, Tokyo, 1983. Also in Concurrent Prolog: Collected Papers, E. Shapiro (ed.), MIT Press, 1987, Chapter 2.
- Regev, Aviv, and Ehud Shapiro. "Cellular abstractions: Cells as computation." Nature 419.6905 (2002): 343-343.
- "Archived copy". Archived from the original on 2014-01-08. Retrieved 2014-05-04.CS1 maint: archived copy as title (link)
- Chen, Yuan-Jyue, Neil Dalchau, Niranjan Srinivas, Andrew Phillips, Luca Cardelli, David Soloveichik, and Georg Seelig. "Programmable chemical controllers made from DNA." Nature nanotechnology 8, no. 10 (2013): 755-762
- "The 2004 Scientific American 50 Award: Research Leaders". Scientific American. 2004-11-11. Retrieved 2007-03-26.
- Tom Ran, Shai Kaplan & Ehud Shapiro, (2009), Molecular implementation of simple logic programs, Nature Nanotechnology, August, 2009.
- Tom Ran, Yehonatan Douek, Lilach Milo, Ehud Shapiro. A programmable NOR-based device for transcription profile analysis. Scientific Reports, 2012.
- Linshiz, G., Yehezkel, T. B., Kaplan, S., Gronau, I., Ravid, S., Adar, R., & Shapiro, E. (2008). Recursive construction of perfect DNA molecules from imperfect oligonucleotides. Molecular Systems Biology, 4(1).