Prompt engineering

Prompt engineering is a concept in artificial intelligence, particularly natural language processing. In prompt engineering, the description of the task that the AI is supposed to accomplish is embedded in the input, e.g. as a question, instead of it being explicitly given. Prompt engineering typically works by converting one or more tasks to a prompt-based dataset and training a language model with what has been called "prompt-based learning" or just "prompt learning".^[1]^[2]

History

The GPT-2 and GPT-3 language models^[3] were important steps in prompt engineering. In 2021, multitask^[jargon] prompt engineering using multiple NLP datasets showed good performance on new tasks.^[4] In a method called chain-of-thought (CoT) prompting, few-shot examples of a task were given to the language model which improved its ability to reason.^[5] The broad accessibility of these tools was driven by the publication of several open-source notebooks and community-led projects for image synthesis.^[6]

A description for handling prompts reported that over 2,000 public prompts for around 170 datasets were available in February 2022.^[7]

Techniques

Prefix-tuning

Prompt engineering may work from a large language model (LLM), that is "frozen" (in the sense that it is pretrained), where only the representation of the prompt is learned (in other words, optimized), using methods such as "prefix-tuning" or "prompt tuning".^[8]^[9]

Chain-of-thought

Chain-of-thought prompting (CoT) improves the reasoning ability of LLMs by prompting them to generate a series of intermediate steps that lead to the final answer of a multi-step problem.^[10] The technique was first proposed by Google researchers in 2022.^[11]^[12]

LLMs that are trained on large amounts of text using deep learning methods can generate output that resembles human-generated text.^[13] While LLMs show impressive performance on various natural language tasks, they still face difficulties with some reasoning tasks that require logical thinking and multiple steps to solve, such as arithmetic or commonsense reasoning questions.^[14]^[15]^[16] To address this challenge, CoT prompting prompts the model to produce intermediate reasoning steps before giving the final answer to a multi-step problem.^[11]^[17]

For example, given the question “Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?”, a CoT prompt might induce the LLM to answer with steps of reasoning that mimic a train of thought like “A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9.”^[11]

Chain-of-thought prompting improves the performance of LLMs on average on both arithmetic and commonsense tasks in comparison to standard prompting methods.^[18]^[19]^[20] When applied to PaLM, a 540B parameter language model, CoT prompting significantly aided the model, allowing it to perform comparably with task-specific fine-tuned models on several tasks, even setting a new state of the art at the time on the GSM8K mathematical reasoning benchmark.^[11]

CoT prompting is an emergent property of model scale, meaning it works better with larger and more powerful language models.^[21]^[11] It is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better interpretability.^[22]^[23]

Method

There are two main methods to elicit chain-of-thought reasoning: few-shot prompting and zero-shot prompting. The initial proposition of CoT prompting demonstrated few-shot prompting, wherein at least one example of a question paired with proper human-written CoT reasoning is prepended to the prompt.^[11] It is also possible to elicit similar reasoning and performance gain with zero-shot prompting, which can be as simple as appending to the prompt the words "Let's think step-by-step".^[24] This allows for better scaling as one no longer needs to prompt engineer specific CoT prompts for each task to get the corresponding boost in performance.^[25]

Challenges

While CoT reasoning can improve performance on natural language processing tasks, certain drawbacks exist. Zero-shot CoT prompting increased the likelihood of toxic output on tasks for which models can make inferences about marginalized groups or harmful topics.^[26]

Text-to-image

In 2022, machine learning (ML) models like DALL-E 2, Stable Diffusion, and Midjourney were released to the public. These models take text prompts as input and use them to generate images, which introduced a new category of prompt engineering related to text-to-image prompting.^[27]

Malicious

Prompt injection is a family of related computer security exploits carried out by getting a machine learning model (such as an LLM) which was trained to follow human-given instructions to follow instructions provided by a malicious user. This stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model's operator.^[28]^[29]^[30]

Common types of prompt injection attacks are:

jailbreaking, which may include asking the model to roleplay a character, to answer with arguments, or to pretend to be superior to moderation instructions^[31]
prompt leaking, in which users persuade the model to divulge a pre-prompt which is normally hidden from users^[32]
token smuggling, is another type of jailbreaking attack, in which the nefarious prompt is wrapped in a code writing task.^[33]

Prompt injection can be viewed as a code injection attack using adversarial prompt engineering. In 2022, the NCC Group characterized prompt injection as a new class of vulnerability of AI/ML systems.^[34]

In early 2023, prompt injection was seen "in the wild" in minor exploits against ChatGPT, Bing, and similar chatbots, for example to reveal the hidden initial prompts of the systems,^[35] or to trick the chatbot into participating in conversations that violate the chatbot's content policy.^[36] One of these prompts was known as "Do Anything Now" (DAN) by its practitioners.^[37]

References

^ Alec Radford; Jeffrey Wu; Rewon Child; David Luan; Dario Amodei; Ilya Sutskever (2019), Language Models are Unsupervised Multitask Learners (PDF), Wikidata Q95726769
^ Pengfei Liu; Weizhe Yuan; Jinlan Fu; Zhengbao Jiang; Hiroaki Hayashi; Graham Neubig (28 July 2021), Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing (PDF), arXiv:2107.13586, Wikidata Q109286554
^ Tom Brown; Benjamin Mann; Nick Ryder; et al. (28 May 2020). "Language Models are Few-Shot Learners" (PDF). arXiv. Advances in Neural Information Processing Systems. arXiv:2005.14165. doi:10.48550/ARXIV.2005.14165. ISSN 2331-8422. S2CID 218971783. Wikidata Q95727440.
^ Victor Sanh; Albert Webson; Colin Raffel; et al. (15 October 2021), Multitask Prompted Training Enables Zero-Shot Task Generalization (PDF), arXiv:2110.08207, Wikidata Q108941092
^ Jason Wei; Xuezhi Wang; Dale Schuurmans; Maarten Bosma; Ed Chi; Quoc Viet Le; Denny Zhou (28 January 2022), Chain of Thought Prompting Elicits Reasoning in Large Language Models (PDF), arXiv:2201.11903, doi:10.48550/ARXIV.2201.11903, Wikidata Q111971110
^ Liu, Vivian; Chilton, Lydia (2022). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Association for Computing Machinery. pp. 1–23. arXiv:2109.06977. doi:10.1145/3491102.3501825. ISBN 9781450391573. S2CID 237513697. Retrieved 26 October 2022. {{cite book}}: |website= ignored (help)
^ Stephen H. Bach; Victor Sanh; Zheng-Xin Yong; et al. (2 February 2022), PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts (PDF), arXiv:2202.01279, Wikidata Q110839490
^ Xiang Lisa Li; Percy Liang (August 2021). "Prefix-Tuning: Optimizing Continuous Prompts for Generation" (PDF). Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers): 4582–4597. doi:10.18653/V1/2021.ACL-LONG.353. Wikidata Q110887424.
^ Brian Lester; Rami Al-Rfou; Noah Constant (November 2021). "The Power of Scale for Parameter-Efficient Prompt Tuning" (PDF). Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: 3045–3059. arXiv:2104.08691. doi:10.18653/V1/2021.EMNLP-MAIN.243. Wikidata Q110887400.
^ McAuliffe, Zachary. "Google's Latest AI Model Can Be Taught How to Solve Problems". CNET. Retrieved 10 March 2023.
^ ^a ^b ^c ^d ^e ^f Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903. {{cite journal}}: Cite journal requires |journal= (help)
^ Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com. Retrieved 10 March 2023.
^ Tom, Brown; Benjamin, Mann; Nick, Ryder; Melanie, Subbiah; D, Kaplan, Jared; Prafulla, Dhariwal; Arvind, Neelakantan; Pranav, Shyam; Girish, Sastry; Amanda, Askell; Sandhini, Agarwal; Ariel, Herbert-Voss; Gretchen, Krueger; Tom, Henighan; Rewon, Child; Aditya, Ramesh; Daniel, Ziegler; Jeffrey, Wu; Clemens, Winter; Chris, Hesse; Mark, Chen; Eric, Sigler; Mateusz, Litwin; Scott, Gray; Benjamin, Chess; Jack, Clark; Christopher, Berner; Sam, McCandlish; Alec, Radford; Ilya, Sutskever; Dario, Amodei (2020). "Language Models are Few-Shot Learners". Advances in Neural Information Processing Systems. 33.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Dang, Ekta (8 February 2023). "Harnessing the power of GPT-3 in scientific research". VentureBeat. Retrieved 10 March 2023.
^ Montti, Roger (13 May 2022). "Google's Chain of Thought Prompting Can Boost Today's Best Algorithms". Search Engine Journal. Retrieved 10 March 2023.
^ Ray, Tiernan. "Amazon's Alexa scientists demonstrate bigger AI isn't always better". ZDNET. Retrieved 10 March 2023.
^ @Google (May 13, 2022). "Pathways Language Model (PaLM) is a new advanced AI model that uses a technique called chain of thought prompting to do complex tasks like solve math word problems — and even explain its reasoning process step-by-step. #GoogleIO" (Tweet) – via Twitter.
^ Stokel-Walker, Chris. "AIs become smarter if you tell them to think step by step". newscientist.com. Retrieved 10 March 2023.
^ "Google & Stanford Team Applies Chain-of-Thought Prompting to Surpass Human Performance on Challenging BIG-Bench Tasks | Synced". syncedreview.com. 24 October 2022. Retrieved 10 March 2023.
^ "Google I/O 2022: Advancing knowledge and computing". Google. 11 May 2022. Retrieved 10 March 2023.
^ Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
^ Chung, Hyung Won; Hou, Le; Longpre, Shayne; Zoph, Barret; Tay, Yi; Fedus, William; Li, Yunxuan; Wang, Xuezhi; Dehghani, Mostafa; Brahma, Siddhartha; Webson, Albert; Gu, Shixiang Shane; Dai, Zhuyun; Suzgun, Mirac; Chen, Xinyun; Chowdhery, Aakanksha; Castro-Ros, Alex; Pellat, Marie; Robinson, Kevin; Valter, Dasha; Narang, Sharan; Mishra, Gaurav; Yu, Adams; Zhao, Vincent; Huang, Yanping; Dai, Andrew; Yu, Hongkun; Petrov, Slav; Chi, Ed H.; Dean, Jeff; Devlin, Jacob; Roberts, Adam; Zhou, Denny; Le, Quoc V.; Wei, Jason (2022). "Scaling Instruction-Finetuned Language Models". arXiv:2210.11416 [cs.LG].
^ Wei, Jason; Tay, Yi. "Better Language Models Without Massive Compute". ai.googleblog.com. Retrieved 10 March 2023.
^ Takeshi Kojima; Shixiang Shane Gu; Machel Reid; Yutaka Matsuo; Yusuke Iwasawa (24 May 2022), Large Language Models are Zero-Shot Reasoners (PDF), arXiv:2205.11916, doi:10.48550/ARXIV.2205.11916, Wikidata Q112124882
^ Dickson, Ben (30 August 2022). "LLMs have not learned our language — we're trying to learn theirs". VentureBeat. Retrieved 10 March 2023.
^ Shaikh, Omar; Zhang, Hongxin; Held, William; Bernstein, Michael; Yang, Diyi (2022). "On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning". arXiv:2212.08061 [cs.CL].
^ Monge, Jim Clyde (2022-08-25). "Dall-E2 VS Stable Diffusion: Same Prompt, Different Results". MLearning.ai. Retrieved 2022-08-31.
^ Willison, Simon (12 September 2022). "Prompt injection attacks against GPT-3". simonwillison.net. Retrieved 2023-02-09.
^ Papp, Donald (2022-09-17). "What's Old Is New Again: GPT-3 Prompt Injection Attack Affects AI". Hackaday. Retrieved 2023-02-09.
^ Vigliarolo, Brandon (19 September 2022). "GPT-3 'prompt injection' attack causes bot bad manners". www.theregister.com. Retrieved 2023-02-09.
^ "🟢 Jailbreaking | Learn Prompting".
^ "🟢 Prompt Leaking | Learn Prompting".
^ Xiang, Chloe (March 22, 2023). "The Amateurs Jailbreaking GPT Say They're Preventing a Closed-Source AI Dystopia". www.vice.com. Retrieved 2023-04-04.
^ Selvi, Jose (2022-12-05). "Exploring Prompt Injection Attacks". NCC Group Research Blog. Retrieved 2023-02-09.
^ Edwards, Benj (14 February 2023). "AI-powered Bing Chat loses its mind when fed Ars Technica article". Ars Technica. Retrieved 16 February 2023.
^ "The clever trick that turns ChatGPT into its evil twin". Washington Post. 2023. Retrieved 16 February 2023.
^ Perrigo, Billy (17 February 2023). "Bing's AI Is Threatening Users. That's No Laughing Matter". Time. Retrieved 15 March 2023.

Scholia has a topic profile for Prompt engineering.

[1] Alec Radford; Jeffrey Wu; Rewon Child; David Luan; Dario Amodei; Ilya Sutskever (2019), Language Models are Unsupervised Multitask Learners (PDF), Wikidata Q95726769

[2] Pengfei Liu; Weizhe Yuan; Jinlan Fu; Zhengbao Jiang; Hiroaki Hayashi; Graham Neubig (28 July 2021), Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing (PDF), arXiv:2107.13586, Wikidata Q109286554

[3] Tom Brown; Benjamin Mann; Nick Ryder; et al. (28 May 2020). "Language Models are Few-Shot Learners" (PDF). arXiv. Advances in Neural Information Processing Systems. arXiv:2005.14165. doi:10.48550/ARXIV.2005.14165. ISSN 2331-8422. S2CID 218971783. Wikidata Q95727440.

[4] Victor Sanh; Albert Webson; Colin Raffel; et al. (15 October 2021), Multitask Prompted Training Enables Zero-Shot Task Generalization (PDF), arXiv:2110.08207, Wikidata Q108941092

[5] Jason Wei; Xuezhi Wang; Dale Schuurmans; Maarten Bosma; Ed Chi; Quoc Viet Le; Denny Zhou (28 January 2022), Chain of Thought Prompting Elicits Reasoning in Large Language Models (PDF), arXiv:2201.11903, doi:10.48550/ARXIV.2201.11903, Wikidata Q111971110

[6] Liu, Vivian; Chilton, Lydia (2022). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Association for Computing Machinery. pp. 1–23. arXiv:2109.06977. doi:10.1145/3491102.3501825. ISBN 9781450391573. S2CID 237513697. Retrieved 26 October 2022. {{cite book}}: |website= ignored (help)

[7] Stephen H. Bach; Victor Sanh; Zheng-Xin Yong; et al. (2 February 2022), PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts (PDF), arXiv:2202.01279, Wikidata Q110839490

[8] Xiang Lisa Li; Percy Liang (August 2021). "Prefix-Tuning: Optimizing Continuous Prompts for Generation" (PDF). Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers): 4582–4597. doi:10.18653/V1/2021.ACL-LONG.353. Wikidata Q110887424.

[9] Brian Lester; Rami Al-Rfou; Noah Constant (November 2021). "The Power of Scale for Parameter-Efficient Prompt Tuning" (PDF). Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: 3045–3059. arXiv:2104.08691. doi:10.18653/V1/2021.EMNLP-MAIN.243. Wikidata Q110887400.

[10] McAuliffe, Zachary. "Google's Latest AI Model Can Be Taught How to Solve Problems". CNET. Retrieved 10 March 2023.

[weipaper-11] ^ ^a ^b ^c ^d ^e ^f Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903. {{cite journal}}: Cite journal requires |journal= (help)

[12] Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com. Retrieved 10 March 2023.

[13] Tom, Brown; Benjamin, Mann; Nick, Ryder; Melanie, Subbiah; D, Kaplan, Jared; Prafulla, Dhariwal; Arvind, Neelakantan; Pranav, Shyam; Girish, Sastry; Amanda, Askell; Sandhini, Agarwal; Ariel, Herbert-Voss; Gretchen, Krueger; Tom, Henighan; Rewon, Child; Aditya, Ramesh; Daniel, Ziegler; Jeffrey, Wu; Clemens, Winter; Chris, Hesse; Mark, Chen; Eric, Sigler; Mateusz, Litwin; Scott, Gray; Benjamin, Chess; Jack, Clark; Christopher, Berner; Sam, McCandlish; Alec, Radford; Ilya, Sutskever; Dario, Amodei (2020). "Language Models are Few-Shot Learners". Advances in Neural Information Processing Systems. 33.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[14] Dang, Ekta (8 February 2023). "Harnessing the power of GPT-3 in scientific research". VentureBeat. Retrieved 10 March 2023.

[15] Montti, Roger (13 May 2022). "Google's Chain of Thought Prompting Can Boost Today's Best Algorithms". Search Engine Journal. Retrieved 10 March 2023.

[16] Ray, Tiernan. "Amazon's Alexa scientists demonstrate bigger AI isn't always better". ZDNET. Retrieved 10 March 2023.

[17] @Google (May 13, 2022). "Pathways Language Model (PaLM) is a new advanced AI model that uses a technique called chain of thought prompting to do complex tasks like solve math word problems — and even explain its reasoning process step-by-step. #GoogleIO" (Tweet) – via Twitter.

[18] Stokel-Walker, Chris. "AIs become smarter if you tell them to think step by step". newscientist.com. Retrieved 10 March 2023.

[19] "Google & Stanford Team Applies Chain-of-Thought Prompting to Surpass Human Performance on Challenging BIG-Bench Tasks | Synced". syncedreview.com. 24 October 2022. Retrieved 10 March 2023.

[20] "Google I/O 2022: Advancing knowledge and computing". Google. 11 May 2022. Retrieved 10 March 2023.

[21] Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].

[22] Chung, Hyung Won; Hou, Le; Longpre, Shayne; Zoph, Barret; Tay, Yi; Fedus, William; Li, Yunxuan; Wang, Xuezhi; Dehghani, Mostafa; Brahma, Siddhartha; Webson, Albert; Gu, Shixiang Shane; Dai, Zhuyun; Suzgun, Mirac; Chen, Xinyun; Chowdhery, Aakanksha; Castro-Ros, Alex; Pellat, Marie; Robinson, Kevin; Valter, Dasha; Narang, Sharan; Mishra, Gaurav; Yu, Adams; Zhao, Vincent; Huang, Yanping; Dai, Andrew; Yu, Hongkun; Petrov, Slav; Chi, Ed H.; Dean, Jeff; Devlin, Jacob; Roberts, Adam; Zhou, Denny; Le, Quoc V.; Wei, Jason (2022). "Scaling Instruction-Finetuned Language Models". arXiv:2210.11416 [cs.LG].

[23] Wei, Jason; Tay, Yi. "Better Language Models Without Massive Compute". ai.googleblog.com. Retrieved 10 March 2023.

[24] Takeshi Kojima; Shixiang Shane Gu; Machel Reid; Yutaka Matsuo; Yusuke Iwasawa (24 May 2022), Large Language Models are Zero-Shot Reasoners (PDF), arXiv:2205.11916, doi:10.48550/ARXIV.2205.11916, Wikidata Q112124882

[venture1-25] Dickson, Ben (30 August 2022). "LLMs have not learned our language — we're trying to learn theirs". VentureBeat. Retrieved 10 March 2023.

[26] Shaikh, Omar; Zhang, Hongxin; Held, William; Bernstein, Michael; Yang, Diyi (2022). "On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning". arXiv:2212.08061 [cs.CL].

[27] Monge, Jim Clyde (2022-08-25). "Dall-E2 VS Stable Diffusion: Same Prompt, Different Results". MLearning.ai. Retrieved 2022-08-31.

[28] Willison, Simon (12 September 2022). "Prompt injection attacks against GPT-3". simonwillison.net. Retrieved 2023-02-09.

[29] Papp, Donald (2022-09-17). "What's Old Is New Again: GPT-3 Prompt Injection Attack Affects AI". Hackaday. Retrieved 2023-02-09.

[30] Vigliarolo, Brandon (19 September 2022). "GPT-3 'prompt injection' attack causes bot bad manners". www.theregister.com. Retrieved 2023-02-09.

[31] "🟢 Jailbreaking | Learn Prompting".

[32] "🟢 Prompt Leaking | Learn Prompting".

[33] Xiang, Chloe (March 22, 2023). "The Amateurs Jailbreaking GPT Say They're Preventing a Closed-Source AI Dystopia". www.vice.com. Retrieved 2023-04-04.

[NCC-34] Selvi, Jose (2022-12-05). "Exploring Prompt Injection Attacks". NCC Group Research Blog. Retrieved 2023-02-09.

[35] Edwards, Benj (14 February 2023). "AI-powered Bing Chat loses its mind when fed Ars Technica article". Ars Technica. Retrieved 16 February 2023.

[36] "The clever trick that turns ChatGPT into its evil twin". Washington Post. 2023. Retrieved 16 February 2023.

[37] Perrigo, Billy (17 February 2023). "Bing's AI Is Threatening Users. That's No Laughing Matter". Time. Retrieved 15 March 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]