Google Prompt Engineering White Paper

Original Paper Link : https://www.kaggle.com/whitepaper-prompt-engineering?_bhlid=a2bfce2cac67662098bd85a241e7cb000576e5d4

Prompt Engineering

www.kaggle.com

Google에서 정리한 Prompt Engineering White Paper에 대해 간단히 요약하여 정리해 보았습니다🙌

읽고 느낀 점은.. LLM & Prompting은 아직은 여전히 경험적으로 체득해야 하는 부분이 훨씬 더 많은 것 같네요😇

Prompt Engineering

LLM takes sequential text as input then predict the following token should be
- the next token prediction is based on the relationship between what's in the previous tokens and what the LLM has seen during its training
Prompt Engineering is the process of designing high-quailty prompts that guide LLMs to produce accurate outputs
- tinkering to find the best prompt
- optimizing prompt length
- evaluating prompt writing style and structure

LLM Output configuration

Output restriction & Sampling controls
- LLM outputs just causes the LLM to stop predicting more tokens once the limit is reached
- LLMs predict probabilities for what the next token could be, with each token in the LLM's vocab getting a probability
- So, set the proper temperature, top-K, and top-P are most common figures setting
Temperature : controls the degree of randomness in token selection
Top-K and Top-P
- top-k : sampling selects the top K most likely tokens from the model's predicted distribution. higher top-k means the more creative and varied the model's output. ~similar with temperature
- top-k=1 is equivalent to greedy decoding

💡LLM Encoding Strategies

Greedy decoding
Beam search
Top-k & Top-p

System, Contextual and Role Prompting

System Prompting : sets the overall context and purpose for the language models. ➡️ Providing an additional taks to the system, (return an specific structure, like JSON), keep safety and toxicity.
- Instruct the 'big picture' of what the model should be doing
- so, Define the model's fundamental capabilities and overarching purpose
- system prompts can be useful for generating output that meets specific requirements
- ex. translate a language, classifying a review etc.
Contextual Prompting : provides specific details or background information relevant to the current conversation or task
- provides immediate, task-specific information to guide the response
- highly specific to the current task or input, which is dynamic
Role Prompting : assigns a specific character or identify for the language model to adopt
- frames the model's output style and voice. (layer of specificity and personality)
- help the model to generate more relevant and informatiive output
- once the model has been assigned a role, you can then give it prompts that are specific to that role
  - it gives a blueprint of the tone, style and focused expertise you're looking for to improve the quaility, relevance, and effectiveness of one's output
  - styles : confrontational, descriptive, direct, formal, humorous, influential, informal, inspirational, persuasive

Step-back Prompting

prompting the LLM to first consider a general question related to the specific task at hand, and feeding the answer to that general question into a subsequent prompt for the specific task
encourage LLMs to think critically and apply their knowledge in new and creative ways

# 비교
# goal : Write a storyline for a level of a first-person shooter video games.
1. Write a one paragraph storyline for a new level of first-person shooter video game that is challenging and engaging.

2. Based on popular first-person shooter action games, aht are 5 fictional key settings that contribute to a challenging and engaging level storyline in a first-person shotter video game?

✅ 이렇게 2번처럼 주요 setting들을 정하고 ➡️ 이를 다시 prompt 로 입력하면 좀 더 구체적인 output을 얻을 수 있음

Chain of Thought (CoT)

tequnique for improving the reasoning capabilities of LLMs by generating intermediate reasoning steps
CoT appears to improve robustness when moving between different LLM versions
Disadvantages
- takes more output tokens
- more money & take longer
CoT Prompting can be very powerful when combined with a single-shot or few-shot
Use cases
- code generation
- creating synthetic data
As this prompt, the model can be prompted to generate reasoning steps like a human solving a problem
CoT uses a simple greedy decoding
For CoT prompting, set the temperature to 0

Self-Consistency

- self-consistency combines sampling and majority voting to generate diverse reasoning paths and select the most consistent answer
- Steps
  - generating diverse reasoning paths : the LLM is provided with the same prompt multiple times. A high temperature setting encourages the model to generate different reasoning paths and perspectives on the problem ➡️ 우선 temperatrue를 높여서 다양한 reasoning path들을 만든 후 일종의 Majority voting을 진행해서 최적화된 값을 도출하는 방법! (majority voting은 프롬프트 단에서 실행하는 것이 아닌 이후 코드단에서 실행한다.)
  - extract the answer from each generated response
  - choose the most common answer
- Use case
  - email classification system (which is important or not important)
  - use LLM as extract multiple reasoning paths, but majority voting is done by code (not LLM)

ToT (Tree of Thoughts)

ToT generalize the concepts of CoT prompting because it allows LLM to explore multiple different reasoning paths simultaneously
ToT well-suited for complex tasks that require exploration
It works by maintaining a tree of thoughts
- each thought represents a coherent language sequence hat serves as intermediate step toward solving a problem
- model can explorer different reasoning paths by branching out form different nodes

ReAct (Reason & Act)

(Agent Prompt) : Enable LLMs to solve complex tasks using NL reasoning combined with external tools (search, code interpreter etc.)
ReAct prompting mimics how humans operate in the world
works by combining reasoning and acting into a thought-action loop
- LLM uses the observation to update its reasoning and generate a new plan of action, this process contiues until the LLM reaches a solution to the problem

Automatic Prompt Engineering

Method
- write the prompt which will generate the output variants
- evaluate all of the instructions candidates by scoring the candidates based on a chosen metric, (BLEU or ROUGE)
- select the instruction candidate with the highest evaluation score. this candidate will be the final prompt you can use in your software application or chatbot. you can also tweak the selection prompt and evaluate again

ex. we have a band merchandise t-shirt webshop, and to train a chatbot we need various ways to order : "One Metallica t-shirt size S". Generate 10 variants, with the same semantic but keep the same meaning.

Extras

Provide (one or few shot) examples within a prompt
giving the model a reference point or target to aim for, improving the accuracy, style, and tone of its response to better match your expectations
prompts should be concise, clear, and easy to understand for both you and the model
Describe the actions with the verbs below

Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define, Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse, Pick, Predict, Provide, Rank, Recommend, Return, Retrieve, Rewrite, Select, Show, Sort, Summarize, Translate, Write.

Be specific about the desired output
- concise instruction might not guide the LLM enough or could be too generic
Use instructions over constraints
- instructions : desired format, style or content of the response (guides the model on what the model should do or produce)
- constraint : limits what the model should not to do or avoid
  - to prevent the model from generating harmful or biased content or when a strict output format or style is needed
- If possible, use positive instructions : instead of telling the model what not to do, tell it what to do instead

Control the max token length

: Set a max token limit in the prompt

JSON

structured nature of JSON ➡️ requires significantly more tokens than plain text
problematic when the generation is abruptly cut off due to token limits
- ex. missing crucial closing braces or brackets
working with schemas (like json), can give the LLM a clear blueprint of the data it should expect, helping it focus its attention on the relevant information and reducing the risk of misinterpreting the input
schemas can help estabilish relationships between different pieces of data and even make the LLM "time-aware" by including data or timestamp fields with specific formats

{
"type": "object",
"properties": {
"name": { "type": "string", "description": "Product name" },
"category": { "type": "string", "description": "Product category" },
"price": { "type": "number", "format": "float", "description": "Product
price" },
"features": {
"type": "array",
"items": { "type": "string" },
"description": "Key features of the product"
},
"release_date": { "type": "string", "format": "date", "description":
"Date the product was released"}
},

💡json-repair 라이브러리를 사용하면 json 형식 못뽑아주는 애들을 처리할 수 있음

저작자표시 비영리 변경금지 (새창열림)

'AI (인공지능) Paper Review > Gen AI (Large Model)' 카테고리의 다른 글

[Lost in the Middle] How Language Models Use Long Contexts 논문 리뷰 (0)	2025.03.21
[HyDE] Precise Zero-Shot Dense Retrieval without Relevance Labels 논문 리뷰 (0)	2024.07.29
[LoRA] Low-Rank Adaptation of Large Language Models 논문 리뷰 (4)	2024.07.24
[SMoE] Mixtral of Experts 논문 리뷰 (1)	2024.04.28
[Augmented LM] Augmented Language Models : a Survey 논문 리뷰 (1)	2024.01.12

Prompt Engineering

LLM Output configuration

System, Contextual and Role Prompting

Step-back Prompting

Chain of Thought (CoT)

Self-Consistency

ToT (Tree of Thoughts)

ReAct (Reason & Act)

Automatic Prompt Engineering

Extras

Control the max token length

JSON

'AI (인공지능) Paper Review > Gen AI (Large Model)' 카테고리의 다른 글

티스토리툴바