Original Paper Link : https://www.kaggle.com/whitepaper-prompt-engineering?_bhlid=a2bfce2cac67662098bd85a241e7cb000576e5d4
Prompt Engineering
www.kaggle.com
Google에서 정리한 Prompt Engineering White Paper에 대해 간단히 요약하여 정리해 보았습니다🙌
읽고 느낀 점은.. LLM & Prompting은 아직은 여전히 경험적으로 체득해야 하는 부분이 훨씬 더 많은 것 같네요😇
Prompt Engineering
- LLM takes sequential text as input then predict the following token should be
- the next token prediction is based on the relationship between what's in the previous tokens and what the LLM has seen during its training
- Prompt Engineering is the process of designing high-quailty prompts that guide LLMs to produce accurate outputs
- tinkering to find the best prompt
- optimizing prompt length
- evaluating prompt writing style and structure
LLM Output configuration
- Output restriction & Sampling controls
- LLM outputs just causes the LLM to stop predicting more tokens once the limit is reached
- LLMs predict probabilities for what the next token could be, with each token in the LLM's vocab getting a probability
- So, set the proper temperature, top-K, and top-P are most common figures setting
- Temperature : controls the degree of randomness in token selection
- Top-K and Top-P
- top-k : sampling selects the top K most likely tokens from the model's predicted distribution. higher top-k means the more creative and varied the model's output. ~similar with temperature
- top-k=1 is equivalent to greedy decoding
💡LLM Encoding Strategies
- Greedy decoding
- Beam search
- Top-k & Top-p
System, Contextual and Role Prompting
- System Prompting : sets the overall context and purpose for the language models. ➡️ Providing an additional taks to the system, (return an specific structure, like JSON), keep safety and toxicity.
- Instruct the 'big picture' of what the model should be doing
- so, Define the model's fundamental capabilities and overarching purpose
- system prompts can be useful for generating output that meets specific requirements
- ex. translate a language, classifying a review etc.
- Contextual Prompting : provides specific details or background information relevant to the current conversation or task
- provides immediate, task-specific information to guide the response
- highly specific to the current task or input, which is dynamic
- Role Prompting : assigns a specific character or identify for the language model to adopt
- frames the model's output style and voice. (layer of specificity and personality)
- help the model to generate more relevant and informatiive output
- once the model has been assigned a role, you can then give it prompts that are specific to that role
- it gives a blueprint of the tone, style and focused expertise you're looking for to improve the quaility, relevance, and effectiveness of one's output
- styles : confrontational, descriptive, direct, formal, humorous, influential, informal, inspirational, persuasive
Step-back Prompting
- prompting the LLM to first consider a general question related to the specific task at hand, and feeding the answer to that general question into a subsequent prompt for the specific task
- encourage LLMs to think critically and apply their knowledge in new and creative ways
# 비교
# goal : Write a storyline for a level of a first-person shooter video games.
1. Write a one paragraph storyline for a new level of first-person shooter video game that is challenging and engaging.
2. Based on popular first-person shooter action games, aht are 5 fictional key settings that contribute to a challenging and engaging level storyline in a first-person shotter video game?
✅ 이렇게 2번처럼 주요 setting들을 정하고 ➡️ 이를 다시 prompt 로 입력하면 좀 더 구체적인 output을 얻을 수 있음
Chain of Thought (CoT)
- tequnique for improving the reasoning capabilities of LLMs by generating intermediate reasoning steps
- CoT appears to improve robustness when moving between different LLM versions
- Disadvantages
- takes more output tokens
- more money & take longer
- CoT Prompting can be very powerful when combined with a single-shot or few-shot
- Use cases
- code generation
- creating synthetic data
- As this prompt, the model can be prompted to generate reasoning steps like a human solving a problem
- CoT uses a simple greedy decoding
- For CoT prompting, set the temperature to 0
Self-Consistency
-
- self-consistency combines sampling and majority voting to generate diverse reasoning paths and select the most consistent answer
- Steps
- generating diverse reasoning paths : the LLM is provided with the same prompt multiple times. A high temperature setting encourages the model to generate different reasoning paths and perspectives on the problem ➡️ 우선 temperatrue를 높여서 다양한 reasoning path들을 만든 후 일종의 Majority voting을 진행해서 최적화된 값을 도출하는 방법! (majority voting은 프롬프트 단에서 실행하는 것이 아닌 이후 코드단에서 실행한다.)
- extract the answer from each generated response
- choose the most common answer
- Use case
- email classification system (which is important or not important)
- use LLM as extract multiple reasoning paths, but majority voting is done by code (not LLM)
ToT (Tree of Thoughts)
- ToT generalize the concepts of CoT prompting because it allows LLM to explore multiple different reasoning paths simultaneously
- ToT well-suited for complex tasks that require exploration
- It works by maintaining a tree of thoughts
- each thought represents a coherent language sequence hat serves as intermediate step toward solving a problem
- model can explorer different reasoning paths by branching out form different nodes
ReAct (Reason & Act)
- (Agent Prompt) : Enable LLMs to solve complex tasks using NL reasoning combined with external tools (search, code interpreter etc.)
- ReAct prompting mimics how humans operate in the world
- works by combining reasoning and acting into a thought-action loop
- LLM uses the observation to update its reasoning and generate a new plan of action, this process contiues until the LLM reaches a solution to the problem
Automatic Prompt Engineering
- Method
- write the prompt which will generate the output variants
- evaluate all of the instructions candidates by scoring the candidates based on a chosen metric, (BLEU or ROUGE)
- select the instruction candidate with the highest evaluation score. this candidate will be the final prompt you can use in your software application or chatbot. you can also tweak the selection prompt and evaluate again
ex. we have a band merchandise t-shirt webshop, and to train a chatbot we need various ways to order : "One Metallica t-shirt size S". Generate 10 variants, with the same semantic but keep the same meaning.
Extras
- Provide (one or few shot) examples within a prompt
- giving the model a reference point or target to aim for, improving the accuracy, style, and tone of its response to better match your expectations
- prompts should be concise, clear, and easy to understand for both you and the model
- Describe the actions with the verbs below
Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define, Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse, Pick, Predict, Provide, Rank, Recommend, Return, Retrieve, Rewrite, Select, Show, Sort, Summarize, Translate, Write.
- Be specific about the desired output
- concise instruction might not guide the LLM enough or could be too generic
- Use instructions over constraints
- instructions : desired format, style or content of the response (guides the model on what the model should do or produce)
- constraint : limits what the model should not to do or avoid
- to prevent the model from generating harmful or biased content or when a strict output format or style is needed
- If possible, use positive instructions : instead of telling the model what not to do, tell it what to do instead
Control the max token length
: Set a max token limit in the prompt
JSON
- structured nature of JSON ➡️ requires significantly more tokens than plain text
- problematic when the generation is abruptly cut off due to token limits
- ex. missing crucial closing braces or brackets
- working with schemas (like json), can give the LLM a clear blueprint of the data it should expect, helping it focus its attention on the relevant information and reducing the risk of misinterpreting the input
- schemas can help estabilish relationships between different pieces of data and even make the LLM "time-aware" by including data or timestamp fields with specific formats
{
"type": "object",
"properties": {
"name": { "type": "string", "description": "Product name" },
"category": { "type": "string", "description": "Product category" },
"price": { "type": "number", "format": "float", "description": "Product
price" },
"features": {
"type": "array",
"items": { "type": "string" },
"description": "Key features of the product"
},
"release_date": { "type": "string", "format": "date", "description":
"Date the product was released"}
},
💡json-repair 라이브러리를 사용하면 json 형식 못뽑아주는 애들을 처리할 수 있음
'AI (인공지능) Paper Review > Gen AI (Large Model)' 카테고리의 다른 글
[Lost in the Middle] How Language Models Use Long Contexts 논문 리뷰 (0) | 2025.03.21 |
---|---|
[HyDE] Precise Zero-Shot Dense Retrieval without Relevance Labels 논문 리뷰 (0) | 2024.07.29 |
[LoRA] Low-Rank Adaptation of Large Language Models 논문 리뷰 (4) | 2024.07.24 |
[SMoE] Mixtral of Experts 논문 리뷰 (1) | 2024.04.28 |
[Augmented LM] Augmented Language Models : a Survey 논문 리뷰 (1) | 2024.01.12 |