Colab으로 Prompting하기

Colab, HuggingFace를 이용해서 llama base prompting을 해보자

1. 기본 작업

GroqCloud에서 Api key를 받고, HuggingFace에서 llama-2-7b-hf 모델을 어세스 받는다 (나같은 경우 한 시간 만에 승인이 났다)

그리고 Colab을 켠다.

VsCode도 좋긴 한데 .. Colab이 즉각적으로 각 줄마다의 결과를 보기가 좋아서 코랩으로 돌려보았다. vscode는 이래저래 쓸 데이터 파일들이 많을 땐 유용한데 결과가 동적으로 보이지 않아서 좀 아쉽 .. 부분부분 실행하는 것도 직관적이지 않다

!pip install groq
!pip install datasets

from google.colab import userdata
from groq import Groq

2. 모델을 가져와보자

Groq 클래스의 인스턴스를 생성하고, Groq API를 이용해서 언어 모델을 가져온 후 사용자의 입력 메시지와 출력을 지정한다.

api_key = userdata.get('GROQ_API_KEY')
client = Groq(api_key=api_key)

messages = [
    {
        "role": "user",
        "content": "Explain how to use ChatGPT",
    }
]
model_name = "llama3-70b-8192"

chat.completions.create 를 통해 메세지 기반으로 모델 응답을 생성하게 해준다. message는 리스트로 작성이 되며, 지금은 사용자 역할의 메시지를 ‘Explain how to use ChatGPT’ 로 설정해두었다. model= 을 통해 모델명도 지정했다. 사용할 수 있는 모델 리스트는 다양한데, 그 중 70b 모델을 가져왔다. 모델 크기에 따른 답변 생성 품질도 비교해볼 예정이다

response = client.chat.completions.create(
    messages=messages,
    model=model_name,
)

result = response.choices[0].message.content
print(result)

chat_completion.choices 를 통해 chat_completion 객체에서 choice 리스트를 가져온다. 이 중 첫 번째 답변을 가져오기 위해 [0]을 취해준다

이걸 실행해보면 ..

ChatGPT is a large language model trained by OpenAI that can understand and respond to human input in a conversational manner. Here's a step-by-step guide on how to use ChatGPT:

**Accessing ChatGPT**

**Basic Usage*

**Tips and Tricks**

**Advanced Features**

**Limitations**

By following these guidelines and understanding the capabilities and limitations of ChatGPT, you can have a more effective and engaging conversation with this AI model.

이렇게 아주 긴 텍스트를 가져온다.

모델을 작은 사이즈로 갈아끼워보면 ( llama-3.2-1b-preview )

ChatGPT is an AI chatbot developed by the company OpenAI. It is a conversational interface that can simulate human-like conversations, using natural language processing (NLP) and machine learning algorithms to understand and respond to user input.

Here's a brief overview of how to use ChatGPT:

**How to access ChatGPT:**

**Using ChatGPT:**

**Tips and Tricks:**

**Interacting with ChatGPT:**

Remember that while ChatGPT is a powerful tool, its responses may not always be perfect or entirely accurate. It's essential to use it as a starting point for a conversation, rather than the sole source of information.

확실히 좀 다르다 !! 덜 구체적으로 안내해준다 (충분히 구체적이긴 하지만 ㅎ)

너무 길어서 좀 줄이느라 각 항목에 대해 세부 내용을 다 중략시켜놨기 때문에 이렇게 보면 별 차이가 없어 보이지만, 확실히 72b 모델이 더 넓은 범위를 언급하고 자세하게 말해주는 것 같다. 전체 버전을 보면 확실히 차이가 느껴진다.

지금은 프롬포트가 지정되어 있었지만, 우리는 그걸 원하는 게 아니라 사용자가 직접 프롬포트를 입력하는 유연한 동적 설정을 원한다. 따라서 사용자가 프롬포트를 직접 입력할 수 있도록 변경해준다.

이를 위해, 함수의 prompt 매개변수로 프롬포트를 받아서 호출할 때마다 다른 메세지를 전달할 수 있게 만들어준다.

API_KEY = userdata.get('GROQ_API_KEY')

def generate_response(prompt, api_key=API_KEY, model_name="llama3-8b-8192"):
    """
    Generate a chat response using the Groq API.

    Args:
        api_key (str): The API key for authentication.
        model_name (str): The name of the model to use.
        prompt (str): The user's input message for the model.

    Returns:
        str: The response generated by the model.
    """
    client = Groq(api_key=api_key)

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,  
            }
        ],
        model=model_name, 
    )

    return chat_completion.choices[0].message.content

이제 코드 내부를 직접 수정하지 않고 간편하게 모델과 프롬포트를 갈아끼우며 응답을 생성할 수 있다.

generate_response("Tell me the capital of South Korea.")

The capital of South Korea is Seoul. 이라는 응답이 아주 잘 생성되는 것을 확인할 수 있다.

모델을 바꿔서 llama-3.2-1b-preview , llama3-70b-8192 로 실험해봐도 아예 동일한 답변을 내놓았다.

3. 프롬프팅을 ‘잘’ 해보자

그렇다면 이제 우리에게 필요한 것은 ‘좋은 프롬포트’이다.

모델 프롬프팅을 잘한다는 것은 무엇일까?

프롬프팅을 하는 방법은 아주 다양하지만, 그 중 가장 대표적인 4가지 방법을 소개하고자 한다.

Few-shot Prompting: 태스크에 대한 설명과 그 예시들을 같이 모델에 넣어주는 방식이다.

Chain-of-Thought Prompting : CoT는 예시를 넣어줄 때 그 사이 complex한 reasoning 과정에 대해 설명을 넣어주는 방식이다.

Self-Consistency : naive한 greedy decoding을 피하려고 한다. 다양한 논리적 경로를 통해 동일한 정답을 낼 수 있도록 한다.

Plan-and-Solve Prompting : [planning] 문제의 구조를 이해하고 논리적 계획을 수립한 후, [solving] 각 단계를 순차적으로 실행하며 문제를 풀어나간다

우리는 일반적으로 gpt 쓸 때 예시를 넣어주기보다는 그냥 태스크에 대한 설명을 하고 바로 인풋을 넣어주니까 zero-shot prompting에 가까울 것이다. zero-shot도 성능이 좋긴 하지만 few-shot에 비해서는 확실히 떨어지는 태스크들이 있는 것 같다.

프롬프팅의 가장 기본이 될 Zero-shot Prompting부터 해보자.

(1) Zero-shot Prompting

간단하게 태스크 설명을 prompt에 지정해주었고, 이를 우리가 아까 만든 generate_response 모델에 넣는다.

zero_shot_prompt = '''Classify the text into neutral, negative or positive.
Text: I think the vacation is okay.
Sentiment:
'''
generate_response(zero_shot_prompt)

결과는 다음과 같이 생성된다.

I would classify this text as "neutral". The word "okay" is a neutral term that doesn't express strong feelings of positivity or negativity. It suggests that the vacation is mediocre or average, rather than excellent or poor.

(2) Few-shot Prompting

태스크와 그 예시를 넣어준다. 내가 어떤 포맷으로 넣는지도 중요하다.

few_shot_prompt = '''
Classify the text into neutral, negative or positive.
[Example]
This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! //
'''
generate_response(few_shot_prompt)

What a horrible show! // Negative

신기하게 태깅을 랜덤하게 맘대로 해도 답을 잘 내줬다. This is bad를 Positive로 분류하고 그랬는데도 .. 너무 신기함. 그리고 이렇게 해도 Zero-shot Prompting보다 성능이 훨씬 좋다고 한다. (지피티 쓸때 참고하자. )

(3) Chain-of-Thought Prompting

CoT는 Few-shot처럼 예시를 여러 개 제공하지만, 답을 도출하는 과정을 함께 넣어주는 것이다. 성능에 얼마나 차이가 있는지 비교해보고자 한다.

아래는 그냥 Few-shot이다. llama3-8b-8192 모델로 진행해보았다.

prompt = '''
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: False.

The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: True.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A: 
'''
generate_response(prompt,model_name='llama3-8b-8192')

A nice sequence of questions! Let's calculate the sum of the odd numbers in the group:15 + 5 + 13 + 7 + 1 = 42 And... 42 is an even number! So, the answer is: A: True

틀린 답을 내놓았다.

(근데 신기한 점: 계속 답이 사방으로 튀었는데 한 10번 정도 돌리니까 41로 stable하게 나왔다.)

Chain of Thought를 적용해보자.

prompt = '''
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.

The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.

The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.

The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A: 
'''
generate_response(prompt,model_name='llama3-8b-8192')

The odd numbers in this group are (15, 5, 13, 7, 1). Adding them up gives 41. The answer is False. 

얘는 처음부터 stable하게 41을 내놓았다.

그치만 사실 mixtral-8x7b-32768 모델을 적용해보았을 때는 Few-shot에서 맞는 답, CoT에서 틀린 답을 내놓았다. llama-3.2-1b-preview 모델을 적용해보았을 때는 둘 다 틀린 답을 내놓았다. 일관적이지 않은 것 같다 (원래 모든 태스크에 대해 완벽하게 적용되지는 않으므로).

그리고 지금 태스크와 같이 아주 간단한 과제의 경우 llm이 이미 잘 알고 있는 경우가 많다. 이런 경우보다 흔하지 않은 태스크를 주었을 때 성능 차이가 확연하게 나타난다.

(4) Self-Consistency

naive하지 않고, 다양한 경로를 통해 문제에 대한 답을 탐색할 수 있도록 한다. 여기서는 응답을 5번 생성하도록 해보았다.

prompt = '''
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:
'''
for i in range(5):
  print(f'{i}번째 응답')
  print(generate_response(prompt))
  print('\n')

그럼 아주 길게 답변을 생성해줄텐데 ..

5개의 응답 중

3개: odd number
2개: even number

=> Majority vote 결과: odd number으로 결론이 나게 된다.

(5) Plain-and-Solve Prompting

문제의 논리적 구조에 따라 단계를 수립하고, 각 단계별로 문제를 풀어나가도록 한다.

prompt = '''
Let's first understand the problem and devise a plan to solve the problem. Then, let's carry out the plan to solve the problem step by step.
Add up the odd numbers in this group: 15, 32, 5, 13, 82, 7, 1.
A:
'''

generate_response(prompt)

Let's break down the problem and come up with a plan to solve it.\n\nProblem: Add up the odd numbers in the group: 15, 32, 5, 13, 82, 7, 1.\n\nPlan:\n\n1. Identify the odd numbers in the group.\n2. Add up the odd numbers.\n\nLet's start by identifying the odd numbers in the group:\n\n* 15\n* 5\n* 13\n* 7\n* 1\n\nThese are the odd numbers in the group. Now, let's add them up:\n\n1. Add 15 + 5 = 20\n2. Add 20 + 13 = 33\n3. Add 33 + 7 = 40\n4. Add 40 + 1 = 41\n\nThe sum of the odd numbers is 41.

와우 !! 확실히 훨씬 논리적인 구조로 답을 도출하고 있는 것을 볼 수 있다. 좀 더 ‘데이터를 풀었다’기 보다 ‘태스크를 풀었다’ 에 가까워진 것 같다