It can not even answer this question: 一公斤的棉花和一公斤的铁，哪一个更重？

#16

by lucasjin - opened Nov 27, 2023

Discussion

lucasjin

Nov 27, 2023

It can not even answer this question: 一公斤的棉花和一公斤的铁，哪一个更重？

Curia

Nov 27, 2023

Which template are you using, and have you considered to ask the model in English instead?

Curia

Nov 27, 2023

Welp, I tested multiple way to massage the model in English, and it seems to insist density somehow matter in the question of weight, thus iron is heavier.
Model doesn't give correct answer unless I ask in very specific manner and template, but then I start asking the same question to a few other model, same old issue.
It is what it is I suppose, LLM truthfulness is always problematic when the internet (which presumably made up orca 2 dataset) can't make up their mind of this simple question to begin with, I blame dumb human.

pandora-s

Nov 27, 2023

Well, that's what happens when AI is trained with human data. it will always have some problems, even more remarquable when dealing with dilemas and commun misconceptions. However, it also depends on the models, we are talking about a 13B model here, it's a powerfull one but we cannot expect it to have greater reasonning than humans. Even the most powerfull ones like llama 70B that are capable of answering your question correctly from what I tried still lack a lot in reasonning, but we are getting there ! To be honnest, I am still surprised how well these models with 13B and even 7B params are doing, and cannot wait to see these models being even more optimised.

MCnus

Nov 27, 2023

•

edited Nov 27, 2023

I got a correct answer. slightly restructured. And not in first try.

The model is from https://huggingface.co/TheBloke/Orca-2-13B-GGUF
orca-2-13b.Q6_K.gguf

pandora-s

Nov 27, 2023

Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.

ntphu

Nov 29, 2023

This comment has been hidden

acrastt

Dec 1, 2023

Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.

LLMs should be deterministic unless you sample.

Wei-ge

Dec 1, 2023

@lucasjin Works for me off the bat.

pandora-s

Dec 2, 2023

Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.

LLMs should be deterministic unless you sample.

Or... unless you change the prompt like i did? You can always add some randomness to it, like having a previous conversation with it or rephrase the prompt.

JoyJosh887

Jan 11, 2024

I am not able to run a model properly. I am on RTX4000, and it takes a lot of time to process a single answer. Do we have any solution for that

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment