I gave GPT-4 a simple real-world question about how much alcohol volume there is in a certain weight (I think 16 grams) of a 40% ABV drink (the rest being water) and it gave complete nonsense answers on some attempts, and straight up refused to answer on others.
So I guess it still comes down to how often things appear in the training data.
(the real answer is roughly 6.99ml, weighing about 5.52grams)
After some follow-up prodding, it realized it’s wrong and eventually provided a different answer (6.74ml), which was also wrong. With more follow-ups or additional prompting tricks, it might eventually get there, but someone would have to first tell it that it’s wrong.
I gave GPT-4 a simple real-world question about how much alcohol volume there is in a certain weight (I think 16 grams) of a 40% ABV drink (the rest being water) and it gave complete nonsense answers on some attempts, and straight up refused to answer on others.
So I guess it still comes down to how often things appear in the training data.
(the real answer is roughly 6.99ml, weighing about 5.52grams)
After some follow-up prodding, it realized it’s wrong and eventually provided a different answer (6.74ml), which was also wrong. With more follow-ups or additional prompting tricks, it might eventually get there, but someone would have to first tell it that it’s wrong.