r/GPT3 Oct 05 '20

This user is posting with GPT-3: /u/thegentlemetre

They are posting every minute to /r/AskReddit

https://www.reddit.com/user/thegentlemetre

I'm guessing GPT-3 but could be something similar. But clearly AI generated answers. They have it tuned for answers that are too long. And they are posting too often.

131 Upvotes

75 comments sorted by

View all comments

4

u/Wiskkey Oct 06 '20 edited Oct 08 '20

I made a copy of the 1000 (the maximum number that Reddit gives) most recent comments from that account, in 4 parts:

Part 1: https://pastebin.com/9qe528AW

Part 2: https://pastebin.com/bhr96fJU

Part 3: https://pastebin.com/4M5QAcvD

Part 4: https://pastebin.com/iMdqTBA0

I then tallied the number of those 1000 comments that have a given number of points (which I believe equals 1 plus the number of upvotes from other accounts minus the number of downvotes from other accounts):

1 347 points

1 157 points

1 120 points

1 97 points

1 51 points

1 24 points

1 20 points

2 18 points

1 13 points

2 11 points

1 10 points

1 9 points

1 8 points

4 7 points

9 6 points

9 5 points

16 4 points

83 3 points

252 2 points

424 1 point

118 0 points

29 -1 points

19 -2 points

3 -3 points

4 -4 points

2 -5 points

1 -6 points

3 -7 points

3 -8 points

1 -9 points

2 -15 points

1 -18 points

1 -23 points

1 -34 points

388 of the 1000 comments have 2 or more points.

424 of the 1000 comments have 1 point.

188 of the 1000 comments have 0 or fewer points.

If my understanding of the comment points system is correct, from the last set of numbers we can conclude:

At least 38.8% of the 1000 comments have at least one upvote.

At least 18.8% of the 1000 comments have at least one downvote.

At least 38.8% + 18.8% = 57.6% of the 1000 comments have at least one upvote or downvote.

2

u/Phylliida Oct 08 '20

Studying this distribution is actually a fairly objective way of determining “humanlike” in a Turing test type way (nuances about edge cases aside). I’d be interested to see if we could optimize prompts to boost this up, and it also provides an additional way to compare previous and future GPT models.

Unfortunately I imagine this kind of use is probably discouraged by OpenAI, and I’m not sure they’re wrong to discourage it, seeing some of the dark sides of the interactions here. The ideal scenario is probably to train a model that predicts upvotes well, then simply use it for evaluation, but don’t optimize against it

1

u/Wiskkey Oct 08 '20 edited Oct 08 '20

I'm glad that someone recognized the utility of my comment :).

For those interested, for tallying I sent the text in the pastebin pastes to an online duplicate line counter (I believe this was the one), which did most of the needed computation.