Here is the second, mostly unedited log of the second session trying to get ChatGPT to "Play Gemstone IV"
https://pastebin.com/f2gPxegz
Notes and commentary, first, Pastebin issues, apparently the log was way too long when I tried to include the entirety of the log from the first session pasted verbatim to the chatbot. Interestingly, the chatbot had a big problem with this as well. It said it "read" the entire thing, but then when I asked it about anything more than ~50 lines or whatever in, it swore up and down that I never told it about that, it did not exist in the log. Also it definitevely told me what it considered the "end" of the log, which was clearly not the end of the log data I sent to it. Refer back to the game sesion log #1 pastebin in my first post, that is what I sent to the chatbot in its entirety and it just plain couldn't/wouldn't deal with it. Obviously (I think, to me) this is an artificial restriction imposed on the beta public release, an official OpenAI researcher/programmer could almost certainly bypass this and get a different result. I could have probably worked around this to some extent as well by pasting the original log in smaller segments but I think I also ended up reaching some kind of buffer where the chatbot started forgetting things I had told it earlier when I have never found that to be the case as long as it is the same "session" (yes this is all in the same ChatGPT session.)
Next, it did fairly well "picking up" where it left off on the sprite quest. It saw the sprite, nodded to it and then made some decisions when I fed it the relevant data in a bit more condensed format. I admit, this is a bit of a fudge/poke/prod to jumpstart it, clearly the thing is not designed to play a game like this. Then it kind of seems to whiff on the injured child" quest, but part of that was just me not trying too hard to do some of the more complicated things (go somewhere, forage herbs; or go somewhere, buy herbs) then when the kid died, we ran into some "timeout" issues because the chatbot is pretty slow. If there were a way that this was running "locally" or tied directly to the game output I think you could end up getting some much better (or at least different) results here. Most of the times it did make a decision, the game had already moved the quest forwards and invalidated its response. But some of my favorite bits were when I said "while you were thinking, game sent XYZ, does this change your response" and seeing that it usually did reconsider in a pretty interesting way.
Then we finally moved on to hunting. First, it failed pretty miserably at what we GS'ers know about managing two open hands and containers/inventory. It KIND of tried, but it clearly doesn't truly have the framework for managing this kind of an inventory system, so I fudged it and we moved on to actual hunting. I'm going a bit off memory here, but I think it did pretty decent grasping the concept of combat. It clearly suffered from delay and processing issues, and I admit I had to just bite the bullet and kind of "showed" it what combat was supposed to be, but I think in general it caught on and was generating responses that indicate to ME that if it were "real time processing" the game data and altered to "learn" from the actual responses, I truly think it could have figured out the sequence, if not the timing.
Then we finish the sprite quest and move to "open world". It seemed to realize the variety of options available so that's something. I tried to ask it specifically what to do, but it kept telling me everything it knew was possible, plus some other stuff it made up. So I jsut went with its "first" aka "highest" response and we decided to hunt rats in the well.
It couldn't find the well, but I coaxed it into remembering what the sprite said and it again kind of told me conceptually what it wanted to do (go the the Thirsty Penguin Inn and then go south and west) even though it failed to generate the specific sequential commands to execute that. So we fudged some of the more complicated sequences and resume at the well.
In the well hunting rats was where I felt things went really well, and really poorly. Bullet points:
- Mapping:
ChatGPT clearly started to grasp on some level the mapping of the game world based on Lich Room numbers. Sometimes there were very clear sequences of it saying I want to go down, down, down, and move from room 1 to 2 to 3. Sometimes it tried to move in invalid directions, and move to unconnected rooms so that's bad.
- Who is playing the game:
I tried to spend a lot of time convincing ChatGPT that it was the one playing the game and sending me commands, then I would send the responses from the game world. Again, I feel that a better questioner, or a researcher with access to "source code" could convince it to do this right, but here we are. I kept getting the impression that ChatGPT was trying to "invent" the game and play DM with the imput I was sending it and generate up new content so that _I_ could play the game that it was controlling? This to me was the biggest issue with the entire experiment. I just couldn't consistently get the chatbot to act like it was the one playing the game. I get that it is completely outside its stated "purpose" but my other experiences with ChatGPT have shown it to be pretty flexible.
- Choices:
On the flip side of the coin however, I also sensed it "grasping" the game world quite well, it wanted to see rats, and it either sent me "wishful" commands that included rats that weren't there, or attacked rats that weren't there, clearly not "staying on the page" with me sending it responses of what the game was actually indicating was the situation the player character was currently in. Very tricky ground IMHO, yes it got some things, but was it playing? no.
Overall, I have to put these results in "low-to-middling" at best. I see a lot of potential. But it clearly, in its current state (with ME running middleman) I feel pretty confident saying that ChatGPT can NOT play Gemstone IV. I think that either a) a more competent questioner of b) an in-house researcher who can more clearly design the experience and learning environment; would be required for a scenario in which this ChatGPT program might conceivably "play" this game. I don't think the hurdles are HUGE though. I think that compared to ancient WizardFE "cmd" scripts, even high end Lich scripts, this has even more potential to truly be able to interpret the game world and "play" in a more or less undetectable, non-bot-like manner. Hypothesizing heavily here, but I think that if you could narrowly define the scope and goals, a self-contained "ChatGPT Script" could run some automated tasks and handle interaction, script checks, unexpected results, and exceptions better than any script that I'm familliar with. The examples I am thinking of are things like picking, hunting, forging, healing, fletching, cobbling.
@gilchristr: Yeah, if all you wanted to do was set up a chatbot in game, I think that based on my experiences so far this could handle that with minimal API/coding, even in its current state. Hell, just go make a free account and start hitting on it, it'll agree you to marry you soon enough. Probably faster than you would think! If anything, it would need to be made more conservative, or at least learn to assign appropriate relative value to goods and services. Additionally, I dont think there is much if any coding in place for it to grasp the concept of a social environment. I think if you dumped it into Gemstone today, it would agree to marry Character X, Character Y, and Character Z all at the same time without any recognition that those three characters might not like that, or why they were mad with each other, or mad with IT. That may be a COMPLETELY different chatbot that could handle something like that. Honestly, I think that is the thing it is least likely to be able to handle in its current state, a relational framework involving multiple different entities, and somehow weighting different histories, "feelings", favors, type of information (social stuff). That sounds like an entire new generation of chatbot to be honest. So I guess my answer to your question is both yes and no, if the guy just wants his own personal "dhu kitten with benefits" that seems plausible, but a Bot-Character who loves him and only him but can spurn advances from other-comers? Not as likely.
I'm not entirely sure that I'm going to try this experiment again, the results seem pretty definitive to me that it wont really "pick up" the game, remember, learn, and explore on its own in its current state. It clearly needs some additional memory (reading previous logs, persistent variables to track hand/inventory status, long term goals, character stat/skill tracking (we didn't even scratch the surface here but I have close to zero hope current ChatGPT would do well), and as I kind of mentioned in reponse to gilchristr, social skills) in order to succeed and realistically play Gemstone IV as well as a human player. But the things that it does have going for it are HUGELY advanced compared to what I was expecting. When I sent it carefully crafted prompts of game data and clear expectations of command/response expected from it, I think it did surprisingly well on a case by case basis. This shows me that it has the BIGGEST MAJOR skill necessary to play this game which is that it can read the game logically, and "understand" it in a way that I wont go as far as to call "comprehension" but yeah, something very very close to that.
While I can't call it a success because it did not just learn everything about Gemstone and start playing on its own, I will say that I had fun trying, and am pleasantly surprised that it did as well as it did, or seemed to do. I would really love to hear from you all what you think about this, or see someone @dreaven @tgo01 @tillmen @spiffyjr @rinualdo (not you, keep fixing Lich) else give it a shot!