Still No Turkish Coffee, Audit Pipelines, and ButcherBox Automation

TJ Miller (00:00)
Welcome back to the Slightly Caffeinated Podcast. I'm TJ Miller. So Chris, what's new in your world,

Chris Gmyr (00:04)
and I'm Chris Gmyr

Yeah, been a couple weeks. Been busy at work and also at home. So we did some traveling for the holiday over Easter. So I ended up taking off last week from Thursday to this past Monday. So extra long weekend. Went down to Georgia to visit some of my wife's family. So I got to see her two sisters and dad and our niece. Kids had a good time. But yeah.

just short weeks, you know, cramming everything into before leaving and then after coming back. So just trying to catch up a little bit and do all the things. So good trip. Had a little bit of coffee here and there and some cafes. Nothing too crazy, but had like a nitro and red eye and just some regular drip coffee. So nothing too exciting, but just better than hotel coffee.

TJ Miller (00:44)
Yeah, nice.

Chris Gmyr (01:00)
So yeah.

TJ Miller (01:01)
Yeah,

for sure. Hotel coffee is maybe the worst.

Chris Gmyr (01:05)
Yeah,

they had some little Keurigs in the in the rooms, but I'm like, no, I'm not. I'm not going to waste my time with this. So we'll just go out and get something on the way over to their house. So, yeah, not too bad. So, yeah. How's everything been for you the last couple of weeks?

TJ Miller (01:08)
Ha!

Yeah, for sure.

Nice.

man, it's been, I feel like it's been a total blur. You know, my mental health has been really bad. been just in a super deep depression. It's been really hard to get out of. So I've been kind of like struggling with that. But also just really nose to the grindstone at Luma. And I'm working on some really cool stuff there that we can talk about today.

But I've just been really focused on that and trying to work on some mental health and relationship stuff. So we also got a bird. So we're now five dogs, three cats, and a bird. I don't know if we mentioned it. I don't think I mentioned that yet. But yeah, we got a bird. It is very much my wife's bird. But I've been enjoying hanging out with it. I've.

spent some time around birds, but like, I've been floored with how affectionate this little bird is. It's so cute and adorable and will just like snuggle on you. It'll like hop up on your shoulder and like nuzzle your face. It's just the cutest. So yeah, we're, we're bird people now.

Chris Gmyr (02:22)
Cool.

That's awesome. Expanding the zoo, adding different types of animals. Yeah, we've talked about, because our son, for whatever reason, has mentioned a bird multiple times. But it's like, no, dude, you have a hard time just helping feed the dog and remembering that, let alone a bird that could live for 20 years. And we got to deal with them.

TJ Miller (02:27)
Uh-huh.

Yeah, and they're

like a permanent three-year-old. very smart, needy, troublesome. So like, really is, especially, I mean, depending on what kind of bird you get, but they're incredibly intelligent. So it's like, yeah, you've got like 20 years of a permanent three-year-old.

Chris Gmyr (02:53)
you

Yep, we

already got a human three year old. We don't need a bird three year old for 20 years.

TJ Miller (03:08)
Yeah, yeah, right.

Geez. So I don't know if anything like super exciting has happened over the gap that we've had outside of like that. Like, I had a fun story that I'll share. Had a concert over the weekend with my wife and we're like, hey, let's go to that restaurant that does the Turkish coffee. And I was like, sweet, let's do it.

And so I put in the name and like only one place came up and like it looked like it was maybe a little far away, but we were like hiding out the door. So I just kind of like showed her my phone. like, this is the place, right? She's like, yeah. So we go there and it was definitely not the right place. Like it took us through Detroit. Like we were, the concert was in Detroit, like downtown, but it took us like a little past the city into like.

the hood, like random middle of nowhere. Like, and we walk in and it's like, it's just, it's definitely not the right place, but it's the right type of food, right? Like it was, it was like Venezuelan food. So we get in there, like nothing's in English, like, which is.

I feel like is typically a good sign, right? It's going to be like very authentic, right? But like nothing in there is in English. And so we're like stumbling our way through. finally figure out what to get and the like, they just like, they're like, this is what we're known for. Like, yeah, let's do that then. And it was incredible. Like it was so good. I think it may have even been better than the place that we were going to go originally. However,

I didn't get my Turkish coffee again. So, but for just being such a like random accident, like running into this place, it was just incredible food. so it worked out well, but I thought it was funny cause it kind of tied back into the Turkish coffee and all this. Like, I didn't know that was fun. And the concert was incredible. Like I had a really good time too. So.

Chris Gmyr (04:46)
No.

Nice, that's awesome.

TJ Miller (05:09)
Yeah,

fun times.

Chris Gmyr (05:12)
Very cool. Well, yeah. Sorry that you didn't get your Turkish coffee again, but I guess you'll have to go back and try again.

TJ Miller (05:18)
Yeah, we'll get there eventually. I don't know. I think I've got my like 30 day anniversary coming up for Luma. And I think maybe I'll treat myself and go out for that coffee finally. I don't know. I keep saying that. We'll see.

Chris Gmyr (05:31)
There you go. We'll hold you to it.

We'll see. Nice. yeah. Sorry, the mental health has been not going so well. But yeah, take the time that you need and take care of yourself and do the things that are good and right for you. And just day by day.

TJ Miller (05:49)
Yeah, man, it's just kind of like getting back to the basics and kind of focusing on ⁓ just solidifying like a solid foundation. So I've been off of socials. I've not really been giving the prison repo the attention that it needs, but I've just I've had I've had to take a break. Luckily, I'm like starting some new meds today and we'll we'll see how that helps everything out.

But yeah, it kind of gets back to what we were talking about a couple episodes ago, just taking that time to get re-centered, refocused. And I'm starting to feel inspired to work on stuff again, which is really nice. So we'll talk a little bit about that. I've been back on the IRS train working on some feature sets there and lots of cool stuff I've been doing at Luma.

feeding the creative juices still. Hopefully we'll see more of that come soon.

Chris Gmyr (06:40)
Yeah, that's awesome. Well, sweet. We already talked a little bit about coffee, so we can move on to you had some audit work for stuff.

TJ Miller (06:49)
Yeah.

So I think what I'm building has been super cool at Luma. So they kind of build up to what I'm building. There's a little bit of information you need. we have these things called eNuggets. you can think about them as a unit of learning. an eNugget could be self-contained, where it's just like it's got

complete information that you would need to know coming out of the learning. Or it can be part of a series of eNuggets. that whole series, that's got all the complete information for the track, right? So we primarily focus on the transportation industry, like trucking industry. So you'll have a series that is hazardous.

material handling or something like that, right? So it's a series of many eNuggets that cover all the, like teaching you all the regulations around hazardous material transportation or handling or whatever. So does that all make sense? So like a trucker or somebody would like end up like getting this course assigned to like this series of eNuggets assigned to them. They'd go through all the learning and like you can have

Chris Gmyr (07:51)
Yeah, yeah, sense.

TJ Miller (08:03)
quizzes and all sorts of stuff inside of the eNuggets. They're very flexible. So we have those. And one of the first things that they had as an idea for me to work on was a system to take these eNuggets and audit them, do gap analysis to see if we're missing out on opportunities or if they're

you know, regulations have changed and maybe the information's outdated. And then, so like that was kind of the first thing, right? So let's like do a compliance audit on them and then do some gap analysis and then just kind of like spit out a summary of like here's, you know, here's what we think about the individual E-nugget or as a series of E-nuggets because you could think about like

an eNugget on its own in isolation that's part of a series, that audit might come back saying, it's got a ton of gaps of information. But then when you zoom out and look at the series, well, the rest of those eNuggets in the series, that has all the rest of the information that that eNugget had like gaps in. So there are two processes. We first do the individual eNugget. We do like the audit and gap analysis.

And then we use all of those individual audit and gap analysis to then do an analysis on the whole series. And then that kind of spits out a report as well. So the pipeline kind of looks like this. It's all like Laravel and Prism. So what the pipeline first does is takes an eNugget, pipes it through Prism.

And what it first does is like it classifies the content of the eNugget as one of three domains. So it's either going to be HR, OSHA, or FMCSA, which is like transportation governing body standards. So it first does a structured output analysis till then like first like classify it can be either like one or many domains. I think eNuggets

are typically thought of as a one-to-one domain, but there are overlaps. So I just let the system look at all the domains. So that's a first structured output pass that just takes in some prompting, nugget content, spits out an array of one or many of the values for the domains. Then.

After it's classified, for each domain that's identified, it goes off and does first a deep research phase. So it's using Anthropix Web Fetch and Web Search tool. Kind of an interesting thing here is the end goal is to use structured output to get us a value object, basically. Like something that we can then, like we get this deep associative array.

of all the information, we can pipe it into a value object and then we can like persist it to the database. So like the end goal is the structured output. But one of the things that I was most concerned about this system and I've had issues with is hallucinations, right? Like it's like hallucinating citation URLs because I believe in like transparency and human in the loop audit ability of like all of this stuff.

like having accurate citations saved. So it's like, you can then do that mapping of, well, it says this regulation coming from, you know, this URL, like this is where that information came from. We need to be able to verify that because this is like regulatory stuff. first it was like just straight to structured output. It was like hallucinating citations. It was hallucinating regulation codes. It was just like making all this stuff up.

Even though it did all the valid research, digging a little deeper, Anthropic, their API doesn't support citations and structured output at the same time. So what I had to do is actually break this apart into two steps where it does all the research using text output. So we just get a big blob of text at the end. But what we get is we actually get

the real citations as part of that response through Prism so that we actually have like the actual URLs, the actual citation quoted text. We have all of that and that's real coming back from Anthropic and their tooling. And then I just take that big blob of text, a very simple prompt, and then pass it off to everything's using Sonnet except this.

we're using Haiku just because it needs basically all we're doing is we're taking this blob of text and then just formatting it as structured output. So we can get away with a lesser power, faster, cheaper model to just do that transformation step. It's just linear linear data transformation. And then we like hold on to the citations from the previous step. We persist all of that stuff. So at the end, we come out with this like

Chris Gmyr (12:54)
Yep. That's a call.

TJ Miller (13:07)
audit that's got like the summary, a gap analysis, it does the mapping of like this eNugget or this series covers these regulation codes. So let me like have those in the database so we can do like reverse searches of like given this regulation code, what eNuggets or series do we have available that like cover these things?

So that was all pretty sweet. System worked really pretty well. The fun thing was I don't really know enough about the industry to know whether it was surfacing accurate results or not, or even asking the question, is this useful information? Sure, it's doing research. It's coming back with a bunch of stuff. We've got the citation so we can audit it.

It's like scoring everything like the system's working well, but I don't, I don't know if it's like even accurate. So I start handing that off to a few people internally to like just sanity check me like, is this useful? Is this accurate? turns out. Yeah, it was pretty good. but I was having.

Chris Gmyr (14:01)
you

TJ Miller (14:20)
Like in that process, I was just like having a lack of confidence in all of that. Like I'm getting the shape of results that I want. I'm getting data that seems correct in there. But I wanted to build a system that goes back and uses an LLM to do an evaluation of that audit because

I don't necessarily want to have.

you know, burn a bunch of people time looking at these to then basically tell me at the end, like, well, there were these inaccuracies, you know, these citations didn't go anywhere real, like maybe we're like dealing with hallucinations again, or just irrelevant stuff, or, you know, or just kind of getting back like trash feedback.

Which I wasn't getting, but like, I was really afraid of that just cause I don't know enough. So I then built a system for each, an individual eNugget and as a series, they're evaluated a little differently. So there's two different steps for that. But I built an LLM as judge evaluation system that like gives a bunch of context. It has like instruction, instructional prompting, but the process I built is like a couple steps of

First, it goes out and tries to validate all of the citations that were part of that audit. Is the URL reachable? Was the content quoted accurate? So it kind of does this scoring process on the citations, which I thought was super important just because I was already dealing with hallucinations. But furthermore, I needed to just make sure that the system was pulling

stuff accurately anyway. So it does a whole bunch of citation verification, then it does an additional research phase. So it like goes off and does like research again. But now it can do much more targeted research because it's actually like kind of, you know, researching of what like the audit came out with, and then spits out two lists, two arrays of like an array of strengths and an array of like weaknesses. So

that all gets like bundled up. And so like I passed that back to Claude a few times of like, all right, do the audit, run the eval for that audit, then start making some suggestions of what we could do to improve the system based upon these audits. So like that also helped me gain some more confidence, but immediately turn around and make some like quick improvements to the system of just like.

Yeah, this audit, just like, an eval came back and like, this audit was trash. so got all that built, and that system's like, just like, I ran a bunch of eNuggets through it, ran some series through it. but I recently ran a series through it, ⁓ yesterday and looking at the results, it totally like the,

audit evaluation surfaced some hallucinated regulation codes.

And that really made me uncomfortable because like, I, I want that to be, and I really, if anything, I'd rather it be missing information than have hallucinated something, right? Like I'd rather it not have enough than have like bullshit. Like I made this up. so the hallucinations really bother.

So what I started working on this morning, and I need to sit with it a little bit more because like it's now, it's now going to make the process like exponentially more expensive. what I think I'm going to do, and Claude's, I got a laptop behind me, Claude's crunching on it right now. What I wanted to do is then,

Add an additional step, right? So we have the audit, we have an audit evaluation that like gives us a quality report on our audit. And now I'm to add a revision step where it takes the audit and it takes the evaluation and then just makes revisions based on that audit back to the evaluation or back to the takes the audit and the evaluation and then revises the original audit so that

If it did hallucinate a code, that gets resolved because the evaluation caught it. Now we can just automatically go back and make those revisions instead of just sitting there with, great, now we've got this audit. What are we going to do about it? So it's now just got this big improvement step on everything. So I'm hoping to get that done and then rerun a few things. But now that process is going to be like, it's now turning into a kind of big pipeline.

But I'm hoping that at the end we have really actionable and like fairly accurate results, you know, and like a really strong audit of everything. And we've got like, one of the big things I've stressed about this process is like, we can automate a bunch of this like research and gap analysis and like it's definitely found like

real things like, since the scene nuggets been created, like in 2026, this regulation changes to X, Y, Z. So like, we now get to go and like update that before it gets stale, which is cool. So it's like definitely surfacing like good real stuff. But it's been like, or I'm sorry, before I get sidetracked, ⁓ where I was heading with that is like one of the big things I've stressed is like

human in the loop on all of this. Like, I don't want to build this pipeline that then goes back and like automatically like edits the eNugget or like I...

Chris Gmyr (20:01)
Mm-hmm.

TJ Miller (20:10)
I guess we could even get to a world where it maybe like suggest the changes, but I'm too skeptical of like what an LLM is going to do to just like allow this like fully automated process, right? Like I've stressed to the Luma team, but I'm just like evangelizing overall. Like it's so important to keep human in the loop in these processes and just like surface it in a way that's like, Hey,

Like I said, you could just suggest the changes and have someone review them and approve it. But I think it's so important to have that human in the loop through all of this stuff and transparency throughout the application. ⁓

Chris Gmyr (20:46)
Yeah, yeah, I think it'd

be like really interesting to like in that like eval and review process to have like a human either like approve or reject or modify like the findings of the eval to so someone is actually like checking up on maybe not like the exact citations because the LLM should be able to figure out if it's an actual URL and it matches the assumptions for.

the regulations and such, just the fact that it may have brought up these five or 10 things, are those actually relevant to the nugget or the series or what we actually want to put into this thing? And then that would be really valuable human feedback that you can then work into the process and the pipeline later.

TJ Miller (21:36)
Yeah, that's something I've been sort of ruminating on too, is like as a next iteration on this is allowing somebody to have like different points of giving feedback. So it's like, maybe it's on a citation of like, we like don't use this source anymore. Or maybe it's on a, it's scored something like really low.

and like low compliance score, but actually that like that wasn't, it's more nuanced than that. Like being able to put feedback to that and then feed that into the next audit, you know, have that feedback just be part of the next audit's prompting so that there is kind of that, that guidance of like.

you know, nudging it around certain things so that like the next time it goes around, it's like a little bit more accurate. they get offering like different feedback points for human input to then, you know, have an effect on the end result for sure.

Chris Gmyr (22:35)
Well, that sounds awesome. And it sounds like it's already finding some gaps that you guys have in those series and nuggets as well. Because like you said, when regulations change, it's hard to go back through a huge catalog of learning resources to be like, OK, what does this actually connect to? And what do we need to Or how much do we need to update? Or it could be a whole new series that you need to implement. So it would be awesome to raise that flag a lot earlier.

TJ Miller (22:52)
Mm-hmm.

Chris Gmyr (23:03)
and often because these things can just eventually run in the background and bring up these things when the stuff changes. So that's cool.

TJ Miller (23:09)
Yeah.

Yeah, so I've got a meeting this Friday with all the stakeholders to just kind of get and gather some more feedback on the system and make some improvements. So I've just been off prototyping this in my own Laravel app. I'm not even in the main line thing. So it's been really cool. And I think that probably the next step is then to start integrating it into the main app.

because this has just been a playground to iterate inside of, but I've been able to iterate super fast that way, so then we can just kind of translate that back. And there's things that don't need to be productionized along the way. Like right now, everything is just artisan commands. Realistically, there's probably a scheduled process, run all this stuff monthly, and then some maybe actions in the UI of like, yep, just run this individual audit because maybe we just updated this eNugget and we want to re-audit it based on our updates.

And then all of that stuff gets cued into jobs. And so there's like steps to like productionize all of this that still needs to take place. But it's been so fun just like spinning up a blank literal app and just going ham.

Chris Gmyr (24:13)
Yeah, yeah, that's a good feeling.

TJ Miller (24:17)
Yeah. Yeah, it's been, it's been super engaging. It's a ton of fun. It feels like really impactful for the content team. like it's, I'm enjoying the hell out of myself.

Chris Gmyr (24:26)
Yeah, that's awesome. that's something that I've been thinking about more too of like doing more of these like evals and rubrics and like human in the loop like feedback to just improve like all of our overall like AI systems because it's been a struggle to get like the majority of the team like actually like embedded in AI and improving the systems instead of just kind of like

blindly using what's there. So a lot of my work has been right now just like installing the cloud, you know, MD file, like getting that set up, setting up rules, doing some like optimizations to, you know, kind of punt it over to them. But if they don't like iterate on it, as things change and the apps evolve, then it's going to slowly break down. So like I've been thinking about different ways to prompt the user to give feedback on

how the session went. So nothing like too solid right now, but most like a handoff or like an end of work, you know, skill or prompt or hook or something like that of like, how did we do? Like, what did you like push back on? And maybe we can like automatically scan the session and the contacts to bring up some of those ideas to prompt like, like, did we eventually get this over that? Or like, what are your general feedback on like,

TJ Miller (25:34)
Mm-hmm.

Chris Gmyr (25:50)
the loop and how it went. And then that works into like, ⁓ maybe we need a new rule or to edit something, or maybe something needs to go in the Cloud MD file or some other tool or system needs to be put into place. So thinking about that too, what you said multiple times of just keeping that human in the loop and also training people. Because you and I know you just always need to be iterating on this, but as a new tool for the majority of

Like the rest of the team, it's not a natural process for them with this tool. They can do it for code or the things that they're typically comfortable with or responsible for. But this feels like a whole new world for them. So to force prompt them into thinking about it and helping that iteration process is something that I've been thinking about as well. So not like.

TJ Miller (26:35)
Mm-hmm.

Chris Gmyr (26:46)
totally directly related to the audit stuff, just, again, like human in the loop workflows and how we make those AI systems better.

TJ Miller (26:57)
Yeah, yeah, for sure. think that's something I'm starting to ruminate about too, because that is, that is an initiative that I have at Luma as well. It's going to be kind of introducing and evangelizing and guiding AI driven development. I don't think that there's a ton of it on the team going on right now. I think there's, I think there's people like dabbling with it.

but I don't think it's, you know, it's not where I'm at where like I don't open an editor anymore. I just quad everything. So.

I'm thinking a lot about that, you know, how can I, like what tooling is going to be interesting and useful and everything to go along with our code base. we, we've got an interesting situation now where we have, we're going to end up with a monorepo of our new Laravel app.

And then our legacy application that we're strangling out to the Laravel application. And it's a 10, 15 year old PHP app. And it's very much coded like 10 or 15 year old PHP. know, it's, it's the cool thing is, is before I started, the team got it up on like modern versions of PHP, which is great. ⁓ but we're to end up with a monorepo.

And that's kind of like the strangling pattern that we're going to have is like a shared packages folder that I've kind of advocated for. takes in value objects and spits value objects out. like it's no database connections, minimal dependency on frameworks, and just like just make it super easy. Value objects in, value objects out.

And if you've got SDKs you're putting in there, it's all the same thing. ValueObjects in, You make life easy on yourself. Contracts everywhere. And then you can put interfaces in there too and have implementations everywhere. I think that's going to be a real challenge for me implementing like

code fixtures and rules because now we've got kind of two sets of standards. We've got, you know, Laravel, which we're very focused on keeping at a very high level of code quality, very much following the framework, intentions, and patterns. And, you know, we've just got like really high standard for that new stuff coming in. And then we've got like the older PHP application, which is

governed by a totally different set of rules because it's its own thing, it's old PHP, like there's no types for anything. like, you're kind of like, you know, there's no, there's not really many tests either. So it's just, there's like totally different set of rules and totally different set of like guidance the alum's gonna need working in their verse in Larvo.

And I don't necessarily want to say like go into the Laravel directory and, you know, launch Claude from there when you're working on the Laravel stuff, because what if you do have cross-cutting concerns or you do need to reference back to the old PHP application to like at least grok the logic of something that was done to then re-implement something logically similar in the Laravel application, just using like newer coding standards and styles, techniques, features.

⁓ so yeah, just kind of talking to that's, that's a thing that I'm going to be like kind of facing as well. so I, I feel you there.

Chris Gmyr (30:34)
Yeah, and that's a little bit of something that we've been talking about too. And something that I think I'm going to try in a couple repos, especially some of our older ones, is setting up instead of just the root Cloud MD file is setting up sub Cloud MD files. So this directory, it's this pattern or this way of doing things. And you can do the same thing for rules as long as you have different directory patterns as well.

TJ Miller (30:50)
Yeah.

Chris Gmyr (31:01)
So like in this directory, you're using like old, like vanilla PHP way of doing things. But in this new directory, in the Laravel directory, it's the Laravel way of doing things. And then in the rule, can call out specifically to like the different skills and other tools that you have of like in this directory, the rule is always use the Laravel best practices skill type of thing. like you knew,

TJ Miller (31:24)
Mm-hmm.

Chris Gmyr (31:26)
some things like that. It's not as automated as just running like Claude in it. But I'm sure there's some other ways that you can spread out that information and have the LLM do it for you. yeah, that's something that we were trying to do that's easy to share across the team, but not as in-depth or intrusive as ragging or vectorizing the code base and

handling either a shared database or having individual local databases to run that and keep that updated and the maintenance of that. So it's something that we've been talking about doing for some of the older apps of just setting up some basic cloud files in Google's to help split that up instead of just a generic one.

TJ Miller (32:13)
Yeah, so that's kind of the strategy I was looking at is definitely some nested cloud files. And what I do is, and I've been doing for a long time, is I encodify all of our standards, guidelines, architecture rules, all of that I built skills for. And then basically, I need to sit down and I've been like,

ideating on this really hard is skill. The problem I have is like skill enforcement. Like Claude, I feel is pretty terrible about auto loading skills and like auto loading all of the and the right skills. So I'm trying to think about some processes and like some tooling around like really forcing it to use skills. I've tried something before. I don't think it's working as well as I would have liked. But

I think that's going to be a little bit of a challenge here of like, getting that mapping of like, all right, if you're working in the Laravel directory, you need to load the Laravel best practices skill, Laravel development skill, you need to load the view best practices skill, you need to load the tailwind CSS skill, like you got to load like, and then if you have, you're working in the other repo, you got to load, you're not the other directory, like, yeah, you got to like explicitly load these skills, because that's where all of those things are encodified.

⁓ so I think it's going to be a challenge. think we can get there, but I think the really big challenge is going to be like, I know that I could probably use a system like that. It's making sure everybody else can use a system like that.

So, and I think there's a lot of like, I think I'm so effective with Claude code because of the way that I use it and kind of like the processes that I use alongside of it and like, and so trying to like teach and guide that to you of like it's.

I'm excited to do it. I'm also kind of stressed about it, but I just want to like, I want to make sure everyone's like effective and having a good experience with it. Like that's, that's what's really important to me.

Chris Gmyr (34:17)
Yeah, 100%. That's been a challenge for us too, but little by little, day by day, like anything else.

TJ Miller (34:23)
Yep. Yeah,

for sure, man. So yeah, let's move on to like the next thing is I think kind of bringing up a topic we've talked about before. You've had this like butcher box automated Claude stuff. I'm like, I'm interested to see like what you've been cooking on with that.

Chris Gmyr (34:42)
Yeah, so I know I've brought up like trimming down the order that we do like monthly and stuff like that. So that's been pretty much the same. Having to take screenshots and D dupe and say like, oh, you should keep these items and like get rid of these items to keep within like a budget and all that stuff. But what the next step is, is we use the app plan to eat.

that has all of our recipes. That's where we have our grocery list. That's where we keep the freezer items for ButcherBox and a separate list in there. So once I receive the order from ButcherBox, I have to basically copy and paste the descriptions of the items, the quantities, the weights, things like that, all in a list in Plant to Eat. And I've done that manually, basically the same way for years.

until more recently, and we were talking about it a little offline, and you recommended, it's called PinchTab, and that enables me to set up a CLI with PinchTab to, it'll actually log in and authorize as me to both ButcherBox and Plant to Eat, so it can access my accounts, and then I have a prompt.

that says, OK, this butcher box pinch tab profile, you're gathering the data from over here. And then you're moving it to this other plan to eat profile over here. So what it does is it basically logs in. As me pulls the most recent order, it dedupes a bunch of things to make sure that the similar steak cuts or whatever, the chicken wings, all that stuff is deduped.

gets the final like totals for things. It'll then produce a summary for me that shows like the final tallies and calculations and items and things like that. Then once I approve that, then it basically logs into plan to eat and I'll put everything in the list for me and submit that. So that's it's automatically added and then we can build recipes and the meal plan like from there. So yeah.

TJ Miller (36:47)
That's sick. That's so sick. That is

so sick, dude.

Chris Gmyr (36:51)
So it worked pretty well. And I have another order coming up pretty soon. So I'll see if it's a lot faster this time. But I think once I get it solidified, it should only take maybe like five minutes to run through and validate compared to like me doing all the manual work. And not that it took me like too long, but it's just like another one of those. Like, this seems like a very dumb like manual task that like I can just offload somewhere else.

And even though I'm still like kind of guiding it and need to approve like the finalized list before it goes into plan to eat. I'd rather do that when I'm, you know, reviewing emails or doing something else like while this thing runs in the background and just let me know when you're done and I can check it out and go from there. So it just speeds up like the entire process altogether. So, yeah, probably we can add a little bit like as we go. But first run or two was.

Really good. And it's cool to just see it running. And it opens up Chrome, and you can see everything it's typing into. And then it's verifying everything on the next page. And just super cool to see it run and just me not having to copy and paste that stuff anymore.

TJ Miller (37:45)
Yeah.

That's so sick, dude. I love it. That's just, that's rad. Like you hear so much about like AI being a bubble and I definitely believe it is for sure. But then you just like all these people talking about like, it's just like not living up to its expectations. But then you hear like super cool stuff like this where it's like, no, this is like super legit, very useful thing. You know, so cool.

Chris Gmyr (38:30)
Yeah, yeah, 100%. So yeah, hopefully I'll be able to connect these things together a little bit more. But because there's so much waiting involved, I have to wait for just the pre-order process to check things. And then I have to wait for the order to actually arrive to make sure that I got everything in the box that was in the order. Maybe eventually I'll put some more automation in it. But just being able to open up Cloud Code and run a skill.

for process, put your box to a plan to eat type of thing and just let it handle it is like more than enough for right now. So yeah, just a fun little project to see if I could do it and see if it'll work out. So far so good.

TJ Miller (38:58)
Mm-hmm.

That's so neat, man. I love it. Cool, dude. You want to talk some databases?

Chris Gmyr (39:18)
Yeah, let's talk some databases. This is another really quick one and kind goes back to our theme of why utilize a service or ⁓ a dependency package, ⁓ what have you, if you can just kind of build it yourself and what you need with it. So one of my clients' sites, I had them using Algolia for searching.

So they have like a whole bunch of resources. It's a Laravel app with Nova and a lot of the resources you could either search like individually or like across sites. So it kind of combined all the resources into a table in Algolia and then you can search across those and it would do all the things that Algolia does really well. And not that it was like super expensive, but like 30 bucks a month for a simple search.

I'm like, ah, this is kind of getting old and it's something that I can like save on my side because I church them for hosting and things like that. So it's like 30 bucks that I can save and keep in my pocket instead of paying for Algolea. I'm like, hmm, I wonder if I could do a little bit of work with Claude to move this over to my sequel to get it kind of like close enough. And it's like, they don't have like a ton of information.

but I wanted it to be fast. still wanted like search logging and a few other things that like Algolia does really well. So I basically just wrote up a plan of like, here's the things that we utilize in Algolia. Can we actually like move this over to my SQL, which we're using on the site anyways and not have like too much overhead and keep in mind like we're using Algolia components for like the view components for searching and highlighting and things like that. So it basically like went out and did a bunch of research, pulled it down.

It made the whole plan of like, oh yeah, we could totally do this with some fancy searches in MySQL. I don't remember the exact query types that we were doing, but yeah, just blasted it out plan pretty quickly. And then it's like, OK, go do it. And 15 minutes later, it's like, oh yeah, you don't need Algolia anymore. And here it is. Just run this artisan command to backfill the search index, and then you're good to go.

set this up on deploy, it'll refresh the indexes and do a few other things like that. And yeah, done. Boom. So just all those little things of like, do we actually need this service or need this dependency if I'm only using this little fraction of it? Just go out and do it, try it, and see what happens. And it's been super solid and haven't missed Algolia. Sorry, Algolia. Loved you for a long time.

I don't have to pay 30 bucks a month for basic search anymore.

TJ Miller (41:54)
No,

I think it's such a good... like, you nailed it. It's the... you're paying for the kitchen sink when you're only using, you know, a small portion of that functionality and you're doing, you know, something very basic. That's the perfect opportunity to like, yeah, it's just service and just do it yourself. Like, you're not taking full advantage of what they have to offer, but you're paying the price for everything.

No, it's great, great justification and I'm stoked that worked out. That's cool.

Chris Gmyr (42:24)
Yeah, yeah, next up, yeah, you wanted to talk about some Iris updates.

TJ Miller (42:29)
Yeah. So I haven't

really done much with Iris lately. haven't even like, I think it's been like a month or maybe a month and a half since I've even like talked to my Iris instance. so I have been basically so inspired by your implementation of like your open claw with, with obsidian.

that I want to do it too. Like, there's some version of it, right? There's definitely things that I'd like to do. I'd like to start keeping track of my meeting.

I started using crisp, which was like your recommendation to like record meetings. I want to start like putting that in some longer term searchable storage, especially like extracting action items as I've like iterating on stuff. I find that really useful. Definitely with like the mental health, I'm finding myself wanting to like journal a little bit more. So like I'd love a place for that. And I'd love to be able to like have access to all that stuff. And like, I don't know, there's just, think

a lot of opportunity for that. And I don't want to use open claw. I mean, I have Iris, right? dealing with a lot of these like potentially one-off requests of like go query for something like I'm trying to figure what was that meeting I was in last Wednesday, somebody mentioned something that I needed to do. I forgot what it is. I don't necessarily want that polluting the context of like, maybe I'm having a really

in-depth conversation about my emotional state or something with Iris. And I really don't want to break that up with this dumb query request and throw all the context off, mess with the summarizations. It does, as much as I wanted that single threaded experience, using it more utilitarian-ly, that's the word.

Using it more that way, I really wanted to have more isolation in the context. So I'm introducing threads, like the concept of threads into Iris so that you can make like a single thread, yeah, ask that question and have it go off and do things. And so like each thread will have its own conversation summarization process. Memories and truths will be cross-thread because those are just like, that's universal information.

Chris Gmyr (44:43)
Mm-hmm.

TJ Miller (44:43)
and yeah, I'm, I tried some new tooling for once. I knew this was going to be a pretty big feature to implement because the entire app was built off the premise of like a single threaded conversation.

So there was a lot that needed to change to introduce threads and some like edge cases to account for and like a feature that I kind of wanted but needed threads for. So I knew this was going to be kind of like big and sweeping. And so I didn't want to just interact with clogged code and do my normal thing with clogged code. I wanted a process that was a little bit more handheld, no, or less handheld.

and we just like chunk through this thing and at least get to a point where I can start doing like code review and iterating on it. Matthias had created a like Ralph Loop project called Chief. I had used it once before, so I was kind of familiar with it. I know there's a couple other ones around right now, like Polyscope and I don't know if I can mention the other one, but there's another one one of our friends have built.

that I want to play with too. But I liked Chief, but we're in the era of customizing your own software. So there were some things that I really felt were important that I wanted to modify about Chief in order to kind of have it more match what I want as the end result. So I forked Chief. And then what I did is I had Claude go out and do deep research on what makes a good PRD, like a product requirement stock.

Go out and like research, deep research that. Come back to me with like, here's all the information you need on like what makes a great PRD. And then I said, go out and like Chief is like user story based, right? So like Chief will help you build the PRD, break it up into user stories. And then that's what it works. The automated Ralph loop goes off and sets up a fresh Claude contact or open code or whatever driver you want.

We'll set a fresh context with that specific story as its task. So I also had Claude go out and do like deep research on like what makes a high quality user story. So I came back with all this information and then I read and then I also wanted to build into that prompting not only what is a good PRD, create a PRD template that goes along with that and user stories, a user story template that goes along with that.

And then I wanted it to do an iteration of. ⁓

Let me tell you what the feature or thing I want to build. Give me an interview about it, which is already kind of part of the process. And then it would like create the PRD. I'm like, no, then after that, I want you to like interview me. I want you to go do research, like go research the code base, go look at existing standards or things that already like affordances that already exist, like go off and do research and then come back and interview me some more. And then generate the PRD. So I also now have a

a different PRT template than what Chief is using. So I made those changes. then I also, because we talked about how I codify coding guidelines and everything in skills, I also added to Chief a special round of analyzing all the available skills and then kind of reinforcing, go load these skills before you do the work.

⁓ I don't know how accurate the skill loading portion was, but the PRD interview process was like very thorough. The research came back and like had like, well, did you think about these edge cases or like, this is the way the existing code base is like, how do you want to approach like making these changes? so I thought it was a really cool experience, through the interview and research stuff after I kind of like modified that. And then I set it off.

I haven't hit my daily Clawed limit in a long time. And it got about halfway through the list of stories. It was like, you're locked out for three hours. I'm like, ⁓ shit. So we did that. And then after it did its initial iteration, things kind of looked fine. But then I had Clawed load all the skills, do a thorough code review of everything. Basically went back to Chief and then said, here's the code review.

honestly not looking great. And then it spit out a bunch of new stories to go back and.

like make all the fixes and changes and everything. So this is like really my first experience, like Ralph looping everything. And the outcome is pretty much what I expected was that functionally it was pretty close to what I wanted, but the quality was the code quality wasn't really what I.

and so that was kind of my assumption about Ralph Lupien and getting into some of this like more autonomous workflows is like the loss of quality, which is where I was like really trying to get the skill to like do something. So I took all of that information. I took the original PRD, the progress, the code review that happened. And I went back to the chief repo and I'm like, look, Claude,

The you can look at the PRD, the initial iteration of stories seemed fine, but we got through it and then look at this code review. had all of these huge gaps and architecture issues and like bugs and like this was not a great code review. Like let's go back and address the prompting or add any additional steps. Like I actually think I had to add a

review stage so like once it's done with a story it then kicks off a code review.

And so I'm trying to add a few little things in there to like do some cleaning along the way and some like code review in process instead of like this big batch thing at the end. But definitely I'm interested in exploring Ralph loops more, but I'm staying very skeptical because yeah, it.

It functionally mostly worked, but I found bugs immediately. The code review found a ton of issues. So quality is just like, that's the thing I'm looking to nail down a bit better. But I've got a couple other features I want to add to Iris as well now with all of this. So I'm going to keep iterating on.

maybe syncing some rough loops into some of these bigger sweeping changes and everything. We'll see.

Chris Gmyr (51:07)
Yeah, I've seen a lot about it, but I haven't tried it myself. Yeah, but maybe it's time to try and see where it goes. Good, open up the next round of automation for me.

TJ Miller (51:18)
Yeah, I mean, I don't mind my current workflow. I just knew that this was going to be a big task. I was going to be battling context. And I knew it was going to take a while to get through. And I didn't want to babysit the process. So I'm like, all right, let's just try a rough loop. But it kind of turned out the way I expected, which is why I haven't really used them. But a friend of ours is working on one that looks

very promising. know he's like built some quality control stuff into the process, like at its core, which has me very excited about that. And then I want to like kind of test my new updates with Chief as well, because we did make a fairly significant amount of updates to like process and things to try and get just better quality control out the other end. And I like that too, because it's like a really

It's a really slick TUI, very clean. And I like the process of breaking things down into stories. And it was just built into the tool, generating the PRDs and everything I thought was great. So I'm going to continue to explore and iterate on that a bit. yeah, jury's out on Ralph Loops.

We'll see. If I can get the quality there, maybe I'll be happier. it did do the thing that I reached for. It did take the big feature and got it 75 % of the way there. So yeah, threads are coming. Very excited about that. then from once threads are done, starting to iterate on Obsidian again.

Chris Gmyr (52:48)
Yeah, awesome. Well, yeah, it's posted and it sounds awesome. All the updates and yeah, threads will be a really good quality of life improvement on the iris.

TJ Miller (52:50)
Yeah.

Almost everybody that's interacted with Iris is like, why no threads? Like, where's threads? And I'm like, no, like that's kind of like part of the experiment is like a single threaded conversation. And then when I get to start really utilizing it as a utility instead of like just this like.

companion like a thing, I don't know. The more I want to use it as a utility, the more I'm like, yeah, no threads are definitely the way to go. And then the cool thing too is instead of being interrupted in your conversation by proactive messages, proactive messaging will now just spawn a new thread and like market is unread. So like now you're in a fresh context, you can talk about, you can like have a conversation about that proactive message.

But at least it's not gonna be like interrupting your, you know, really in-depth conversation about something with like, hey, don't forget to feed your dog or whatever.

Chris Gmyr (53:52)
Nice. Well, that's awesome. Sweet. Well, yeah, keep us posted on that. And definitely interested to circle back on the RalphLoop stuff when it gets more updates over there too.

TJ Miller (53:54)
Yeah, man.

Yeah, for sure, man. So on that note, you want to wrap up? All right. Thank you all so much for listening to the Slightly Caffeinated podcast. Show notes, including all the links and social channels, are down below and are also available at slightlycaffeinated.fm. If you have questions for us or any content suggestions, go to the Ask a Question page on our site and we'll feature it on an upcoming episode. Thank you all so much for listening. We'll catch you next week.

Chris Gmyr (54:05)
Yeah, let's wrap up.

Creators and Guests

Chris Gmyr
Host
Chris Gmyr
Husband, dad, & grilling aficionado. Loves Laravel & coffee. Staff Engineer @ Rula | TrianglePHP Co-Organizer
TJ Miller
Host
TJ Miller
Dreamer ⋅ ADHD advocate ⋅ Laravel astronaut ⋅ Building Prism ⋅ Principal at Geocodio ⋅ Thoughts are mine!
Still No Turkish Coffee, Audit Pipelines, and ButcherBox Automation
Broadcast by