Goblin - Build it all, or Build With Us
Project Concept
Advanced scripting engine for AI; surgically targeting the pain points of data informed decision-making.
You can build it all, or you can build just the parts that matter, and operate with us. goblin makes it easy to run what we call plans, that execute graph models of scripts in complex ways.
We utilized goblin to A/B test Claude Sonnet and Haiku on a real production workload, detecting hate speech using the FrancophonIA hate speech dataset (in the English language version of this data set).
We showcased that for the ~6000 instances of speech, our model workflow was able to give us a query-able SQL output that showed us the better of two models, all while seamlessly allowing for production behavior to continue while running the analysis.
claude-sonnet-4-20250514: 59.78% accuracy compared to the dataset claude-3-haiku-20240307: 55.05% accuracy compared to the dataset
By utilizing goblin with this workflow, we determined that for a production use case it would be over 21700% more expensive to utilize a slightly better model with the same prompt. This workflow engine supports custom model comparisons of every imaginable variety.
Natalie Bridgers @ stream.place says “A product like you are describing would be exceptionally helpful for AI moderation tools present at Streamplace.”
Joel Kaiser @ First Rule says “This scripting engine concept would be exceptionally helpful for our more advanced model fine tuning, we could easily see us using this as a production service immediately.”
Casey O’Malley @ CB insights says “The tool you’re describing to me would be useful for my job function as a Data Scientist.” (he even decided to join the team for the hackathon to make it happen after the original pitch!)
Entry
Status: Submitted
Last saved: September 07 at 12:49 PM CDT
Team Roster
Message board not available for this team yet.
William Berry Team Lead RSVP Approved
Senior Software Engineer at William S Berryc************y@g*******m RSVP Approved
HB RSVP Approved