HoneyHive is a modern AI observability and evaluation platform that enables developers and domain experts to collaboratively build reliable AI applications faster.
Tool to add datapoints to a dataset. Use when you need to append multiple entries with specified input, ground truth, and history mappings.
Tool to retrieve experiment comparison between two evaluation runs. Use when you need to analyze the differences in metrics, datapoints, and events between two runs.
Tool to compare events between two experiment runs side-by-side. Use when analyzing differences in model behavior, performance metrics, or outputs between evaluation runs. Returns matched event pairs with their respective data from both runs for comparison.
Tool to create multiple datapoints in a single batch operation. Use when you need to bulk-import events into a dataset or create many datapoints at once. Supports filtering by date range, event IDs, or custom criteria. Efficient for migrating large numbers of events to evaluation datasets.
Tool to create multiple model events in a single request. Use when you need to log a batch of event interactions to HoneyHive.
Tool to log a batch of external API calls as tool events. Use when you need to record multiple tool events in one request—use after gathering all event data.