Vibe coding for data science: how to label a dataset with Kimi K2
This post explains how to use Kimi 2 to analyze the categories in a dataset quickly. All with no code and using the Hugging Face Hub. It's like vibe coding but for NLP and data science tasks.
Let's get to it!
(Image Credits: Original design by Chun Te Lee transformed with FLUX.1 Kontext dev)
What do you need to get started?
- Log in or sign up to Hugging Face.
- Find a dataset you'd like to analyze. A good place to find valuable datasets is the datasets page
- Go to the AISheets app and keep reading!
What's AISheets?
AISheets is a simple tool to transform, analyze, and expand datasets with the help of 1,000s of AI (open) models.
How to label the dataset
- Import the dataset into AISheets
- Add a column and write a simple prompt like "Categorize the following text: {{column you want to categorize}}". Here, the idea is to start simple, no need for complex prompt engineering. You can reference as many columns as you want to give as much context as possible to the model.
- Run the prompt and see the cells filling up. Here we recommend that you stop the generation process and inspect the results. They will likely be imperfect, but here's when vibe coding kicks in!
How to improve the labels with vibe data science
Now, you have a few rows labeled purely by AI with a (intentionally) simple prompt. Now with AISheets you have several options:
- Are some of the labels what you were looking for? If yes, tell the model for the next round using the thumbs-up button on those cells. Even 1 or 2 validated cells are incredibly effective in steering the model in the next round. Another option is to just manually edit a few cells to adjust them.
- If none of the labels are what you expect, open the configuration of the column and tune the prompt by adding more detail about the labels, the format, etc. Re-run the generation and stop it. If it improves, go to step 1.
Once you are satisfied with the validated cells, it's time to label the whole dataset. Click the regenerate button in the column and watch the magic happen!
Here's a video showing the complete process with Kimi 2:
What's next?
This mini tutorial scratches the surface of what's possible today with AI models for data work. Try AISheets app and share other use cases and ideas you'd like to explore, we're just getting started and would love to help!
PS: What's going on under the hood
AISheets is using your feedback as few-shots, and if you enable search it will inject search results chunks into the context as well. Think of it as your context engineering companion.
Here's the config created under the hood after some iterations:
columns:
topic:
modelName: moonshotai/Kimi-K2-Instruct
modelProvider: groq
userPrompt: |-
categorize the question:
{{question}}
prompt: "
You are a rigorous, intelligent data-processing engine. Generate only the
requested response format, with no explanations following the user
instruction. You might be provided with positive, accurate examples of how
the user instruction must be completed.
# Examples
The following are correct, accurate example outputs with respect to the
user instruction:
## Example
### Input
question: What is the total work done on an object when it is moved
upwards against gravity, considering both the change in kinetic energy and
potential energy? Use the Work-Energy Theorem and the principle of
conservation of mechanical energy to derive your answer.
### Output
Physics – Mechanics – Energy & Work
## Example
### Input
question: Two equal masses, each with a mass similar to that of the sun,
are separated by a distance of 1 light-year and are devoid of all outside
forces. They accelerate towards each other due to gravity. As they
approach each other, their mass increases due to relativistic effects,
which in turn increases the gravitational force between them. However, as
they approach the speed of light, their acceleration decreases. What is
the correct description of their motion, and how do their velocities and
gravitational forces change as they approach each other? Provide a
detailed analysis of the problem, including any relevant equations and
calculations.
### Output
Physics – Relativistic Two-Body Gravitation
## Example
### Input
question: What is the minimum number of red squares required to ensure
that each of $n$ green axis-parallel squares intersects 4 red squares,
assuming the green squares can be scaled and translated arbitrarily
without intersecting each other?
### Output
Combinatorial Geometry – Tiling / Packing / Covering
# User instruction
categorize the question:
{{question}}
# Your response
\ "
searchEnabled: false
columnsReferences:
- question
And here's the resulting dataset with the config: https://huggingface.co/datasets/dvilasuero/facebook_natural_reasoning_categorized