Vibe coding for data science: how to label a dataset with Kimi K2

Community Article Published July 22, 2025

This post explains how to use Kimi 2 to analyze the categories in a dataset quickly. All with no code and using the Hugging Face Hub. It's like vibe coding but for NLP and data science tasks.

Let's get to it!

image/webp (Image Credits: Original design by Chun Te Lee transformed with FLUX.1 Kontext dev)

What do you need to get started?

  1. Log in or sign up to Hugging Face.
  2. Find a dataset you'd like to analyze. A good place to find valuable datasets is the datasets page
  3. Go to the AISheets app and keep reading!

What's AISheets?

AISheets is a simple tool to transform, analyze, and expand datasets with the help of 1,000s of AI (open) models.

How to label the dataset

  1. Import the dataset into AISheets

image/png

  1. Add a column and write a simple prompt like "Categorize the following text: {{column you want to categorize}}". Here, the idea is to start simple, no need for complex prompt engineering. You can reference as many columns as you want to give as much context as possible to the model.

image/png

  1. Run the prompt and see the cells filling up. Here we recommend that you stop the generation process and inspect the results. They will likely be imperfect, but here's when vibe coding kicks in!

How to improve the labels with vibe data science

Now, you have a few rows labeled purely by AI with a (intentionally) simple prompt. Now with AISheets you have several options:

  1. Are some of the labels what you were looking for? If yes, tell the model for the next round using the thumbs-up button on those cells. Even 1 or 2 validated cells are incredibly effective in steering the model in the next round. Another option is to just manually edit a few cells to adjust them.

image/png

  1. If none of the labels are what you expect, open the configuration of the column and tune the prompt by adding more detail about the labels, the format, etc. Re-run the generation and stop it. If it improves, go to step 1.

Once you are satisfied with the validated cells, it's time to label the whole dataset. Click the regenerate button in the column and watch the magic happen!

Here's a video showing the complete process with Kimi 2:

What's next?

This mini tutorial scratches the surface of what's possible today with AI models for data work. Try AISheets app and share other use cases and ideas you'd like to explore, we're just getting started and would love to help!

PS: What's going on under the hood

AISheets is using your feedback as few-shots, and if you enable search it will inject search results chunks into the context as well. Think of it as your context engineering companion.

Here's the config created under the hood after some iterations:

columns:
  topic:
    modelName: moonshotai/Kimi-K2-Instruct
    modelProvider: groq
    userPrompt: |-
      categorize the question:

      {{question}}
    prompt: "

      You are a rigorous, intelligent data-processing engine. Generate only the
      requested response format, with no explanations following the user
      instruction. You might be provided with positive, accurate examples of how
      the user instruction must be completed.


      # Examples

      The following are correct, accurate example outputs with respect to the
      user instruction:


      ## Example

      ### Input

      question: What is the total work done on an object when it is moved
      upwards against gravity, considering both the change in kinetic energy and
      potential energy? Use the Work-Energy Theorem and the principle of
      conservation of mechanical energy to derive your answer.

      ### Output

      Physics – Mechanics – Energy & Work

      ## Example

      ### Input

      question: Two equal masses, each with a mass similar to that of the sun,
      are separated by a distance of 1 light-year and are devoid of all outside
      forces. They accelerate towards each other due to gravity. As they
      approach each other, their mass increases due to relativistic effects,
      which in turn increases the gravitational force between them. However, as
      they approach the speed of light, their acceleration decreases. What is
      the correct description of their motion, and how do their velocities and
      gravitational forces change as they approach each other? Provide a
      detailed analysis of the problem, including any relevant equations and
      calculations.

      ### Output

      Physics – Relativistic Two-Body Gravitation

      ## Example

      ### Input

      question: What is the minimum number of red squares required to ensure
      that each of $n$ green axis-parallel squares intersects 4 red squares,
      assuming the green squares can be scaled and translated arbitrarily
      without intersecting each other?

      ### Output

      Combinatorial Geometry – Tiling / Packing / Covering



      # User instruction

      categorize the question:


      {{question}}




      # Your response

      \    "
    searchEnabled: false
    columnsReferences:
      - question

And here's the resulting dataset with the config: https://huggingface.co/datasets/dvilasuero/facebook_natural_reasoning_categorized

Community

Can we add labels to images?
Like I have 20k images of pdf pages and I want to label tables and images in it.

·

We can. I tried creating "an image dataset for finetuning" (entered the same as a prompt), and it gave me a small dataset to get started with. This was the result. Very useful I must say.
image.png

Edit: Translation is not the best, might have to work on it.

Sign up or log in to comment