Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
@@ -38,6 +38,11 @@ with gr.Blocks() as demo:
|
|
38 |
with gr.Column():
|
39 |
gr.Markdown(
|
40 |
"""<h1>The Optimal Vocabulary Size Predictor</h1>
|
|
|
|
|
|
|
|
|
|
|
41 |
This tool is used to predict the optimal vocabulary size given the non-vocabulary parameters. We provide 3 ways for prediction:
|
42 |
|
43 |
- **Approach 1: Build the relationship between studied attributes and FLOPs**: Build the relationship between the optimal data points (the points that reach the lowest loss under the same FLOPs budget) and the FLOPs.
|
@@ -51,7 +56,7 @@ with gr.Blocks() as demo:
|
|
51 |
|
52 |
|
53 |
with gr.Row():
|
54 |
-
Nnv = gr.Textbox(label="Non-vocabulary Parameters", value=str(
|
55 |
flops = gr.Textbox(label="FLOPs", placeholder="Optional (e.g. 7.05e21)")
|
56 |
output_text = gr.Textbox(label="Prediction")
|
57 |
with gr.Row():
|
|
|
38 |
with gr.Column():
|
39 |
gr.Markdown(
|
40 |
"""<h1>The Optimal Vocabulary Size Predictor</h1>
|
41 |
+
|
42 |
+
This repo is the offical demo space for [Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies](https://huggingface.co/papers/2407.13623). In summary, we show that when scaling up model size, increase vocabulary size too, but at a slower rate than other parameters.
|
43 |
+
|
44 |
+

|
45 |
+
|
46 |
This tool is used to predict the optimal vocabulary size given the non-vocabulary parameters. We provide 3 ways for prediction:
|
47 |
|
48 |
- **Approach 1: Build the relationship between studied attributes and FLOPs**: Build the relationship between the optimal data points (the points that reach the lowest loss under the same FLOPs budget) and the FLOPs.
|
|
|
56 |
|
57 |
|
58 |
with gr.Row():
|
59 |
+
Nnv = gr.Textbox(label="Non-vocabulary Parameters", value=str(7e9))
|
60 |
flops = gr.Textbox(label="FLOPs", placeholder="Optional (e.g. 7.05e21)")
|
61 |
output_text = gr.Textbox(label="Prediction")
|
62 |
with gr.Row():
|