SivilTaram commited on
Commit
28cb526
·
verified ·
1 Parent(s): 6e86500

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +6 -1
app.py CHANGED
@@ -38,6 +38,11 @@ with gr.Blocks() as demo:
38
  with gr.Column():
39
  gr.Markdown(
40
  """<h1>The Optimal Vocabulary Size Predictor</h1>
 
 
 
 
 
41
  This tool is used to predict the optimal vocabulary size given the non-vocabulary parameters. We provide 3 ways for prediction:
42
 
43
  - **Approach 1: Build the relationship between studied attributes and FLOPs**: Build the relationship between the optimal data points (the points that reach the lowest loss under the same FLOPs budget) and the FLOPs.
@@ -51,7 +56,7 @@ with gr.Blocks() as demo:
51
 
52
 
53
  with gr.Row():
54
- Nnv = gr.Textbox(label="Non-vocabulary Parameters", value=str(7*10**9))
55
  flops = gr.Textbox(label="FLOPs", placeholder="Optional (e.g. 7.05e21)")
56
  output_text = gr.Textbox(label="Prediction")
57
  with gr.Row():
 
38
  with gr.Column():
39
  gr.Markdown(
40
  """<h1>The Optimal Vocabulary Size Predictor</h1>
41
+
42
+ This repo is the offical demo space for [Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies](https://huggingface.co/papers/2407.13623). In summary, we show that when scaling up model size, increase vocabulary size too, but at a slower rate than other parameters.
43
+
44
+ ![Vocabulary Demo](figure/vocabulary_demo.png)
45
+
46
  This tool is used to predict the optimal vocabulary size given the non-vocabulary parameters. We provide 3 ways for prediction:
47
 
48
  - **Approach 1: Build the relationship between studied attributes and FLOPs**: Build the relationship between the optimal data points (the points that reach the lowest loss under the same FLOPs budget) and the FLOPs.
 
56
 
57
 
58
  with gr.Row():
59
+ Nnv = gr.Textbox(label="Non-vocabulary Parameters", value=str(7e9))
60
  flops = gr.Textbox(label="FLOPs", placeholder="Optional (e.g. 7.05e21)")
61
  output_text = gr.Textbox(label="Prediction")
62
  with gr.Row():