Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Commit
·
ab3ebc8
1
Parent(s):
3abd747
shorter responses
Browse files- evaluations/models.py +17 -15
evaluations/models.py
CHANGED
@@ -10,21 +10,23 @@ system_messages = { "STRICT": """You are a chatbot evaluating github repositorie
|
|
10 |
Keep your answers short, and informative.
|
11 |
Your answer should be a single paragraph.""",
|
12 |
"PITFALL": """You are a chatbot evaluating github repositories, their python codes and corresponding readme files.
|
13 |
-
You are looking for common pitfalls in the code.
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
|
|
|
|
28 |
|
29 |
class LocalLLM():
|
30 |
def __init__(self, model_name):
|
|
|
10 |
Keep your answers short, and informative.
|
11 |
Your answer should be a single paragraph.""",
|
12 |
"PITFALL": """You are a chatbot evaluating github repositories, their python codes and corresponding readme files.
|
13 |
+
You are looking for common pitfalls in the code.
|
14 |
+
Keep your answer short and informative.
|
15 |
+
Only report serious flaws. If you don't find any, return an empty string.
|
16 |
+
Answer in a short paragraph, and keep in mind the following common pitfall categories
|
17 |
+
Pitfall #1 Design-flaws with regards to the data collection in the code."))
|
18 |
+
Pitfall #2 Dataset shift (e.g. sampling bias, imbalanced populations, imbalanced labels, non-stationary environments)."))
|
19 |
+
Pitfall #3 Confounders."))
|
20 |
+
Pitfall #4 Measurement errors (labelling mistakes, noisy measurements, inappropriate proxies)"))
|
21 |
+
Pitfall #5 Historical biases in the data used."))
|
22 |
+
Pitfall #6 Information leaking between the training and testing data."))
|
23 |
+
Pitfall #7 Model-problem mismatch (e.g. over-complicated/simplistic model, computational challenges)"))
|
24 |
+
Pitfall #8 Overfitting in the code (e.g. high variance, high complexity, low bias)."))
|
25 |
+
Pitfall #9 Misused metrics in the code (e.g. poor metric selection, poor implementations)"))
|
26 |
+
Pitfall #10 Black box models in the code (e.g. lack of interpretability, lack of transparency)"))
|
27 |
+
Pitfall #11 Baseline comparison issues (e.g. if the testing data does not fit the training data)"))
|
28 |
+
Pitfall #12 Insufficient reporting in the code (e.g. missing hyperparameters, missing evaluation metrics)"))
|
29 |
+
Pitfall #13 Faulty interpretations of the reported results.""" }
|
30 |
|
31 |
class LocalLLM():
|
32 |
def __init__(self, model_name):
|