Spaces:

attilasimko
/

reproduce

Running on CPU Upgrade

App Files Files Community

attilasimko commited on Oct 3, 2024

Commit

ab3ebc8

1 Parent(s): 3abd747

shorter responses

Browse files

Files changed (1) hide show

evaluations/models.py +17 -15

evaluations/models.py CHANGED Viewed

@@ -10,21 +10,23 @@ system_messages = { "STRICT": """You are a chatbot evaluating github repositorie
                     Keep your answers short, and informative.
                     Your answer should be a single paragraph.""",
                      "PITFALL": """You are a chatbot evaluating github repositories, their python codes and corresponding readme files.
-                     You are looking for common pitfalls in the code. More specifically please consider the follwing pitfalls:
-                      Please explain if you find any design-flaws with regards to the data collection in the code."))
-                      Please explain if you find signs of dataset shift in the code (e.g. sampling bias, imbalanced populations, imbalanced labels, non-stationary environments)."))
-                      Please explain if you find any confounders in the code."))
-                      Please explain if you find any measurement errors in the code (labelling mistakes, noisy measurements, inappropriate proxies)"))
-                      Please explain if you find signs of historical biases in the data used."))
-                      Please explain if you find signs of information leaking between the training and testing data."))
-                      Please explain if you find a model-problem mismatch (e.g. over-complicated/simplistic model, computational challenges)"))
-                      Please explain if you find any signs of overfitting in the code (e.g. high variance, high complexity, low bias)."))
-                      Please explain if you find any misused metrics in the code (e.g. poor metric selection, poor implementations)"))
-                      Please explain if you find any signs of black box models in the code (e.g. lack of interpretability, lack of transparency)"))
-                      Please explain if you find any signs of baseline comparison issues in the code (e.g. if the testing data does not fit the training data)"))
-                      Please explain if you find any signs of insufficient reporting in the code (e.g. missing hyperparameters, missing evaluation metrics)"))
-                      Please explain if you find signs of faulty interpretations of the reported results.
-                      If you don't find anything concerning, please return an empty string.""" }
 class LocalLLM():
   def __init__(self, model_name):

                     Keep your answers short, and informative.
                     Your answer should be a single paragraph.""",
                      "PITFALL": """You are a chatbot evaluating github repositories, their python codes and corresponding readme files.
+                     You are looking for common pitfalls in the code.
+                     Keep your answer short and informative.
+                     Only report serious flaws. If you don't find any, return an empty string.
+                     Answer in a short paragraph, and keep in mind the following common pitfall categories
+                     Pitfall #1 Design-flaws with regards to the data collection in the code."))
+                     Pitfall #2 Dataset shift (e.g. sampling bias, imbalanced populations, imbalanced labels, non-stationary environments)."))
+                     Pitfall #3 Confounders."))
+                     Pitfall #4 Measurement errors (labelling mistakes, noisy measurements, inappropriate proxies)"))
+                     Pitfall #5 Historical biases in the data used."))
+                     Pitfall #6 Information leaking between the training and testing data."))
+                     Pitfall #7 Model-problem mismatch (e.g. over-complicated/simplistic model, computational challenges)"))
+                     Pitfall #8 Overfitting in the code (e.g. high variance, high complexity, low bias)."))
+                     Pitfall #9 Misused metrics in the code (e.g. poor metric selection, poor implementations)"))
+                     Pitfall #10 Black box models in the code (e.g. lack of interpretability, lack of transparency)"))
+                     Pitfall #11 Baseline comparison issues (e.g. if the testing data does not fit the training data)"))
+                     Pitfall #12 Insufficient reporting in the code (e.g. missing hyperparameters, missing evaluation metrics)"))
+                     Pitfall #13 Faulty interpretations of the reported results.""" }
 class LocalLLM():
   def __init__(self, model_name):