File size: 3,417 Bytes
ae40a8f
 
1a7f6aa
 
 
 
 
fddbf7d
ae40a8f
1c48074
1a7f6aa
 
e13ab15
1a7f6aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fddbf7d
 
 
1a7f6aa
 
e13ab15
1a7f6aa
fddbf7d
1a7f6aa
fddbf7d
1a7f6aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fddbf7d
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: apache-2.0
language:
- cy
- en
datasets:
- yahma/alpaca-cleaned
- allenai/MADLAD-400
---
![Mistral-7B-Cymraeg-Welsh](https://huggingface.co/BangorAI/Mistral-7B-Cymraeg-Welsh-v2/resolve/main/draig.jpeg)   
# Mistral-7B-Cymraeg-Welsh-v2  #

This is a bilingual Mistral chat / instruct model trained in both English and Welsh languages.

The model is based on [BangorAI/mistral-7b-cy-epoch-2](https://huggingface.co/BangorAI/mistral-7b-cy-epoch-2) which is a continual pre-training of the [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model with Welsh data from the [allenai/MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) dataset for 2 epochs.

The model was then fine-tuned using the [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) dataset in both Welsh and English languages, also for 2 epochs.

## Demo ##

An online demo of the model can be found at [https://demo.bangor.ai](https://demo.bangor.ai)

It's an experimental LLM, so don't take any response from the model seriously or as factually correct. You are responsible for any output you generate.

## Format ##

The LLM uses the Llama-2 format for its prompts:
```
<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]
```

The language of the system prompt will guide the LLM as to which language it should respond in.
For example, in English:
```
<s>[INST] <<SYS>>
You are a helpful assistant that responds truthfully, logically and in detail. Answer in English.
<</SYS>>

{{ user_message }} [/INST]

```

Similarily, for responses in Welsh:

```
<s>[INST] <<SYS>>
Rydych chi'n gynorthwydd cymwynasgar sy'n barod i ateb unrhyw gwestiwn yn ffyddlon. Ymatebwch i gwestiwn y defnyddiwr yn llawn a gyda ffeithiau cywir yn y Gymraeg.
<</SYS>>

{{ user_message }} [/INST]

```

---


# Mistral-7B-Cymraeg-Welsh-v2  #

Mae hwn yn fodel sgwrsio / cyfawryddo Mistral dwyieithog wedi'i hyfforddi yn y Gymraeg a'r Saesneg.

Mae'r model yn seiliedig ar [BangorAI/mistral-7b-cy-epoch-2](https://huggingface.co/BangorAI/mistral-7b-cy-epoch-2) sy'n rhaghyfforddiant parhaus o fodel [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) gyda data [allenai/MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) ar gyfer 2 epoch.

Cafodd y model hyfforddiant cywrian pellach gan ddefnyddio'r dataset [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) yn Gymraeg a Saesneg, hefyd am 2 epoch.

## Demo Byw ##
Mae fersiwn o'r model i'w gael yma am sgwrs: [https://demo.bangor.ai](https://demo.bangor.ai). 

LLM arbrofol ydyw, felly peidiwch a chymeryd unrhyw ymateb gan y model o ddifri.

## Fformat Sgwrs ##

Mae iaith y "system prompt" yn arwain yr LLM i ymateb yn y Gymraeg neu'r Saesneg.
Er enghraifft, ar gyfer y Gymraeg:
```
<s>[INST] <<SYS>>
Rydych chi'n gynorthwydd cymwynasgar sy'n barod i ateb unrhyw gwestiwn yn ffyddlon. Ymatebwch i gwestiwn y defnyddiwr yn llawn a gyda ffeithiau cywir yn y Gymraeg.
<</SYS>>

{{ user_message }} [/INST]

```

Yn yr un modd, ar gyfer atebion yn Saesneg:
```
<s>[INST] <<SYS>>
You are a helpful assistant that responds truthfully, logically and in detail. Answer in English.
<</SYS>>

{{ user_message }} [/INST]

```

---

*Contains information from [allenai/MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) which is made available
under the ODC Attribution License.*