Research
- J. Tiedemann, 2016, Finding Alternative Translations in a Large Corpus of Movie Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
Gain
Closest of natural oral corpora.
Links
- Portal
- bre.txt.gz -- Bretonl corpus.
- 60+ languages available.
- List:
af,ar,bg,bn,br,bs,ca,cs,da,de,el,en,eo,es,et,eu,fa,fi,fr,gl,he,hi,hr,hu,hy,id,is,it,ja,ka,kk,ko,lt,lv,mk,ml,ms,nl,no,pl,pt,pt_br,ro,ru,si,sk,sl,sq,sr,sv,ta,te,th,tl,tr,uk,ur,vi,ze_en,ze_zh,zh_cn,zh_tw
There are ready-to-download open licence Wikipedia corpora available.
| Project introduction |
Type |
Languages (2024) |
Portal all |
Language specific |
Download link |
Comments |
OpenSubtitles 2016/2018
|
Subtitles Parallel sentences Monolingual sentences |
75 |
Portal |
br&en |
bre (mono) |
'''Source:''' * P. Lison and J. Tiedemann (2016), ''"OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles"'', http://stp.lingfil.uu.se/~joerg/paper/opensubs2016.pdf . '''Licence:''' unclear, "The corpora is made freely available to the research community on the OPUS website" − Lison and Tiedemann (2016). |
Research
Gain
Closest of natural oral corpora.
Links
af,ar,bg,bn,br,bs,ca,cs,da,de,el,en,eo,es,et,eu,fa,fi,fr,gl,he,hi,hr,hu,hy,id,is,it,ja,ka,kk,ko,lt,lv,mk,ml,ms,nl,no,pl,pt,pt_br,ro,ru,si,sk,sl,sq,sr,sv,ta,te,th,tl,tr,uk,ur,vi,ze_en,ze_zh,zh_cn,zh_twThere are ready-to-download open licence Wikipedia corpora available.
Parallel sentences
Monolingual sentences
br&en