Introduction
Technology of all sorts has seen growing use by linguists of all orientations. But linguistics has split from humanities rather recently, and we’re not on par even with other social scientific disciplines with regard to use of technology, let alone STEM disciplines. That shows especially in the software services we use for dissemination of our work: for example, we have nothing on par with arXiv, for example. Still, there is a growing number of utilities that we can use. But they are all dispersed on the world wide web, underutilised, unrecognised, and underappreciated. One of the most useful type of such utilities/services for researchers is the preprint archive, repositories of preprint versions of research papers, made available freely by the authors. Sadly, we don’t have a single, canonical place to go for all linguistics papers. arXiv is a big, exhaustive repo for many STEM fields. If you’re looking for a computer science paper, you can expect to find it there. If you’re looking for a paper in linguistics, it’s nowhere near as straightforward.
What follows is a hand-picked list of preprint archives for linguistics papers, made with the hope that it can simplify the process of searching for linguistics preprints. Not all of these are exclusively dedicated to linguistics, though.
I intend to keep this list up-to-date, so don’t hesitate to contact me via e-mail or mastodon for suggesting improvements, additions, and for pointing out any errors.
The List
- LingBuzz: LingBuzz is not really a preprint archive, but more of a forum for linguistics discussions that is based around uploaded documents, which may happen to be preprints of journal articles. It is scarcely accessible, rudimentary search, no metadata like e.g. citation info, scarce metadata otherwise, and no RSS feeds or email newsletters to follow new uploads. The front page is not chronological, and not that predictable. Good for following what’s going on in the field at the moment, but doesn’t cut it as a preprint archive.
- cs.CL on arXiv: cs.CL is the “Computation and Language” repository on arXiv.org. You can find computational linguistics papers here. Citations, PDFs, other metadata, and other filetypes available. Great search, including full-text search of documents’ contents. RSS feed available. Main page is chronological. Main downside is its limited scope.
- Rutgers Optimality Archive: «The Rutgers Optimality Archive is a distribution point for research in Optimality Theory and its conceptual affiliates. Posting in ROA is open to all who wish to disseminate their work in OT and related theories of grammar.» Uses Google site search, so I’m not sure it can search full text. The front page seems to be chronological. Some metadata is available, but no citation downloads. Doesn’t seem to have an RSS feed, or any other news aggregator support.
- Semantics Archive: «For exchanging papers of interest to natural language semanticists and philosophers of language.» This archive by the Linguistic Society of America specialises on semantics (if that was not already obvious). The front page has links to browse by author and submission date. The latter is chronological. The way papers and metadata are made available is very clumsy. There is some metadata in
info.txt
files, but very scarce. Not even an abstract. No citation exports. No feeds. But seems to be active. Full text search is provided via Google (thus depending on Google’s indexing). Overall, nice, but left very unpolished. - Cogprints: It is a rather general archive that is meant for cognitive science and other related disciplines, including linguistics. Contains less than 400 linguistics papers within the linked section, from various linguistic subdisciplines. No search or any chronological listing, nor any syndication. Does include detailed metadata and citation exports tho. Not that active, in general and as of recent. Couldn’t find where to upload.
- OSF Preprints: From what I can deduce using terrible documentation and absent explanations, this website seems to be both a free software tool to build your own preprint archives and a metaindex of a bunch of archives that use this software. The project is authored and maintained by the Centre for Open Science. Metadata is available. BibTeX and other citation exports are behind a terrible UI: within a searchable list box named ‘get more citations’, which is within a ‘Citation’ container, which is collapsed by default. Additional files, like data files are supported. No syndication. Linguistics papers can be browsed here. It’s not clear whether search is full-text or metadata only. Website is bad UX-wise, but usable. Too much fluff and complication.
- SSRN This apparently used to be a useful repository network, but it’s been acquired by Elsevier and slowly closed up (see this article from 2016). It’s virtually futile at this point, it seems. Still worth a mention tho because you’ll encounter it when you search online. They have a Linguistics section, and a Cognitive Science one. Some downloads say “Not available for download”, and when available, the link doesn’t work, at least for me. Sad… This is excluded from the tabular comparison.
- Social Science Open Access Repository: This is operated by two German institutions and contains papers from the entire field of social sciences, which are nicely categorised. There is a Literature and Linguistics collection, which is 280-papers strong, and there is recent activity. Citations exports and metadata easily available. Unclear if search is metadata based or full-text. Moderation is handled via only publishing pre/post/reprints of published articles. Original stuff can be published without review or moderation but that’s indicated via a ‘not peer revieved’ tag. Big con: some articles may be put under ‘embargo’ which means there are date restrictions regarding when you can download a paper. The most recent linguistics paper when I checked was made unavailable until June 2021. Sad. Also, no syndication available. German-oriented but English website and papers available.
- Archive ouverte HAL: Belongs to the Centre pour la Communication Scientifique Directe of Centre national de la recherche scientifique of France. Mainly French content. Has a language/linguistics section. Citations exports and metadata available. Chonological sorting by default. Detailed categorisation. Couldn’t find download links tho. I’ll exclude this from the table because downloads weren’t available and I couldn’t figure out much about it.
- PubMed: Contains many linguistics and related articles. Checks many boxes, but apparently it’s not something you can submit stuff. Mostly a meta index with full text links. Not included in the table for that reason, but still worth mentioning.
- viXra: «An alternative archive of 34686 e-prints in Science, Mathematics & Other Scholarly Areas serving the whole scientific community[.]» Linguistics section not really active, dominated by one poster. Mediocre metadata, no citations exports. No moderation. Uses Google site search. Not really useful but still worth mentioning. No syndication.
Tabular comparison of the above
The table is sorted based on a simple accumulative grading system plus an alphanumeric tie-breaker:
- Scope: 2 if general purpose, 1 if not general purpose
- Metadata: 2, if available; 1, if scarce; 0, if unavailable
- Citation exports: 2, if available; 0, if not
- Moderation: 2, if available; 1, if not; 0, if unclear
- Chronological Front Page: 2, if available and default; 1, if option available but not default; 0, if no option and not default
- RSS/Atom Feeds: 2, if available; 0, if not
- Search: 3, if full-text available; 2, if metadata only; 1, if full-text unclear or Google site search; 0, if no search
- Recently active: 4, if yes; 1, if not substantial; 0, if no
- Alphabetically ascending sort if tie
Hover the numbers under the ‘Points’ column to see the individual grades.
Name & Link | Scope | Metadata | Citation exports | Moderation | Chronological Front Page | RSS/Atom Feeds | Search | Recently active? | Points |
---|---|---|---|---|---|---|---|---|---|
cs.CL | Computational Linguistics | Yes | Yes | Yes | Yes | Yes | Full text | Yes | 16 |
SSOAR | Social Sciences, Literature and Linguistics | Yes | Yes | Not really | Option available | No | Available, unclear if full-text | Yes | 13 |
OSF Preprints | General, General Linguistics | Yes | Yes, behind terrible UI | Unclear, possibly instance-dependent | Yes-ish, when you browse tags | No | Available, unclear if full-text | Yes | 12 |
LingBuzz | General Linguistics | Scarce | No | Unclear | No | No | Metadata | Yes | 9 |
Semantics Archive | Semantics (Linguistic, Philosophical) | Very scarce | No | Unclear, seemingly absent | Yes (behind link) | No | Google site search | Yes | 9 |
ROA | Optimality Theory | Scarce | No | No | Yes | No | Google site search | Yes | 8 |
viXra | General, General linguistics | Scarce | No | No | Yes | No | Google site search | Not substantial | 8 |
Cogprints | Cognitive Science, General Linguistics | Yes | Yes | Unclear | No | No | No | No | 6 |
Conclusions
The preprint sources available to linguists are numerous, but not rich. There is no central location for accessing linguistics papers, whether it’s a metasearch engine or a general purpose well-organised feature-complete preprint archive for the field like arXiv. LingBuzz is lively and relevant, but one needs to manually check the front page and remember what they already checked in order to make use of it. OSF Preprints is promising, but its UX is a failure and there is no specific instance for linguistics. viXra has a section for linguistics, but there isn’t much useful stuff in there. Cogprints also has a general linguistics section but it’s stagnant and the website itself is hard to use. The rest of the options cater to specific subdisciplines and interdisciplinary practices.
IMHO, creating a serious arXiv clone for Linguistics, Sociology and Cognitive Science is necessary. I don’t know if it’s a feasible task to retroactively collect preprints into such a central resource, but even if that was not done, a central place to go to for most recent stuff in our discipline, with metadata, citations, preprints available, with decent moderation and sustainable funding would make a lot of difference. LingBuzz is good, but merely a quarter of that, unfortunately.