Here at Delvin, we feel that one of the greatest challenges in learning a foreign language is making the jump from "book learning" to understanding real-life language. This is especially true with a difficult language like Japanese. Many have argued that having large amounts of comprehensible input is a key requirement for language acquisition (Input Hypothesis, Antimoon, AJATT). And it's all the more effective if the input is fun and incremental.
Native-language content by itself is not sufficient to do this well. Learners need accurate translations and word-by-word breakdowns of material for it to be really useful. So it seems like what's needed is not just better apps, but better data. Taking Japanese as an example, there are countless learning apps and sites, but only a handful of high-quality datasets that power them, like the ubiquitous JMdict and Tatoeba. Tatoeba is great, but is relatively lacking in audio/video and most of its sentences are not natural, native text but derived translations. What if there was an open dataset designed for the purpose of providing authentic comprehensible input?
Inspired by this, we've been working on an experiment called Delvin Data. Our idea is start with an open, structured database of language, based on video clips from authentic, immersive sources (e.g. those meant for natives). Authentic video makes it fun; structure (esp. indexing by words) lets it be customized and incremental. Given that data, what kinds of new apps/sites/decks could be made? We've built one study site using the data (the SRS-based Delvin), and it seems promising to us.
We're ready to take the idea further if there's enough interest from the learning community. If this is something you want to see developed further or want to collaborate, let us know!
#10632 | 芸者 | げいしゃ | geisha, Japanese singing and dancing girl |