How complex is "slightly non-trivial" ? Hmm. Here's a recent example:
le se viska be mi cu xamgu
Translation:
The thing(s) I (or we) see is/are/were/will be good.
Probably from context, more colloquially:
What I see is good.
Be advised: not only am I not an expert, I class as a newbie with dangerous knowledge. I have read, understood and worked with the formal machine grammar, and I have written some tools to work with parsed utterances. I have not, and probably never will, have any command of the vocabulary.
I'm happy to try to answer questions, and will defer those that are beyond me to one of the lists on which I lurk.
Thanks for the sentence, that's certainly nontrivial enough (I was just trying to dodge 2-3-word-sentences like "jesus wept" or "i like milk", which rarely show anything interesting at the grammatical level).
Not-mocking: there's an apparent lack of number ("thing(s)", "I (or we)"). Is it closer to true that:
- the grammatical root of this apparent lack of # is shared between "thing(s)" and "i (or we)"
...or that:
- "thing(s)" is a gloss of some word that's ~ "specific things not specifically specified"
- "I (or we)" is a gloss of some word that's ~ "who I speak for" (or some other deictic term that's ~ first-person but otherwise underspecified)
...or to some other possibility? Additionally: Given that it's a designed language I'm curious about what underlying intent (whatever it is that explains why the answer is what it is, instead of being something else).
(Disclaimer: wiki's lojban articles have resolved a lot of my other questions, but before I asked you I'd only looked at lojban's wiki's articles, which are mostly unhelpful.)
Short answer: the latter is more true. The reason why {mi} is unmarked for number is simply that it’s defined that way.
In Lojban, everything is unmarked for number by default. It’s actually quite rare to see things explicitly marked for number, as it’s usually either irrelevant or obvious from context.
The pronoun {mi} is technically unmarked for number, but is restricted to refer to people that the speaker represents, just as you guessed. For example, it would usually be weird or incorrect to use {mi} to mean “we” in the sense of “me and you”, since representing the very people you are talking to is a rare situation — although theoretically you could come up with examples where it would make sense.
So in practice {mi} is usually singular. On the other hand, {do} (which means “you”) is as often plural as it is singular.
There are other pronouns that mean “me and you”, “me and others”, “you and others”, and “me, you, and others” — respectively, {mi’o}, {mi’a}, {do’o} and {ma’a} — which is another reason why the need for plural {mi} seldom arises.
Thanks. I figured grammatical # as such would be discarded as unnecessary but the notion of plurality raises semantic issues that natural language sidesteps by ambiguity.
Disclaimer/Personal Background (trying to be brief): I've often been told I've got an unusual cognitive style (for lack of better term) and I've often felt very much as if there's an impedance mismatch between how my thoughts are structured and how language operates; in essence, at the word-or-sentence level everything I hear or read is very polyvalent and vague, and only take on a concrete meaning to me if I get multiple paraphrases of it...it's putting all the variants into superposition and seeing which parts reinforce or cancel shows me the contour of the actual meaning (which itself is not necessarily ever actually "represented in words" so much as "gets the outline of its semantic boundaries painted").
In the abstract this leaves me with an interest in the idea of something like lojban but very mixed initial reactions: it's possible an artificial language with more-precise meanings would eliminate my need for doing verbal interferometry across multiple paraphrases but on the other
hand I have a lifetime's experience feeling very uncomfortable without tons of redundancy and repetition-with-alteration, which seems to be what lojban is trying to eliminate in its use.
Too much info, I'll stop there.
I do have two more questions if you have time.
#1 is historical: what's the process by which the core sets of things like spatial relationships or tenses or shapes or so on came to be enumerated?
EG: if I were doing a language in this form I'd go through all the languages I could get my hands on and try to get good lists of all their fundamental categories (eg: spatial prepositions and "classifiers", like you have in swahili and chinese (+ languages with heavy chinese contact) (cf: http://www.jstor.org/pss/413103 ) and then try to factor them into semantic atoms. I'd consider this approach bottom-up (see what's out there, and then try and simplify and unify them) and contrast it with a more top-down approach (trying to derive a finite set of spatial relations ab initio via pure reasoning); it'd also be a good set of "unit tests" for your final set of core concepts, making sure that none of these words' senses are not really expressible in terms of your base concepts.
How did the lojbanists derive their tenses / spatiotemporal prepositions / etc.? Is there a good "history of the design of lojban" that speaks to this?
Question #2: at a practical level how would you decompose "There are dogs in the kitchen" into lojban?
If I had to break it into predicates it'd probably be the conjunction:
- T ~ whatever containment type you have that is ~ "contains within its spatial bounds -- but not structurally -- for an indeterminate time period"
- COUNT(E) > 0
- ENTITY-COLLECTION-TYPE(E,X), where X ~ "collection treated as collection due to spatiotemporal circumstance and descriptive convenience" (EG: E is an entity collection b/c there are label(s) they all share, namely being instance-of dog and contained-in-the-kitchen in the same way; there's no assertion of any other source of entity-identity beyond the circumstances this utterance is describing; contrast to say "baseball team" or "deck of cards", etc., which are entity-collections with a more-persistent and "intentional" identity)
- forall e in E IS-INSTANCE-OF("instance-of-type IoT",e,"dog")
- "instance-of-type" ~ whatever instance-of you have that is ~ "is a concrete instantiation of an abstract type not otherwise specified (eg: an actual 'dog', not 'Pomeranian')
- + some temporal modifier to explain like "the described circumstance started before I made this utterance and I do not think it has ceased, yet"
...but I'd assume some of the intended distinctions are usually left implicit or inferred; what's a good lojban decomposition?
Something similar was done, but it was explicitly recognised that the purpose of lojban was not to generate "the semantic primes of language." Such as exercise is regarded by some linguists as meaningless, and by others as too difficult. Instead, concepts were listed, and from them a "covering set" was extracted. Similarly tenses, both spatial and temporal.
After the concepts were agreed, it was expressed in each of the (then) six major world languages. The words thus obtained were put through a weighting algorith,=m to try to find a "word" that had components of each, and that became the lojban word for that concept.
Thanks for the response, I do appreciate it. I wasn't aware of how active lojban still is (and it's much more accessible to to get information on thanks to the internet).
I should point out that I'm fairly familiar with the general range of opinion in the linguistics community (as an undergrad I did dual math / linguistics, which made me at that time quite the rara avis, though it's more common now apparently).
Generally I don't give much credence to the idea of semantic primes (at all, not just in some pragmatic sense) but for stuff like spatial relationships + tenses (+ aspect, mood, etc.) it'd seem not an impossible undertaking (do enough reading in linguistic typology and you start seeing enough "repeats" to think such an enumeration might be possible).
After going through a bit of the grammar and the vocab list on wiktionary it seems like you'd have constant problems with synecdoche, which'd bother me (but perhaps only me, and it's not as though natural languages aren't riddled with similar problems).
I've walked away from this with a much stronger sense of the sense in which lojban is attempting to be a logical language, thanks for your time.
It's heartening to see substantial effort put into engineering language; good luck with your efforts.
In lojban, as in Chinese (I believe both Mandarin and Cantonese, although I'm not an expert), number is not implicit and is usually either irrelevant or determined by context. There are mechanisms for specifying number. Ditto tense. Things can be left even more fuzzy, or made more precise.
I think the easiest way to sum up Lojban is this (my understanding based solely on reading ABOUT Lojban, I don't know a single Lojban word... aside from Lojban):
Anything that can be said in any language can be said in Lojban. Anything that can be left unsaid in any language can be left unsaid in Lojban.
Thus while it is not possible to use the verb "to be" in English without expressing a tense, it is in some languages, therefore it is in Lojban. Presumably it is also possible to specifiy a tense when saying "to be" in Lojban. It can be done in English, therefore it can be done in Lojban.
I'm happy to try to answer questions, and will defer those that are beyond me to one of the lists on which I lurk.