thought I’d share

Sometimes, I make myself laugh. This is what I just wrote:

Examples 17 through 22b were coded as “reconstructed dialog.” However, the dialog was obviously constructed rather than reconstructed as the interlocutors are styrofoam packing peanuts:

17. They’re all “Pauly, don’t you want to try just one of us?”

One of the questions on my takehome final is a brief study of the usage of the quotative be+all, a close relative of the quotative be+like. We’ve been given a small corpus of 50 examples of this quotative, including some from blogs. Such as this excerpt from this one:

People who eat pennies are stupid. I mean, it’s obvious what’s going to happen. Your body isn’t going to be able to digest a damn penny, okay? Same goes for shards of glass or thumbtacks or pieces of errant plastic or even little lego pieces. I’m past that. I’ve moved on. Matured. But when it comes to these damn packing peanuts they call out to me like they know what I’m thinking. They’re all “Pauly, don’t you want to try just one of us?”” and I’m all “No thanks Mr. Packing Peanut, I think you’ll just give me a stomach ache” and they’re all “Oh, c’mon — what’s the worst that could happen?” and I’m all “I could get sick” and they’re all “Sicker than when you got food poisoning from Pizza Hut?” and I’m all “How did you know about that?” and they’re all “We’re packing peanuts, Pauly — we know all.”

Let this be a warning to all of you. Anything you write and publish on the web could turn into data fodder for linguists, or even worse, linguistics students. Bwahahaha.

Oooh. There’s something I could research: the distribution of spellings of evil laughs on the web. There seems to be a bit of variation in the onset (bwahahaha vs muahahaha vs buahahaha vs mwahahaha) not to mention variability in the number of “ha”s. (We can get from two “ha”s, as in bwahaha, all the way up to…I’m not sure how many. I got as far as googling


and was amused to get 7180 hits, and this:

Did you mean: mwahahahahahahahahahahahahahaha

Hey, remember how I said I like sleep, and should be in pretty good shape to get some tonight? Not if I keep screwing around like this. So I’m all “I totally need to get back to work.”

picturing some numbers

Want to see some really amazing photos? Check out this link, sent to me by a friend.

Here are a couple of numbers learned from this site:

  • 60,000 (The number of plastic shopping bags used in the U.S. every 5 seconds)
  • 1,140,000 (The number of brown paper grocery bags used in the U.S. every hour)

The numbers are staggering, but abstract.

Photographer Chris Jordan has taken numbers like these, and created large-scale works of art that really show the numbers. To give us the sense of the scale of 60,000 plastic bags.

You really need to see the photos to get a sense of them. So, come on. Click on the link

collecting tokens

The word token has many meanings, having synonyms such as symbol, memento, or representative:

    1. I give you this squid as a token of my affection.
    2. I’ll keep these pants forever as a token of my holiday escapades.
    3. I posted this photo of a duck in the dishwasher as a token of the many pictures I’ve taken of random things.

A token can also be a conventionalized object, such as a metal coin or plastic figure, used in place of money for some transactions or used in some sort of group activity, like a game.

    4. I’m not sure what to do with my old subway tokens now that they’ve started using Charlie Cards.
    5. My old Monopoly game was missing half its tokens.

In my world, though, the most frequent use of the word token is the meaning used in linguistics. (Interestingly, the page, with all its various links and definitions from those various sources, doesn’t even mention linguistics.) In linguistics, a token is an instance of some form that is being studied, an item of a particular category or class. It is commonly discussed in terms of the type-token distinction, which has its roots in philosophical usage:

Type (metaphysics)

A type is a category of being. A human is a type of thing; a cloud is a type of thing (entity); and so on. A particular instance of a type is called a token of that thing; so Socrates was a token of a human being, but is not any longer since he is dead. Likewise, the capital A in this sentence is a token of the first letter of the Latin alphabet.

According to the Stanford Encyclopedia of Philosophy,

The distinction between a type and its tokens is an ontological one between a general sort of thing and its particular concrete instances (to put it in an intuitive and preliminary way).

In linguistics (and in related speech and language research) the term token is used to refer to any single instance of some phenomenon or category that’s under investigation, and type is used for some category of which a token is a member. The type-token distinction is often used when investigating words used in a written text. Imagine, if you will, a short text such as:

I like the word pants. I actually like saying the word pants. It’s one of those words that begs to be repeated. Pants. For example, in a discourse on pants, I would hypothesize that speakers would be less inclined to use pronouns to refer to pants than, say, other entities in the discourse. Even if the word pants had just been mentioned, I would still say “pants.”

The text in the block quote above has 63 words. However, it doesn’t have 63 unique words. It has fewer unique words, or word types. I counted 41 unique words, so 41 types. (Mind you, I’m counting things like “say” and “saying” as different words for these purposes, and ignoring punctuation and capitalization.) If we want to look at a particular word type, oh, let’s say maybe the word pants, we can count 7 instances of that word in the text. That’s 7 tokens of pants.

While token is commonly used for a written instance of a word in a text, it can also be used for a larger or smaller unit of speech or language. It could be a spoken production of a sentence, or a production of a single sound segment, like a consonant or a vowel. It could be a gesture. It all depends on what categories, or types, that you are looking at.

For example, let’s say I’m studying phonetic characteristics of a vowel in American English, such as [æ], the vowel in words like bad, pat and pants. I would probably want to collect a large number of instances of words spoken aloud that contain that vowel. If I get a recording of someone reading a list of 5 words with [æ], and I have them read that list 3 times, I end up with 15 tokens of [æ] by that speaker. I could also talk about having 15 tokens of words containing [æ], or even 15 tokens of utterances containing [æ]. If I have 4 speakers all reading that same list, 3 times each, I end up with 60 tokens of [æ].

Here’s an example of the use of the word tokens from a phonetics paper* I grabbed off the web (found by googling “tokens of p”, in case you’re wondering):

This includes all /k/ and /p/ tokens produced, not only those in potentially fricatable environments.

(And yes, I do get off on this stuff.)

The article repeatedly mentions tokens of /p/ and tokens of /k/, and how many tokens of each fit some criteria, or follow some pattern.

Now let’s say we wanted to study the use of the word tokens in that text. (So in this case, our type is tokens.) Using a basic text search, I counted 28 instances of the word tokens. That means that the text contains 28 tokens of tokens.

Much of what I do as part of my research, especially for my various jobs, involves collecting, categorizing and otherwise analyzing tokens. I love this part, collecting and working with the data. It’s the thrill of the hunt. Followed by the thrill of the puzzle. Followed by the thrill of the data organization. (What I must learn to love is the thrill of the write…)


*Loakes, D. and McDougall, K. (2004) “Frication of /k/ and /p/ in Australian English: Inter – and Intra-Speaker Variation” in Proceedings of the 10th Australian International Conference on Speech Science & Technology, pp 171-176.

Nimberpoop, R. (1954) “What’s your deal with the word pants? A study in bizarre philological obsessions.” Sense, Nonsense and Polysemy Quarterly, 3, pp. 4-97.

world health and cool use of animation

Do you know the answers to this set of questions?

In each of the following pairs…

Which country has the highest child mortality?

Sri Lanka or Turkey
Poland or South Korea
Malaysia or Russia
Pakistan or Vietnam
Thailand or South Africa

This is from a pre-test given to Swedish university students by Professor of International Health Hans Rosling of the Karolinska institute, and the answers are given in a lecture that he presented at the TED 2006 conference*. Jenny of Baggage Carousel 4 offers up and discusses the video of this lecture, where you can go to get the answer to the questions. The answers are about a minute and a half into the video, but don’t stop watching there. It’s an amazing lecture, and fun to watch, and, as my friend Jenny puts it:

even if you couldn’t give a fig about international health and development, rosling shows an amazing, dynamic use of data the likes i’ve never seen before.

I’ve never seen the likes of it either. And as someone who works with a lot of data, I was blown away by the power of the visual data presentation, which uses animated graphics and something called Trendalyzer produced by Gapminder. These aren’t your grandmother’s line charts.

And as someone who does give a fig about international heath and development (I’d like to think I give a whole lot of figs about it), and have read books and taken courses relating to these issues (albeit more than a decade ago), I learned a huge amount just from watching that video. Wow.

*(note: I thought it was from the 2007 TED conference, but I see now he talked about something a little different this year. And apparently swalllowed a sword.)