Division of Social and Organizational Psychology

Version française
See also the good references guide

The PROTAN software of

computer-aided content analysis

Presentation

PROTAN (for PROTocol ANalyzer) is a computer-aided content analysis system. Being aided by the computer means, in the present case, that PROTAN does the many tedious tasks of textual analysis that a human being can do but generally avoids doing, like counting words. Not infrequently, without further notice, PROTAN will do its job "by default", that is, by assuming that parameters have the values given initially to the system. For instance, some system's tasks require as little information as a semicolon, picking in its memory for the rest of the information required. Never, however, PROTAN does automatic content analysis.

What kind of text can one analyze with the help of PROTAN?

PROTAN can handle any textual material such as narratives, clinical interviews, scientific publications, titles or abstracts of scientific journals through their publication years, poetry, advertising blurbs, and many other forms of textual material. The limitations of PROTAN are those imposed either by statistical constraints, by the unavailability of dictionaries necessary for analyzing a particular sort of text, or by the lack of hypotheses on the analyst's side.

The text itself must be presented to PROTAN in ASCII format and may not spill over column 70. Columns 73 to 80 are filled with indications of interview, unit, and speaker that mean what the analyst has decided them to mean.

What does PROTAN do to help one analyze a text?

The aims of the PROTAN software

PROTAN is tuned to two very different tasks, corresponding to two different content-analytic strategies (Weber, 1983). In the first one, PROTAN addresses the question of how does the text look like. Is it generally abstract, does it become ever more abstract, or less? What is the profile of the main affective connotations (Anderson & McMaster, 1986) of the text? For example, one could show that the general mood in Hamlet -using Whissell's dictionary of affect (Hogenraad, McKenzie, & Martindale, 1997; Whissell et al., 1986)- progresses as an inverted-U, with the second branch of the inverted-U going much lower than the first one. Such a finding does not cut ice: We always suspected it to be so. This is the very reason to pick this classic text. To achieve this first task, PROTAN rests on a series of semantic dictionaries that are part of the system.

The second task to which PROTAN is tuned is to answer the question of what the text is talking about. What are the main themes in it? A theme, like any interest, is never fixed. We usually want to know how the interests in a text come and go. The trick of PROTAN, as of Iker's WORDS (Iker & Klein, 1974) system from which we got the idea, is to postulate that there is enough information in the relations between words to allow for themes to emerge by simply analysing these relations.

The tools of the PROTAN software

To accomplish its tasks, PROTAN avails itself of three tools. These are the segmentation, the lemmatization, and the dictionaries.

Segmentation stands for what it means to each of us. One has to divide the text into as many parts as one feels appropriate. If possible, these segments should be meaningful, i.e. letters, chapters of a book, or acts of a playwright. One can also divide the text into artificial units, i.e. segments of 700 words each, or one may have reasons to decide that one needs to divide the text into 20 equal parts.

One program takes care of the job of segmenting. Its name is CSCUT. This program can be complex. This step must be taken great care of. Indeed, all further analyses depend on it.

Lemmatization is a barbarism to designate the operation by which the various endings of words (plurals, conjugations, etc.) are transformed into a simpler form, for example, the infinitive for verbs.

Dictionaries are systems of categories (great dimensions of the mind) that an analyst may be interested in. PROTAN is equipped with several such dictionaries in different languages. PROTAN is indeed moderately polyglot.

Standard Operating Procedures

PROTAN is composed of 30 programs. These programs are modular. This means that each of them has a specific role in a chain. For instance, program CRWSTRIP, that lemmatizes words, takes its input from program CSCUT (the one that takes care of segmenting texts) and produces an output (a system file) to be processed by other ones.

All programs produce at least one output, i.e. a listing of results. Occasionally, programs produce several outputs: a list of results and either a system file ready to be used by the next program or a numeric file to be processed by some statistical package, or both. In our analysis of Hamlet, the output from the comparison between text and dictionary is sent out to the SAS statistical package for polynomial analysis. We did not equipped PROTAN with statistical software.

A list of programs

Following is a list of programs that are currently part of PROTAN. These are the things that the system can do. Not all these programs are necessary to have a successful run. Many of these programs are for creating or editing dictionaries, or striplists, or for editing the text. For convenience, the list is alphabetical.

Platforms

A distinctive feature of PROTAN is its portability to several platforms, DOS, UNIX, and Macintosh.

There is no installing procedure; the user can install immediately the 30 programs and organize the inputs (texts, strip dictionaries, parameter files) and outputs (listings and punch files) as preferred. Punch files are formatted to be easily exported towards most statistical packages.

Technical specifications

There are no minimal computer requirements, but with corpora over 100,000 words, PROTAN will run faster on powerful platforms such as a UNIX one. PROTAN is written in C. Each program has been tested in several studies that used PROTAN as a support. PROTAN has never been submitted for reviews in computer software magazines or scientific journals.

Further information

Further information or request for assistance concerning the software PROTAN may be obtained from Robert Hogenraad:

Office:
Dr. Robert Hogenraad
Psychology Department, Catholic University of Louvain
10 place du Cardinal Mercier
B-1348 Louvain-la-Neuve, Belgium
Ph.: ..32-(0)10-47 4411
Fax: ..32-(0)10-47 3774
E-mail: hogenraad@upso.ucl.ac.be

Private:
63 Avenue Constant Montald, B-1200 Brussels, Belgium
Ph. & Fax: ..32-(0)2-763 2012

Documentation and references

User's manual:

Hogenraad, R., Daubies, C., & Bestgen, Y. (1995). Une théorie et une méthode générale d'analyse textuelle assistée par ordinateur. Le système PROTAN (PROTocol ANalyzer) (Version March 2, 1995). Louvain-la-Neuve, Belgium: Psychology Department, Catholic University of Louvain. (In French).

Bibliographic references :

Anderson, C. W., & McMaster, G. E. (1986). Modeling emotional tone in stories using tension levels and categorical states. Computers and the Humanities, 20(1), 3-9.

Bestgen, Y. (1994). Can emotional valence in stories be determined from words ? Cognition and Emotion, 8(1), 21-36.

Hogenraad, R. (1991). Retratos de Fernando Pessoa. Revista de Comunicação e Linguagens, 14, 91-110.

Hogenraad, R. (1994). Über den Versuch, das Leben der Wörter zu messen. Inhaltsanalytische Verfahren und Literatur. Achim Barsch, Gebhard Rusch, & R. Viehoff (Eds.), Empirische Literaturwissenschaft in der Diskussion (pp. 306­323). Frankfurt am Main: Suhrkamp.

Hogenraad, R., & Bestgen, Y. (1989). On the thread of discourse: Homogeneity, trends, and rhythms in texts. Empirical Studies of the Arts, 7(1), 1-22.

Hogenraad, R., Bestgen, Y., & Durieux, J. F. (1992). Psychology as literature. Genetic, Social, and General Psychology Monographs, 118(4), 455­478.

Hogenraad, R., Bestgen, Y., & Nysten, J.­L. (1995). Terrorist rhetoric: Texture and architecture. In E. Nissan & K. M. Schmidt (Eds.), From information to knowledge. Conceptual and content analysis by computer (pp. 54­67). Oxford, England: Intellect.

Hogenraad, R., Boulard, R., & McKenzie, D. (1994). Les mots qui ont fait les relations industrielles. Québec: Presses de l'Université Laval.

Hogenraad, R., Boulard, R., & McKenzie, D. P. (in preparation). An assessment of the creativity of industrial relations journals: An integrative view. Journal of Organizational Behavior.

Hogenraad, R., Kaminski, D., & McKenzie, D. P. (1995). Trails of social science: The visibility of scientific change in criminological journals. Social Science Information, 34(4), 663-685.

Hogenraad, R., McKenzie, D. P., & Martindale, C. (1997). The enemy within: Autocorrelation bias in content analysis of narratives. Computers and the Humanities, 30 (6), 433-439.

Hogenraad, R., McKenzie, D. P., Morval, J., & Ducharme, F. A. (1995). Paper trails of psychology: The words that made applied behavioral sciences. Journal of Social Behavior and Personality, 10(3), 491-516.

Iker, H. P. & Klein, R. H . (1974). WORDS: A computer system for the analysis of content. Behavior Research Methods & Instrumentation, 6(4), 430­438.

Weber, R. P. (1983). Measurement models for content analysis. Quality and Quantity, 17, 127-149.

Whissell, C., Fournier, M., Pelland, R., Weir, D., & Makarec, K. (1986). A dictionary of affect in language. IV. Reliability, validity, and applications. Perceptual and Motor Skills, 62, 875­888.


Contact for information: Robert Hogenraad (mailto:%20robert.hogenraad@psp.ucl.ac.be)
Counter: