Contacts

Cybernetic methodology of content analysis. Content analysis as a research method. Content analysis - method description

Content analysis(from English, contens - content) - a special rather rigorous method of qualitative and quantitative analysis of the content of documents in order to identify or measure social facts and trends reflected in these documents. Its peculiarity is that it studies documents in their social context.

Content analysis can be used as the main research method (for example, in researching the social orientation of a newspaper); parallel, i.e. in combination with other methods (for example, in the study of the effectiveness of the functioning of the mass media); auxiliary or control (for example, when classifying answers to open-ended questionnaires).

Not all documents can be the object of content analysis. It is necessary that the investigated content makes it possible to set an unambiguous rule for the reliable fixation of the required characteristics (the principle of formalization), as well as that the content elements of interest to the researcher occur with sufficient frequency (the principle of statistical significance). Most often, the objects of research through content analysis are messages from the press, radio, television, mass oral agitation and propaganda, minutes of meetings, letters, orders, instructions, etc., as well as data from free interviews and open-ended questionnaires.

There are three main areas of application of content analysis:

a) identifying what existed before the text and what was reflected in it in one way or another (the text as an indicator of certain aspects of the studied object - the surrounding reality, the author or the addressee);

b) determination of what exists only in the text as such (various characteristics of the form - language, structure and genre of the message, rhythm and tone of speech);

c) identifying what will exist after the text, i.e. after its perception by the addressee (assessment of various effects of exposure).

There are several stages in the development and practical application of content analysis. After the topic, tasks and hypotheses of the research have been formulated, the categories of analysis are determined, i.e. the most general, key concepts corresponding to research objectives. The system of categories plays the role of questions in the questionnaire and indicates which answers should be found in the text. In the practice of Soviet content-analytical research, a fairly stable system of categories has developed at one time, among which one can name such as sign, goals, values, theme, hero, author, genre, etc. Content analysis of mass media messages is becoming more and more widespread. , based on a paradigmatic approach, according to which the studied features of texts (the content of the problem, the reasons for its occurrence, the problem-forming subject, the degree of intensity of the problem, ways of solving it, etc.) are considered as a structure organized in a certain way. The categories of content analysis should be exhaustive (i.e., cover all parts of the content determined by the objectives of this study); mutually exclusive (the same parts should not belong to different categories); reliable (i.e. there should be no disagreement between encoders about which parts of the content should be categorized); relevant (i.e., correspond to the task at hand and the content under study).

When choosing categories, it is necessary to avoid two extremes: choosing too many and fractional categories that almost repeat the text, and choosing too large categories, because this can lead to oversimplified, superficial analysis. Sometimes it is necessary to take into account the missing elements of the text, which can be significant.

After the categories are formulated, it is necessary to choose the appropriate unit of analysis - a linguistic unit of speech or an element of content, which serve as an indicator of the phenomena of interest to the researcher in the text. Complex types of content analysis usually operate not with one, but simultaneously with several units of analysis.

Units of analysis taken in isolation may not always be interpreted correctly, therefore they are considered against the background of broader linguistic or meaningful structures that indicate the nature of the division of the text, within which the presence or absence of units of analysis - contextual units - is identified. For example, for the unit of analysis “word”, the contextual unit is “sentence”.

Finally, it is necessary to establish a unit of account - a quantitative measure of the relationship between text and extra-textual phenomena. The most commonly used counting units are time-space (number of lines, area in square centimeters, minutes, broadcast time, etc.), the appearance of signs in the text, the frequency of their appearance (intensity).

The choice of the necessary sources subject to content analysis is important. The sampling problem involves the choice of the source, the number of messages, the date of the message, and the content to be examined. All these parameters of the sample are determined by the objectives and scope of the study. Most often, content analysis is carried out on a one-year sample: if it is a study of the minutes of meetings, then 12 minutes are enough (by the number of months), if the study of media reports is 12-16 issues of a newspaper or TV-radio day. Typically, the sample of mass media messages is 200-600 texts.

A prerequisite for content research is development of a content analysis table- the main working document with which it is carried out. The type of table is determined by the stage of the study. So, developing a categorical apparatus, the analyst compiles a table, which is a system of coordinated and subordinated categories of analysis. Such a table outwardly resembles a questionnaire: each category (question) presupposes a number of attributes (answers) by which the content of the text is quantified. The questionnaire table can be quite voluminous.

To register units of analysis, another table is compiled - a coding matrix:

Sign Text
1 2 3 n Σn
A +
V + +
WITH + +
...
n
Σn

If the sample size is large enough (over 100 units), then the encoder, as a rule, works with a notebook of matrix sheets. If the sample is relatively small (up to 100 units), then two-dimensional or even multivariate analysis can be carried out. In this case, each text must have its own coding matrix. However, this work is very laborious and painstaking, therefore, with large sample sizes, the comparison of features of interest to the researcher is carried out on a computer.

Sometimes a table may be necessary at the stage of quantitative data processing. For example, when using the analysis of chances developed by the American social psychologist Charles Osgood, the so-called. random matrix:

Real coincidence Expected match
A V WITH n Σn
A - 0,15 0,02
V 0,05 - 0,06
WITH 0,08 0,12 -
... -
n -
Σn -

With the help of such a matrix, measures of the randomness of the coincidence of each classification unit with all the others are revealed. For example unit A occurs in 30% of analyzed texts (P = 0.3), and the unit V - in 50% of texts (P = 0.5), then the expected frequency of the joint occurrence of these units will be equal to: RAV= RA Pw = 0.3 0.5 = 0.15. In reality, the signs A and V met together in only 5% of texts AB = 0.05. By comparing the expected and real coincidences of features, it is possible to determine which actual dependencies were not accidental (for example, from the table above, it can be seen that the joint occurrence of units A and V - random, because the actual match is less than expected, and the units V and WITH- not accidental, i.e. actual match is higher than expected). The purposes of using this matrix can be different: to trace the randomness-non-randomness of the coincidence of features for testing the hypothesis, to mark stable-unstable pair combinations of features, which may be significant for characterizing the activity of the information sender, etc.

An important condition K.-A. is the development of instructions for the encoder - a system of rules and explanations for the one who will collect empirical information, coding (registering) the given units of analysis. The instructions accurately and unambiguously set out the algorithm of the encoder's actions, provide an operational definition of the categories and units of analysis, the rules for their coding, provide specific examples from the texts that are the object of the study, stipulate how to proceed in controversial cases, etc.

Counting procedure for quantitative content analysis. in general, the classification according to the selected groupings of ranking and measurement of associations is similar to the standard methods. There are also special calculation procedures in relation to content analysis, for example, the Janis coefficient formula (with), designed to calculate the ratio of positive and negative (relative to the chosen position) assessments, judgments, arguments. In the case when the number of positive ratings exceeds the number of negative ones, the Janis coefficient is calculated by the formula

© Sociology: encyclopedia. M., 2003

Term content-analysis first began to be used in the late XIX - early XX centuries. in American journalism (see work: B. Matthew, A. Tenney, D. Speid, D. Whipkins). At the origins of the methodology of content analysis were the American sociologist G. Lassuel and the French journalist J. Kaiser.

So, in the early 60s. G. Lasswell attempted a political science analysis of the media based on formal criteria. He introduced into the scientific circulation a certain abstract unit: "word". The aim of Lassuela's work was to obtain a proper sociological result based on material atypical for sociology: the texts of printed publications. The researcher did a great job, but since the qualitative assessments were not adequately correlated with quantitative methods in Lassuel's method, the results of his work were difficult to verify.

In the early 60s. J. Kaiser developed an original method of statistical analysis of periodicals. It was based on the approach to the test array as an information system. Thus, the Kaiser formulated the theoretical basis for the subsequent spread of sociological methods in the study of all narrative sources, including epigraphic and epistolary material. In the work of J. Kaiser, attention was focused on the external form of the organization of the material: its location, table of contents, design, etc. Kaiser has developed a whole range of research procedures that ensure complete formalization of both a single newspaper issue and a set of similar periodicals. Thus, J. Kaiser formulated a system that allows recording the development of trends in media publications.

The “Kaiser's direction” of the content analysis methodology received its further development in the works of E. Morin. E. Morin introduced into scientific circulation the term "unit of information" - a semantic block, the content of which answers the question: "What are we talking about?" The latter circumstance made it possible to study any forms of organization of textual material, both at the terminological level and at the level of a phrase, paragraph, article, and even whole books. Thus, E. Moren destroyed the criterion of homogeneity, which was previously used in the statistical processing of narratives. Instead, she proposed the ideology of "semantic groups", which, in her opinion, should be taken into account on a thematic basis. In addition, E. Moren developed the concept of the "tone" of the material, which was determined sociometrically: "positive information", "negative", "neutral".

An important contribution to the development of content analysis was made by Russian and Estonian sociologists, especially A.N. Alekseev, Yu. Vooglaid, P. Vihalemm, B.A. Grushin, M. Lauristin and others.

To study the content of the text, the method of content analysis is traditionally used. The word "content" means the content (or content) of a document. In this case, a document is understood not only as an official text (such as an instruction or legal law), but everything written or spoken, everything that has become a communication. So, according to V.A. Yadov, any information recorded in printed or handwritten text, on magnetic tape, on photographic or film tape is called documentary.

Books, newspaper or magazine articles, advertisements, television appearances, films and videos, photographs, slogans, labels, drawings, other works of art, as well as, of course, official documents are subject to content analysis. Currently, due to the active use of electronic means of communication, electronic documents are also analyzed.

There are various definitions of content analysis, some of them differ in their views on the quantitative and qualitative aspects of the method. So, there are two points of view on content analysis:

Content analysis is an independent method, different from the usual content analysis of documents

The fundamental difference between these methods of analysis lies in the clearly expressed severity, formalization, systematization of content analysis. It is aimed at developing a quantitative description of the semantic and symbolic content of the document, at fixing its objective features and counting the latter .

Fedotova L.N. highlights such characteristics inherent in the method: complexity, thoroughness, punctuality, labor intensity.

Being a supporter of this point of view on content analysis, V.A. Yadov defines content analysis in the following way - it is a translation into quantitative indicators of mass text (or recorded on tape) information with its subsequent statistical processing.

The second view takes into account both types of analysis.

Content analysis includes both quantitative and qualitative analysis of the text.

The first complements the second, and their combination deepens the understanding of the meaning of any text. Content analysis allows you to discover in a document that which escapes the superficial view of its traditional study, but which has important social meaning.

So, from the point of view of the types of analysis carried out, there are two types: quantitative and qualitative content analysis. When conducting quantitative content analysis, the frequency of appearance in the text of any units is analyzed, be it mentions of topics or company names. The definitions of qualitative content analysis are rather vague, they most often say that when conducting a qualitative analysis, conclusions are made based on the presence of a fact in the text. In fact, we are talking about the interpretation of the content of the text, which is often found in historical science and philology. For this reason, it is more accurate to call quality content analysis interpretive.

In the Western research tradition, content analysis is unambiguously viewed as a quantitative method. There is no doubt that quantitative content analysis has a wider scope and reliability than qualitative one. One of the most significant reasons is the objective nature of the quantitative indicators, while the interpretation is almost always subjective. However, the interpretation of the results of quantitative analysis also has subjective elements.

According to a number of sociologists (Markoff, Shapiro, Weitman, etc.), content analysis could be called “textual encoding”, since it involves obtaining quantitative information about the content of a document based on its encoding.

So, quantitative Content analysis is primarily interested in the frequency of occurrence in the text of certain characteristics (variables) of the content.

Quality content analysis allows conclusions to be drawn even based on the presence or absence of a particular content characteristic.

To the question: "in what cases should one not resort to quantitative analysis?", V.A. Yadov answers: if we are dealing with unique documents, where the main goal of the study is a comprehensive meaningful interpretation of the material.

Qualitative data differ from quantitative data in that the content of the latter carries a meaning that directly characterizes their carrier, while quantitative data indicate the scale, volume, intensity of the characteristics of the phenomenon under study. Qualitative data make it possible to reveal the meaning of a social phenomenon, quantitative data show how often it happens or how intensely it is represented in social reality. Qualitative data indicate the subject of research, quantitative data show how strongly it is manifested in the object. Continuing this kind of reasoning, we can conclude that some data are more focused on creating a judgment about a social phenomenon, others - on assessing the significance or testing this judgment. These differences in the nature of the two types of data led to the fact that the so-called qualitative research (research based on the collection and analysis of qualitative data) began to be associated more with the stage of generating or constructing a theory, and quantitative research with its verification.

The fact that qualitative methods are assigned a secondary role significantly narrows their possibilities, according to B. Gleser and A. Strauss, who put forward the "grounded theory". The authors place their qualitative research method - “grounded theory” - between a content analysis approach and an approach that offers some tentative ideas and hypotheses. Classical content analysis proposes the following model: first, an encoding model is set, and then the data is systematically collected, evaluated and analyzed according to predetermined, unchanging and uniform scales for all of them, which make it possible to give qualitative (verbal) data a quantifiable form.

Glaser and Strauss' method involves constant comparison and regrouping of data. The goal of the continuous comparison method, which combines coding and analysis, is to generate theory more systematically than the second approach suggests, using expanded coding and analytic procedures.

The comparative method is used at every stage of the analytical process of building a sound theory. It includes the following procedures: coding, identification of key categories, theoretical selection and formation of a theoretical sample, theoretical saturation and integration of theory.

Stages of content analysis

Determination of tasks, theoretical basis and object of research, development of a categorical apparatus, a set of relevant qualitative and quantitative units.

Drawing up a coding instruction.

Aerobatic text encoding

The encoding of the entire array of the studied texts.

Statistical processing of the obtained quantitative data.

Interpretation of the obtained data based on the objectives and the theoretical context of the research.

Content analysis consists of a number of stages: selection of materials, selection of a unit of analysis, counting of units and, finally, interpretation of the results. From the point of view of pure methodology, the selection of materials is preliminary in nature. After determining the topic, a potential range of sources is identified, in which the information of interest may be located. Then, from this information, one is selected that contains information that is significant from the research point of view. The selected materials are further analyzed. In the classical descriptions of the method, it is stipulated that with a large volume of more or less homogeneous sources, it is permissible to analyze not the entire array of information, but only part of it.

Describing the content analysis procedure, several stages can be distinguished, namely:

1st stage of research: Determination of tasks, theoretical basis and object of research, development of a categorical apparatus, a set of relevant qualitative and quantitative units.

This stage is directly related to the preparation of the research program. It has the character of a qualitative analysis, which prepares the translation of the semantic content of a text into a digital expression for its subsequent quantitative analysis. For these purposes, on the basis of the tasks and theoretical context, the object of research is selected and specific units of analysis are determined.

2nd stage: Drawing up a coding instruction.

At this stage, the categories and subcategories of content analysis are correlated with specific content elements of the text, i.e. the search for indicators of the selected research categories occurs in the text. Here either the corresponding dictionary of category indicators is compiled, or a detailed description of the categories in terms of the studied texts is given. All categories and subcategories of content analytical research are coded, i.e. they are given certain numeric or alphabetic designations, which is code of this study. All this is included in the encoding instruction. It also includes the designation of the information mark. It is usually defined as a "positive", "negative" and "neutral" ratio, which is respectively coded as +, -, 0.

The compilation of a coding instruction is very important, since, in essence, the main provisions of the research methodology find their concrete expression in it. In addition to the appropriate definition of categories and subcategories and other units of analysis, the coding instruction includes coding rules, stipulates controversial cases, etc. When compiling a specific code in the categories, a subcategory "other" is provided, which includes those indicators of this category that are not included in the selected subcategories, but nevertheless are its referents and therefore should be fixed in the frequency (and volume) of its mentions. The need to include the subcategory "other" is due to the fact that it is impossible in advance, and often it is not necessary to provide for all subcategories.

3rd stage: Aerobatic text encoding

At this stage, the encoding of a part of the text array under investigation is carried out in order to test the methodology described in the encoding instruction. Text encoding is a procedure for direct translation of qualitative, semantic units (categories, subcategories) through finding their indicators in the text into quantitative units, i.e. translation of texts into symbols - codes (numbers or letters that indicate certain subcategories in the coding instruction). Such aerobatic coding makes it possible to check the reliability of the technique, i.e. test it on validity(compliance with the objectives and theoretical concepts of the research) and sustainability(reproducible results)

The substantiation of the completeness of the volume of the selected semantic units is proved as follows: all semantic units are selected from the first analyzed text, then from the second text - the same units plus those that have not been encountered before, from the third document - the same ones that were encountered in the two previous ones, plus additional ones, etc. etc. After studying 3-5 consecutive texts, in which there is not a single new unit that was not previously fixed in previous documents, it can be assumed that the "field" of semantic units from the material under study has been exhausted.

Data persistence is determined by re-encoding the same documents with the same encoder ("persistence in time") or different encoders using a single instruction ("persistence among analysts").

4th stage: The encoding of the entire array of the studied texts.

The process of quantification is in progress, i.e. translation into digital expression of the entire aggregate of the studied texts. Registration of the frequency (and volume) of mentioning categories and subcategories of content analysis can be done either in pre-prepared tables, or on separate cards and punched cards.

5th stage: Statistical processing of the obtained quantitative data.

This processing is carried out manually or on a computer. Quite often both of these methods are used simultaneously in combination. There are special computer programs that help to carry out the analysis more quickly, such as Content Analysis 1.6, WINMAX, ATLAS / ti, NUDIST, as well as AQUAD, CAQDAS, ETHNOGRAPH.

,

Statistical processing of digital material obtained in the encoding process does not actually differ in its methods from statistical processing of data obtained in other types of socio-psychological research. Percentage and frequency distributions, various correlation coefficients, etc. are commonly used. At the same time, special methods of quantitative data processing are also used (see the formula for the "share" of semantic categories in the total volume of the text, proposed by A.N. Alekseev).

6th stage:Interpretation of the obtained data based on the objectives and the theoretical context of the research.

At this, the last stage of the study, as well as at the first, associated with the preparation of the program, the qualitative aspect of the content analysis is especially vivid, in contrast to the quantitative aspect, which prevails at the intermediate stages. For an adequate interpretation of the results and their correlation with the data obtained using other methods, it is especially important to take into account the broader theoretical and social context.

Formalization, systematization and severity of content analysis is manifested in the following. Before directly analyzing the text of the document, the researcher determines the categories of analysis, i.e. key concepts (semantic units) available in the text and corresponding to those definitions and their empirical indicators that are fixed in the research program. In doing so, it is desirable to avoid extremes. If too general (abstract) concepts are accepted as the categories of analysis, then this will predetermine the superficiality of the text analysis, will not allow delving into its content. If the categories of analysis are extremely specific, then there will be too many of them, which will lead not to the analysis of the text, but to its abbreviated repetition (synopsis). It is necessary to find a middle ground and try to achieve that the categories of analysis were: a) appropriate, i.e. corresponded to the solution of research problems; b) exhaustive, i.e. fully reflect the meaning of the basic concepts of the study; c) mutually exclusive (the same content should not be included in different categories in the same volume); d) reliable, i.e. such that would not cause disagreement among researchers as to what should be attributed to one or another category in the process of analyzing a document.

Content analysis units After defining the system of analysis categories, the corresponding unit of text analysis is selected.

Bogomolova N.N. And Stefanenko T.G. propose to share content analysis units into two large groups:

quality

quantitative.

Qualitative units of content analysis answer the question of WHAT should be counted in the text, and quantitative units answer the question HOW should be counted.

TO quality propose to classify categories and their referents in the text (indicators). It should be noted that various terms are used to denote various units of content analysis, only the main unit of content analysis - a category - is recognized by all authors. A large inconsistency in terminology in the designation of various units of content analysis, to a certain extent, makes it difficult to understand the procedure of this method.

Categories can be subdivided into smaller quality units - subcategories. Indicators categories are those elements of the text, those units of content that serve as referents, qualitative features of the corresponding categories and subcategories. Depending on the specifics of the study, category indicators can be expressed in the form of individual words, phrases, judgments, topics, etc.

A unit of analysis can be taken: a) word b) sentence c) theme d) idea e) author f) character g) social situation h) part of the text united by something that corresponds to the meaning of the category of analysis

When content analysis is the only method of information, they operate not with one, but with several units of analysis at once.

When using the simplest unit of analysis, words, it is very easy to lose the context of the mention. Direct counting of the number of mentions gives the so-called "simple frequencies". However, for comparison, for example, the number of references, such an indicator is not suitable due to the fact that it is not standardized. It becomes necessary to use "relative frequencies", i.e. the number of mentions per unit of text (total number of words in publications, thousand words, number of sentences, paragraphs, publications, etc.).

Quantitative units of content analysis are units of count and units of context.

Context units are used to designate that segment of the text within which the frequency of mentioning the corresponding categories and subcategories is determined. A unit of context can be a sentence, article, answer to a questionnaire, interview, etc. Then set unit of account, i.e. a quantitative measure of a unit of analysis that allows you to register the frequency (regularity) of the appearance of an attribute of an analysis category in the text. Counting units can be the number of certain words or their combinations, the number of lines, printed characters, pages, paragraphs, author's sheets, the area of ​​the text, expressed in physical spatial quantities, and much more.

Bogomolova N.N. And Stefanenko T.G. there are two types of calculations of the frequency of references to categories and subcategories in quantification: a) continuous, terminological, b) segmental, typological.

In a continuous count, all occurrences of indicators of a given category or subcategory are recorded and then counted. In a segmental, thematic counting of category references, only the first occurrence of a given category in a context unit is recorded, and repeated references to this category in this context unit are not taken into account.

The unit of account can be volume- the physical extent or area of ​​texts filled with semantic units. The volume of references to categories can be measured in various ways: by counting the number of lines, printed characters, square centimeters of the area dedicated to a given category, etc.

The coding system should be based on at least one (or several) of the following four characteristics of the text content: frequency, directionality, intensity and space. As noted above, the frequency and volume of occupied space are most often measured. In a research project based on content analysis, a researcher can measure one or all four characteristics. Let us explain what each of them is.

Frequency. It's just fixing and counting whether something is happening or not, and if so, how often. For example, how many older people appear on television in one week? What is their share among all the characters? Or what is the proportion of these programs among the rest?

Directionality. We are talking about indicating the direction of messages within the content of a certain continuum (their positive or negative, supporting or refuting nature). For example, a researcher can develop a list of ways to show the situations in which older persons operate. These methods can be positive (for example, a friendly, wise, balanced person), or negative (for example, obscene, dumb, narcissistic).

Intensity. It is the strength or power of a message in a given direction. For example, the negative characteristic of forgetfulness can be mitigated (forgot to take the keys when leaving home; did not immediately remember the name of a person whom he had not seen for several years) or exaggerated (does not remember his name, does not recognize his children).

Space. The researcher can fix the size of the message or quantify the space it occupies. Written space is measured by counting words, sentences, paragraphs, or the space given to a message on a page (for example, in square inches or centimeters). To measure video and audio texts, you can use quantitative characteristics of time. For example, a character may be present for a few seconds or appear periodically in each scene of a two-hour program.

In general terms, the calculation procedures for content analysis are similar to the standard methods of classification by selected grouping, ranking and scale measurement. To calculate the results of content analysis, specially developed formulas are also used.

A.N. Alekseev proposed the following formula for assessing the "share" of semantic categories in the total volume of the text, indicating the level of intensity of a certain topic presented in the text:

Ucs - "specific gravity" of a given semantic unit

Kgl - the number of cases when the semantic unit turned out to be the main

KW - the number of cases when the same unit turns out to be of secondary importance

E - the sum of the analyzed texts (documents)

A special method developed for the needs of content analysis is C. Osgood's method of analyzing the dependence of elements for calculating the joint occurrence of various elements in a text. The procedure of this technique is that after calculating the joint occurrence of units of analysis, a square matrix of possible and actual joint occurrences of these units in the text is calculated.

Introduction

In the 21st century, information flows have fallen upon a person - the number of television programs has increased, and their format has also changed. The finished product, broadcast on air, has an impact on the audience, forms its social attitudes and opinions. In this regard, it becomes important to study the relationship of the "screen picture" with real life, to what extent this or that program (project) corresponds to what is happening in society with the introduction of subsequent changes and adjustments to the TV content.

The object of the research is the study of the method of content analysis, the subject is the study of its characteristics and application.

The purpose of the course work is to give definitions to the method of "content analysis", to study its structure, to identify the features and methods of application in the study of the TV program and to apply it in practice.

To achieve this goal, it is necessary to solve the following tasks:

  • 1. To reveal the essence of the concept of "content analysis".
  • 2. Consider the components of the "content analysis" method, identify its main functions and application algorithm.
  • 3. Apply the method of "content analysis" (for example, the TV show "Battle of psychics").

The methodological basis of the study was the scientific works of domestic and foreign scientists in the field of sociology, political science and communication theory. As research methods, the work used: the method of analysis and synthesis, observation, statistical and mathematical methods.

The concept of "content analysis"

In sociology, when collecting primary data, four main methods are used: polling (questioning and interviewing), document analysis (qualitative and quantitative "content analysis"), observation (not included and included), and experiment (controlled and uncontrolled). Despite the functional difference between these methods, they have a common internal structure. It can be divided into three groups of elements: normative, instrumental and procedural.

In this chapter, we will look at what a content analysis method is. The emergence of the term "content analysis", which denoted statistically accurate measurements of the content of mass media, refers to the first attempts of this kind of research in American journalism in the late 19th and early 20th centuries.

In the early works of researchers of the American press, the tendency of qualitative analysis and comprehension of the content prevailed, therefore they paid special attention to the development of such a classification of press materials according to the form of presentation and, mainly, by topic, then comparing the volume of materials by category. Specialists in the field of American journalism were engaged not only in the development of various classifications, but also in the general theoretical foundations of the analysis procedure itself, but their categories were too broad. The full-scale use of content analysis as an objective and systematic quantitative description of the explicit content of texts required an increase in its accuracy, and Harold Lasswell was the first to solve this problem, who highlighted the repetition of their individual parts and elements in the texts on the basis of rigorous mathematical calculations.

Lasswell paid main attention to the frequency of using certain "symbols": the more often one or another word occurs, the more significant the information associated with it. He took into account only the "dictionary meaning" of the word, which made the obtained result strict (objective). Considering purely quantitative characteristics as the main criterion for content analysis limited the possibilities of this method, which did not take into account rare topics or "symbols". Content analysis, as given by Lassuel, has become a common method for studying newspaper and magazine press.

An important contribution to the development of content analysis procedures was made by both Russian and Estonian sociologists, especially A.N. Alekseev, Yu. Vooglaid, P. Vihalemm, B.A. Grushin, T.M. Dridze, M. Lauristin.

The method of content analysis seems simple, but the real problem for a researcher is in the selection of semantic units that are significant specifically for the studied content area, and in the correct categorization of the selected texts. In order for the selection of semantic units (categories) to be "correct", and in order for the categorization to be correct (bringing text units under one of the available categories), both expert evidence and a more or less deep preliminary substantive study of the investigated area are needed. ...

B. Berelson characterizes this method as "a research method by which an objective, systematic and quantitative analysis of the plain text is achieved." The sense of objectivity lies in the fact that the categories used in the analysis of content should be defined so precisely that, using them, different people, parsing the same text, get the same result. It also means that all terms and categories containing an explicit grade element must be excluded, i.e. they are very subjective, and their meaning changes with changing situations and times. Systematicity suggests that the choice of a text or part of it for analysis should be carried out on a formal basis, without taking into account the personal interest and predisposition of the researcher. Analysis results should be expressed in mathematical form.

The famous sociologist V.A. Yadov defines content analysis, saying that “this is a translation into quantitative indicators of mass test information with its subsequent statistical processing. M.K. Gorshkov writes that “the desire to avoid subjectivity to the maximum extent, the need for sociological study and generalization of a large amount of information. The orientation towards the use of modern computer technology in processing the content of texts led to the formation of a method of formalized, qualitative and quantitative study of documents _ content analysis, according to which, the content of a text is defined as a set of information available in it, assessments, united into a kind of integrity by a single concept, concept. Formalized analysis of documents deals with text, but is focused primarily on the study of the underlying reality. "

J. B. Mannheim and R. K. Rich define content analysis as “a formalized method that is effective when it is necessary to ensure high accuracy of indicators, to explore extensive unsystematic material. This method is used when there is an opportunity to get acquainted with a material source of information - a newspaper, magazine, book, phonogram, audio or video. The minutes, the transcript of the meeting, the advertising poster, and assumes the systematic processing, evaluation and interpretation of the form and content.

V.I Dobrenkov and A.I. Kravchenko write that “content analysis is a quantitative analysis of texts and text arrays with the aim of subsequent meaningful interpretation of the identified numerical patterns. This type of non-poll research is also called document analysis. Documents (texts) in content analysis mean books, book chapters, essays, interviews, discussions. Newspaper headlines and articles, historical documents, diary entries, speeches, advertising texts.

As a sociological method, content analysis is not used by itself, but as part of a large research project, for which a scientific program has been drawn up, where the goals and objectives, the problem and the object, the theoretical model and the object of research, are put forward. Content analysis allows you to discover in a document that which escapes the superficial glance during its traditional study. It allows you to fit the content of a document into a social context, to comprehend it both as a manifestation and as an assessment of social life.

The advantages of content analysis include:

The ability to accurately record externally indistinguishable indicators in bulk empirical data;

Ability to identify hidden trends and patterns;

The admissibility of the implementation of the delayed analysis of events and situations;

Relative objectivity of procedures and reliability of results;

The absence of manifestations of the effect of the influence of the researcher on the behavior of the subjects.

Along with this, content analysis has some limitations:

The nature of information is largely determined by the intentions of its author and the specifics of the forms of presentation. Therefore, it is possible for the researcher to accept fiction as documentary or omission of any significant data due to their insufficient expression in the processed material;

Distortions of information can also arise through the fault of the researcher, who is unable, for example, to adequately highlight the categories of analysis or take into account all the available options for their verbal expression.

In the terminological dictionary of television, “content analysis of TV programs is a method for studying topics, the content of TV programs. Its task is to reveal how reality is reflected on the screen, to what extent the models created by television correspond to what is happening in society. The system of categories: external signs of the shown reality; contextual signs of the shown reality; character, subject of the image. As a result, the processing system allows obtaining data of a different nature: operational _ description of the content of programs of a certain period, for example, one week; comparative _ description of the content of one period in comparison of the content of TV programs with newspaper articles, with radio programs. Ultimately, content analysis helps to improve broadcasting efficiency. "

When content analysis of TV programs, it is important how, in the context of a TV program, the means of forming and disseminating images are used, as well as distracting attention by creating new informational pretexts (prepositions). Providing a sensation, emotional coloring of an event not related to the main event of the TV story, as well as a means of changing the tone of the text or context through the "last word" of the presenter (reporter). Creating a double standard or dosing the information base, omitting or excluding information, providing a positive (negative) coloring of the presentation of an event through a positive (negative) plot line and its positive (negative) ending (the "sweet" or "poisonous sandwich" method). Means of inflating the details and complicating and complicating the information base (when the point of view or position of the monitoring object is laid out as academically as possible, detailed, using complex scientific terms, which makes such a point of view or position incomprehensible). Researchers pay attention both to the means of manipulating the semantic structure of the utterance (the selection of words that indirectly cause one emotion or another) and to the means of manipulating color and light.

American sociologist B. Turner speaks about the advantages of content analysis, characterizing it as unobtrusive (there is no interaction with the background of the study, which could distort the results) and an indirect method (conclusions are based on what is not directly observed), which gives an idea of ​​the objects that the researcher does not directly observe.

Tasks, functions and basic procedures of content analysis.

Estonian sociologist M. Lauristin summarizes the tasks, subject and object of content analysis in relation to the study of mass communications:

Reality reflection problem;

The area of ​​implementation of the goals of the communicator and the social institution that he represents;

The scope of the needs of the audience of mass communication, satisfied by it;

The area of ​​interaction between the communicator and the audience.

Further, a system of indicators is developed in relation to each of the named aspects. For example, for the first aspect (reflection of reality), the following tasks are set: to reconstruct events and phenomena and to establish patterns of reflection of reality by means of mass communication (QMS). The object of analysis here is the content of messages, their subject matter and semantic meanings, and the subject is the picture of the world presented by the QMS. In the last aspect (interaction), the tasks are set to predict the effectiveness of information impact, its social effect and communication relations between various audience groups. The object of communication is the language and structure of the text (what is communicated) and the characteristics of the source of the message, as well as its addressee.

The subject of content analysis can be copies of books, posters or leaflets, newspaper numbers, films, public speeches, television and radio broadcasts. Public and personal documents, journalistic interviews, answers to open-ended questionnaires.

By functions, content analysis is divided:

On search, aimed at testing the hypothesis put forward, identifying unknown trends;

Control, associated with a more accurate definition of already known (more or less) content.

The nature:

Directed when you know exactly what to measure;

Undirected, when the researcher acts intuitively, without systematizing the object of research in advance.

The content analysis algorithm also consists in determining the modality and tone of the text. The modality of the text is the expression in the text of the author's attitude to the communicated, his concept, point of view, position, his value orientations, formulated for the sake of communicating them to the reader. Allocate positive, neutral and negative modality.

Tonality is a category that reflects the psychological attitude of the author and refers to the conceptual field of subjective modality. Materials, part (blocks) of newspaper materials, plots, part of plots are delineated. By neutral, positive (positive) and negative tonality. The tonality of materials, part (blocks) of newspaper materials, plots, part of plots, is determined by:

The presence of value judgments and evaluative vocabulary (epithets, derogatory, diminutive nouns);

By intonation (in writing, there are punctuation marks, especially exclamation and interrogative);

The use of metaphors, comparisons, proverbs, symbols, describing a person, object or phenomenon;

According to the wording of the main message (titles);

By context;

According to the correspondence of the verbal text and the video sequence (photos, illustrations);

According to the correspondence between synchron and video sequence;

According to the layout of part (blocks) of the plot, parts (alternation) of their tonality;

By direct or indirect assessment of the environment of the subjects of content analysis;

By direct or indirect assessment of the subjects of content analysis by other subjects of content analysis;

By direct or indirect assessment of the subjects of content analysis by journalists;

By direct or indirect assessment of the subjects of content analysis by other actors, attested in the plot.

Conducting content analysis requires the preliminary development of a number of research tools. According to S.I. Grigorieva and Yu. There should be five of them:

Content analysis classifier;

Analysis results protocol (content analysis form);

Registration card or code matrix;

Instructions for the researcher directly involved in the registration and coding of account units;

Directory (list) of analyzed documents.

The authors call a classifier of content analysis a general table that brings together all categories (and subcategories) of analysis and units of analysis. Its main purpose is to very clearly record the units in which each category used in the study is expressed. The classifier can be likened to a sociological questionnaire, where the categories of analysis play the role of questions, and the units of analysis play the role of answers.

The protocol (form) of the content analysis contains information about the document (its author, publication time, volume) and the results of its analysis (the number of cases of using certain units of analysis in it and the following conclusions regarding the categories of analysis).

The protocols are filled in coded form, based on the fact that it is necessary to fit all the information about the document for the convenience of comparing the analysis results.

The registration card is a coding matrix in which the number of counting units characterizing the units of analysis is noted. The content analysis protocol for each specific document is filled in on the basis of counting the data of all registration cards related to the analyzed one.

The stages of development and conduct of content analysis include the formulation of topics, tasks, research hypotheses, define the categories of analysis _ the most general, key concepts corresponding to research tasks.

Categories of analysis are semantic units denoting empirical features of textual information, which are the result of the operationalization of the basic theoretical concepts in the research concept. Certain requirements are imposed on the categories of analysis, they must express the theoretical concepts of research, have in accordance with the signs (semantic units) in the text, have the ability to unambiguously register the characteristics that make up these categories.

When choosing categories for content analysis, extremes should be avoided: choosing too numerous and fractional categories that almost repeat the text, and choosing too large categories, because this can lead to oversimplified, superficial analysis. It is necessary to take into account the missing elements of the text, which may be significant for content analysis.

Relevant, i.e. correspond to the solution of research problems;

Exhaustive, i.e. fully reflect the meaning of the basic concepts of the study;

Mutually exclusive (the same content should not be included in different categories in the same amount);

Reliable, i.e. such that would not cause disagreement among researchers as to what should be attributed to one or another category in the process of analyzing a document.

Currently, four content analysis methodologies are distinguished:

Grammar (linguistic) (by paragraph size, phrase length, word order in a sentence, metric composition);

Semantic (sociological) (according to expert estimates of the content);

Documentary (cybernetic) (according to the parameters of language, text and document as a message (descriptors and their load, compactness, information density, aspect, flow, physical and information volumes, information capacity and information content));

Citation (analysis of bibliographic references in scientific literature).

Content analysis is carried out in several stages. The researcher must draw up a work plan, determine the sources of information, then highlight the units of analysis and draw up a coding form, which is filled out while working with texts.

The first stage involves the choice of units of analysis: it is necessary to select the appropriate ones that serve as an indicator of the phenomena of interest to the researcher in the text, which depend on the research program, object, subject, goal, objectives and hypotheses of the research. The main semantic unit can be a social idea, a socially significant topic, reflected in operational concepts. In the text, it is expressed in different ways - a word, a combination of words, a description. The goal is to find indicators that indicate the presence of a topic in the document that is significant for analysis and that reveal the content of textual information.

The units of analysis are:

Concepts expressed in separate terms. For example, from the field of economics: "forms of ownership", "privatization", "financial system", "money circulation"; politicians: "ruling circles" and "opposition", "democracy", "international cooperation". Moral or legal symbols: "human rights". "Humanism", "activity", "crime"; scientific: "model", "system", "outer space";

Topics expressed in whole semantic paragraphs, parts of texts, articles, radio broadcasts. By topic, you can even more fully present the content of the document. Plots from personal documents, for example, letters about oneself or about one's loved ones, in industrial and political affairs, and about art, are just as revealing. All this is evidence of a certain orientation of views, interests, value orientations and norms of activity;

The names of historical figures, politicians, prominent scientists and artists, organizers of production, leaders of movements and parties, the names of public institutions, organizations and institutions. These characteristics may indicate the influence of individuals or the social institutions, communities, groups they represent on public opinion. The importance of a particular scientific goal is determined by the number of references to individual authors: if the number of references increases or decreases, this indicates an increase or decrease in the authority of this concept. From the frequency of references to social movements or their leaders, it is easy to conclude about the influence of these movements;

A holistic social event, an official document, fact, work, incident, carry a specific semantic load and can also be taken as a unit of analysis .;

The meaning of appeals to a potential addressee-user of the advertised products, or to a citizen as a possible supporter of a political, other movement. Commercial advertising contains appeals to age cohorts (“youth chooses”), the social stratum, activating different needs of the individual (health, social status), aimed at motivating to avoid danger or to achieve success.

The second stage is associated with the choice of units of measurement (counting), i.e. quantitative measure of units of analysis (indicators of units of analysis), which allows you to register the frequency (regularity) of the appearance of the attribute of the category of analysis in the text. The unit of account can be taken:

The frequency of occurrence of the feature of the analysis category;

The amount of attention given to the category of analysis in the content of the text. To establish the amount of attention, the number of printed characters, paragraphs, the area of ​​the text, expressed in physical spatial units, can be taken into account.

The units of account may or may not be the same as the units of analysis. When analyzing the press, a counting unit is often taken as the physical length or area of ​​texts (in square centimeters) filled with semantic units: the number of lines, paragraphs, signs; the duration of the broadcast on radio and television, the footage of the tape for tape recordings. The advantage of this unit of account is the speed of the encoder.

The content analysis procedure includes the application of standard rules for highlighting the same type of units of analysis (counting, observation) in the studied text. And the calculation of the frequency of occurrence of these units in the sample (the number of documents subject to counting) both in absolute (number of times) and in relative (percentage) values. An obligatory moment in such a procedure is the use of mathematical and statistical methods of counting, since the basis of content analysis is the calculation of the occurrence of some components in the analyzed information array, supplemented by the identification of statistical relationships and the analysis of structural relationships between them, as well as supplying them with certain quantitative and qualitative characteristics.

The third stage is the preparation of the toolkit, the preparation of the coding form. Each of the selected units is assigned a specific code _ numerical designation. It can be one digit, if there are few key ones, but it can also be two-digit or three-digit. All assigned codes are entered into a special log used by decryptors.

Encoding form is a mandatory toolkit for the implementation of formalized analysis of documents. It is drawn up in accordance with the scheme of operational concepts, contains the units of analysis and all elements of the description of the problem situation, establishes an unambiguous correspondence between the lexicon of the text and the codes on which the computational operations are performed.

In general, the calculation procedures for content analysis are similar to the standard methods of classification by selected groupings, ranking. To calculate the results of content analysis, specially developed formulas are also used. So, A.N. Alekseev proposed a formula for assessing the "share" of semantic categories in the total volume of the text. The formula indicates the level of intensity presented in the text of a certain topic (or argumentation, ways of addressing the reader).

Statistical calculations of the intelligibility of the text (terms, sentences), its interest for the reader and more complex techniques for studying the relationship of distributions of semantic units are also used. Content analysis data processing procedures take place in a special SPSS program.

A widely used tool that allows you to check the reliability, accuracy of information and simultaneously examine the content of documents _ internal and external analysis. External analysis consists in the study of the circumstances of the emergence of the document, its historical, social context. Internal analysis is the study of the content of the document, all that is evidenced by the text of the source and those objective processes and phenomena that it reports.

The reliability of information received by content analysis is ensured in the following ways:

Substantiation of the completeness of the volume of the allocated semantic units by the "snowball" method. Initially, all semantic units are selected from the first analyzed text, then from the second _ the same plus additional ones that were not previously encountered, from the third document _ again the same ones that were encountered in the two previous ones, plus additional ones. After studying other texts in which not a single new unit appears previously fixed in previous documents, it can be assumed that the “field” of semantic units from the array under study has been exhausted;

Control over the validity of the content of semantic units with the help of judges. Experts in this field discuss how the proposed quality units meet the objectives;

Justification by an independent criterion. For example, the data of the content analysis of diaries or essays of students in order to identify their professional inclinations are selectively checked by surveys or by observation data, or by a test for a known group;

Data stability is determined by encoding the same text by different encoders based on a single instruction.

Content analysis (from English, contens - content) is a special rather rigorous method of qualitative and quantitative analysis of the content of documents in order to identify or measure social facts and trends reflected in these documents. Its peculiarity is that it studies documents in their social context.

Content analysis can be used as the main research method (for example, in researching the social orientation of a newspaper); parallel, i.e. in combination with other methods (for example, in the study of the effectiveness of the functioning of the mass media); auxiliary or control (for example, when classifying answers to open-ended questionnaires).

Not all documents can be the object of content analysis. It is necessary that the investigated content makes it possible to set an unambiguous rule for the reliable fixation of the required characteristics (the principle of formalization), as well as that the content elements of interest to the researcher occur with sufficient frequency (the principle of statistical significance). Most often, the objects of research through content analysis are messages from the press, radio, television, mass oral agitation and propaganda, minutes of meetings, letters, orders, instructions, etc., as well as data from free interviews and open-ended questionnaires.

There are three main areas of application of content analysis:

a) identifying what existed before the text and what was reflected in it in one way or another (the text as an indicator of certain aspects of the studied object - the surrounding reality, the author or the addressee);

b) determination of what exists only in the text as such (various characteristics of the form - language, structure and genre of the message, rhythm and tone of speech);

c) identifying what will exist after the text, i.e. after its perception by the addressee (assessment of various effects of exposure).

There are several stages in the development and practical application of content analysis. After the topic, tasks and hypotheses of the research have been formulated, the categories of analysis are determined, i.e. the most general, key concepts corresponding to research objectives. The system of categories plays the role of questions in the questionnaire and indicates which answers should be found in the text. In the practice of Soviet content-analytical research, a fairly stable system of categories has developed at one time, among which one can name such as sign, goals, values, theme, hero, author, genre, etc. Content analysis of mass media messages is becoming more and more widespread. , based on a paradigmatic approach, according to which the studied features of texts (the content of the problem, the reasons for its occurrence, the problem-forming subject, the degree of intensity of the problem, ways of solving it, etc.) are considered as a structure organized in a certain way. The categories of content analysis should be exhaustive (i.e., cover all parts of the content determined by the objectives of this study); mutually exclusive (the same parts should not belong to different categories); reliable (i.e. there should be no disagreement between encoders about which parts of the content should be categorized); relevant (i.e., correspond to the task at hand and the content under study).

When choosing categories, it is necessary to avoid two extremes: choosing too many and fractional categories that almost repeat the text, and choosing too large categories, because this can lead to oversimplified, superficial analysis. Sometimes it is necessary to take into account the missing elements of the text, which can be significant.

After the categories are formulated, it is necessary to choose the appropriate unit of analysis - a linguistic unit of speech or an element of content, which serve as an indicator of the phenomena of interest to the researcher in the text. Complex types of content analysis usually operate not with one, but simultaneously with several units of analysis.

Units of analysis taken in isolation may not always be interpreted correctly, therefore they are considered against the background of broader linguistic or meaningful structures that indicate the nature of the division of the text, within which the presence or absence of units of analysis - contextual units - is identified. For example, for the unit of analysis “word”, the contextual unit is “sentence”.

Finally, it is necessary to establish a unit of account - a quantitative measure of the relationship between text and extra-textual phenomena. The most commonly used counting units are time-space (number of lines, area in square centimeters, minutes, broadcast time, etc.), the appearance of signs in the text, the frequency of their appearance (intensity).

The choice of the necessary sources subject to content analysis is important. The sampling problem involves the choice of the source, the number of messages, the date of the message, and the content to be examined. All these parameters of the sample are determined by the objectives and scope of the study. Most often, content analysis is carried out on a one-year sample: if it is a study of the minutes of meetings, then 12 minutes are enough (by the number of months), if the study of media reports is 12-16 issues of a newspaper or TV-radio day. Typically, the sample of mass media messages is 200-600 texts.

A prerequisite for content research is the development of a content analysis table - the main working document with which it is carried out. The type of table is determined by the stage of the study. So, developing a categorical apparatus, the analyst compiles a table, which is a system of coordinated and subordinated categories of analysis. Such a table outwardly resembles a questionnaire: each category (question) presupposes a number of attributes (answers) by which the content of the text is quantified. The questionnaire table can be quite voluminous.

To register units of analysis, another table is compiled - a coding matrix:

If the sample size is large enough (over 100 units), then the encoder, as a rule, works with a notebook of matrix sheets. If the sample is relatively small (up to 100 units), then two-dimensional or even multivariate analysis can be carried out. In this case, each text must have its own coding matrix. However, this work is very laborious and painstaking, therefore, with large sample sizes, the comparison of features of interest to the researcher is carried out on a computer.

Sometimes a table may be necessary at the stage of quantitative data processing. For example, when using the analysis of chances developed by the American social psychologist Charles Osgood, the so-called. random matrix:

With the help of such a matrix, measures of the randomness of the coincidence of each classification unit with all the others are revealed. For example, unit A is found in 30% of the analyzed texts (P = 0.3), and unit B - in 50% of texts (P = 0.5), then the expected frequency of the joint occurrence of these units will be equal to: PAB = PA * Pb = 0.3 * 0.5 = 0.15. In reality, signs A and B were found together in only 5% of texts AB = 0.05. By comparing the expected and real coincidences of features, it is possible to determine which actual dependencies turned out to be not random (for example, from the table above, it can be seen that the joint appearance of units A and B is random, since the real coincidence is less than expected, and units B and C - not accidental, i.e. the real match is higher than expected). The purposes of using this matrix can be different: to trace the randomness-non-randomness of the coincidence of features for testing the hypothesis, to mark stable-unstable pair combinations of features, which may be significant for characterizing the activity of the information sender, etc.

An important condition K.-A. is the development of instructions for the encoder - a system of rules and explanations for the one who will collect empirical information, coding (registering) the given units of analysis. The instructions accurately and unambiguously set out the algorithm of the encoder's actions, provide an operational definition of the categories and units of analysis, the rules for their coding, provide specific examples from the texts that are the object of the study, stipulate how to proceed in controversial cases, etc.

Counting procedure for quantitative content analysis. in general, the classification according to the selected groupings of ranking and measurement of associations is similar to the standard methods. There are also special calculation procedures in relation to content analysis, for example, the formula for the Janis coefficient (c), designed to calculate the ratio of positive and negative (relative to the chosen position) estimates, judgments, arguments. In the case when the number of positive ratings exceeds the number of negative ones, the Janis coefficient is calculated by the formula

where; - the number of positive ratings; n is the number of negative ratings; d - the volume of the content of the text, which is directly related to the problem being taught; t is the total volume of the analyzed text.

In the case when the number of positive ratings is less than negative, the Janis coefficient is found by the formula

There are also simpler ways to measure. The specific weight of a particular category can be calculated using the formula

Content analysis

This is the first article on my blog about content analysis, and it provides an overview of the content analysis method. Translation from English is mine. Happy reading.

Bernard R Berelson (1912-1979)

Content analysis is the brainchild of the era of electronics. At the same time, content analysis was carried out regularly already in the 1940s and has become even more frequently used and trusted methods since the mid-1950s, when researchers began to rely not on words, but on the operation of individual thematic-semantic structures, they became interested in the connections between these meanings [correlations], and not in the simple presence of words in text arrays.

Areas of use of content analysis.

Due to the fact that content analysis can be used to study any content and form of text or an array of texts or other form of recording communication, the method is used in various fields, for example, in the field of marketing and the field of media studies, literature and rhetoric, ethnography and cultural studies. , in the disciplines studying gender and age, sociology and political science, psychology and cognitive sciences and in other research areas of knowledge and sciences. Also, content analysis is closely related to socio- and psycholinguistics; it plays a key integral role in artificial intelligence development systems. The following list, based on Berelson's work, describes other categories of uses for content analysis:

  • Provides an opportunity to understand international differences in communication
  • Determines the presence of propaganda materials
  • Identifies intentions and trends in individual or group communication
  • Describes behavioral responses within communications
  • Determines the psychological and emotional background of individuals and groups

Types of content analysis

There are two main categories of content analysis: conceptual (conceptual) [in Russian-language materials it is customary to call it quantitative, regardless of the semantic inequality of terms] and correlation. Conceptual is focused on identifying the presence and frequency of occurrence of these conceptual units [units of account]. Correlation analysis is focused on identifying links between individual units of account within a text.

Conceptual content analysis

Traditionally, content analysis was considered only as a conceptual version of it. In conceptual analysis, the concept [counting unit] is chosen as a means of studying the text by counting the frequency of its occurrence in the text. Since the counting units can appear both explicitly and implicitly, before the beginning of the quantification of units, it is important to clearly define and fix in advance the variants of the implicit manifestation of the counting units. In order to avoid subjectivity in defining objects as counting units at this stage, it is customary to use special content analysis dictionaries [thesauri].

As with many other methods, conceptual content analysis begins with identifying key research questions and sampling or sampling. Once selected for analysis, the text must be encoded within the framework of the system of categories established by the researcher. The coding process is the process of reducing the volume of material, which is the main idea of ​​content analysis. Dividing an array of text into separate thematically integral units of information relevant to the categorical apparatus allows one to identify certain characteristics of the material, analyze and interpret them.

An example of conceptual analysis can be the study of a text by counting the occurrence of codes included in the content analysis dictionary of codes. As part of the analysis, the researcher should, for example, raise the question of how often words are found in the text that confirm a particular position, and how often words are found that refute it. The researcher should be interested only in counting these words, but not in identifying semantic and thematic connections between them, which is typical for correlation analysis. In conceptual analysis, the researcher studies only the presence of objects relevant to the research questions, that is, he determines - what is mostly presented in the text - the confirmation of a particular hypothesis or hypotheses or its (their) refutation.

Correlation content analysis

As mentioned above, correlation content analysis is based on the principles of conceptual content analysis, studying the relationship between units of account (concepts, positions). And as in the case with other types of research, this approach is based on the definition of a sample and categories of analysis, operationalized by the dictionary of content analysis, which determines the further course of the research. For correlation content analysis, determine what types of positions (units of account) will be used within the framework of the study. Research has been carried out using only a few such concepts and conducted using more than 500 categories of concepts. Obviously, too many categories can give incorrect research results, since with an increase in the number of categories and counting units, the complexity of the analysis also increases. The same statement is typical for too small categorical apparatus and dictionaries, which give unreliable and potentially incorrect results when used. Thus, when creating dictionaries and categorical devices, it is important to rely on the features of the analyzed array and on specific measurement tasks.

There are a large number of methods for conducting correlation content analysis, which determines the flexibility and popularity of the method. Researchers can independently develop their own methods for conducting correlation content analysis in accordance with the objectives of a particular study. When the developed procedure has sufficiently proven its effectiveness and objectivity, it can be accepted and disseminated among other researchers. The process of conducting correlation content analysis has reached a high level of development in a computer environment - an environment for automating calculations, but, regardless of this, like many other research methods, it is very long, requiring a lot of time to implement. Probably the most serious requirement for this method is the need to comply with strict statistical norms, provided that the richness of the material is preserved, expressed in individual details, requiring a qualitative approach for analysis.

Reliability and verification issues

The issues of reliability and verification are also relevant within the framework of this method. The reliability of the results of content analysis is based on the homogeneity of the study process, its stability (stability), the ability of encoders and interpreters to operate with data in a uniform way throughout the study; reproducibility or the ability of a group of coders to classify material in accordance with a given categorical apparatus in a uniform way; high statistical accuracy of material classification in accordance with the specified categories.

The key problem of conceptual content analysis is the problem of obtaining controversial, dubious results, which is a consequence of the use of the method's procedures themselves. The main question in this context is what volume and level of meaning inherent in the texts is objectively available for identification, or, in other words, are the data obtained as a result of using exclusively the introduced instruments or are they obtained with the participation of other factors that influenced the research results? At the same time, it is hardly possible to imagine various interpretation options, for example, the number 99 in the exact sciences. Objective research results can be obtained using only the main [thematically representative, relevant to the topic of measurement] materials, arrays of texts, but at the same time, the question of objectivity and the possibility of verification and substantiation of the results remains open and topical.

Generalizations and conclusions of researchers are largely dependent on how the researcher determines the meaning of a particular category for himself, as well as the reliability of the categorical apparatus itself. The researcher must definitely determine the categories and units of the account that will allow to objectively measure the investigated object. Likewise, it is necessary to create an objective system of rules and instructions for research in the most accurate way. Developing rules that will allow all coders and interpreters to follow the same standards in their work, to encode material in the same way, is vital to the success of conceptual content analysis. Reproducibility [an objective choice of analysis tools, if necessary, selected identically within the framework of a similar study] and the accuracy, not only of the categories of analysis and counting units, but also of the key approaches to the analysis of the material, make it possible to obtain more correct and reliable results.

One of the first works on content analysis: B. Berelson "Formation of political preferences in the presidential elections"

Benefits of content analysis

Content analysis has a number of serious advantages over other methods and just obvious advantages. Among them, it is worth highlighting:

  • Studies directly the communication itself through the analysis of texts, which allows the researcher to interact with the primary means of communication in society
  • Works with both qualitative and quantitative data
  • Can provide valuable historical / cultural information describing different historical periods, based only on text analysis
  • Allows you to receive information similar in form of presentation [text], although the degree of such closeness varies depending on the toolkit used
  • It can be used to analyze the material needed as a means of developing certain systems.
  • An "unobtrusive" method of analyzing communications [the communication participants in this case do not experience any discomfort during the analysis, since the method does not directly interfere with communications]
  • Comprehensively, integrally and thoughtfully, deeply approaches the study of models of human thoughts and language
  • If the method is used correctly, then it is regarded as objective (based on real facts, as opposed to discourse analysis)

Disadvantages of content analysis

  • Content analysis also has a number of disadvantages, both theoretical and applied:
  • Can be very time consuming to measure
  • Potentially dangerous for error, especially if correlation analysis focused on identifying deep data is used
  • Often it does not have a theoretical basis within the framework of various methods of conducting, or, in order to achieve results important for research, it can ignore theoretical scientific guidelines
  • By its nature, it is reductive, that is, it is focused on ignoring weakly manifested information, especially if the analysis of texts that are complex in content is carried out.
  • Often focused on simplifying results as it relies on simple word counts
  • The context of the content of count units (words) is often ignored, or the significance of subsequent words is leveled
  • Can be tricky to apply computer technology and research automation

The original article is located at the following address: http://www.gslis.utexas.edu/~palmquis/courses/content.html

(translation by Alexey Ryumin)

Did you like the article? Share it