Spanish Newspaper Editions

Newspapers are the tangible record of the history lived by a society, and their importance lies in the way they observe, describe and record the facts. How do they record reality? How have editorial lines and political trends marked these records?



This project aims to be a first step towards a critical analysis of these questions, as well as an invitation to read newspaper in a different way, and continue the work of research and analysis of the richness contained in historical newspapers

What influences newspaper style?

OUR PROJECT

The Spanish Newspaper Project analyzes 100 editions (complete and semi-complete) of 16 historical Spanish newspapers with different political and editorial lines. Our dataset and metadata was built with material available in the Digital Newspaper Library of the National Library of Spain. The analysis includes the articles published in these newspapers between 1890 and 1940. This is a subjective selection, based on historical issues published in this period of time.

DATA

DESCRIPTION

In general, the scanned texts were in acceptable reading condition when we collected them, but we had to address two problems before starting the process:


1. There were extra spaces in some words, and

2. Certain characters were not recognized by the OCR.

OPEN THE NOTEBOOK

newspapers
Newspaper Article Icon

100 editions of historical Spanish newspaper

1
Newspaper Article Icon

16 newspapers

2
Newspaper Article Icon

4 types of format

3
Newspaper Article Icon

7 different ideologies

4
Newspaper Article Icon

3 regions

5
Newspaper Article Icon

2 type of audiences

6

2.316.825

RAW TOKENS ANALYZED

To fix the words with extra spaces we decided to create a function that tries to recompose those words, remove the spaces, and convert all the tokens to lowercase.


To address the OCR issues instead, we decided to work with SpaCy corpus, "es_core_news_md" to remove unrecognized text.

2.000.333

TOKENS RECOGNIZED BY SPACY

330.434

UNIQUE RAW WORDS

65.876

UNIQUE WORDS

Snow Covered Mountain Under Cloudy Sky

Explore, analyZe, and visualize

the data

It is important to keep in mind that we selected a small dataset for a faster analysis and iteration. It is therefore not very representative, and we can't make any conclusions about general developments in the Spanish press through this time period.

This is an exploration of our dataset.

STYLO

Stylistic analysis of newspaper edition text

For the particular analysis of our corpus consisting of Historic Spanish Newspapers, the team decided to create different subcorpora based on the feature we want to analyse. It is important that the visualisations are as clear as possible, in that sense the colouring plays an essential role. By creating subcorpora from the main corpus, it is useful to plot all newspapers having for eg. a primarily adult audience in the same colour for easy visualisation.


With this in mind, the analysis started by inputting the plain text and choosing the Spanish language from the stylo() parametres. What makes stylo()such an innovative package is that it allows the user to choose from a range of preset settings according to the complexity of the corpus suited for fast exploratory analysis. Further on, in the features section we found it suitable to count words and left the n-grams to 1.


The parameter for the Most Frequent Words was set to 500 MFW, and as for the words we did not want to be included in the analysis, the culling parameter was set to min=max=20, meaning that a given word has to appear in at least 20% texts. Also, for the analysis, pronouns were excluded as well. In the statistics section, for the corpus at hand, the best exploratory method was Multidimensional Scaling, and for having the most precise result among similarities between texts, Elder’s Delta was chosen since Spanish is such an inflected language. No sampling was performed.

FOR MORE DETAILS, OPEN THE NOTEBOOK

NEWSPAPERS

The first visualisation consists of all the newspaper editions that make up the corpus. For the most part, editions cluster together according to their newspaper, and the majority are very similar in writing style. Of course, there is a very small number of outliers.

AUDIENCE

The corpus was divided previously into two categories: youth and adult audiences.


From an audience point of view, there is a clear clustering between adult-targeted newspapers. The newspaper editions dedicated to youth audiences seem to be split into two groups. The number of adult-dedicated newspapers clearly surpasses the number of youth newspapers, so the main target audience was not people aged between 15 and 24 years old.

FORMAT

The corpus was divided into four categories according to the number of appearances in a month: daily, biweekly, weekly, and monthly. The format is important to the analysis because it usually reflects how well-established a newspaper is. As we can see, the majority of the newspapers had daily editions, and very few of them have either weekly or monthly editions. This could either mean that they have a niche audience, for example, youth, or they cover a limited distribution area.

HEADQUARTERS (HQ) - REGION

From the two visualisations, we observe that most national newspapers had their headquarters in Madrid or Barcelona, and regional newspapers were either based in Sevilla or Santander.

IDEOLOGY

From this depiction, one can see how on a general level the different ideologies specific to each newspaper tend to position themselves close to each other. It is also interesting to observe how some specimens of unknown ideology were placed in the cluster with specimens of socialist and regionalist ideology.


CLICK HERE TO SEE ALL THE GRAPHICS

A Network analysis based on stylo results

For Gephi, we used the stylistic data generated by Stylo with the Force Atlas 2 algorithm. Nodes were sized according to their degree and edges according to their weight. Nodes were then coloured to the feature relevant for each analysis.


FOR MORE DETAILS, OPEN THE NOTEBOOK

GEPHI

NEWSPAPERS

Individual editions of newspapers cluster together fairly strongly, though to various degrees. For example, all editions of Vida Socialista are the closest to each other, while some editions of La Dinastía are spread out fairly widely. This could be a result of actual stylistic differences, but errors in OCR and token recognition are probably the most likely cause.


You can also see the different degrees of these newspapers, represented by the size of the nodes. It seems that stylistically, editions of El Sol and El Imparcial had most editions similar to them, as they have the highest degrees. This fits with descriptions of both as being influential newspapers in this period.


FORMAT

The main cluster is obviously formed by newspapers with a daily publication cycle. Only a few non-daily newspapers are even close to the main cluster.


IDEOLOGY

The centre of the main cluster is formed by Liberal, Republican or Conservative newspapers. Anarchists, Socialists or Carlists are on the margins, especially in the case of El Cruzado Español on the top right. Interestingly, the other Carlist newspaper CEDA is located rather close to Anarchists and Socialists. There are also two non-political newspapers, shown here as “nan”, which are the ones aimed at youth audiences. They are both completely separate from the main cluster.


headquarters (HQ)

II this case, the centre of the main cluster is formed by newspapers located in the Spanish capital of Madrid. A ring of publications set in Barcelona, Spain’s second city and the capital of Catalonia, surrounds it. One newspaper set in Cantabria’s Santander is well-connected to the main cluster, while the single Andalusian newspaper headquartered in Sevilla is completely separate from the rest.




A possible explanation for this could be that Barcelona was such an important city that its newspapers were still written for a national base, but separate enough due to the different location to be recognisably different from the Madrid ones. Santander’s style could not be that much of an influence, while the Andalusian accent is famously strong, suggesting a stronger influence.



AUDIENCE

As in the ideology analysis, the two newspapers aimed at a youth audience are completely separate from the main cluster.




Publication Year

Editions in the main cluster are ordered from an earlier to a later year of publication, from the left to the right.



CLICK HERE TO SEE ALL THE GRAPHICS

Leafless Trees Covered in Snow
Lined Wireframe Small Landscape Frame
Folded Newspapers and a Cup of Coffee
Lined Wireframe Small Landscape Frame
Lined Wireframe Portrait Frame

conclusions

of the project

What influences newspaper style?

  • The main cluster of our network was made up of daily newspapers, seated in Madrid, with a middle-of-the-road political ideology and a national adult audience. The closer to the centre of the main cluster and the larger the degree of the newspaper edition, the closer it is to this model. Clusters outside of that were made up of newspapers differing in various features, some only in one, some in several.


  • For ideology, there does not seem to be a strong stylistic difference between middle-of-the-road ideologies. More extreme ideologies seem to be located at the edges or completely separate from the main cluster.


  • The target audience seems to be a feature influencing the style particularly strongly, as both non-adult newspapers are completely separate from the main cluster.


  • Editions are also completely ordered according to publication year, suggesting a strong underlying influence.


Lined Wireframe Portrait Frame

PROJECT TEAM

Lined Wireframe Round Frame

Lisa Raulli

S5352029

Data wrangling & Website management

Lined Wireframe Round Frame

Maximilian Henning

S5305403

Data wrangling

& Gephi analysis

Lined Wireframe Round Frame

Catalina Cruceanu

S5367530

Stylo analysis

Lined Wireframe Round Frame

Maria Pilar Uribe Silva

S5341191

General Coordinator, Data wrangling & Website management