A Century of Cinematography : The Quest
An actors and movies network analysis

Amazing title illustration image

Since what is considered as the first commercial cinematographic projection by Les Frères Lumière in Paris, December 1895, the movie industry has grown to be worth several tens of billions dollars. Dozens of studios were created, producing an increasing number of movies each year. Today, several thousands of movies are released worldwide each year and they represent an unnegligable cultural vector. Movies are an important instrument in the soft power toolbox. Thus, we came to ask ourselves this burning question: what is the influence of the movie industry around the world ?

Of course that is too grand a question to be answered with the tap of a finger. But nonetheless, we can try to focus on specific aspects of the movie industry, such as actors. Here’s our attempt to collect some nuggets of information about the world of movies and actors in it.

How ?

At our disposition, we have an already existing dataset concerning only actors and movies : the CMU Movie Summary Corpus (citation). We will try to make the most out of it to get a little information on the movie industry through the lens of actors!

Provided with this information, we excavate an underpinning structure of the movies’ world. We draw a sort of Facebook of actors where actors are friends if they played in one or several movies together.

Once we have our network, we can cluster actors in communities of strongly-related individuals using the Louvain algorithm. Then, the computed communities can be characterized to understand who gets to access the wider communities, how interconnected the communities are and lots of other fascinating questions. With the initial corpus, we got information on actors date of birth, movies they played in, genres, languages and countries of movies. To have a better understanding of the main communities, we decided to scrape additionnal information such as actors nationality and occupations. However, as scraping is a very time-consuming task, we decided to only apply it on the 20 most populated communities.

📽️ Let’s start our journey to explore the results !
Our quest : identifying and naming as much communities as we can

The Network and Communities

An overlook

After epics battles with Python, we finally outsmarted the beast and managed to compute our network and communities. Here it is!

As the network is big, the graph is a bit heavy to render correctly… you will need to drag a node to force a better rendering.

You can move into the graph with your mouse (trackpad isn’t recommended if you want to keep your nerves). Every node color represent a community and you will find more info if you pass your mouse over the node.

This graph only shows the 20 first communities, but you here is a graph of every nodes. Furthermore you will be able to select which community to show, color by nationality or by actor’s year of birth (you definitly wants to see that, even if it’s a bit long to charge).

The network data includes 8 427 actors.

As is usual in most real-world networks, the network is very sparse. Only 0.03% of all possible links between actors are present.

Number of nodes: 8427
Number of edges: 25865

It stars 646 communities in total, ranging from 1080 to 2 individuals. The community sizes distribution follows a power-law distribution and only the 14 widest communities contain more than a hundred actors.