User Tools

Site Tools


labs:namethatmovie

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
labs:namethatmovie [2017/04/11 10:02]
brunnegi
— (current)
Line 1: Line 1:
-====== Python and SQLAlchemy ====== 
- 
-Exact contents and description TBD.  
- 
-Preliminary contents: 
- 
-  * Introduction to Python 
-  * Learn how to use the SQLAlchemy Python toolkit (SQLAlchemy is a Object Relational Mapper (ORM) for Python) 
-  * Use SQLAlchemy to manipulate MySQL databases 
-  * Use numpy, pandas, maptplotlib,​ etc. to do some data analysis 
- 
- 
-===== Exercise 1 - Query the IMDB Database ===== 
- 
-We've collected some of information about movies from various sources and have compiled a database with a number of tables. You can find the sqlite [[http://​10.0.0.1/​download/​moviedb.sqlite|here]]. Familiarize yourself with the [[http://​pc-10129.ethz.ch/​sqlquery/​schema|schema]] and contents and answer the following questions. If you need help and cannot google a solution, feel free to ask the assistants. 
-  - The creator of the database was sloppy and accidentally entered some movies twice. How can you find out which? Remove them from the database! Make sure to also remove the dependent foreign key constraints from other tables. 
-  - How many of the movies you crawled in the first exercise are already in the IMDB db? Which are missing? 
-  - How many of the actors you crawled in the first exercise are already in the IMDB db? Do you notice a problem? 
-  - What is the most frequently occurring movie keyword? 
-  - Choose 5 keywords and find out how many movies match with each. 
-  - Display the top 5 most frequent movie keywords and how often they occur. 
-  - List all directors who have produced more than 10 movies yet still retain an average tomatometer score of over 90% over all their movies combined. 
-  - List all movies that reference 'The Matrix'?​ 
-  - List all the keywords that describe 'The Matrix'​. 
-  - Which keywords do the movies 'The Matrix'​ and 'The Matrix Reloaded'​ have in common? 
-  - List all actors who appear in 'The Matrix'​ along with the total number of movies they appear in including the tomatometer score. 
-  - Which actor or actress who starred in at least 3 movies since 2000 has the overall best or worst average tomatometer?​ 
-  - List all occurring tomatometer values, how many times they'​re seen in the movie table and calculate the percentage of occurrence with respect to all table rows. 
- 
-===== Exercise 2 - Plot some things ===== 
-In this exercise you will learn how to plot some things. You may use any software to accomplish the following tasks. If you're unsure what to do or need help and cannot google a solution, feel free to ask the assistants. 
-  - Use a plot to decide upon the truth of the following statement: "The movie title length is a direct indicator of the tomatometer score of the movie" 
-  - Plot Robert De Niro's career trajectory visualizing the year on the x-axis and the average tomatometer score on the y-axis. 
-  - Plot the average tomatometer score over all movies by year. 
-  - Plot the average tomatometer score per year of the top five actors and the top five actresses. (Rank the actors according to their overall average tomatometer score. Add a legend to the plot with the names of the actors) 
-===== Bonus - Crawl Again ===== 
- 
-Scrape any information from a website of your choosing. 
  
labs/namethatmovie.1491897732.txt.gz · Last modified: 2020/08/31 21:05 (external edit)