Movie IMDb Rating Predictor

EECS 349, Machine Learning

Zhilin Chen

What is our task and motivation?

     Movies, originally invented in the 1990s for entertainment, has now became an indispensable part of human culture. There are multiple criteria to define a “good” movie: if a movie profitable, popular or if it introduces new techniques in filmmaking. In my tasks, I’m going to adapt the rating of IMDb(Internet Movie Database) to define if a movie is “good” because it fairly reflects how public evaluate this movie.

     Our task is to determine what and how attributes decide the IMDb rating of a movie. For example, how movie’s genre, director, stars or production corporation effect its IMDb ratings. This would be interesting and meaningful. A businessman would want to use it because he can know will a movie be popular and profitable beforehand. What’s else, this predictor could also provide us and relevant scholars how public preferences in movies change over time.

What is IMDb and IMDb ratings?

    The Internet Movie Database (abbreviated IMDb) is an online database(www.imdb.com) of information related to films, television programs and video games, including cast, production crew, fictional characters, biographies, plot summaries, trivia and reviews. Actors and crew can post their own resume and upload photos of themselves for a yearly fee. U.S. users can view over 6,000 movies and television shows from CBS, Sony, and various independent filmmakers.

    In IMDb voting system, each registered users can cast a vote(from 1 to 10) on every released title in the database. Users can vote as many times as they want but every vote will overwrite the previous one so it is one vote per title per user.

    IMDb takes all the individual votes cast by IMDb registered users and use them to calculate a single rating and they don't use the arithmetic mean of the votes (although they do display the mean and average votes on the votes breakdown). IMDb displays weighted vote averages rather than raw data averages. Various filters are applied to the raw data in order to eliminate and reduce attempts at "vote stuffing" by individuals more interested in changing the current rating of a movie or TV show than giving their true opinion of it.

    Although the raw mean and median are shown under the detailed vote breakdown graph on the ratings pages, the user rating vote displayed on a film / show's page is a weighted average. In order to avoid leaving the scheme open to abuse, IMDb does not disclose the exact methods used.

    In a word, IMDb ratings could reflect HOW AUDIENCES LIKE THIS MOVIE

What is our goal and how does it make sense?

    Each movie is jointly produced by Director, Writers, Actors and so on. These people (with their professional skills, specific understanding of movies and their audiences or any chemistry among them) and the content (some basic features such as Genres, Language and Country) could decide the quality of a movie (in terms of IMDb rating).

    Therefore, in our task, we are going to predict the IMDb rating of movie with some given attributes.

Why is it important?

    In general, this task would help:

        1) provide businessmen who want to make profitable movies a guideline to choose the staffs.

        2) provide us an insight into how the taste of audiences changes among years.