its done in jupyter note book and its a beggineers class . I have attached all the files require for the project.
its due today
{
“cells”: [
{
“cell_type”: “code”,
“execution_count”: 10,
“id”: “cec7bf82”,
“metadata”: {
“cell_type”: “code”,
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“# import and initialize otter\n”,
“import otter\n”,
“grader = otter.Notebook(\”p6.ipynb\”)”
]
},
{
“cell_type”: “code”,
“execution_count”: 11,
“id”: “66c622bb”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“import p6_test”
]
},
{
“cell_type”: “markdown”,
“id”: “c3ab6456”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“# Project 6: Airbnb”
]
},
{
“cell_type”: “markdown”,
“id”: “350b4db3”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“## Learning Objectives:\n”,
“\n”,
“In this project, you will demonstrate how to:\n”,
“\n”,
“* access and utilize data in CSV files,\n”,
“* process real world datasets,\n”,
“* use string methods and sorting function / method to order data.\n”,
“\n”,
“Please go through [Lab-P6](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-s23-projects/-/tree/main/lab-p6) before working on this project. The lab introduces some useful techniques related to this project.”
]
},
{
“cell_type”: “markdown”,
“id”: “8480d85c”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“## Testing your code:\n”,
“\n”,
“Along with this notebook, you must have downloaded the file `p6_test.py`. If you are curious about how we test your code, you can explore this file, and specifically the value of the variable `expected_json`, to understand the expected answers to the questions.”
]
},
{
“cell_type”: “markdown”,
“id”: “3e3a06d9”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“## Project Description:\n”,
“\n”,
“Data Science can help us understand user behavior on online platform services. This project is about the rooms listed on Airbnb. Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. `airbnb.csv` has data of nearly 50,000 listings on Airbnb from New York City, NY from the year 2019. This file includes a lot of information about the hosts, geographical availability of the listings, and other necessary metrics to make predictions and draw conclusions. You will be using various string manipulation methods that come with Python as well as creating some of your own functions to solve the problems posed. Happy coding!”
]
},
{
“cell_type”: “markdown”,
“id”: “232e134c”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“## Dataset:\n”,
“\n”,
“A small portion of the dataset `airbnb.csv` you will be working with for this project is reproduced here:”
]
},
{
“cell_type”: “markdown”,
“id”: “36b9b9a2”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“room_id|name|host_id|host_name|neighborhood_group|neighborhood|latitude|longitude|room_type|price|minimum_nights|number_of_reviews|last_review|reviews_per_month|calculated_host_listings_count|availability_365\n”,
“——|——|——|——|——|——|——|——|——|——|——|——|——|——|——|——|\n”,
“2539|Clean & quiet apt home by the park|2787|John|Brooklyn|Kensington|40.64749000000001|-73.97237|Private room|149|1|9|2018-10-19|0.21|6|365\n”,
“2595|Skylit Midtown Castle|2845|Jennifer|Manhattan|Midtown|40.75362|-73.98376999999998|Entire home/apt|225|1|45|2019-05-21|0.38|2|355\n”,
“3647|THE VILLAGE OF HARLEM….NEW YORK !|4632|Elisabeth|Manhattan|Harlem|40.80902|-73.9419|Private room|150|3|0|||1|365\n”,
“3831|Cozy Entire Floor of Brownstone|4869|LisaRoxanne|Brooklyn|Clinton Hill|40.68514|-73.95976|Entire home/apt|89|1|270|2019-07-05|4.64|1|194\n”,
“5022|Entire Apt: Spacious Studio/Loft by central park|7192|Laura|Manhattan|East Harlem|40.79851|-73.94399|Entire home/apt|80|10|9|2018-11-19|0.1|1|0\n”,
“5099|Large Cozy 1 BR Apartment In Midtown East|7322|Chris|Manhattan|Murray Hill|40.74767|-73.975|Entire home/apt|200|3|74|2019-06-22|0.59|1|129”
]
},
{
“cell_type”: “markdown”,
“id”: “09a9e49a”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“You can find more details on the dataset in [Lab-P6](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-s23-projects/-/tree/main/lab-p6).”
]
},
{
“cell_type”: “markdown”,
“id”: “94a8f0e3”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“## Questions and Functions:\n”,
“\n”,
“Let us start by importing all the modules we will need for this project.”
]
},
{
“cell_type”: “code”,
“execution_count”: 35,
“id”: “96a3f4b1”,
“metadata”: {
“tags”: []
},
“outputs”: [
{
“data”: {
“text/plain”: [
“[‘ABS_TOL’,\n”,
” ‘MAX_FILE_SIZE’,\n”,
” ‘PASS’,\n”,
” ‘REL_TOL’,\n”,
” ‘TEXT_FORMAT’,\n”,
” ‘TEXT_FORMAT_DICT’,\n”,
” ‘TEXT_FORMAT_LIST_DICTS_ORDERED’,\n”,
” ‘TEXT_FORMAT_NAMEDTUPLE’,\n”,
” ‘TEXT_FORMAT_ORDERED_LIST’,\n”,
” ‘TEXT_FORMAT_ORDERED_LIST_NAMEDTUPLE’,\n”,
” ‘TEXT_FORMAT_SPECIAL_ORDERED_LIST’,\n”,
” ‘TEXT_FORMAT_UNORDERED_LIST’,\n”,
” ‘__builtins__’,\n”,
” ‘__cached__’,\n”,
” ‘__doc__’,\n”,
” ‘__file__’,\n”,
” ‘__loader__’,\n”,
” ‘__name__’,\n”,
” ‘__package__’,\n”,
” ‘__spec__’,\n”,
” ‘check’,\n”,
” ‘check_cell_text’,\n”,
” ‘check_file_size’,\n”,
” ‘dict_compare’,\n”,
” ‘expected_json’,\n”,
” ‘json’,\n”,
” ‘list_compare_helper’,\n”,
” ‘list_compare_ordered’,\n”,
” ‘list_compare_special’,\n”,
” ‘list_compare_special_init’,\n”,
” ‘list_compare_unordered’,\n”,
” ‘math’,\n”,
” ‘os’,\n”,
” ‘simple_compare’,\n”,
” ‘special_ordered_json’]”
]
},
“execution_count”: 35,
“metadata”: {},
“output_type”: “execute_result”
}
],
“source”: [
“# it is considered a good coding practice to place all import statements at the top of the notebook\n”,
“# please place all your import statements in this cell if you need to import any more modules for this project\n”,
“import csv\n”,
“dir(p6_test)”
]
},
{
“cell_type”: “markdown”,
“id”: “85ada17e”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“#### Now, copy and paste the `process_csv` and `cell` functions from your Lab-P6 notebook to the cell below.\n”,
“\n”,
“You are expected to call the `process_csv` function correctly, and read the data on `airbnb.csv`. After reading the file, define the `csv_header`, and `csv_rows` variables as in Lab-P6, and define the `cell` function.\n”,
“\n”,
“**Important:** You **must** only use the `cell` function to extract data from the dataset. If you extract any data without explicitly using this function, you will **lose points** during manual review. It is recommended but **optional** that you use the `cell_v2` function defined in Lab-P6. However, you **must** rename the function to `cell` in this notebook. ”
]
},
{
“cell_type”: “code”,
“execution_count”: 13,
“id”: “3d5e2ddf”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“def process_csv(filename):\n”,
” example_file = open(filename, encoding=\”utf-8\”)\n”,
” example_reader = csv.reader(example_file)\n”,
” example_data = list(example_reader)\n”,
” example_file.close()\n”,
” return example_data ”
]
},
{
“cell_type”: “markdown”,
“id”: “c7d05d92”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 1:** What **unique** neighborhood groups (`neighborhood_group`) are included in the dataset?\n”,
“\n”,
“Your output **must** be a *list* which stores all the **unique** neighborhood groups (i.e., without any duplicates). The order **does not** matter.”
]
},
{
“cell_type”: “code”,
“execution_count”: 32,
“id”: “1c13a292”,
“metadata”: {
“tags”: []
},
“outputs”: [
{
“ename”: “FileNotFoundError”,
“evalue”: “[Errno 2] No such file or directory: ‘neighborhood_groups.csv'”,
“output_type”: “error”,
“traceback”: [
“\u001b[0;31m—————————————————————————\u001b[0m”,
“\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)”,
“\u001b[0;32m/var/folders/sh/57yplwld3pz8gy3ltbs50p1r0000gn/T/ipykernel_70886/3694406519.py\u001b[0m in \u001b[0;36m
“\u001b[0;32m/var/folders/sh/57yplwld3pz8gy3ltbs50p1r0000gn/T/ipykernel_70886/1297077488.py\u001b[0m in \u001b[0;36mprocess_csv\u001b[0;34m(filename)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mprocess_csv\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m—-> 2\u001b[0;31m \u001b[0mexample_file\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mencoding\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\”utf-8\”\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mexample_reader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcsv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexample_file\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mexample_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexample_reader\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mexample_file\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n”,
“\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: ‘neighborhood_groups.csv'”
]
}
],
“source”: [
“# compute and store the answer in the variable ‘neighborhood_groups’, then display it\n”,
“neighborhood_groups = set(process_csv(\”neighborhood_groups.csv\”))\n”,
“neighborhood_groups \n”
]
},
{
“cell_type”: “code”,
“execution_count”: 31,
“id”: “6caaea45”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [
{
“data”: {
“text/html”: [
“
q1
results:
q1 - 1
result:
❌ Test case failed\n", " Trying:\n", " p6_test.check(\"q1\", neighborhood_groups)\n", " Expecting:\n", " True\n", " **********************************************************************\n", " Line 1, in q1 0\n", " Failed example:\n", " p6_test.check(\"q1\", neighborhood_groups)\n", " Expected:\n", " True\n", " Got:\n", " ERROR: in the list, found unexpected '_' (found 19 entries in list, but expected 5)\n", "
”
],
“text/plain”: [
“q1 results:\n”,
” q1 – 1 result:\n”,
” ❌ Test case failed\n”,
” Trying:\n”,
” p6_test.check(\”q1\”, neighborhood_groups)\n”,
” Expecting:\n”,
” True\n”,
” **********************************************************************\n”,
” Line 1, in q1 0\n”,
” Failed example:\n”,
” p6_test.check(\”q1\”, neighborhood_groups)\n”,
” Expected:\n”,
” True\n”,
” Got:\n”,
” ERROR: in the list, found unexpected ‘_’ (found 19 entries in list, but expected 5)”
]
},
“execution_count”: 31,
“metadata”: {},
“output_type”: “execute_result”
}
],
“source”: [
“grader.check(\”q1\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “7a12711e”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 2:** What is the **average** `price` of all rooms in the dataset?”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “583a7631”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘avg_price’, then display it”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “2e2a9f90”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q2\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “a9e14251”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 3:** How many rooms are in the `neighborhood` of *SoHo*?”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “4ac30f8f”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘count_soho’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “a2980991”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q3\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “937e80b5”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“### Function 1: `find_room_names(phrase)`\n”,
“\n”,
“We require you to complete the below function and use it to answer question 4 to 6 \n”,
“(this is a **requirement**, and you will **lose points** if you do not implement this function). You can review string methods from the [lecture on Feb 24](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-lecture-material/-/blob/main/s23/Common_to_all_lectures/13_Strings/13_Strings )”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “722a84ab”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“def find_room_names(phrase):\n”,
” \”\”\”\n”,
” find_room_names(phrase) returns a list of all the room names that CONTAINS the \n”,
” substring (case insensitive match) `phrase`.\n”,
” \”\”\”\n”,
” pass # replace with your code \n”,
” # TODO: create an empty list\n”,
” # TODO: ignore rooms that do not have data entry for name, as indicated by a value of None\n”,
” # TODO: check if the room name string contains phrase (case insensitive match)\n”,
” # TODO: if so, add these room names to the list (the room names should be as in the dataset)\n”,
” # TODO: return your list of room names\n”,
” …”
]
},
{
“cell_type”: “markdown”,
“id”: “6d12cb62”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 4:** Find all room names that contain the string `\”free wifi\”`.\n”,
” \n”,
“Your output **must** be a *list*. The order **does not** matter. You **must** call the `find_room_names` function to answer this question.”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “8ade7385”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘rooms_free_wifi’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “91fea5bb”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q4\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “8b015854”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 5:** Find all room names that contain **either** `\”cinema\”` **or** `\”film\”`.\n”,
“\n”,
“Your output **must** be a *list*. The order **does not** matter, but if a room’s `name` contains **both** `\”cinema\”` and `\”film\”`, then the room must be included **only once** in your list. You **must** call the `find_room_names` function to answer this question.”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “bb37417f”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘rooms_contain_cinema_film’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “00f66b87”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q5\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “3e334c28”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 6:** Find the **longest** room `name` that contains the word `\”fun\”`.\n”,
“\n”,
“There is a **unique** such room with the longest `name`, so you **do not** have to worry about breaking ties. You **must** call the `find_room_names` function to answer this question. You **must** initialize the variable `funnest_room` to be `None`.”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “98bb93e2”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘funnest_room’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “e3137e78”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q6\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “98819e95”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 7:** Find the names (`name`) of all the rooms which have `price` *0* and have **more than** *90* reviews (`number_of_reviews`).\n”,
“\n”,
“Your output **must** be a *list*. The names **must** be sorted in **ascending (alphabetical) order**.”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “5f452682”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘no_cost_rooms’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “2946c170”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q7\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “d510f8a1”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 8:** What neighborhoods (`neighborhood`) are the rooms that have `price` greater than *9999* located in?\n”,
“\n”,
“Your output **must** be a *list* of **unique** neighborhoods (i.e., without any duplicates). The names **must** be sorted in **descending (reverse-alphabetical) order**.”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “8b518213”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘pricey_neighborhoods’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “bdba4596”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q8\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “2fa18d66”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 9:** How many rooms received their `last_review` **in or before** *2015*?\n”,
“\n”,
“You should **ignore** rooms for which the `last_review` data is missing.\n”,
“\n”,
“**Hint:** You can find the date of the last review in the `last_review` column. \n”,
“You can review the get_year function from [lab-p5](https://git.doit.wisc.edu/cdis/cs/courses/cs220/cs220-s23-projects/-/tree/main/lab-p5)”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “fe337eab”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘last_review_before_2015’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “e4692707”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q9\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “6dbff213”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“### Function 2: `avg_price_per_room_type(room_type, neighborhood)`\n”,
“\n”,
“We require you to complete the below function to answer the next several questions (this is a **requirement**, and you will **lose points** if you do not implement this function).”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “243e812a”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“def avg_price_per_room_type(room_type, neighborhood):\n”,
” ”’\n”,
” avg_price_per_room_type(room_type, neighborhood) returns the average price of \n”,
” rooms of the type `room_type` in the given `neighborhood`; if there are no\n”,
” rooms of the type `room_type` in the given `neighborhood`, it returns `None`\n”,
” ”’\n”,
” pass # replace with your code ”
]
},
{
“cell_type”: “markdown”,
“id”: “34620d59”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 10:** What is the **average** `price` of a *Private room* (`room_type`) in the`neighborhood` *Little Neck*?\n”,
“\n”,
“You **must** call the `avg_price_per_room_type` function to answer this question.\n”,
“\n”,
“**Hint:** To help you debug your code in case you run into any bugs, we have reproduced in the cell below, **all** the rows in the dataset from the `neighborhood` *Little Neck*. If you run into bugs with `avg_price_per_room_type`, it is recommended that you go through your code and verify that it does what it is supposed to, for this tiny dataset.”
]
},
{
“cell_type”: “markdown”,
“id”: “b4248fc0”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“room_id|name|host_id|host_name|neighborhood_group|neighborhood|latitude|longitude|room_type|price|minimum_nights|number_of_reviews|last_review|reviews_per_month|calculated_host_listings_count|availability_365\n”,
“——|——|——|——|——|——|——|——|——|——|——|——|——|——|——|——|\n”,
“20227428|Douglaston Apartment Room A|18996093|Leonard|Queens|Little Neck|40.75794000000001|-73.72955999999998|Private room|45|1|12|2019-06-22|0.55|5|133\n”,
“21025083|Douglaston (apt 2) Room one\\n(Largest room)|18996093|Leonard|Queens|Little Neck|40.75777|-73.72949|Private room|50|1|6|2018-12-16|0.31|5|94\n”,
“30325639|Cozy shared studio in a safe neighborhood|21495656|Ramy|Queens|Little Neck|40.76212|-73.71928|Shared room|32|3|1|2018-12-04|0.14|1|88\n”,
“31553066|Near major transportation|41090359|Abi|Queens|Little Neck|40.77122|-73.738|Private room|100|1|0|||1|88\n”,
“35515780|30-min to Manhattan Quiet Big House in Great Neck|31859704|Vincent|Queens|Little Neck|40.77444000000001|-73.73373000000002|Entire home/apt|149|3|0|||1|3”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “b441c819”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘pvt_room_little_neck’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “98922e5e”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q10\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “2766870f”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 11:** On average, how much **more** expensive (`price`) is a *Entire home/apt* (`room_type`) than a *Private room* (`room_type`) in the `neighborhood` *Astoria*?\n”,
“\n”,
“You **must** call the `avg_price_per_room_type` function to answer this question.”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “588ee84b”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“# compute and store the answer in the variable ‘home_pvt_room_astoria_diff’, then display it\n”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “82cabe88”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“outputs”: [],
“source”: [
“grader.check(\”q11\”)”
]
},
{
“cell_type”: “markdown”,
“id”: “e3836b5a”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“### Function 3: `find_prices_within(lat_min, lat_max, long_min, long_max)` \n”,
“\n”,
“We require you to complete the below function to answer the next several questions (this is a **requirement**, and you will **lose points** if you do not implement this function).”
]
},
{
“cell_type”: “code”,
“execution_count”: null,
“id”: “ca1be9c1”,
“metadata”: {
“tags”: []
},
“outputs”: [],
“source”: [
“def find_prices_within(lat_min, lat_max, long_min, long_max):\n”,
” \”\”\”\n”,
” find_prices_within(lat_min, lat_max, long_min, long_max) returns an unordered \n”,
” list of prices of all the rooms within the geographical location between and including\n”,
” the latitudes lat_min and lat_max and longitudes long_min and long_max.\n”,
” \”\”\”\n”,
” pass # replace with your code”
]
},
{
“cell_type”: “markdown”,
“id”: “7cb93af3”,
“metadata”: {
“deletable”: false,
“editable”: false
},
“source”: [
“**Question 12:** What is the **lowest** `price` room near *NYU* (`40.729 <= latitude <= 40.73, -74.01 <= longitude <= -74.00`)?\n",
"\n",
"You **must** call the `find_prices_within` function to answer this question."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89e9a9f8",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'min_price_nyu', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7bc48e6",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q12\")"
]
},
{
"cell_type": "markdown",
"id": "35092860",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"### Function 4: `median(items)` \n",
"\n",
"We require you to complete the below function to answer the next several questions (this is a **requirement**, and you will **lose points** if you do not implement this function).\n",
"You may copy/paste this function from your Lab-P6 notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "41a321b1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def median(items):\n",
" \"\"\"\n",
" median(items) returns the median of the list `items`\n",
" \"\"\"\n",
" pass # replace with your code\n",
" # you may copy/paste this function from your Lab-P6 notebook"
]
},
{
"cell_type": "markdown",
"id": "aecac920",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 13:** What is the **median** `price` of the rooms near *Columbia University* (`40.79 <= latitude <= 40.80, -73.96 <= longitude <= -73.95`)?\n",
"\n",
"You **must** call the `find_prices_within` function to answer this question."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1167d39b",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'median_price_columbia', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b0108fa",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q13\")"
]
},
{
"cell_type": "markdown",
"id": "e0f6fdc4",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 14:** What **percentage** of rooms near *Rockefeller Center* (`40.749 <= latitude <= 40.75, -73.98 <= longitude <= -73.97`) have a `price` **more than** *100*?\n",
"\n",
"Your answer **must** be a *float* value between *0* and *100*. You **must** call the `find_prices_within` function to answer this question."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ff94931",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'pct_price_over_hundred', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54ed36d0",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q14\")"
]
},
{
"cell_type": "markdown",
"id": "fd02b61a",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"### Function 5: `avg_review_avail_ratio(neighborhood)` \n",
"\n",
"We require you to complete the below function to answer the next several questions (this is a **requirement**, and you will **lose points** if you do not implement this function). \n",
"\n",
"$\\text{Ratio of number of reviews and availability =} \\frac{\\texttt{number_of_reviews}}{\\texttt{availability_365}}$\n",
" \n",
"In this function, we want to compute the **average ratio** of `number_of_reviews` and `availability_365` in a `neighborhood`.\n",
"\n",
"You **must** **ignore** rooms that have `availability_365` data of 0. \n",
"You **must** also **ignore** rooms for which the ratio cannot be computed due to **missing data** (i.e., either the numerator or denominator is **missing**).\n",
"\n",
"**For example**, Let's consider a sample dataset which only has two rooms, and we want to compute the average ratio of `number_of_reviews` and `availability_365` in *Jamaica*\n",
"\n",
"`name`|`number_of_reviews`| `availability_365`|`neighborhood`\n",
"------|------|------|------|\n",
"room_one| 4 | 200| Jamaica\n",
"room_two| 200| 20 |Jamaica\n",
"\n",
"\n",
"1. Compute the ratio for each of the room in the `neighborhood` *Jamaica*: \n",
"\n",
" review-availability ratio for room_one = 4 / 200 = 0.02.\n",
" \n",
" review-availability ratio for room_two = 200 / 20 = 10. \n",
" \n",
"2. Calculate the average between the two ratios:\n",
" $\\texttt{average_review_availability} \\text{ in Jamaica} = \\frac{0.02 + 10}{2} = 5.01.$\n",
" \n",
"**Hints:**\n",
"1. The denominator is the availability of a room (`availability_365`). The numerator is the number of reviews of a room (`number_of_reviews column`). \n",
"2. Be careful! You need to compute the ratio for each room in the given neighborhood, then take the average of those ratios. Simply dividing the sum of reviews by the sum of availability will calculate the wrong answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c6c9749",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def avg_review_avail_ratio(neighborhood):\n",
" \"\"\"\n",
" avg_review_avail_ratio(neighborhood) returns the average of the ratios of \n",
" number of reviews to availability of all rooms in the `neighborhood`;\n",
" If there are no rooms in the `neighborhood` for which the ratio can\n",
" be computed, then the function returns `None`\n",
" \"\"\"\n",
" pass # replace with your code\n",
" # TODO: you should **ignore** rooms that have `availability_365` data of 0\n",
" # TODO: you should **ignore** rooms for which the ratio cannot be computed due to missing data\n",
" # Hint: the numerator is the number of reviews of a room (`number_of_reviews column`)\n",
" # Hint: the denominator is the availability of a room (`availability_365` column)\n",
" # Hint: note that you need to compute the average of the ratios, **not** the ratio of the averages.\n",
" # you must compute the ratio for each room in the `neighborhood`, then take the average of those ratios.\n",
" # simply dividing the sum of reviews by the sum of availability will calculate the wrong answer."
]
},
{
"cell_type": "markdown",
"id": "0ceca136",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 15:** What is the **average of the ratios** of the `number_of_reviews` to `availability_365` in the `neighborhood` *Bushwick*?\n",
"\n",
"You **must** call the `avg_review_avail_ratio` function to answer this question."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a8cf34c9",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'bushwick_avg_ratio', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc43cf68",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q15\")"
]
},
{
"cell_type": "markdown",
"id": "defaf37e",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 16:** What is the **average of the ratios** of the `number_of_reviews` to `availability_365` in the `neighborhood` *Manhattan Beach*?\n",
"\n",
"You **must** call the `avg_review_avail_ratio` function to answer this question."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d8ddc0e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'manhattan_beach_avg_ratio', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13ba034f",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q16\")"
]
},
{
"cell_type": "markdown",
"id": "744122c7",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 17:** Which `neighborhood` in the `neighborhood_group` *Staten Island* has the **highest average of ratios** of the `number_of_reviews` to `availability_365`?\n",
"\n",
"You **must** **ignore** any `neighborhood` for which the average ratio **cannot be computed**.\n",
"\n",
"**Clarification:** Don't worry about it if this cell takes around 10 seconds to run, that is expected. If it takes much longer (i.e., more than 30 seconds), you **must** optimize your code. Attend office hours if you are unable to get your code to run faster.\n",
"\n",
"**Hint:** You do not need to compute the average of ratios for the **same** `neighborhood` more than once. Make a list of the **unique** neighborhoods in *Staten Island* first, and then find the **highest average of ratios** among those `neighborhoods`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "624f1d31",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'max_nbhd_staten_island', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef9fb72d",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q17\")"
]
},
{
"cell_type": "markdown",
"id": "d38c664b",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"### Function 6: `find_good_rooms(room_type, neighborhood, number_of_reviews_threshold)`\n",
"\n",
"We require you to complete the below function to answer the next several questions (this is a **requirement**, and you will **lose points** if you do not implement this function). \n",
"\n",
"Price, location, room type and number of reviews are metrics that people look into when they book a room in Airbnb. \n",
"In this function, we want to return a **list** of all the room names (`name`) of the given `room_type` from the given `neighborhood` who have received **at least** `number_of_reviews_threshold` many reviews (`number_of_reviews`), and are **cheaper** than the **average** priced (`price`) room of the given `room_type` from that `neighborhood`.\n",
"\n",
"The order of the **list** does **not** matter. You **must** **ignore** any rooms for which the `price`, `room_type`, `neighborhood`, or `number_of_reviews` data is **missing**. If the average `price` of rooms of the given `room_type` from the given `neighborhood` **cannot be computed** due to missing data, then the function **must** return `None`.\n",
"\n",
"**For example**, let's consider the following small dataset: \n",
"\n",
"`name`| `price` |`number_of_reviews`|`room_type`|`neighborhood`\n",
"------|------|------|------|------|\n",
"room_one| 65 | 165| Private room |Jamaica\n",
"room_two| 50 |200| Private room |Jamaica\n",
"room_three| 80| 120| Private room |Jamaica\n",
"room_four| 300| 300| Private room |Jamaica\n",
"room_five| 450| 240| Private room |Jamaica\n",
"room_six| 180| 150| Private room |Jamaica\n",
"\n",
"In this small dataset, we want to find the list of all room names in *Jamaica* of `room_type` *Private room* with **at least** *150* reviews that have a price **lower** than the **average** price of `room_type` *Private room* in *Jamaica*. \n",
" \n",
"1. The **average** `price` of a *Private room* in the `neighborhood` *Jamaica* is:\n",
"$\\frac{65 + 50 + 80 + 300 + 450 + 180}{6} = 187.5.$ \n",
"\n",
"2. We can see that there are *4* rooms (`room_one`, `room_two`, `room_three`, and `room_six`) with a `price` **lower** than the **average**, *187.5*. Of these rooms, *3* of them (`room_one`, `room_two`, and `room_six`) also have `number_of_reviews` **at least** *150*.\n",
"\n",
"3. So, the output should be the **list** `[\"room_one\", \"room_two\", \"room_six\"]`.\n",
"\n",
"The `find_good_rooms` function definition **must** invoke the function `avg_price_per_room_type`. **We'll manually deduct points** if you don't use `avg_price_per_room_type`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99284b30",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def find_good_rooms(room_type, neighborhood, number_of_reviews_threshold=150):\n",
" \"\"\"\n",
" find_good_rooms(room_type, neighborhood, number_of_reviews_threshold)\n",
" returns a list of room `names` having at least the given `number_of_reviews` \n",
" that also have a price that is lower than the average price of rooms\n",
" of the same `room_type` from the same `neighborhood`\n",
" \"\"\" \n",
" pass # replace with your code\n",
" # TODO: use 'avg_price_per_room_type' to find the average `price` of rooms\n",
" # of the given `room_type` from the given `neighborhood`\n",
" # TODO: create an empty list\n",
" # TODO: add the names of all the rooms of the given `room_type` from the\n",
" # the given `neighborhood` with `price` lower than the average\n",
" # and `number_of_reviews` at least the threshold\n",
" # TODO: return the list"
]
},
{
"cell_type": "markdown",
"id": "1264dc54",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 18:** Find a **list** of all the *Entire home/apt* type rooms (`room_type`) in the *Chinatown* `neighborhood` with at least *100* reviews (`number_of_reviews`) that are cheaper than average.\n",
"\n",
"Your answer **must** be a **list**. The order does **not** matter. You **must** call the `find_good_rooms` function to answer this question."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1ca2816",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'good_chinatown_rooms', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7433079b",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q18\")"
]
},
{
"cell_type": "markdown",
"id": "a8f6d90f",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 19:** Find a **list** of all the *Private room* type rooms (`room_type`) in the *Harlem* `neighborhood` with $\\geq 300$ and $< 500$ reviews (`number_of_reviews`) that are cheaper than average.\n",
"\n",
"Your answer **must** be a **list**. The order does **not** matter. You **must** call the `find_good_rooms` function to answer this question.\n",
"\n",
"**Hint**: Call the `find_good_rooms` function twice with the two different `number_of_reviews_threshold` values, and use these two lists to compute the answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f8d1637",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'decent_harlem_rooms', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4473e299",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q19\")"
]
},
{
"cell_type": "markdown",
"id": "73681939",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"**Question 20:** On a trip to NYC, you need to stay for *3* days in *Queens*, and then *4* days in *Brooklyn*. What is the **minimum** amount of money you need to spend on this trip?\n",
"\n",
"Note that:\n",
"1. The `price` of each room is for one day, and you'll only stay in one room at each location.\n",
"2. The total cost = (lowest price in *Queens*) * 3 + (lowest price in *Brooklyn*) * 4.\n",
"3. You'll need to **skip** those rooms that don't have enough availability, for example, you **must** ignore rooms in *Queens* whose availability is **less than** *3*.\n",
"4. You'll need to **skip** those rooms for which you don't meet the required number of `minimum_nights`, for example, you **must** ignore rooms in *Brooklyn* whose `minimum_nights` is **greater than** *4*.\n",
"5. You **must** skip all rooms with any of the relevant data missing.\n",
"\n",
"\n",
"**Hint:** You might want to define a helper function to compute the **minimum** daily `price` of a room in a given `neighborhood_group` among rooms whose `availability_365` is at least the number of days one will be staying in that neighborhood group, and the `minimum_nights` is at most the number of the number of days one will be staying in that neighborhood group."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f99d0a08",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# compute and store the answer in the variable 'min_cost_trip', then display it\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d80ed37a",
"metadata": {
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"grader.check(\"q20\")"
]
},
{
"cell_type": "markdown",
"id": "91e0ce42",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
"## Submission\n",
"It is recommended that at this stage, you Restart and Run all Cells in your notebook.\n",
"That will automatically save your work and generate a zip file for you to submit.\n",
"\n",
"**SUBMISSION INSTRUCTIONS**:\n",
"1. **Upload** the zipfile to Gradescope.\n",
"2. Check **Gradescope otter** results as soon as the auto-grader execution gets completed. Don't worry about the score showing up as -/100.0. You only need to check that the test cases passed."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d571556",
"metadata": {
"cell_type": "code"
},
"outputs": [],
"source": [
"# running this cell will create a new save checkpoint for your notebook\n",
"from IPython.display import display, Javascript\n",
"display(Javascript('IPython.notebook.save_checkpoint();'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "023947a7",
"metadata": {
"cell_type": "code",
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"!jupytext --to py p6.ipynb"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6cc6260d",
"metadata": {
"cell_type": "code",
"deletable": false,
"editable": false
},
"outputs": [],
"source": [
"p6_test.check_file_size(\"p6.ipynb\")\n",
"grader.export(pdf=False, run_tests=True, files=[\"p6.py\"])"
]
},
{
"cell_type": "markdown",
"id": "1d42dabf",
"metadata": {
"deletable": false,
"editable": false
},
"source": [
" "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
},
"otter": {
"OK_FORMAT": true,
"tests": {
"q1": {
"name": "q1",
"points": 5,
"suites": [
{
"cases": [
{
"code": ">>> p6_test.check(\”q1\”, neighborhood_groups)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q10”: {
“name”: “q10”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q10\”, pvt_room_little_neck)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q11”: {
“name”: “q11”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q11\”, home_pvt_room_astoria_diff)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q12”: {
“name”: “q12”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q12\”, min_price_nyu)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q13”: {
“name”: “q13”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q13\”, median_price_columbia)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q14”: {
“name”: “q14”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q14\”, pct_price_over_hundred)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q15”: {
“name”: “q15”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q15\”, bushwick_avg_ratio)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q16”: {
“name”: “q16”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q16\”, manhattan_beach_avg_ratio)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q17”: {
“name”: “q17”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q17\”, max_nbhd_staten_island)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q18”: {
“name”: “q18”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q18\”, good_chinatown_rooms)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q19”: {
“name”: “q19”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q19\”, decent_harlem_rooms)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q2”: {
“name”: “q2”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q2\”, avg_price)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q20”: {
“name”: “q20”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q20\”, min_cost_trip)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q3”: {
“name”: “q3”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q3\”, count_soho)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q4”: {
“name”: “q4”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q4\”, rooms_free_wifi)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q5”: {
“name”: “q5”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q5\”, rooms_contain_cinema_film)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q6”: {
“name”: “q6”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q6\”, funnest_room)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q7”: {
“name”: “q7”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q7\”, no_cost_rooms)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q8”: {
“name”: “q8”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q8\”, pricey_neighborhoods)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
},
“q9”: {
“name”: “q9”,
“points”: 5,
“suites”: [
{
“cases”: [
{
“code”: “>>> p6_test.check(\”q9\”, last_review_before_2015)\nTrue”,
“hidden”: false,
“locked”: false
}
],
“scored”: true,
“setup”: “”,
“teardown”: “”,
“type”: “doctest”
}
]
}
}
}
},
“nbformat”: 4,
“nbformat_minor”: 5
}
#!/usr/bin/python
import os, json, math
MAX_FILE_SIZE = 500 # units – KB
REL_TOL = 6e-04 # relative tolerance for floats
ABS_TOL = 15e-03 # absolute tolerance for floats
PASS = “PASS”
TEXT_FORMAT = “text” # question type when expected answer is a str, int, float, or bool
TEXT_FORMAT_NAMEDTUPLE = “text namedtuple” # question type when expected answer is a namedtuple
TEXT_FORMAT_UNORDERED_LIST = “text list_unordered” # question type when the expected answer is a list where the order does *not* matter
TEXT_FORMAT_ORDERED_LIST = “text list_ordered” # question type when the expected answer is a list where the order does matter
TEXT_FORMAT_ORDERED_LIST_NAMEDTUPLE = “text list_ordered namedtuple” # question type when the expected answer is a list of namedtuples where the order does matter
TEXT_FORMAT_SPECIAL_ORDERED_LIST = “text list_special_ordered” # question type when the expected answer is a list where order does matter, but with possible ties. Elements are ordered according to values in special_ordered_json (with ties allowed)
TEXT_FORMAT_DICT = “text dict” # question type when the expected answer is a dictionary
TEXT_FORMAT_LIST_DICTS_ORDERED = “text list_dicts_ordered” # question type when the expected answer is a list of dicts where the order does matter
expected_json = {“1”: (TEXT_FORMAT_UNORDERED_LIST, [‘Brooklyn’, ‘Manhattan’, ‘Queens’, ‘Staten Island’, ‘Bronx’]),
“2”: (TEXT_FORMAT, 152.7206871868289),
“3”: (TEXT_FORMAT, 358),
“4”: (TEXT_FORMAT_UNORDERED_LIST,[‘One Bedroom Mini studio – Free WIFI’,
‘Great Chelsea Location, Couch/2nd bed, Free WiFi’,
‘Private 2 BR APT: Free WIFI & JACUZZI’,
‘PRIVATE 1BR APT: Free WIFI & DIRECT TV’,
‘Landmark 1 Bedroom has 2 beds, Free WiFi’,
‘Modern and Safe Place,Free Wifi’,
‘Newly renovated 2 bedroom with FREE WIFI’,
‘*NO GUEST SERVICE FEE* Beekman Tower Studio with Queen Bed & Free Wifi’,
‘*NO GUEST SERVICE FEE* Beekman Tower One Bedroom Suite with Queen Bed & Free Wifi’,
‘Sunny Hudson Yards/ Chelsea Studio, Free WiFi’,
‘Private Bedroom in MANHATTAN (Free Wifi)’,
‘J- LUXURY SHARED ROOM, AC FREE WIFI+CABLE GARDEN’,
‘J- *LUXURY SHARED ROOM AC FREE WIFI CABLE, GARDEN’,
‘J- **LUXURY SHARED ROOM 2PPL FREE WIFI+CABLE+AC’,
‘5min walk to L train – Free WiFi & Cleaning’,
‘J- HOTEL STYLE SHARE ROOM FOR 2PPL FREE WIFI CABLE’,
‘Explore NYC From Our Private Studio w/Free Wifi’,
‘Staten Island – Free Wifi, Parking Space, Near NYC’,
‘BIG BEDROOM CLOSE TO LA GUARDIA AIRPORT FREE WIFI’,
‘J- COZY ROOM FOR 1 FEMALE FREE WIFI & COFFEE’]),
“5”: (TEXT_FORMAT_UNORDERED_LIST, [‘HUGE LUX 2FLOOR 2 BDRMSOHO LOFTw/HOME CINEMA’,
‘Cinema Studio on Duplex Apt.’,
‘Cool apartment in Brooklyn with free cinema & gym’,
‘Cinema + gym included with room’,
‘TV-PHOTO-FILM-CINEMA-ART GALLERY-MUSIC STUDIO-LOFT’,
‘Premium Chelsea 1BR w/ Gym, W/D, Doorman, Sundeck, Cinema, by Blueground’,
‘Stunning Chelsea 1BR w/ Gym, W/D, Doorman, Sundeck, Cinema, by Blueground’,
‘Sunny private room featured in film’,
“Downtown Filmmaker’s Loft by WTC”,
‘Film Location’,
‘Brooklyn townhouse for filming’,
‘WoodyAllen FilmSet-Like Digs (Apt)’,
‘WoodyAllen FilmSet-Like Digs (Room)’,
‘Film / photography location in unique apartment’,
‘The Otheroom Bar/Event/Filming Space -read details’,
‘Victorian Film location’,
‘Modern Townhouse for Photo, Film & Daytime Events’,
‘Shoot. Film. Sleep. Unique Loft Space in Brooklyn.’,
‘Clean music/film themed bedroom’,
‘Music Recording Mixing Film Photography Art’]),
“6”: (TEXT_FORMAT, ‘Homey 1BR in Fun, Central West Village w/ Doorman by Blueground’),
“7”: (TEXT_FORMAT_ORDERED_LIST, [‘Contemporary bedroom in brownstone with nice view’,
‘Cozy yet spacious private brownstone bedroom’,
‘Spacious comfortable master bedroom with nice view’]),
“8”: (TEXT_FORMAT_ORDERED_LIST, [‘Upper West Side’, ‘Greenpoint’, ‘Astoria’]),
“9”: (TEXT_FORMAT, 1672),
“10”: (TEXT_FORMAT, 65.0),
“11”: (TEXT_FORMAT, 46.02133741379495),
“12”: (TEXT_FORMAT, 75),
“13”: (TEXT_FORMAT, 100),
“14”: (TEXT_FORMAT, 93.10344827586206),
“15”: (TEXT_FORMAT, 0.9856466156705752),
“16”: (TEXT_FORMAT, 0.27323293295076073),
“17”: (TEXT_FORMAT, ‘Arden Heights’),
“18”: (TEXT_FORMAT_UNORDERED_LIST,[‘Cute & Cozy Lower East Side 1 bdrm’,
‘QT STUDIO FOR ROMANTIC COUPLES’,
‘Lower East Side 1bedroom apt in NYC’,
‘Two Bridges District Chinatown NYC’,
‘Fun LES 1br, close to everything!’,
‘Amazing Downtown With Rooftop’,
‘Sunny Central Location!’,
‘Cozy Apartment in Lower Manhattan’,
‘Gorgeous Loft in the LES/Chinatown!’,
‘Spacious 2 Bedroom Lower East Side’,
‘Lovely 2-bedroom 1bath in Chinatown & Little Italy’]),
“19”: (TEXT_FORMAT_UNORDERED_LIST, [‘PRIVATE Room on Historic Sugar Hill’,
‘Bright Room With A Great River View’,
‘1 Pvt. Room in Upper West Manhattan’]),
“20”: (TEXT_FORMAT, 30)}
special_ordered_json = {}
def check_cell_text(qnum, actual):
format, expected = expected_json[qnum[1:]]
try:
if format == TEXT_FORMAT:
return simple_compare(expected, actual)
elif format in [TEXT_FORMAT_ORDERED_LIST, TEXT_FORMAT_LIST_DICTS_ORDERED]:
return list_compare_ordered(expected, actual)
elif format == TEXT_FORMAT_UNORDERED_LIST:
return list_compare_unordered(expected, actual)
elif format == TEXT_FORMAT_SPECIAL_ORDERED_LIST:
return list_compare_special(expected, actual, special_ordered_json[qnum[1:]])
elif format == TEXT_FORMAT_DICT:
return dict_compare(expected, actual)
else:
if expected != actual:
return “expected %s but found %s ” % (repr(expected), repr(actual))
except:
if expected != actual:
return “expected %s” % (repr(expected))
return PASS
def simple_compare(expected, actual, complete_msg=True):
msg = PASS
if type(expected) == type:
if expected != actual:
if type(actual) == type:
msg = “expected %s but found %s” % (expected.__name__, actual.__name__)
else:
msg = “expected %s but found %s” % (expected.__name__, repr(actual))
elif type(expected) != type(actual) and not (type(expected) in [float, int] and type(actual) in [float, int]):
msg = “expected to find type %s but found type %s” % (type(expected).__name__, type(actual).__name__)
elif type(expected) == float:
if not math.isclose(actual, expected, rel_tol=REL_TOL, abs_tol=ABS_TOL):
msg = “expected %s” % (repr(expected))
if complete_msg:
msg = msg + ” but found %s” % (repr(actual))
else:
if expected != actual:
msg = “expected %s” % (repr(expected))
if complete_msg:
msg = msg + ” but found %s” % (repr(actual))
return msg
def list_compare_ordered(expected, actual, obj=”list”):
msg = PASS
if type(expected) != type(actual):
msg = “expected to find type %s but found type %s” % (type(expected).__name__, type(actual).__name__)
return msg
for i in range(len(expected)):
if i >= len(actual):
msg = “expected missing %s in %s” % (repr(expected[i]), obj)
break
if type(expected[i]) in [int, float, bool, str]:
val = simple_compare(expected[i], actual[i])
elif type(expected[i]) in [list]:
val = list_compare_ordered(expected[i], actual[i], “sub” + obj)
elif type(expected[i]) in [dict]:
val = dict_compare(expected[i], actual[i])
elif type(expected[i]).__name__ == obfuscate1():
val = simple_compare(expected[i], actual[i])
if val != PASS:
msg = “at index %d of the %s, ” % (i, obj) + val
break
if len(actual) > len(expected) and msg == PASS:
msg = “found unexpected %s in %s” % (repr(actual[len(expected)]), obj)
if len(expected) != len(actual):
msg = msg + ” (found %d entries in %s, but expected %d)” % (len(actual), obj, len(expected))
if len(expected) > 0 and type(expected[0]) in [int, float, bool, str]:
if msg != PASS and list_compare_unordered(expected, actual, obj) == PASS:
try:
msg = msg + ” (list may not be ordered as required)”
except:
pass
return msg
def list_compare_helper(larger, smaller):
msg = PASS
j = 0
for i in range(len(larger)):
if i == len(smaller):
msg = “expected %s” % (repr(larger[i]))
break
found = False
while not found:
if j == len(smaller):
val = simple_compare(larger[i], smaller[j – 1], False)
break
val = simple_compare(larger[i], smaller[j], False)
j += 1
if val == PASS:
found = True
break
if not found:
msg = val
break
return msg
def list_compare_unordered(expected, actual, obj=”list”):
msg = PASS
if type(expected) != type(actual):
msg = “expected to find type %s but found type %s” % (type(expected).__name__, type(actual).__name__)
return msg
try:
sort_expected = sorted(expected)
sort_actual = sorted(actual)
except:
msg = “unexpected datatype found in %s; expected entries of type %s” % (obj, obj, type(expected[0]).__name__)
return msg
if len(actual) == 0 and len(expected) > 0:
msg = “in the %s, missing” % (obj) + expected[0]
elif len(actual) > 0 and len(expected) > 0:
val = simple_compare(sort_expected[0], sort_actual[0])
if val.startswith(“expected to find type”):
msg = “in the %s, ” % (obj) + simple_compare(sort_expected[0], sort_actual[0])
else:
if len(expected) > len(actual):
msg = “in the %s, missing ” % (obj) + list_compare_helper(sort_expected, sort_actual)
elif len(expected) < len(actual):
msg = "in the %s, found un" % (obj) + list_compare_helper(sort_actual, sort_expected)
if len(expected) != len(actual):
msg = msg + " (found %d entries in %s, but expected %d)" % (len(actual), obj, len(expected))
return msg
else:
val = list_compare_helper(sort_expected, sort_actual)
if val != PASS:
msg = "in the %s, missing " % (obj) + val + ", but found un" + list_compare_helper(sort_actual,
sort_expected)
return msg
def list_compare_special_init(expected, special_order):
real_expected = []
for i in range(len(expected)):
if real_expected == [] or special_order[i-1] != special_order[i]:
real_expected.append([])
real_expected[-1].append(expected[i])
return real_expected
def list_compare_special(expected, actual, special_order):
expected = list_compare_special_init(expected, special_order)
msg = PASS
expected_list = []
for expected_item in expected:
expected_list.extend(expected_item)
val = list_compare_unordered(expected_list, actual)
if val != PASS:
msg = val
else:
i = 0
for expected_item in expected:
j = len(expected_item)
actual_item = actual[i: i + j]
val = list_compare_unordered(expected_item, actual_item)
if val != PASS:
if j == 1:
msg = "at index %d " % (i) + val
else:
msg = "between indices %d and %d " % (i, i + j - 1) + val
msg = msg + " (list may not be ordered as required)"
break
i += j
return msg
def dict_compare(expected, actual, obj="dict"):
msg = PASS
if type(expected) != type(actual):
msg = "expected to find type %s but found type %s" % (type(expected).__name__, type(actual).__name__)
return msg
try:
expected_keys = sorted(list(expected.keys()))
actual_keys = sorted(list(actual.keys()))
except:
msg = "unexpected datatype found in keys of dict; expect a dict with keys of type %s" % (
type(expected_keys[0]).__name__)
return msg
val = list_compare_unordered(expected_keys, actual_keys, "dict")
if val != PASS:
msg = "bad keys in %s: " % (obj) + val
if msg == PASS:
for key in expected:
if expected[key] == None or type(expected[key]) in [int, float, bool, str]:
val = simple_compare(expected[key], actual[key])
elif type(expected[key]) in [list]:
val = list_compare_ordered(expected[key], actual[key], "value")
elif type(expected[key]) in [dict]:
val = dict_compare(expected[key], actual[key], "sub" + obj)
if val != PASS:
msg = "incorrect val for key %s in %s: " % (repr(key), obj) + val
return msg
def check(qnum, actual):
msg = check_cell_text(qnum, actual)
if msg == PASS:
return True
print("ERROR: ” + msg)
def check_file_size(path):
size = os.path.getsize(path)
assert size < MAX_FILE_SIZE * 10**3, "Your file is too big to be processed by Gradescope; please delete unnecessary output cells so your file size is < %s KB" % MAX_FILE_SIZE