티스토리 뷰

XX(회사이름) 파이썬 판다스 테스트 (30 mins)

In this task, you'll be analysing listings data from our Shopee Platform.

You may use the Pycharm IDE installed, Sublime or other windows native text editors. Please save your python source code on the desktop. You may use the internet for help.

The dataset is stored in the Test_Pandas.xlsx file. It contains listing information posted on Shopee.  One single listing corresponds to one row in the dataset.

The dataset has 12 columns, and 464433 rows.

Here are the brief descriptions of each column:

Itemid - a unique ID of the product

Shopid - a unique ID of the shop

item_name - product title 

item_description - detailed  product description

item_variation - stores variations of a product (e.g. different colours or sizes, in the format like {variation 1 name: variation 1 price, variation 2 name: variation 2 price})

price - how much does the item sold

stock - how many stocks left

category - which category does the product belongs to

cb_option - 1 indicates the product is sold by a cross border shop

is_preferred - 1 indicates the product is sold by a preferred shop

sold_count - how many products have been sold

item_creation_date - when are the product uploaded by the seller

 

  1. Use pandas function to read the Test_Pandas.xlsx file in: (10 marks)
    1. Assign the result to a variable named “data”
    2. Assign all column names to a variable named “columns”

 

  1. Use pandas function to find: (20 marks)
    1. How many unique shops are in the dataset?
    2. How many unique preferred and cross border shops are in the dataset?
    3. How many products have zero sold count?
    4. How many products were created in the year 2018?

 

  1. Use pandas function to find: (10 marks)
    1. Top 3 Preferred shops’ shopid that have the largest number of unique products
    2. Top 3 Categories that have the largest number of unique cross-border products

 

  1. Find Top 3 shopid with the highest revenue (Assumption: the product price has not been changed.) (15 marks)

 

  1. Find number of products that have more than 3 variations (do not include products with 3 or fewer variations) (15 marks)

 

  1. Use pandas function to identify duplicated listings within each shop (If listing A and B in shop S have the exactly same product title, product detailed description, and price, both listing A and B are considered as duplicated listings) (30 marks)
    1. Mark those duplicated listings with True otherwise False and store the marking result in a new column named “is_duplicated”
    2. Find duplicate listings that has less than 2 sold count and store the result in a new excel file named “duplicated_listings.xlsx”
    3. Find the preferred shop shopid that have the most number of duplicated listings

 

 

728x90
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/01   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함