60 Python Interview Questions For Data Analyst

K.C. Sabreena Basheer Last Updated : 02 Jul, 2025

9 min read

Python powers most data analytics workflows thanks to its readability, versatility, and rich ecosystem of libraries like Pandas, NumPy, Matplotlib, SciPy, and scikit-learn. Employers frequently assess candidates on their proficiency with Python’s core constructs, data manipulation, visualization, and algorithmic problem-solving. This article compiles 60 carefully crafted Python coding interview questions and answers categorized by Beginner, Intermediate, and Advanced levels, catering to freshers and seasoned data analysts alike. Each of these questions comes with detailed, explanatory answers that demonstrate both conceptual clarity and applied understanding.

Beginner Level Python Interview Questions for Data Analysts

Q1. What is Python and why is it so widely used in data analytics?

Answer: Python is a versatile, high-level programming language known for its simplicity and readability. It’s widely used in data analytics due to powerful libraries such as Pandas, NumPy, Matplotlib, and Seaborn. Python enables quick prototyping and integrates easily with other technologies and databases, making it a go-to language for data analysts.

Q2. How do you install external libraries and manage environments in Python?

Answer: You can install libraries using pip:

pip install pandas numpy

To manage environments and dependencies, use venv or conda:

python -m venv env
source env/bin/activate  # Linux/macOS
env\Scripts\activate    # Windows

This ensures isolated environments and avoids dependency conflicts.

Q3. What are the key data types in Python and how do they differ?

Answer: The key data types in Python include:

int, float: numeric types
str: for text
bool: True/False
list: ordered, mutable
tuple: ordered, immutable
set: unordered, unique
dict: key-value pairs

These types let you structure and manipulate data effectively.

Q4. Differentiate between list, tuple, and set.

Answer: Here’s the basic difference:

List: Mutable and ordered. Example: [1, 2, 3]
Tuple: Immutable and ordered. Example: (1, 2, 3)
Set: Unordered and unique. Example: {1, 2, 3} Use lists when you need to update data, tuples for fixed data, and sets for uniqueness checks.

Q5. What are Pandas Series and DataFrame?

Answer: Pandas Series is a one-dimensional labeled array. Pandas DataFrame is a two-dimensional labeled data structure with columns. We use Series for single-column data and DataFrame for tabular data.

Q6. How do you read a CSV file in Python using Pandas?

Answer: Here’s how to read a CSV file using Python Pandas:

import pandas as pd
df = pd.read_csv("data.csv")

You can also customize the delimiter, header, column names, etc. the same way.

Q7. What is the use of the type() function?

Answer: The type() function returns the data type of a variable:

type(42)       # int
type("abc")    # str

Q8. Explain the use of if, elif, and else in Python.

Answer: These functions are used for decision-making. Example:

if x > 0:
    print("Positive")
elif x < 0:
    print("Negative")
else:
    print("Zero")

Q9. How do you handle missing values in a DataFrame?

Answer: Use isnull() to identify and dropna() or fillna() to handle them.

df.dropna()
df.fillna(0)

Q10. What is list comprehension? Provide an example.

Answer: List comprehension offers a concise way to create lists. For example:

squares = [x**2 for x in range(5)]

Q11. How can you filter rows in a Pandas DataFrame?

Answer: We can filter rows by using Boolean indexing:

df[df['age'] > 30]

Q12. What is the difference between is and == in Python?

Answer: == compares values while ‘is’ compares object identity.

x == y  # value
x is y  # same object in memory

Q13. What is the purpose of len() in Python?

Answer: len() returns the number of elements in an object.

len([1, 2, 3])  # 3

Q14. How do you sort data in Pandas?

Answer: We can sort data in Python by using the sort_values() function:

df.sort_values(by='column_name')

Q15. What is a dictionary in Python?

Answer: A dictionary is a collection of key-value pairs. It’s useful for fast lookups and flexible data mapping. Here’s an example:

d = {"name": "Alice", "age": 30}

Q16. What is the difference between append() and extend()?

Answer: The append() function adds a single element to the list, while the extend() function adds multiple elements.

lst.append([4,5])  # [[1,2,3],[4,5]]
lst.extend([4,5])  # [1,2,3,4,5]

Q17. How do you convert a column to datetime in Pandas?

Answer: We can convert a column to datetime by using the pd.to_datetime() function:

df['date'] = pd.to_datetime(df['date'])

Q18. What is the use of the in operator in Python?

Answer: The ‘in’ operator lets you check if a particular character is present in a value.

"a" in "data"  # True

Q19. What is the difference between break, continue, and pass?

Answer: In Python, ‘break’ exits the loop and ‘continue’ skips to the next iteration. Meanwhile, ‘pass’ is simply a placeholder that does nothing.

Q20. What is the role of indentation in Python?

Answer: Python uses indentation to define code blocks. Incorrect indentation would lead to IndentationError.

Intermediate Level Python Interview Questions for Data Analysts

Q21. Differentiate between loc and iloc in Pandas.

Answer: loc[] is label-based and accesses rows/columns by their name, while iloc[] is integer-location-based and accesses rows/columns by position.

Q22. What is the difference between a shallow copy and a deep copy?

Answer: A shallow copy creates a new object but inserts references to the same objects, while a deep copy creates an entirely independent copy of all nested elements. We use copy.deepcopy() for deep copies.

Q23. Explain the role of groupby() in Pandas.

Answer: The groupby() function splits the data into groups based on some criteria, applies a function (like mean, sum, etc.), and then combines the result. It’s useful for aggregation and transformation operations.

Q24. Compare and contrast merge(), join(), and concat() in Pandas.

Answer: Here’s the difference between the three functions:

merge() combines DataFrames using SQL-style joins on keys.
join() joins on index or a key column.
concat() simply appends or stacks DataFrames along an axis.

Q25. What is broadcasting in NumPy?

Answer: Broadcasting allows arithmetic operations between arrays of different shapes by automatically expanding the smaller array.

Q26. How does Python manage memory?

Answer: Python uses reference counting and a garbage collector to manage memory. When an object’s reference count drops to zero, it is automatically garbage collected.

Q27. What are the different methods to handle duplicates in a DataFrame?

Answer: df.duplicated() to identify duplicates and df.drop_duplicates() to remove them. You can also specify subset columns.

Q28. How to apply a custom function to a column in a DataFrame?

Answer: We can do it by using the apply() method:

df['col'] = df['col'].apply(lambda x: x * 2)

Q29. Explain apply(), map(), and applymap() in Pandas.

Answer: Here’s how each of these functions is used:

apply() is used for rows or columns of a DataFrame.
map() is for element-wise operations on a Series.
applymap() is used for element-wise operations on the entire DataFrame.

Q30. What is vectorization in NumPy and Pandas?

Answer: Vectorization allows you to perform operations on entire arrays without writing loops, making the code faster and more efficient.

Q31. How do you resample time series data in Pandas?

Answer: Use resample() to change the frequency of time-series data. For example:

df.resample('M').mean()

This resamples the data to monthly averages.

Q32. Explain the difference between any() and all() in Pandas.

Answer: The any() function returns True if at least one element is True, whereas all() returns True only if all elements are True.

Q33. How do you change the data type of a column in a DataFrame?

Answer: We can change the data type of a column by using the astype() function:

df['col'] = df['col'].astype('float')

Q34. What are the different file formats supported by Pandas?

Answer: Pandas supports CSV, Excel, JSON, HTML, SQL, HDF5, Feather, and Parquet file formats.

Q35. What are lambda functions and how are they used?

Answer: A lambda function is an anonymous, one-liner function defined using the lambda keyword:

square = lambda x: x ** 2

Q36. What is the use of zip() and enumerate() functions?

Answer: The zip() function combines two iterables element-wise, while enumerate() returns an index-element pair, which is useful in loops.

Q37. What are Python exceptions and how do you handle them?

Answer: In Python, exceptions are errors that occur during the execution of a program. Unlike syntax errors, exceptions are raised when a syntactically correct program encounters an issue during runtime. For example, dividing by zero, accessing a non-existent file, or referencing an undefined variable.

You can use the ‘try-except’ block for handling Python exceptions. You can also use ‘finally’ for cleaning up the code and ‘raise’ to throw custom exceptions.

Q38. What are args and kwargs in Python?

Answer: In Python, args allows passing a variable number of positional arguments, whereas kwargs allows passing a variable number of keyword arguments.

Q39. How do you handle mixed data types in a single Pandas column, and what problems can this cause?

Answer: In Pandas, a column should ideally contain a single data type (e.g., all integers, all strings). However, mixed types can creep in due to messy data sources or incorrect parsing (e.g., some rows have numbers, others have strings or nulls). Pandas assigns the column an object dtype in such cases, which reduces performance and can break type-specific operations (like .mean() or .str.contains()).

To resolve this:

Use df[‘column’].astype() to cast to a desired type.
Use pd.to_numeric(df[‘column’], errors=’coerce’) to convert valid entries and force errors to NaN.
Clean and standardize the data before applying transformations.

Handling mixed types ensures your code runs without unexpected type errors and performs optimally during analysis.

Q40. Explain the difference between value_counts() and groupby().count() in Pandas. When should you use each?
Answer: Both value_counts() and groupby().count() help in summarizing data, but they serve different use cases:

value_counts() is used on a single Series to count the frequency of each unique value. Example: pythonCopyEditdf[‘Gender’].value_counts() It returns a Series with value counts, sorted by default in descending order.
groupby().count() works on a DataFrame and is used to count non-null entries in columns grouped by one or more fields. For example, pythonCopyEditdf.groupby(‘Department’).count() returns a DataFrame with counts of non-null entries for every column, grouped by the specified column(s).

Use value_counts() when you’re analyzing a single column’s frequency.
Use groupby().count() when you’re summarizing multiple fields across groups.

Advanced Level Python Interview Questions for Data Analysts

Q41. Explain Python decorators with an example use-case.

Answer: Decorators allow you to wrap a function with another function to extend its behavior. Common use cases include logging, caching, and access control.

def log_decorator(func):
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__}")
        return func(*args, **kwargs)
    return wrapper

@log_decorator
def say_hello():
    print("Hello!")

Q42. What are Python generators, and how do they differ from regular functions/lists?

Answer: Generators use yield instead of return. They return an iterator and generate values lazily, saving memory.

Q43. How do you profile and optimize Python code?

Answer: I use cProfile, timeit, and line_profiler to profile my code. I optimize it by reducing complexity, using vectorized operations, and caching results.

Q44. What are context managers (with statement)? Why are they useful?

Answer: They manage resources like file streams. Example:

with open('file.txt') as f:
    data = f.read()

It ensures the file is closed after usage, even if an error occurs.

Q45. Describe two ways to handle missing data and when to use each.

Answer: The 2 ways of handling missing data is by using the dropna() and fillna() functions. The dropna() function is used when data is missing randomly and doesn’t affect overall trends. The fillna() function is useful for replacing with a constant or interpolating based on adjacent values.

Q46. Explain Python’s memory management model.

Answer: Python uses reference counting and a cyclic garbage collector to manage memory. Objects with zero references are collected.

Q47. What is multithreading vs multiprocessing in Python?

Answer: Multithreading is useful for I/O-bound tasks and is affected by the GIL. Multiprocessing is best for CPU-bound tasks and runs on separate cores.

Q48. How do you improve performance with NumPy broadcasting?

Answer: Broadcasting allows NumPy to operate efficiently on arrays of different shapes without copying data, reducing memory use and speeding up computation.

Q49. What are some best practices for writing efficient Pandas code?

Answer: Best Python coding practices include:

Using vectorized operations
Avoid using .apply() where possible
Minimizing chained indexing
Using categorical for repetitive strings

Q50. How do you handle large datasets that don’t fit in memory?

Answer: I use chunksize in read_csv(), Dask for parallel processing, or load subsets of data iteratively.

Q51. How do you deal with imbalanced datasets?

Answer: I deal with imbalanced datasets by using oversampling (e.g., SMOTE), undersampling, and algorithms that accept class weights.

Q52. What is the difference between .loc[], .iloc[], and .ix[]?

Answer: .loc[] is label-based, while .iloc[] is index-based. .ix[] is deprecated and should not be used.

Q53. What are the common performance pitfalls in Python data analysis?

Answer: Some of the most common pitfalls I’ve come across are:

Using loops instead of vectorized ops
Copying large DataFrames unnecessarily
Ignoring memory usage of data types

Q54. How do you serialize and deserialize objects in Python?

Answer: I use pickle for Python objects and json for interoperability.

import pickle
pickle.dump(obj, open('file.pkl', 'wb'))
obj = pickle.load(open('file.pkl', 'rb'))

Q55. How do you handle categorical variables in Python?

Answer: I use LabelEncoder, OneHotEncoder, or pd.get_dummies() depending on algorithm compatibility.

Q56. Explain the difference between Series.map() and Series.replace().

Answer: map() applies a function or mapping, whereas replace() substitutes values.

Q57. How do you design an ETL pipeline in Python?

Answer: To design an ETL pipeline in Python, I typically follow three key steps:

Extract: I use tools like pandas, requests, or sqlalchemy to pull data from sources like APIs, CSVs, or databases.
Transform: I then clean and reshape the data. I handle nulls, parse dates, merge datasets, and derive new columns using Pandas and NumPy.
Load: I write the processed data into a target system such as a database using to_sql() or export it to files like CSV or Parquet.

For automation and monitoring, I prefer using Airflow or simple scripts with logging and exception handling to ensure the pipeline is robust and scalable.

Q58. How do you implement logging in Python?

Answer: I use the logging module:

import logging
logging.basicConfig(level=logging.INFO)
logging.info("Script started")

Q59. What are the trade-offs of using NumPy arrays vs. Pandas DataFrames?

Answer: Comparing the two, NumPy is faster and more efficient for pure numerical data. Pandas is more flexible and readable for labeled tabular data.

Q60. How do you build a custom exception class in Python?

Answer: I use the code to raise specific errors with domain-specific meaning.

class CustomError(Exception):
    pass

Also Read: Top 50 Data Analyst Interview Questions

Conclusion

Mastering Python is essential for any aspiring or practicing data analyst. With its wide-ranging capabilities from data wrangling and visualization to statistical modeling and automation, Python continues to be a foundational tool in the data analytics domain. Interviewers are not just testing your coding proficiency, but also your ability to apply Python concepts to real-world data problems.

These 60 questions can help you build a strong foundation in Python programming and confidently navigate technical data analyst interviews. While practicing these questions, focus not just on writing correct code but also on explaining your thought process clearly. Employers often value clarity, problem-solving strategy, and your ability to communicate insights as much as technical accuracy. So make sure you answer the questions with clarity and confidence.

Good luck – and happy coding!

K.C. Sabreena Basheer

Sabreena is a GenAI enthusiast and tech editor who's passionate about documenting the latest advancements that shape the world. She's currently exploring the world of AI and Data Science as the Manager of Content & Growth at Analytics Vidhya.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Intoduction to Python

Variables and data types

OOPs Concepts

Conditional statement

Looping Constructs

Data Structures

String Manipulation

Functions

Modules, Packages and Standard Libraries

Python Libraries for Data Science

Reading Data Files in Python

Preprocessing, Subsetting and Modifying Pandas Dataframes

Sorting and Aggregating Data in Pandas

Visualizing Patterns and Trends in Data

Programming