Share
The following dataset credit.csv contains information about credit card debt for hundreds of customers. The response is balance (average credit card debt for each individual) and there are several quantitative predictors: age, cards (number of credit cards), education (years of education), income (in thousands of dollars), limit (credit limit), and rating (credit rating). You need carry out Exploratory Data Analysis and create a linear regression model along with detailed model diagnostics.
ReportQuestion
Please briefly explain why you feel this question should be reported.
see attached file
Answer ( 1 )
Please briefly explain why you feel this answer should be reported.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns
# Load the data
data = pd.read_csv('credit.csv')
# Display the first few rows of the dataset
print(data.head())
# Statistical summary of the dataset
print(data.describe())
# Check for missing values
print(data.isnull().sum())
# Data visualization using seaborn or matplotlib (histograms, pairplots, etc.)
sns.pairplot(data)
plt.show()
# Define predictors and response
X = data[['Limit', 'Rating', 'Cards', 'Age', 'Education']]
y = data['Balance']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize a Linear Regression model
model = LinearRegression()
# Fit the model on training data
model.fit(X_train, y_train)
# Predict on test set
y_pred = model.predict(X_test)
# Calculate model evaluation metrics (e.g., Mean Squared Error)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)