Data Analysis

DATA ANALYSIS

로그앤 2023. 8. 15. 17:10

1. REGRESSION: 연속된 독립변수와 종속변수의 관계 및 경향성을 찾는 방법 --> Least Squares (오차를 최소화하는 모델 찾기)

X: [1, 2, 3, 4, 5]
Y: [2, 4, 5, 4, 5]

1. FIND DEVIATION OF X and Y deviation of x = X - x̄ deviation of y = Y - ȳ	deviation of x: [-2, -1, 0, 1, 2] deviation of y: [-2, 0, 1, 0, 1]
2. CALCULATE PRODUCT OF DEV deviation of x * deviation of y	product of deviations: [4, 0, 0, 0, 2]
3. CALCULATE SQUARED DEV OF X deviation of x * deviation of y	squared deviation of x: [4, 1, 0, 1, 4]
4. CALCULATE SLOPE (β₁) β₁ = Σ(product of deviations) Σ(squared deviation of x)	Sum of product of deviations = 4 + 0 + 0 + 0 + 2 = 6 Sum of squared deviation of x = 4 + 1 + 0 + 1 + 4 = 10 β₁ = 6 / 10 = 0.6
5. Calculate Intercept β₀ β₀ = ȳ - β₁ * x̄	*β₀ = 4 - 0.6 3 = 2.2**
6. Write Regression Equation Y = β₀ + β₁ * X	*Y = 2.2 + 0.6 X**

2. VARIANCE (1개 변수) & COVARIANCE (여러개 변수): 확률변수가 기댓값(mean) 부터 얼마나 멀리 있는지

VARIANCE (σ²) = Σ (xᵢ - μ)²

(n-1)

COV(X, Y) = Σ (xᵢ - X̄)(yᵢ - Ȳ)

(n - 1)

1. COVARIANCE MATRIX

STEP 1:

X: [2, 3, 5, 7, 10]
Y: [6, 9, 12, 15, 18]

Step 2: Calculate the Means (μx and μy):

μx = (2 + 3 + 5 + 7 + 10) / 5 = 5.4
μy = (6 + 9 + 12 + 15 + 18) / 5 = 12

Step 3: Calculate the Covariance:

Cov(X, Y)

= Σ((xi - μx) * (yi - μy)) / (n - 1)
          = (20.4 + 7.2 + 0 + 4.8 + 27.6) / (5 - 1)
          = 59 / 4
          ≈ 14.75

Cov(X, X)

= Σ((xi - μx)^2) / (n - 1)
          = ((-3.4)^2 + (-2.4)^2 + (-0.4)^2 + 1.6^2 + 4.6^2) / (5 - 1)
          = 43.2 / 4
          = 10.8
Cov(Y, Y)

= Σ((yi - μy)^2) / (n - 1)
= ((-6)^2 + (-3)^2 + 0^2 + 3^2 + 6^2) / (5 - 1)
= 90 / 4
= 22.5

Step 5: Assemble the Covariance Matrix:

3. Correlation Coefficient: Normalized Covariance

ρ (X,Y) = cov (X,Y)

σXσY

4. Z-score (Normalization): Z = (X - μ) / σ

1. Returns # of Sales / Month

sales['Month'].value_counts()

france_states = sales.loc[sales['Country'] == 'France', 'State'].value_counts()

i. FILTER Country == France

ii. SELECT STATES COLUMN

iii. COUNT # OF STATES

sales.loc[(sales['Customer_Gender'] == 'M') & (sales['Revenue'] == 500)].shape[0]

i. FILTER Customer_Gender == M && Revenue == 500

ii. C