What statistical test should I use in this situation?

I’m exploring the relationship between the amount a sales rep smiles and the outcome of an online sales meeting. The smiling data is expressed as a percentile based on our database, while the meeting outcome is categorized as either a “Win” or “Loss.”

To gain initial insights, I’ll create bar plots. However, I’m unsure which statistical test would best determine if there’s a non-random association between smiling and meeting outcomes. I could bin the smiling data and perform a Chi-square test, but this might reduce the data’s granularity. Logistic regression or point-biserial correlation could work, though I’m uncertain if the relationship is linear, as both excessive and minimal smiling might negatively impact outcomes.

Additionally, to assess whether being in the top 5% of smiles enhances meeting success, could I conduct a one-tailed t-test comparing the win rates between the top 5% and the rest?

Hey, In this case i will give a statistical solution on Logistic Regression, This is a good choice as it allows you to model the binary outcome (Win/Loss) based on a continuous predictor (percentile of smiling). Logistic regression does not assume a linear relationship and can accommodate the possibility that both too much and too little smiling may negatively impact outcomes.

Hello, I will recommend the Chi-Square Test that is If you choose to bin the data into categories (e.g., 0-10%, 10-20%, etc.), a Chi-square test can help determine if there is a significant association between the binned smiling data and the binary outcome.

Hey, The Point-Biserial Correlation is one of the solution since it measures the strength and direction of the association between a continuous variable and a binary variable. However, it assumes a linear relationship, which may not be ideal if the relationship is more complex.

Hey, For your first question, I suggest flipping the order of what you’re asking. The directionality doesn’t really matter for a single statistical test, as it simply provides the result, not the cause. For example, rephrase your question to “Do won deals involve more smiling?” This clarifies that you could use a two-sample T-test with the null hypothesis that the mean percentage of smiling time is the same between won and lost deals. Although smiling time might not be normally distributed, the test still shows that there’s more smiling in successful deals. However, it can’t determine if smiling influences the sale or if sales reps smile more when they feel a deal is likely to succeed.

For your second question, you’ve categorized smiling into two groups: the top 5% and the bottom 95%. Since your outcome is also categorical (won or lost), a Fisher’s exact test or a Chi-Squared test (if you have a large sample size) would be appropriate.

Hello, Here are a few steps you could take to address your issue:

  1. Perform Exploratory Data Analysis (EDA): Start by creating plots to visualize the relationship between the explanatory variable and the outcome. This will give you a clearer understanding of how they are connected.
  2. Address Non-Linearity: If the relationship is not linear, you can still apply regression by transforming the features. This allows you to model complex relationships effectively.