Story update 5/31/2012: Joel Smith and his wife Silvana are the proud parents of a new baby girl, Juliana Garcia Smith, born May 23, 2012.
My wife and I have been expecting a baby girl soon—very soon, in fact, as in “Will this be published before the baby is born?” soon. The due date given was May 19, but we stat geeks know that a point estimate just isn’t good enough. We want probability intervals that reflect the uncertainty in the data.
I found a chart that lets me know the number of babies born to “spontaneous labor” by each week of pregnancy, but I’m interested in more precision than just the week. I converted the data to days instead of weeks (for example, week 40 starts on day 280 and runs through day 286), and here it is:
One of the more overlooked areas of statistics is reliability, a field that was originally intended to estimate when parts would fail but has much broader applications than that. If you ever have ranges of values in which your data fall, rather than exact values, reliability is the tool for you.
Data such as these, where we have a count of occurrences within a range of values (common when things are reported in whole days and weeks) but know the real distribution is on a continuous scale, are known as “arbitrarily censored data.” In this case I’d like to know the odds of the baby being born by certain days, which requires me to first find the distribution of the data. To do this I go to “Stat > Reliability Survival > Distribution Analysis (Arbitrary Censoring) > Distribution ID Plot” in Minitab software and complete the dialog like this:
Based on the excellent fit on the probability plot and high correlation coefficient, I’m going to use the three-parameter Weibull distribution for my analysis:
So from our original categorized data, we now have a continuous distribution to work with.
To learn some more about what this distribution means for when to expect a baby, I go to “Stat > Reliability/Survival > Distribution Analysis (Arbitrary Censoring) > Parametric Distribution Analysis” and complete the dialog like this:
I also click on “graphs” and choose to show a “cumulative failure plot.”
First I look at the “Characteristics of Distribution” table from the session window:
So what can I learn from this table?
• The mean (aka mean time to failure or MTTF)—remember these terms were created for part failures and not childbirth—tells me that on average, babies are born almost exactly at 280 days or 40 weeks.
• The median is about 280.5 days, so about 50 percent of children are born by half a day past 40 weeks.
• The first quartile and third quartile tell me by which day 25 percent and 75 percent of babies are born, respectively, so the “middle 50 percent” of babies are born between days 274.5 and 286. For my wife and me, day 280 is May 19, that means we have a 50-percent chance of the baby being born between Sunday the 13th and Friday the 25th.
Minitab software gives a “table of percentiles” in the session window, but I prefer to use the cumulative failure plot and again want to remind readers that the term “failure” is associated with a part or product failing and not a child being born.
This graph above plots the day of pregnancy on the x-axis, and the percentage of babies born by that day on the y-axis. So for a given day—such as day 285—we can find the corresponding point on the line and read that a little more than 70 percent of babies are born by that day (our first child was five days late). We could also look at the cumulative probabilities of days 280 and 281 (47.47% and 52.14%, respectively) to find that the odds of the baby being born at some point on the due date are about 4.67 percent.
Now that we have used what was originally very categorized data to form a continuous distribution, we could answer many questions more precisely, such as:
• When should a relative arrive on a seven-day stay to have the greatest chance of being there for the birth? (May 17)
• What are the odds of the baby being born on a weekend? (27.7%)
• What are the odds of the baby being born on her great-grandmother’s birthday, which is May 14? (3.4%)
• What should we name her? (Actually parametric distribution analysis can’t answer that, but it’s still a pretty great tool.)