An Empirical Distribution Function for Sampling with Incomplete Information

Abstract
For $i = 1, 2, cdots, n$, let $N_i$ independent trials be made of an event with probability $p_i$, and suppose that the probabilities $p_i$ are known to satisfy the inequalities $p_1 geqq p_2 geqq cdots geqq p_n$. Let $a_i$ denote the number of successes in the $i$-th trial, and $p^ast_i$ the ratio $a_i/N_i(i = 1, 2, cdots, n)$. Then the maximum likelihood estimates $ar{p}_1, cdots, ar{p}_n$ of the numbers $p_1, cdots, p_n$ may be found in the following way. If $p^ast_1 geqq p^ast_2 geqq cdots geqq p^ast_n geqq 0$, then $ar{p}_i = p^ast_i, i = 1, 2, cdots, n$. If $p^ast_k geqq p^ast_{k+1}$ for some $k(k = 1, 2, cdots, n - 1)$, then $ar{p}_k = ar p_{k+1}$; the ratios $p^ast_k = a_k/N_k$ and $p^ast_{k+1} = a_{k+1}/N_{k+1}$ are then replaced in the sequence $p^ast_1, p^ast_2, cdots, p^ast_n$ by the single ratio $(a_k + a_{k+1}) / (N_k + N_{k+1})$, obtaining an ordered set of only $n - 1$ ratios. This procedure is repeated until an ordered set of ratios is obtained which are monotone non-increasing. Then for each $i, ar p_i$ is equal to that one of the final set of ratios to which the original ratio $a_i/N_i$ contributed. It is seen that this method of calculating the $ar p_i, cdots, ar p_n$ depends on a grouping of observations which might very well appeal to an investigator on purely intuitive grounds. It seems of interest to note that it yields the maximum likelihood estimates of the desired probabilities. Particular examples of this situation are found in bio-assay [3] and in the proximity fuze problem discussed by M. Friedman ([1], Chapter 11). The last section is devoted to a consistency property of the maximum likelihood estimators.