UNITED STATES DISTRICT COURT FOR THE SOUTHERN DISTRICT OF NEW YORK
January 10, 1975
James C. Jones et al., Plaintiffs
New York City Human Resources Administration et al., Defendants. Dorothy Williams et al., Plaintiffs v. New York City Human Resources Administration et al., Defendants
The opinion of the court was delivered by: LASKER
The Human Resources Administration (HRA), a "super-agency" of the City of New York, was created in 1966 in order to coordinate and administer the varied city programs dealing with poverty and social services. Plaintiffs in these two consolidated actions challenge five civil service examinations for positions in the Human Resources Specialist (HRS) Series. They claim that the examinations had a discriminatory impact on Blacks and Hispanics and are not job-related. The named plaintiffs and the class they seek to represent are Black and Hispanic persons
who took and failed one or more of the five examinations challenged here. They seek (1) a declaration of the unconstitutionality of the examinations; (2) an injunction against appointments from the lists based on the results of the examinations; (3) an injunction requiring the creation of constitutionally adequate selection procedures for the positions in question and (4) an injunction requiring the permanent appointment of those presently serving as provisional employees to the positions they now occupy. Suit is brought under 42 U.S.C. §§ 1981 and 1983. Jurisdiction is based on 28 U.S.C. § 1343(3) and (4), and the Fifth and Fourteenth Amendments.
The Jones plaintiffs challenge examinations Nos. 1631 and 2013, the promotional and open competitive examinations for the position of Supervising Human Resources Specialist (Sup. HRS). The Williams plaintiffs attack the constitutionality of the open competitive examination (No. 1097) for Human Resources Specialist (HRS) and both the promotional and open competitive examinations for Senior Human Resources Specialist (Sr. HRS) (Nos. 1626 and 1099). By earlier orders the city has been preliminarily enjoined from making appointments based on any of the examinations.
Trial of the issues in Jones has been completed. By stipulation, the parties have supplemented the record developed in Jones to enable the court to decide the merits of Williams.
Cases of this type, and these suits in particular, involve a prodigious amount of factual matter. Accordingly, we have so far as possible restricted the text of this opinion to substantive discussion, and made extensive use of footnotes for other material.
The present suits follow in the wake of several recent cases in this Circuit involving civil service examinations alleged to have a disparate impact on minority applicants. See, e.g., Vulcan Society v. Civil Service Commission, (hereafter "Vulcan"), 490 F.2d 387 (2d Cir. 1973), aff'g 360 F. Supp. 1265 (S.D.N.Y. 1973); Bridgeport Guardians, Inc. v. Bridgeport Civil Service Commission, ("Bridgeport Guardians"), 482 F.2d 1333 (2d Cir. 1973) aff'g in part and rev'g in part, 354 F. Supp. 778 (D. Conn. 1973); Chance v. Board of Examiners, ("Chance"), 458 F.2d 1167 (2d Cir. 1972) aff'g 330 F. Supp. 203 (S.D.N.Y. 1971); Kirkland v. N.Y. State Dep't of Correctional Services, ("Kirkland"), 374 F. Supp. 1361 (S.D.N.Y. 1974).
The ground rules established in those decisions require plaintiffs to make a prima facie showing that the examinations have a "racially disproportionate impact," Vulcan, 490 F.2d at 391, Chance, 458 F.2d at 1175-1176; see also Castro v. Beecher, ("Castro"), 459 F.2d 725, 732 (1st Cir. 1972). Upon such a showing the burden shifts to the defendants to establish that the challenged examinations are job-related, Vulcan, 490 F.2d at 391. If it is demonstrated that disparate examination performance results from the candidates' relative qualifications for the job, rather than their race, the examinations are constitutionally adequate, in spite of their racially disparate impact. Griggs v. Duke Power Co., 401 U.S. 424, 28 L. Ed. 2d 158, 91 S. Ct. 849 (1971), Chance, 330 F. Supp. at 214. The burden on defendants is "a heavy one," Chance, 458 F.2d at 1176, Guardians, 482 F.2d at 1337, but is discharged if they "come forward with convincing facts establishing a fit between the qualification and the job." Vulcan, 490 F.2d at 393 quoting with approval Castro, 459 F.2d at 732. The defendants are not required to prove that no alternative methods of selection were available to them; the critical question is whether the challenged procedure is constitutionally sound, not whether a better one could have been devised. Castro, 459 F.2d at 733, Vulcan, 490 F.2d at 393.
A. As in earlier suits, plaintiffs base their prima facie case on statistics provided by defendants as to the race of passing and failing candidates. However, as to three of the five examinations in question, the data is incomplete because HRA does not keep records of the race of candidates who were not HRA employees at the time they took an examination. Neither side suggested or undertook, and the court did not order, a survey to determine the race of those not identified in HRA's records.
Accordingly, as to 2013 the ethnicity of only 51% of the candidates is known; for 1097 and 1099 the figures are 54% and 60% respectively. The available statistics are set forth in the chart below:
Challenged Exam No. 1631 (Sup. HRS) (Prom.)
Pass Fail Total % Passing
Blacks 12 57 69 17%
Whites 28 24 52 54%
Hispanics 3 13 16 19%
Unknown 0 1 1
43 95 138
Challenged Exam No. 2013 (Sup. HRS) (Prom.)
Pass Fail Total % Passing
Blacks 39 208 247 16%
Whites 125 108 233 54%
Hispancis 3 17 20 15%
Others 5 3 8 63%
Subtotal 172 336 508
Unknown 183 303 486 38%
355 639 994
Challenged Exam No. 1626 (Sr. HRS) (Prom.)
Passed Failed Total % Passing
Blacks 11 51 62 18%
Whites 30 4 34 88%
Hispanic 3 5 8 37%
Other 0 2 2
44 62 106
Challenged Exam No. 1099 (Sr. HRS) (OC)
Blacks 56 165 221 26%
Whites 101 54 155 65%
Hispanic 8 22 30 27%
Subtotal 165 241 406
Unknown 90 187 277 32%
255 428 683
Challenged Exam No. 1097 (HRS) (OC)
Blacks 55 120 175 31%
Whites 59 56 115 51%
Hispanics 7 29 36 19%
Other 1 1 2
Subtotal 122 206 328
Unknown 78 200 278 28%
200 406 606
Putting aside for the moment the question of the representativeness of the available data for Examinations No. 2013, 1099 and 1097, the existing figures for all five examinations clearly indicate a disparity between the passing rates of white and minority candidates in excess of the 1.5 to 1 ratio which Chance held sufficient to establish a prima facie case. 330 F. Supp. at 210.
As to No. 1631, for which complete data is available, whites passed at a rate of approximately three times that of Blacks and Hispanics (54% to 17% and 19% respectively). On Examination No. 1626, for which the data is also complete, whites passed at a rate of about five times that of Blacks and 2.4 times the rate of Hispanics (88% to 18% and 37%, respectively).
The available figures for No. 1099 indicate that whites passed at 2.5 times the rate of Blacks and Hispanics (65% to 26% and 25%, respectively). As to No. 2013, whites passed at over three times the rate of Blacks and Hispanics (54% to 16% and 15%, respectively). Whites passed No. 1097 at a rate of 1.7 times that of Blacks and 2.7 that of Hispanics (51% to 31% and 19%, respectively). In sum, the figures for all five examinations indicate a disparate impact in favor of white candidates in excess of the 1.5 to 1 ratio that carried the day for plaintiffs in Chance.
Not surprisingly, defendants' most vigorously pressed objection to plaintiffs' prima facie case is the incompleteness of the data for Nos. 2013, 1099 and 1097. They argue that in the absence of complete and reliable data as to the race and passing rate of all, or substantially all, candidates on these exams, plaintiffs have failed to establish a prima facie case.
Defendants make the related argument that even assuming that the complete figures for Nos. 1631 and 1625 show substantial disparate impact as to those exams, the inconclusive nature of the statistics for the other three tests and plaintiffs' failure to challenge two additional exams in the HRS series whose results are inconclusive as to impact, indicate that the class did not fare significantly worse than whites on the HRS series on the whole, which defendants claim is the proper standard. For the reasons stated below, we find that neither argument has merit and that plaintiffs have established a prima facie case as to all five exams.
B. Although neither side produced a statistical expert at trial, experts for each of the parties have submitted affidavits as to the significance of the statistics in the record.
Plaintiffs' expert, Richard S. Barrett, is a nationally recognized expert in the field of testing. His affidavit sets forth certain computations using the Chi-Square Test, a generally accepted means of analyzing statistics of the type used in lawsuits such as this one. See Chance, 458 F.2d at 1173, 330 F. Supp. at 212. The purpose of the Chi-Square Test, as described by Barrett, is to determine whether a differential pass rate for two or more groups arises from a real difference in the performance of the groups, or from random differences arising from chance variation in the sample. (Barrett affidavit, dated November 4, 1974, Paragraph 4.) In this case, the Chi-Square Test attempts to determine whether the lower passing rates for Blacks and Hispanics resulted from mere chance, or from a factor related to race.
Barrett's computations, which are based on the complete statistics for Nos. 1631 and 1626, and on the available statistics for Nos. 2013, 1099 and 1097, are set forth below:
Examination Comparison Chi-Square
1631 Black v. White 17.81
Minority v. White 19.63
2013 Black v. White 76.40
Minority v. White 80.42
1626 Black v. White 44.60
Minority v. White 43.65
1099 Black v. White 59.40
Minority v. White 62.49
1097 Black v. White 11.49
Minority v. White 15.32
Barrett states that a Chi-Square of 6.64 will occur less than one time in 100 as the result of chance, and that conventional statistical tables do not include values as large as those shown in the chart "because their occurrence as chance events is too small to be taken seriously." (Barrett affidavit, Paragraph 6) Accordingly, as to Nos. 1631 and 1626, for which complete statistics are available, it is readily apparent that plaintiffs have established disproportionate impact not resulting from chance.
The question that is unresolved by the Chi-Square analysis set forth above is whether the data for the entire group of candidates on Nos. 2013, 1097 and 1099 would show the same results as Barrett calculated on the basis of the known candidates on those exams. On this question Barrett states:
"Strictly speaking such a determination can be made only if there is reason to believe that those whose identity is not known are a random sample of the total group. There is, of course, no way to make this determination. However, the size of the Chi-Square statistics reported above [which were computed on the basis of the known group only] is so great that those whose race or ethnicity is unknown would have to differ in an unrealistically large degree from those whose identity is known to lead to the conclusion that the tests are free from adverse impact." (Barrett affidavit, Paragraph 7).
Although we recognize that in cases such as this, we may walk through statistical mine fields, Barrett's conclusions do accord with common sense. On No. 2013, for example, for which the ethnicity of 51% of candidates is known (the HRA population)5a whites passed at over 3 times the rate of Blacks and Hispanics. We find it distinctly improbable that minority group members in the non-HRA (unknown) group would outperform non-HRA whites on the same examination to the extraordinary degree necessary to bring the overall passing rates for minorities and whites into rough parity. This conclusion is buttressed by Barrett's observation that Nos. 2013, 1097 and 1099 are "made up of items of the type on which Blacks and Hispanics generally do more poorly than whites." (Barrett affidavit, Paragraph 8.) Cf. Griggs, 401 U.S. at 430. We reach the same conclusion as to No. 1097, which whites passed at a rate of 1.7 that of Blacks and 2.7 times that of Hispanics. The ethnicity and pass-fail results of 54% of the candidates are known. Consequently, minorities in the non-HRA group would have to outscore non-HRA whites substantially on that examination to negate the strong showing of adverse impact. The same conclusion applies to Examination No. 1099 which whites passed at a rate of 2.5 that of minority candidates, and as to which the ethnicity of 60% of the candidates is known.
In sum, we find that the data of record meets the standard to establish a prima facie case as articulated by Judge Friendly in Vulcan :
"It may well be that the cited figures and other more peripheral data relied on by the district judge did not prove a racially disproportionate impact with complete mathematical certainty. But there is no requirement that they should. 'Certainty generally is illusion, and repose is not the destiny of man.' We must not forget the limited office of the finding that black and Hispanic candidates did significantly worse in the examination than others. That does not at all decide the case; it simply places on the defendants a burden of justification which they should not be unwilling to assume." 490 F.2d at 393.
The affidavit of defendants' statistical expert, Gus W. Grammas, is not inconsistent with our conclusions as to 2013, 1099 and 1097.
It states, and we agree, that neither the precise racial make-up nor the pass-fail rates of the non-HRA groups in Nos. 2013, 1097 and 1099 can be statistically inferred from the data about the HRA groups whose ethnicity and pass-fail rates are known because the known group (HRA employees) is not a random or representative sample of the unknown (non-HRA) employees. (Grammas affidavit, dated November 7, 1974, Paragraphs 6-7, 20-23.) But that fact is not inconsistent with our conclusion. Strictly speaking, the precise racial make-up of the unknown groups in Nos. 2013, 1099 and 1097 is irrelevant; the issue rather is whether there is any realistic likelihood that non-HRA minority candidates -- however many or few -- fared well enough in comparison to non-HRA whites to offset the startling imbalance in favor of whites among the known (HRA) candidates. We conclude there is no such likelihood.
Defendants' second attack on plaintiffs' prima facie case can be disposed of more easily. They claim that, notwithstanding plaintiffs' prima facie showing as to the five examinations challenged in this lawsuit, they should not be permitted to choose among the exams in the HRS series, challenging only those in which minorities performed worst. Neither the facts nor the law support defendants' argument.
Of the nine examinations in the HRS series, five are challenged here. Plaintiffs do not challenge the four other exams in the series; however, the results for three of these are of record: the Sr. HRS (MDT) open competitive exam (No. 1094), the HRS promotional exam (No. 1625), and the HRS (MDT) open competitive (No. 1095). The statistics for these are indicated in the chart below:
Sr. HRS (MDT) Open Competitive Exam No. 1094
Passed Failed Total % Passing
Blacks 18 41 59 31%
Whites 5 10 15 33%
Hispanics 7 15 22 32%
Other 0 1 1
Subtotal 30 67 97
Unknown 18 78 96 19%
48 145 193
HRS Promotional Exam No. 1625
Passed Failed Total % Passing
Blacks 13 36 49 27%
Whites 1 4 5 20%
Hispanics 0 4 4
14 44 58
HRS (MDT) Open Competitive Exam No. 1095
Passed Failed Total
Black 12 44 56
White 8 9 17
Hispanic 7 16 23
Subtotal 27 69 96
No ethic info 12 74 86
39 143 182
As to 1094 and 1625, it is evident that, although the results suggest roughly equal passing rates, the samples are too small to be valuable. As to No. 1094, if only one more white had passed, the passing rate for whites would rise from 33% to 40%; if two more whites had passed the rate would be 47%, as compared with a 31% rate for Blacks. These figures (47% to 31%) compare favorably with the 1.5 to 1 ratio in Chance. As to No. 1625, if only one more white had passed, the rate would be 40% for whites, as compared with 27% and 0% for Blacks and Hispanics, respectively. The available figures show that as to 1095, only 17 whites took the exam as compared to 56 Blacks and 23 Hispanics, but in any event whites passed at over two times the rate of Blacks and 1.5 the rate of Hispanics (47% to 21% and 30%).
Comparison of the aggregate available figures for the five exams under challenge with the aggregate figures for all eight examinations demonstrates the shaky factual basis for defendants' argument.
AGGREGATE RESULTS ON 5 EXAMS
Passed Failed Total % Passing
Blacks 173 601 774 22%
Whites 343 246 589 58%
Hispanics 24 86 110 22%
As the chart indicates whites passed at nearly 3 times the rate of minorities when the challenged exams are considered in the aggregate.
When the results for the three examinations not challenged by plaintiffs are added into the aggregate computation, the overall pass rates are not significantly altered:
AGGREGATE RESULTS ON EIGHT EXAMS
Passed Failed Total % Passing
Blacks 216 732 948 23%
Whites 357 269 626 59%
Hispanics 38 121 159 24%
We regard these figures as sufficient proof that plaintiffs' class performed significantly worse than whites and that the disparity is not the result of chance.
In any event, defendants' argument that the plaintiffs should not be permitted to challenge only those exams whose results show disparate impact is invalid as a matter of law. In Vulcan, defendants challenged plaintiffs' statistical case because it was based on a single examination, which they claimed was insufficient to be meaningful. In rejecting the argument Judge Weinfeld observed:
"The consequence of relying upon one examination is only that any finding of discrimination and the relief to be granted will necessarily be restricted to the scope of the proof. The evidence presented was more than adequate to support a finding of discriminatory impact." 360 F. Supp. at 1271.
The observation applies with equal force in the case at hand.
As noted earlier, defendants have the burden of justifying the use of the challenged examinations by proving that they are job-related. Vulcan, 470 F.2d at 391, and that the differential impact indicated by the statistics results from variance in qualifications for the job, rather than race. Griggs v. Duke Power Co., supra, 401 U.S. at 430-431, (1971), Chance, 330 F. Supp. at 214. This burden is discharged if the city "[comes] forward with convincing facts establishing a fit between the qualification and the job." Vulcan, 490 F.2d at 393, quoting Castro, 459 F.2d at 732; see also Guardians, 482 F.2d at 1337, Chance, 458 F.2d at 1176.
A. Case law in this Circuit recognizes three methods for validating an examination as job-related: criterion-related validation, construct validation and content validation. See, e.g., Vulcan, 490 F.2d at 394-96; Guardians, 482 F.2d at 1337-1338 and 354 F. Supp. at 788-789; Kirkland, 374 F. Supp. at 1370-1372. Criterion-related validation is a process by which relative performance on an examination is compared with relative performance on the job, either by "pretesting" a group of current employees or by subsequent on-the-job evaluation of successful candidates. See Vulcan, 360 F. Supp. at 1273. This method is considered more effective than other validation methods because it clearly establishes the degree of correlation between successful examination performance and successful job performance. Guardians, 482 F.2d at 1337 and 354 F. Supp. at 788. However, no case in this Circuit has held that a showing of criterion validity is required for defendants to satisfy their burden of proving job-relatedness, if the test can be shown to have been validated by another method. See Vulcan, 490 F.2d at 395.
The second recognized method of validation is construct validation, which involves "identification of the general mental and psychological traits believed necessary to successful performance of the job in question," Vulcan, 490 F.2d at 395, and the construction of an examination which tests for these qualities. Defendants do not contend that they validated the examinations by either criterion validation or construct validation.
Consequently, defendants' proof on the issue of job-relatedness hinges on whether the examinations are "content valid." Judge Weinfeld described this method in Vulcan :
"An examination has content validity if the content of the examination matches the content of the job. For a test to be content valid, the aptitudes and skills required for successful examination performance must be those aptitudes and skills required for successful job performance. It is essential that the examination test these attributes both in proportion to their relative importance on the job and at the level of difficulty demanded by the job." 360 F. Supp. at 1274. See also, Vulcan, 490 F.2d at 395; Guardians, 482 F.2d at 1338; Kirkland, 374 F. Supp. at 1372.
Cases in this Circuit have recognized the difficulties of applying sophisticated, and unfamiliar, principles of psychometrics to jobs about which the trier of fact has only superficial knowledge, and have dealt with the problem on a pragmatic basis. Judge Friendly's approving description of the approach Judge Weinfeld used in Vulcan sets the tone:
"Instead of burying himself in a question-by-question analysis of Exam 0159 to determine if the test had construct or content validity, the judge noted that it was critical to each of the validation schemes that the examination be carefully prepared with a keen awareness of the need to design questions to test for particular traits or abilities that had been determined to be relevant to the job. As we read his opinion, the judge developed a sort of sliding scale for evaluating the examination, wherein the poorer the quality of the test preparation, the greater must be the showing that the examination was properly job-related, and vice versa. This was the point he made in saying that a showing of poor preparation of an examination entails the need of 'the most convincing testimony as to job-relatedness.' The judge's approach makes excellent sense to us. If an examination has been badly prepared, the chance that it will turn out to be job-related is small. Per contra, careful preparation gives ground for an inference, rebuttable to be sure, that success has been achieved. A principle of this sort is useful in lessening the burden of judicial examination-reading and the risk that a court will fall into error in umpiring a battle of experts who speak a language it does not fully understand. See Chance, supra, 458 F.2d at 1173." 490 F.2d at 395-396.
B. The initial step in the construction of a content-valid examination is the "job analysis." Its purpose is to identify the knowledge, skills and abilities required for performance of the job. Such an analysis involves the isolation of the qualities most critical to job performance, an evaluation or weighing of their importance relative to one another, and a determination of the level of competence required as to each of them. Vulcan, 360 F. Supp. at 1274, Kirkland, 374 F. Supp. at 1373. Obviously, the adequacy of the job analysis is crucial to a content-valid examination; unless the analysis accurately describes the "content" of the job, the content of the examination based on it is likely to be seriously distorted.
Accordingly, for defendants to sustain their burden of proof as to the content validity of the examinations in issue, they must show,
"not only that the knowledge, skills and abilities tested for . . . coincide with some of the knowledge, skills and abilities required successfully to perform on the job, but also that 1) the attributes selected for examination are critical and not merely peripherally related to successful job performance; 2) the various portions of the examination are accurately weighted to reflect the relative importance to the job of the attributes for which they test; and 3) the level of difficulty of the exam matches the level of difficulty for the job." Kirkland, 374 F. Supp. at 1372.
Leonard Rosenberg prepared the job analysis for the five challenged examinations. He has been employed since 1956 in the Department of Personnel in New York City, and has had varied experience in the personnel field, primarily in the area of classification of civil service titles. Since 1970 he has been assigned to the Bureau of Examinations, where he is responsible for all personnel matters relating to the HRS Series of Titles. Although his earlier work had involved a large number of "desk audits" to determine whether a particular city employee was performing duties appropriate to his title (Tr. 177), the job analysis for the challenged examinations
was the first he had undertaken for purposes of exam construction. (Tr. 230)
Rosenberg's job analysis for the title of Sup. HRS
was based on a series of visits to various HRA agencies and work locations during the period October 1-7, 1971. At the time of the visits, there were about 180 Sup. HRS provisionals scattered throughout HRA. (Tr. 374) However, to safeguard against leakage of information relating to the forthcoming exams, and pursuant to city policy, Rosenberg advised HRA officials that he wished to confer only with permanent Sup. HRS's. (Tr. 193) At the time of the audit there were seven permanent Sup. HRS incumbents, of whom Rosenberg interviewed four (Tr. 226). Beyond that he spoke to several HRA employees in higher titles and observed an unspecified number of HRA employees as they went about their work. (Tr. 224)
Defendants' Exhibit F is the two-page written job analysis which Rosenberg prepared on the basis of his visits.
Rosenberg also prepared a one page test plan
(Defendants' Exhibit G1) based on the job analysis. The test plan lists eight areas to be covered on the Sup. HRS examination which are substantially identical to the eight knowledges and skills identified in the job analysis.
For the reasons discussed below, we find that the job analysis and test plan prepared by the city fall short of professional standards as delineated by the testimony and applicable case law. First, the evidence establishes that Rosenberg's visits and interviews at work locations of HRA did not cover the full spectrum of tasks performed by those in the title of Sup. HRS. It is undisputed that Rosenberg did not interview people in most of the sub-agencies of HRA including, for example, the Agency for Child Development and the Youth Services Administration (Tr. 235, 381; see Defendants' Exhibit F). Consequently, the job analysis cannot -- and on the face of it does not --
purport to be a complete profile of the job title. Without question, the city guidelines which prevented Rosenberg from interviewing provisionals in the course of his visits to HRA made a thorough job analysis nearly impossible. Of close to 200 employees in the title of Sup. HRS (about 180 provisionals and 7 permanent incumbents), the city's policy authorized Rosenberg to speak only to the seven permanent incumbents, and, in fact, he spoke to only four of these.
The evidence establishes that a sample of four employees in the Sup. HRS title is insufficient to provide a full view of the job of Sup. HRS. All the witnesses agreed that it is difficult to imagine job titles broader than those in the HRS Series.
(See, e.g. Tr. 275, 381-382, 393, 421, 425, 497-498, 501-502) Employees holding the generic title Sup. HRS may do jobs ranging from payroll and purchasing of supplies to public relations or program planning. (Defendants' Exhibit F) in twenty different kinds of programs (Tr. 381). Moreover, although there are a number of small clusters of Sup. HRS's who do approximately the same kind of work, there are no large sub-groups capable of easy categorization. (Tr. 382) In view of the variety of HRA activities, and work tasks associated with them, an insufficient interview sample would seriously distort the overall picture
Defendants called two witnesses as to the adequacy of the job analysis. Everett Williams is a psychologist employed by the Educational Testing Service in Princeton, New Jersey. Mildred Katzell is a psychologist specializing in the field of measurement and evaluation. Their conclusion that the job analysis was professionally adequate (Tr. 409, 498) must be viewed in light of their criticism of the small sample and the restrictive city policy which caused it. Katzell conceded that "it might have been desirable to have a larger sampling of the total gamut of the types of positions that are circumscribed by this title." (Tr. 433-34) Williams testified that interviewing four out of seven permanent Sup. HRS's was "very adequate in terms of a sample percentage," but that "you typically would want to have more observation points if there are these wide differences [in tasks performed], usually between 10% to 25% of the total class." (Tr. 498-499) Williams made it clear that his opinion that the job analysis was adequate might change if the city's restrictions on interviewing provisionals were lifted. (Tr. 499-500) But the professional and legal inadequacy of a job analysis is not cured simply because there is an extrinsic explanation for it, such as the city's policy here. In view of the wide variety of tasks performed by those in the title of Sup. HRS and the large number and varied type of sub-agencies within HRA, it is reasonable to assume that an adequate sample would approach the upper end of the 10-25% spectrum mentioned by Williams (Tr. 581-582). Accordingly, we regard the 2% sample used in the job analysis for Sup. HRS (four of a group of about 180) as critically insufficient.
Defendants argue that Rosenberg's prior experience with HRA matters, his interviews with employees in titles higher than Sup. HRS and his observations of many other employees whom he did not interview cure any deficiency in the sample. We disagree. Assuming that Rosenberg's prior experience in HRA matters gave him a general knowledge of the Sup. HRS title, his private knowledge about HRA, however extensive, cannot have been of value to persons constructing the examination unless committed to writing in the job analysis. See Kirkland, 374 F. Supp. at 1373-1374. But that is not the case here; on its face, the written job analysis purports to be based only on information gathered from Rosenberg's visits to HRA pursuant to his assignment to prepare the particular examinations in issue.
In any event, the value of Rosenberg's prior experience in personnel matters relating to HRA ought not be overestimated since, as noted earlier, it occurred primarily in the area of classification of job titles (Tr. 176-179, 218-222). Such work demands substantially different methods than those required for a thorough job analysis to be used as the foundation of an examination (Tr. 230-231, 570-571, 577-579). Moreover, although the job analyses Rosenberg prepared for the three job titles in issue were the first he had done for an examination (Tr. 230), he spent only seven days to prepare the three job analyses. (Tr. 224) Rosenberg himself testified that a thorough job audit normally requires from a few days to two weeks (Tr. 229). Finally, although it was intended that Rosenberg actually prepare the examinations themselves, he was reassigned to another position and the task fell to Helene Willingham. Although Rosenberg may have known considerably more about the job of Sup. HRS than the written job analysis discloses, Willingham never secured the benefit of his knowledge. On the contrary, as it turned out Rosenberg took no part in the construction of the exam and did not review it before it was administered to insure that it matched the job profile (Tr. 201, 205, 210, 247).
Nor do the supplemental interviews of four employees in higher titles cure the inadequacy of the sample. Plaintiffs' expert, Felix Lopez, testified that such interviews could not substitute for the perceptions of those holding the job to be tested, and that an adequate sampling of both categories of employees was necessary to a proper job analysis. (Tr. 574-575)
The relatively casual approach which characterizes the sampling of the Sup. HRS population is evident in the written job analysis itself. A critical step in any job analysis is the largely inferential one of breaking down an observed task into a set of component skills, abilities and knowledge (Tr. 521-522; 575-576) or, as Williams put it, a "going from some observation to a verbal description which is understandable to some set of people who will be involved in the act of putting together the [test]." (Tr. 525)
Lopez testified that he would not be able to construct a content-valid test on the basis of the job analysis and test plan. (Tr. 585) His criticisms were sensible and persuasive. First, the descriptions of both "typical tasks" and "knowledge and skills"
are too ambiguous and unrefined to give any real idea about what the job involves and what is required to perform it. (Tr. 570, 572, 576, 580, 585-586; see also 518-519) The job analysis does not indicate what level of proficiency is required as to each skill, a critical defect. (Tr. 576-577, 580, 588) It does not explain how or why the skills in the test plan were weighted as they were, also a serious defect (Tr. 585-588); or even suggest -- apart from the "examples of typical tasks" -- that the jobs held by those in the title of Sup. HRS can be different from one another and require different abilities (Tr. 469-71).
Neither Rosenberg nor defendants' experts satisfactorily refuted the existence of these defects in the job analysis. Rosenberg testified that in order to determine what knowledge or skills were essential to performance in the job, he "analyzed each and every one of the functions, activities, jobs, duties, and determined that they fit into certain common areas . . . and as a result came up with the eight categories that became, in effect, the test plan." (Tr. 199-200) These were weighted, Rosenberg testified, according to "the incidence of the performance of specific types of duties and the essential importance, the criticality of the types of decisions that would impinge on knowledge or lack of knowledge in the specific areas." (Tr. 201-202)
Although Rosenberg correctly stated the general procedure to be followed, he did not detail how it was applied to the job analysis for Sup. HRS (but see Tr. 257, 407-408, 587-588).
However, it is evident that no matter how well he applied the procedures, they could not have resulted in a thorough analysis. A determination that "each and every one of the functions, activities, job, duties" of a Sup. HRS fits into eight common areas is necessarily flawed where, as here, much of the spectrum of tasks remains uninvestigated. Similarly, a weighting of the relative importance of "knowledge and skills" which is based on the relative frequency of their occurrence requires a sufficient sample to insure reliable measurement. Yet, as the written job analysis frankly acknowledges, Rosenberg listed only " examples of typical tasks".
C. Helene Willingham prepared the examinations for the title of Sup. HRS.
She has been with the Examining Division of the Department of Personnel since 1959 and has prepared over 200 examinations, most of them in the social services area (Tr. 259-60). In preparing the examination for Sup. HRS, she consulted the notice of examination, and the job analysis and test plan. However, the fact that she supplemented these with her own knowledge about the job title and ideas about what ought to go into the exam, (Tr. 280) and consulted the director of the Training Staff of HRA as to functions and skills involved in the Sup. HRS title (Tr. 279-280) suggests that she did not find Rosenberg's work sufficiently complete. For example, she felt that the area of the test plan relating to "machine, equipment and supply purchase, usage and management" was weighted too heavily because "it might have been too particular and too many people wouldn't know anything about it." (Tr. 291-292)
The raw material for the individual questions or "items" on the examination came from several sources. Willingham consulted "all sorts of HRA procedural material and newsletters" distributed to component agencies of HRA, various professional journals and government publications she thought relevant, and newspaper clippings and various texts. (Tr. 264-65) She also made extensive use of the training materials used in a course given by HRA to prepare candidates for the Sup. HRS exam. (Tr. 273-274)
On reviewing the test plan and job audit, Willingham determined that some questions ought to be constructed by her staff and others ought to be referred to experts in particular fields. (Tr. 270) Accordingly, she invited two outside experts to submit questions dealing with community organization and community relations, supervision and current events, of which nine or ten were used on the examination. (Tr. 272-273) There is no indication that Willingham prescribed any requirements as to the level of proficiency or areas of concentration the questions should test. Indeed, it would have been difficult for her to do so, since neither the job analysis nor test plan provides any basis whatever for such refinements.
Indeed, a similar problem exists with regard to the approximately seventy questions prepared by Willingham and her staff.
Although she stated that "it was certainly possible to decide on certain critical knowledges" based on Rosenberg's materials (Tr. 291), Willingham did not explain how she did so, or decided such matters as the degree of difficulty of the questions. However, that is beside the point, since the fact that the job analysis and test plan needed refinement by Willingham suggests that they were inadequate to begin with.
Defendants' experts were unenthusiastic in their appraisal of the examination. Although Williams testified that the test was "reasonably well put together," both he and Katzell expressed reservations about the quality of the item construction. Katzell observed that it was "quite evident" that many of the myriad rules governing the proper phrasing of questions and multiple choice options on the test were violated (Tr. 411-412, 503).
Lopez concluded the test was "poorly constructed" (Tr. 600). In particular he noted that many questions appear to have more than one correct answer -- even to an expert in the field; while others suggest the proper answer to a test-wise candidate who may not in fact "know" the answer. Indeed, the record suggests, if it does not establish, that the exam favored those with formal education, although only minimum educational requirements were imposed on candidates. (Tr. 590-597; 600-605; 621-622, see also Tr. 297-298, 312-316, 550-553, Barrett affidavit, Paragraph 8)
D. The evidence as to the inadequate manner and method of preparation of the job analysis and the examination creates the "rebuttable inference" that the examination is not job-related. Vulcan, 490 F.2d at 395-396. Although in cases of this type the primary emphasis is on the validity of the methods used in creating the examination rather than the independent validity of the end product, Kirkland, 374 F. Supp. at 1373, the opinion testimony as to the content-validity of the exam itself confirms our conclusion that defendants have not shown the examination to be job-related.
Harold Yourman, Director of Labor Relations at HRA, has been with the Agency since 1967. Although he observed that the exam "delves into the agency, HRA, [and] covers the full spectrum of HRA" (Tr. 362-363, 395) and is generally related to the position (Tr. 363), he expressed reservations about the substantial number of questions on supervision (Tr. 341-342, 391, 395) and conceded that the exam was not directly related to his earlier duties as a provisional Sup. HRS (Tr. 395-396).
Katzell is concededly not well acquainted with the content of the job (Tr. 427) and her conclusion that the test "appears to have content-validity" (Tr. 479-80, 474-477) must be viewed in that light. She observed, as is obvious, that the questions on reading comprehension, vocabulary and graph interpretation related to those areas on the test plan (Tr. 427). However, these were the very areas that Willingham considered eliminating from the test, because of their possibly discriminatory bias (Tr. 298, see also Tr. 292-300) and which Lopez particularly criticized (Tr. 608-612). As to other areas of the test, such as that dealing with knowledge of the constituent agencies of HRA, Katzell testified that it would be "desirable" or "appropriate" to have such knowledge (Tr. 475-477) but did not suggest it was critical.
Like Katzell, Williams stopped short of stating that the test was content-valid, observing only that the procedures used to construct the test were consistent with content-validity (Tr. 504) and, somewhat tautologically, that successful performance on the test certifies that a candidate possesses the particular knowledge being tested for (Tr. 546-549; see also Tr. 247, 620). Indeed, none of the witnesses was willing to say that the test was useful for selecting those who were likely to perform well on the job, which as we view the matter is the only reason for administering it. (See Tr. 247-248, 390, 449, 546, 569-70, 590, 605, 612, 620-621.)
E. As noted above, direct testimony regarding the manner of preparation and job-relatedness of the five examinations under challenge was for the most part limited to the two examinations (promotional and open competitive) for the position of Sup. HRS under attack in Jones ; the parties stipulated that the same testimony would be given as to the examinations involved in Williams. Although our findings as to the Sup. HRS exam require a finding that the other three exams (the open competitive exam for HRS and Senior HRS and promotional exam for Senior HRS) are not job-related, a comparison of the job analyses, test plans and examinations viewed as a group fortifies this conclusion.
For example, the job analyses and test plans for HRS and Senior HRS (Plaintiffs' Exhibits 8 and 9) identify "knowledge and skills" and "areas to be covered" that are almost identical to those listed in the analysis and plan for Sup. HRS; the relative weights assigned to the areas of the test are also substantially identical for all three titles. (Tr. 616-619) Not surprisingly, therefore, the tests based on these documents were very similar; Willingham, who prepared them, stated that the exams for HRS and Sr. HRS had forty questions in common (of a total of eighty), as did those for Sr. HRS and Sup. HRS. However, she sought to make the other forty questions on each test somewhat more difficult than those on the next lower level (Tr. 318-320).
The problem with Willingham's approach is that the job analyses and test plans provide no basis for rational differentiation between the three levels to be tested. (Tr. 618-620, 662-663) The fact that the materials prepared for the three titles do not distinguish to any appreciable extent between the nature of the jobs or the level of competence needed to perform them confirms our conclusion that the examinations were not carefully prepared and, consequently, not job-related.
F. Although what we have said so far decides the case, it is necessary to comment further on certain factors which set the present suit somewhat apart from other cases of this type and which, as defendants view the matter, support a finding of job-relatedness.
In Chance, Vulcan, Bridgeport Guardians and Kirkland, the public agencies involved either had prepared no job analysis at all, or pieced one together from pre-existing documents of doubtful value for purposes of exam preparation. Moreover, with the possible exception of Chance, which involved supervisory positions in the New York City school system, the cases deal with positions (policeman, fireman and correction sergeant) whose component skills and tasks are relatively easy to define. This combination of factors somewhat simplified the determination as to job-relatedness in earlier cases.
The present suit does not readily fit into the mold established in earlier decisions. It is evident that the "job" of Supervising HRS is not a job in the same relatively restrictive sense as the job of policeman or fireman. Indeed, as Rosenberg acknowledged, jobs performed by individuals in the title of Supervising HRS may have nothing in common with each other except salary and general level of responsibility (Tr. 243). Not surprisingly, therefore, defendants argue that the exams in issue pass constitutional muster even though they are not demonstrably related to a definable "job". They contend that because HRA cannot in fact predict the type of work to which an individual might be assigned, the examinations were designed to test mastery of skills which Rosenberg found to be basic to all jobs performed by Supervising HRS's (see Tr. 199, 202, 240-243, 251-252, 275, 546).
The weakness of this argument is that defendants have not established either that there is in fact such a core of skills common to all jobs within the extraordinarily diffuse titles in issue (see Tr. 393-395, 421-422, 470-471) or that Rosenberg successfully identified them. Indeed, the evidence suggests the contrary. To cite the most obvious example, the examination for Sup. HRS involved twenty to twenty-five questions (of a total of eighty) relating to supervisory skill but, as noted earlier, only 60% to 65% of those in the title actually have supervisory responsibility (Tr. 391, 478).
Moreover, the ten questions on the promotional exam for Sup. HRS relating to the internal organization of HRA were understandably attacked as peripheral to the duties of many individuals in the title (Tr. 474); it is difficult to see how such questions can be considered essential to all those in the title in view of the fact that the open competitive exam for Sup. HRS omitted these very questions in favor of more general questions dealing with "Functions of Relevant Public and Private Agencies" (Tr. 278-279).
Indeed, if defendants are correct that the examinations tested skills common to all jobs within the title and were job-related, it is nearly past understanding why substantial numbers of provisionals at all three levels failed the examination; and why the overall pass rates for the open competitive exams for Sup. HRS and HRS were higher than on the promotional exams for the same titles.
(See Tr. 248, 327-328, 392-393, 450-457, 615-616) Many of the provisionals who failed the exams in issue had been in their jobs for two years or more and, significantly, the only evidence in the record indicates that they were highly effective performers.
Despite the fact that the existence of large numbers of provisionals who had taken the tests provided a unique opportunity for a concurrent validation study, (Tr. 529 ff.) defendants have come forward with no evidence to suggest that provisionals were doing an inadequate job.
Plaintiffs seek and are entitled to declaratory and injunctive relief. Accordingly, Examinations 2013, 1631, 1097, 1099 and 1626 are declared unconstitutional and defendants are enjoined from making appointments from eligible lists based on their results, and from terminating the provisional appointments of those in plaintiffs' proposed class to their respective positions solely because they failed the examinations.
In addition, plaintiffs seek affirmative relief (1) requiring defendants to appoint an unspecified number of members of plaintiff class to the three positions "based on their experience, education and qualifications," including evaluation of their performance as provisionals; (2) directing defendants to develop and administer either written examinations in accordance with the EEOC guidelines,
or some other selection process which is non-discriminatory and job related; (3) establishing a temporary procedure for selection to the three positions while new permanent procedures are developed or, alternatively, (4) directing the permanent appointment of the present provisionals to the jobs they now hold.
Although the invalidation of the five examinations in issue authorizes the court to fashion appropriate affirmative relief, see Louisiana v. United States, 380 U.S. 145, 154, 13 L. Ed. 2d 709, 85 S. Ct. 817 (1965), Guardians, 482 F.2d at 1340, the proper course is to defer decision as to the nature and extent of affirmative relief to enable defendants to respond to proposals set forth in plaintiffs' post-trial brief. Accordingly, defendants are directed to file a memorandum on these issues within ten days of the filing of this Opinion, with plaintiffs to submit any reply within one week thereafter.
There remains the matter of plaintiffs' motion for a class action determination in both Williams and Jones. Plaintiffs' proposed class is composed of Blacks and Hispanics who took and failed one or more of the five challenged examinations; or who took and passed an examination but scored too low to be initially appointed. Defendants have no objection to the grant of class status if the class is limited to those who failed an exam. However, although plaintiffs have satisfied the requirements of Rule 23, there is no need to designate a class; plaintiffs have requested only declaratory and injunctive relief, which will in any event benefit all members of the proposed class. See Vulcan, 360 F. Supp. at 1266-1267, note 1; Bridgeport Guardians, 354 F. Supp. at 783; 3B Moore, Federal Practice P 23.10-1 at 2768 (2d Ed. 1969). Accordingly, the motion is denied.
Plaintiffs' request for an award of reasonable attorneys' fees is denied. Although counsel fees were awarded in Kirkland, 374 F. Supp. at 1380-1382, they are not appropriate in the present suit. Kirkland involved an examination for the position of correction sergeant, whose preparation did not present the uniquely difficult problems involved in testing for the titles in issue here. Moreover, while in Kirkland there was an almost complete failure of proof on the issue of job-relatedness, we are impressed in the present case by the sincere efforts of Rosenberg and Willingham to construct tests in accordance with the stringent legal standards applicable in this Circuit, however inadequate the examinations proved to be.