UNITED STATES DISTRICT COURT FOR THE SOUTHERN DISTRICT OF NEW YORK
April 3, 1995
UNITED STATES OF AMERICA, Plaintiff, against ROBERTA and EILEEN STARZECPYZEL, Defendants.
The opinion of the court was delivered by: LAWRENCE M. MCKENNA
MEMORANDUM AND ORDER
A Grand Jury has charged Defendants with conspiring to steal paintings, sculpture, silver and jewelry from Ethel Brownstone, defendant Roberta Starzecpyzel's aunt. ( S1 93 Cr. 553.) The indictment charges, inter alia, that the Starzecpyzels removed over 100 items of artwork from Brownstone's apartment, delivered them to Sotheby's and Christie's auction houses, authorized the sale of the items, and directed the auction houses to forward the sale proceeds to Swiss bank accounts. The Defendants have also been charged with Interstate and Foreign Transportation of Stolen Moneys, Mail Fraud, Laundering of Monetary Instruments, and Tax Evasion.
On December 12, 1994, Defendants moved the Court, pursuant to Fed. R. Evid. 702 and 403, to exclude all expert witness testimony and other evidence relating to the alleged forgery of Ethel Brownstone's signatures on two documents dated June 3, 1985 and March 11, 1986. These writings had been examined by Gus Lesnevich, a forensic document examiner ("FDE"), who concluded that the challenged signatures were not genuine. Defendants argued that:
this alleged expertise [of forensic document examination] has never been validated as credible scientific or technical knowledge and does not comport with the requirements of evidentiary reliability articulated by the Supreme Court in Daubert v. Merrell Dow Pharmaceuticals, Inc., 125 L. Ed. 2d 469, U.S. , 113 S. Ct. 2786 (1993).
(Defs.' Mem. at 1-2.) In the alternative, Defendants requested a Daubert hearing on this issue pursuant to Fed. R. Evid. 104(a) and 702.
The Court granted Defendants' request for a Daubert hearing, which was held from February 28 through March 2, 1995. The government offered the testimony of Mary Wenderoth Kelly, an FDE employed by the City of Cleveland Police Forensic Laboratory, who currently serves as vice-president of the American Board of Forensic Document Examiners. Defendants offered the testimony of George Edward Stelmach, Professor of Exercise Science and Psychology at Arizona State University, and Michael J. Saks, Professor of Law and Psychology at the University of Iowa.
While the Court originally considered Daubert to be controlling as to the admissibility of the forensic testimony at issue -- relating to the comparison of a large body of genuine writings to claimed forgeries -- the Court now concludes that Daubert, which focuses on the "junk science" problem, is largely irrelevant to the challenged testimony. The Daubert hearing established that forensic document examination, which clothes itself with the trappings of science, does not rest on carefully articulated postulates, does not employ rigorous methodology, and has not convincingly documented the accuracy of its determinations. The Court might well have concluded that forensic document examination constitutes precisely the sort of junk science that Daubert addressed.
Yet, as distinguished from such discredited ventures as hedonic damage expertise,
or the Bendectin plaintiffs' statistical machinations,
forensic document examination does involve true expertise, which may prove helpful to a fact-finder. FDE expertise is not properly characterized as scientific, but as practical in character. In a nutshell, over a period of years, FDEs gradually acquire the skill of identifying similarities and differences between groups of handwriting exemplars. Such expertise is similar to that developed by a harbor pilot who has repeatedly navigated a particular waterway. The Court therefore treats forensic document expertise under the "technical, or other specialized knowledge" branch of Rule 702, which is apparently not governed by Daubert.
Such experts, who acquire their skills through practical training, apprenticeships, and long years of practice, are generally not expected to be able to articulate and justify the theoretical bases underlying their practice, to expose their techniques to a larger community of practitioners through peer-reviewed publication, or to subject those techniques to extensive testing. Although Daubert standards do not apply to such "skilled" witnesses, trial courts need not certify every individual accomplished at a particular task as an expert. The Federal Rules of Evidence have long imposed a "gatekeeping" function on trial judges to ensure relevance and helpfulness to the fact-finder before admitting expert witness testimony. Finding this standard satisfied for the proffered testimony, Defendants' motion to exclude the testimony is denied.
FDE testimony, while acceptable under Rule 702, does suffer from a substantial problem of prejudice, which is the subject of Fed. R. Evid. 403. The problem arises from the likely perception by jurors that FDEs are scientists, which would suggest far greater precision and reliability than was established by the Daubert hearing. This perception might arise from several sources, such as the appearance of the words "scientific" and "laboratory" in much of the relevant literature, and the overly precise manner in which FDEs describe their level of confidence in their opinions as to whether questioned writings are genuine. The Court has determined that the problem of prejudice can be sufficiently diminished with the use of procedural safeguards, including a pre-testimony jury instruction, that FDE testimony need not be excluded pursuant to Rule 403.
I. The Daubert Reliability Standard
Daubert was concerned with the standard for admitting expert scientific testimony in a federal trial. The Supreme Court held that the "general acceptance" test established 70 years ago in Frye v. United States, 54 App. D.C. 46, 47, 293 F. 1013, 1014 (1923), was superseded by Rule 702 of the Federal Rules of Evidence, which states:
If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise.
The Court first observed the "liberal thrust" of the Federal Rules of Evidence, shown, for example, by Rule 402, which states that "all relevant evidence is admissible, except as otherwise provided." 113 S. Ct. at 2793-94; see also Beech Aircraft Corp. v. Rainey, 488 U.S. 153, 169, 102 L. Ed. 2d 445, 109 S. Ct. 439 (1988) (noting the "general approach [of the Federal Rules] of relaxing the traditional barriers to 'opinion' testimony"). The Daubert Court then noted that nothing in Rule 702 establishes general acceptance in the relevant scientific community as an absolute prerequisite to admissibility.
Having rejected Frye, the Court looked to the language of Rule 702 itself to provide a new standard for admissibility. The Court focused primarily on the words "scientific" and "knowledge" which respectively imply "a grounding in the methods and procedures of science" and "more than subjective belief or unsupported speculation." 113 S. Ct. at 2795.
The Court wisely observed that scientific testimony need not be established to a certainty, as "arguably, there are no certainties in science." Id. The Court also cited with approval the brief of the American Association for the Advancement of Science and the National Academy of Sciences as Amici Curiae 7-8:
Science is not an encyclopedic body of knowledge about the universe. Instead it represents a process for proposing and refining theoretical explanations about the world that are subject to further testing and refinement.
The Court offered district courts guidance as to how to respond to a proffer of expert scientific testimony. Rule 104(a) of the Federal Rules of Evidence provides the starting point:
Preliminary questions concerning the qualification of a person to be a witness, the existence of a privilege, or the admissibility of evidence shall be determined by the court . . . . In making its determination it is not bound by the rules of evidence except those with respect to privileges.
Construing Rule 104(a) with Rule 702, the Court assigned trial judges the gatekeeping task of determining "at the outset, whether the expert was proposing to testify to (1) scientific knowledge that (2) will assist the trier of fact to understand or determine a fact in issue. This entails a preliminary assessment of whether the reasoning or methodology underlying the testimony is scientifically valid." 113 S. Ct. at 2796.
The Court expressed confidence in district judges to undertake this review, and declined to set out a "definitive checklist or test," providing only "general observations" as to the inquiry. Id. The first factor the Court identified was whether the scientific knowledge "can be (and has been) tested" -- or testability. "Scientific methodology today is based on generating hypotheses and testing them to see if they can be falsified; indeed, this methodology is what distinguishes science from other fields of human inquiry." Id. (quoting Green, Expert Witnesses and Sufficiency of Evidence in Toxic Substances Litigation, 86 Nw. U.L. Rev. 643, 645 (1992)). A closely related factor was whether particular scientific techniques had known or potential error rates. Id. (citing United States v. Smith, 869 F.2d 348, 353-54 (7th Cir. 1989) (reviewing studies of the error rate of spectrographic voice identification methods)).
The Court also noted that whether a theory or technique had been subjected to peer review and publication was significant, for these mechanisms increased the likelihood that methodological flaws would be detected. Trial courts should consider, however, that incorrect theories are sometimes published, while correct theories may be too new, too controversial or of too limited interest to be published.
The final factor identified by the Court was "general acceptance." While Frye had been rejected as too inflexible, a widely known technique able to attract only minimal support might "properly be viewed with skepticism." 113 S. Ct. at 2797.
While the Court characterized this inquiry as a flexible one, it noted that, given the lack of certainties in science the Daubert hearing should focus "solely on principles and methodology, not on the conclusions that they generate." Id. Finally, the Court observed that judges should be mindful of other applicable rules of evidence, particularly Rule 403, which permits the exclusion of relevant evidence "if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury." Id. at 2798.
The Supreme Court did not address such procedural aspects of Daubert hearings as which party has the burden of proof. Professor Berger says that "certainly . . . Daubert requires the proponent to bear the burden of demonstrating the technique's capacity to produce a reliable result." Federal Judicial Center, Reference Manual on Scientific Evidence 76 (1995). Berger cites to no authority for this proposition, however, and Daubert itself provides no support for it. Another commentator argues that "the proponent has the burden of establishing the preliminary facts, but then the burden shifts to the opposing party to show that the evidence should be excluded." Arvin Maskin, The Impact of Daubert on the Admissibility of Scientific Evidence: The Supreme Court Catches up with a Decade of Jurisprudence, 15 Cardozo L. Rev. 1929, 1936 (1994).
In the spirit of Daubert, this Court concludes that the inquiry conducted pursuant to Rule 104(a) represents primarily a legal question for the court, rather than a factual one for the parties. That is, while the proponent of the evidence may have the burden to come forward with factual evidence of such matters as the training received by FDEs, the question as to whether those facts suggest sufficient reliability is a purely legal question. See Paul C. Giannelli, Daubert: Interpreting the Federal Rules of Evidence, 15 Cardozo L. Rev. 1999, 2011 (1994) ("The court must make an independent assessment" of reliability. (emphasis supplied)).
Believing that Daubert controlled the admissibility of the challenged testimony, the Court scheduled an in limine Daubert hearing. See, e.g., United States v. Williams, 583 F.2d 1194 (2d Cir. 1978) (pretrial hearing established spectrographic voice analysis admissible as identification evidence), cert. denied, 439 U.S. 1117, 59 L. Ed. 2d 77, 99 S. Ct. 1025 (1979); Government of the Virgin Islands v. Penn, 838 F. Supp. 1054, 1073-74 (D.V.I. 1993) (in limine Daubert hearing established DNA profiling test as sufficiently reliable).
II. The Daubert Hearing
The hearing commenced with the testimony of Mary Wenderoth Kelly, an FDE with more than a decade of experience, currently employed by the City of Cleveland Police Forensic Laboratory. Ms. Kelly holds a Bachelor of Science degree in general ethics from Ohio State University and a juris doctorate from Cleveland Marshal College of Law. She currently serves as the vice-president of the American Board of Forensic Document Examiners (the "Board"), which certifies document examiners. Kelly described her duties on the Board:
My primary duties involve the testing committee. I am chairperson of the testing committee and oversee the tests that are administered to the different candidates. I also sit as the professional review committee chairperson thus handling any kind of professional complaints that would be [lodged].
(Transcript of Daubert Hearing ("Tr.") at 18.) Ms. Kelly also serves as a director for the American Society of Questioned Document Examiners and is a fellow in the questioned document section of the American Academy of Forensic Sciences.
Kelly first described the recommended two-year training program for FDEs. (Id. at 25.) She also described the process by which approximately 200 FDEs have achieved certification, and retained it through the acquisition of continuing education credits. (Id. at 280-81.)
Kelly testified that two basic principles underlie the field of forensic document examination. The first principle is that "no two people write exactly the same way." (Id. at 31.) For brevity the Court will refer to the differences between the handwriting of distinct writers as "inter-writer" differences. The second principle is that "no person will write exactly the same way when repeating." (Id. at 34.) These differences, referred to by FDEs, and hereinafter by the Court, as "natural variation," arise from the fact that human beings "lack the machine-like precision" to be able to reproduce even their signature the same way each time. (Id.)
As a matter of elementary logic, the fundamental issue in determining the reliability of forensic document examination must be the ability of FDEs to distinguish natural variation from inter-writer differences. Of course, where someone attempts to mimic the handwriting of another, there is a suppression of "ordinary" inter-writer differences, which makes the FDE's task more difficult. By characterizing natural variation and inter-writer differences as "principles underlying the field," Ms. Kelly may have suggested that these phenomena constitute solutions to the problem of forgery detection. It is clear, however, that the tension between these two sources of handwriting variation constitutes the problem itself, rather than a solution to the problem.
If forensic document examination does rely on an underlying principle, logic dictates that the principle must embody the notion that inter-writer differences, even when intentionally suppressed, can be distinguished from natural variation. How FDEs might accomplish this was unclear to the Court before the hearing, and largely remains so after the hearing.
One can speculate that while natural variation involves primarily quantitative differences (such as small changes in the "slant" of successive "t's" produced by a single individual), inter-writer variations involve qualitative differences (such as loops produced in a clockwise, rather than counter-clockwise, fashion).
Both the Defendants and the Court sought elucidation on this issue, but met with little success:
Q: [Osborn, questioned Documents 147 (2d ed. 1929), states that] a slight but persistent difference [in] slant in two writings of considerable length may be evidence that the writings are by two different writers, while a pronounced difference might be the result of intended disguise. What would be the difference between a slight but persistent difference and a profound difference in slant?
Ms. Kelly: I think what he is getting to, once again, is that a slight but persistent difference in two writings might indicate it's two different writers. In order to make that evaluation, you would have to have sufficient known writing of both writers. . . .
Q: So slight in that context would be a comparative notion?
Ms. Kelly: It would seem like it, yes.
. . . .
Q: So a slight difference in slant would be?
Ms. Kelly: He is using it as a comparative between two different writers . . . .
Q: What degree of measure might be slight?
Ms. Kelly: . . . I wouldn't quantify it.
(Tr. at 241-43.)
Court: Do these natural variations ever occur to a degree that's equal to [inter-writer] differences so that a document examiner would not be able to distinguish between a [natural] variation and a difference as I have defined those terms?
Ms. Kelly: I think, your Honor, the only time that might occur is if they were not provided with an adequate amount of known writing. If you only have a small amount of the writing, you can't properly evaluate those characteristics to determine if in fact they are differences or variations of a given writer . . . .
. . . .
Court: Do you know or are you aware of any quantitative, by which I mean numerical, standards which would form a document examiner's conclusion that a given feature in a questioned document is . . . not by the author of the known samples?
To make that precise, let me give you an example. Suppose, let us say, that if we look at a known writing and a reasonably fair sample of known writing by a particular writer, that that writer's small L is always slanted between say 8 to 12 degrees to the right, you find that in an adequate sample of known writing, is there some numerical standard which a document examiner would apply to determine what  degree of slant in the questioned document would lead to the conclusion that the questioned document is by a different author? Would you need 14 [degrees], 17, 19, is the real question.
Ms. Kelly: Is there a way of measuring?
Court: Is there some numerical standard that document examiners use?
Ms. Kelly: I would say as a general rule, no, your Honor; there is no numerical measurement of that kind of slant.
(Id. at 277-79.)
Q: A major difference between two samples of writing is different than a minor difference?
Ms. Kelly: I think I can agree that a major difference is more important than a minor difference.
Q: Is there a standard that document examiners use to distinguish what a minor difference would be from a major difference in two samples of handwriting?
Ms. Kelly: I think that gets back to the definition that I read as to what is significant, what a significant feature would be. And if that difference rises to something that becomes significant, then it would lead you to the conclusion that they were two different writers. If it's something that is only a minor difference, then I don't think you could reach that conclusion. Is that what you're getting at?
(Id. at 145.)
Ms. Kelly, who may not have accepted, or, perhaps, to have understood as posed to her, the problem of distinguishing inter-writer differences from natural variation, repeatedly insisted that with an adequate number of handwriting samples, FDEs, through unspecified methods, could reliably do what they purport to do.
Q: How is the notion of sufficient quantity judged?
Ms. Kelly: It's based on the training and experience of the particular examiner.
Q: So there are no standard measures of sufficient quantity?
Ms. Kelly: There is no way to measure that, no.
(Id. at 202-03.)
While not at all finding Ms. Kelly to be evasive, her responses were far from satisfactory in establishing the scientific reliability of FDEs' efforts:
Q: What would the appropriate weight of [an individual handwriting] characteristic be [for purposes of making an identification]?
Ms. Kelly: How individual that feature might be . . . . The divergence from the standard copy book form of maybe that particular letter or some particular habit that is very individual.
Q: How is the individuality of a feature determined?
Ms. Kelly: Once again, it goes to how much that might diverge from the standard copy book form of a given letter formation and it really is based on the training and experience of the examiner in evaluating that characteristic.
Q: So then it's a subjective determination of individuality?
Ms. Kelly: There is a subjective component, but once again there are objective components to how you have to proceed.
Q: What might one of those objective components be?
Ms. Kelly: Once again, it's the amount of known writing that is necessary, as to being able to establish the range of variation of a given writer, sufficient quantity of questioned writing, the appropriate weight and the equipment that might be used.
Q: But there is no numerical or quantifiable standard for the amount of known writing or the sufficient quantity of known writing?
Ms. Kelly: No. Once again, there is no standard measurement because every writer is very individual and very unique. Some people have a lot of individuality present in their writing and other people do not. So it's kind of evaluated for each writer.
(Id. at 204-05.)
Ms. Kelly's testimony established that there are few published studies supporting even the "two underlying principles" of forensic document analysis, much less the claim that FDEs can reliably detect forgeries. Ms. Kelly relied on a study that concerned the uniqueness of each person's handwriting. R.J. Muehlberger et al., A Statistical Examination of Selected Handwriting Characteristics, 22 J. Forensic Sci. 206 (1977). Muehlberger concluded, however:
The survey is admittedly modest because of the size of the total population studied (200 individual writers). . . . This study was undertaken as an attempt to consider the possibilities for a standardized statistical approach to handwriting identification problems.
Id. at 215 (emphasis supplied).
Muehlberger also bemoaned the lack of statistical data concerning the occurrence of handwriting characteristics and noted that "while document examiners tend to assign probative values to specific handwriting characteristics and their combinations, judgments are often based almost entirely on the examiner's experience and power of recall." Id. at 206.
Ms. Kelly observed that, since Muehlberger was written in the 1970's, "I think that there [have] been a lot of studies done on the various aspects of handwriting characteristics that have since supplemented the work that was done then." (Id. at 112.) When asked to name a single such study, Ms. Kelly stated "I'm sorry, right now I can't think. I would have to go through some of the bibliographies that I brought with me." (Id.) Kelly was frequently unable to cite to studies supporting such critical propositions:
Q. Is there sufficient statistical data for document examiners to make the kinds of handwriting identifications they do?
Ms. Kelly: I believe that the principles that underlie handwriting analysis have sufficiently been proved and tested so that, yes, they can be used and applied for the purpose of handwriting identification.
Q. And the basis for your conclusion is?
Ms. Kelly: Is the studies that have been done and the papers that have been published and the work that has been done in the area.
Q. But you can't name us a particular study that would support that?
Ms. Kelly: I'm sorry, not off the top of my head I can't.
(Id. at 114.) Ms. Kelly's inability to cite such studies, given her high standing within the FDE community and the substantial period of time that the government had to prepare both its case and its witness, leads to an inference that there are few useful scientific studies relevant to forensic document examination.
Ms. Kelly did, however, discuss a handful of articles that address the ability of FDEs to perform identifications, and the smaller number of articles that compare the relative skills of FDEs and lay examiners. The latter articles include Moshe Kam et al., Proficiency of Document Examiners in Writer Identification, 39 J. Forensic Sci. 5 (1994) and Henry T.F. Rhodes, Statistical Approaches to the Identification of Handwriting (unpublished manuscript). Ms. Kelly conceded that both Kam
and Rhodes believed that additional studies were required before conclusions could be drawn as to such ability. (Tr. at 115-17, 229.) Kelly also admitted that the problems attacked by FDEs did not lend themselves to rigorous (quantifiable) analysis:
Q: Wasn't Rhodes trying to examine if there is a classification system that could be used for handwriting?
Ms. Kelly: It appears that that's what he was trying to do.
Q: And wouldn't you agree that such a classification system would help provide a reliable criterion for dividing handwriting into standardized identifiable units?
Ms. Kelly: If it were possible, but he also concedes that it's not very possible.
Q: And you agree with that conclusion?
Ms. Kelly: It is -- once again, my conclusion is that it is difficult to somehow quantify and weigh and measure something that is done as dynamic [sic] as handwriting characteristics.
(Id. at 120-21 (emphasis supplied).)
Ms. Kelly's testimony was followed by that of Defendants' first witness, George Edward Stelmach, Professor of Exercise Science and Psychology at Arizona State University. Professor Stelmach holds a doctorate in motor control and learning from the University of California, Berkeley. He currently has a $ 2 million grant from the National Institute of Health, half of which relates to the study of handwriting. Professor Stelmach's 30 page curriculum vitae contains a long list of scientific honors (e.g., Senior Fulbright Research Scholar, visiting professorships, fellowships), editorial positions on scientific journals, and 170 publications.
Professor Stelmach explained to the Court that he considers himself a scientist because he publishes technical papers, supervises doctoral and postdoctoral students, competes for grants, and tries to "participate in most types of scientific communities where there is review of hypotheses, where there is review of data, where there is scrutiny of methodology, and where conclusions are drawn." (Id. at 297.) Professor Stelmach stated that he employs the scientific method in his work, which includes peer review, scrutiny of methodology, scrutiny of error rates, and conclusions and interpretations." (Id. at 298.) Stelmach then explained in some detail why the determination of error rates through "validation studies" is crucial to scientific advancement. (Id. at 300-03.)
The bulk of Professor Stelmach's testimony concerned his method for studying handwriting as a dynamic process, rather than as a static one. That is, instead of looking only at the end product of handwriting, an ink trace on a piece of paper, Stelmach, with the aid of a computer and a "digitizing tablet," measures the process of handwriting. Stelmach's technique enables him to determine the force, speed, and acceleration associated with the formation of each handwritten character. This approach has little direct connection to the work of FDEs, who are limited to the examination of static traces. Nevertheless, Stelmach and other scientists
working along these lines are taking the first steps towards creating a science of handwriting analysis. Their dynamically-oriented approaches may someday result in accurate methods for analyzing static traces. Stelmach compared his research to the efforts of FDEs:
As I see it, the debate should be on whether or not there are definable criteria. And as I look at the literature and I see what has been published . . . there is very little indication at any level that suggests that the methodology is open and visible, that the method has produced an error rate that is acceptable, that they can in fact validate and predict what they do.
(Tr. at 347.)
Stelmach was also critical of the work of Kam et al.:
What I found particularly distasteful from scientific terms . . . is they tried to debrief the particular investigators, and they found terminology such as, well, first of all, I did a global inspection and then I looked at the documents, and then I did a letter analysis.
But they never state what they do. They never address the issue of independence. They never address the issue of size variability. So they conclude by saying that the examiners have difficulty verbalizing what they do. It seems to me that that is not an acceptable level of science that I would like to see in this particular field.
(Id. at 353 (emphasis supplied).)
Defendants' second witness was Michael Saks, Professor of Psychology and of Law at the University of Iowa.
Saks holds a Ph.D. in social psychology with an emphasis on research methodology and statistics. Among his many other professional activities and honors, Professor Saks has taught appellate judges about research methodology and statistics in the summer LL.M. program at the University of Virginia Law School. Professor Saks has also worked for the National Center for State Courts, where he looked at problems that lawyers and courts had in using forensic science.
Professor Saks testified at some length on the research of Kam, Rhodes and others, and spoke of his own publications in this area. Saks' testimony established that there is no strong statistical evidence supporting, or disproving, the "two fundamental principles" or the reliability of forensic document examination.
One particularly interesting portion of Professor Saks' testimony concerned a study of signatures of individuals named John Harris on voter registration cards. John J. Harris, How Much do People Write Alike?, 48 J. Crim. Law & Criminology 637 (1958). Professor Saks observed:
[According to Mr. Harris, himself an FDE,] so many of these signatures lacked individuality and looked alike that they were not worth photographing.
So . . . this is not necessarily an absolute proof there are many people who write alike, but it raises such doubts about such a fundamental tenet that one would have expected document examiners, first of all, to run out and start doing lots of studies to either confirm or refute this finding.
To the best of my knowledge there are none, and the existence of this study would raise so many doubts that one would think the document examiners would feel much more circumspectly about their confidence . . . .
(Tr. at 426.)
Saks also addressed the issue of the testability of the principles and methods employed by FDEs:
Yesterday there was some disagreement about whether document examiners were able to judge the speed of a writing by looking at the static trace. Well, if one wanted to answer that question empirical research would be a very, in principle a very simple way to do it.
You could take 100 people and have them write something, and time how long it takes them to write it, a word, Professor Stelmach's apparatus would be a ridiculously easy way to do that, have them sit down, write a word, take the static traces, give them to document examiners and ask them to estimate the time or to rank order these and then you could compare the known length of time that was required to write these words to the amount of time that the document examiners estimated. And then one would have the answer.
. . . Empirical studies are the root to answering empirical questions. And otherwise one is just guessing. In fact, there is research on research which shows that the sloppier the research technique or no technique, no real research at all leads to the most enthusiastic beliefs and highest confidence on the part of the people who do any technique . . . . And the more rigorously designed the research is . . . the less enthusiastic the conclusions are.
(Id. at 399-400.)
III. Forensic Document Expertise -- Inadmissible as "Scientific . . . Knowledge"
Were the Court to apply Daubert to the proffered FDE testimony, it would have to be excluded. This conclusion derives from a straightforward analysis of the suggested Daubert factors -- testability and known error rate, peer review and publication, and general acceptance -- in light of the evidence adduced at the Daubert hearing. The Court analyses each factor in turn.
The Daubert Court distinguished "testability," the amenability of a field of knowledge to empirical testing, from "known or potential rate of error," which it recommended in the case of evaluating a "particular scientific technique." 113 S. Ct. at 2797. It does not appear that any aspect of forensic document examination relevant to forgery detection is inherently untestable. While the testing performed to date has been criticized as employing sample sizes too small for statistically reliable conclusions, e.g., Muehlberger and Harris, or for questionable methodology, e.g., Kam,
the Court is not aware of any substantial argument that proper validation testing cannot be conducted. Ms. Kelly testified as to larger scale studies in progress, such as a 5,000 sample study being conducted by the U.S. Postal Laboratory in Memphis, Tennessee, and another study being conducted by the Immigration and Naturalization Service Laboratory in Washington, D.C. (Tr. at 78.) These studies, being incomplete, cannot speak directly to the issue of reliability, but they do support the Court's view that handwriting analysis is amenable to testing.
Looking to the closely related issue of known error rates, the data collected thus far is best characterized as sparse, inconclusive and highly disputed, particularly as regards the relative abilities of forensic document examiners and lay examiners. One such study, conducted by Oliver Galbraith, Craig S. Galbraith and Nanette G. Galbraith, is entitled The Principle of the 'Drunkard's Search' As a Proxy for Scientific Analysis: The Misuse of Handwriting Test Data in a Law Journal Article, 1 Int'l J. Forensic Document Examiners 7 (1995). Ms. Kelly characterized the Galbraith study as demonstrating that handwriting examiners perform substantially better than lay persons. (Tr. at 289.) However, Professor Saks observed:
The document examiners reached the correct answer 52 percent of the time. The lay persons reached a correct answer 50 percent of the time. The Galbraiths then took the document examiners' answers and the lay people's "inconclusives," and they fiddled with them . . . . And then came up with a more impressive difference.
(Tr. at 415.) Professor Saks also directed the Court's attention to Table 3 of the Galbraith article, in which, on two of six identification tasks, FDEs did not even exceed chance in their responses.
Ms. Kelly had testified that in the course of preparing a certification exam she informally administered the test to ten lay persons. Each lay person failed the exam, while actual candidates achieved a 90% pass rate. (Tr. at 290.) Ms. Kelly did not publish the results of her experiment, and, in the additional light of concerns about the motivation of lay participants, the Court did not find this testimony to be very persuasive.
Certainly an unknown error rate does not necessarily imply a large error rate. However if testing is possible, it must be conducted if forensic document examination is to carry the imprimatur of "science."
The next Daubert factor is peer review and publication. FDEs publish in several journals, including the Journal of Forensic Sciences, the International Journal of Forensic Sciences, and the International Journal of Forensic Document Examiners. Only a handful of articles in these journals were brought to the Court's attention that speak to issues of the reliability of forensic document examination.
In scrutinizing these articles, the Court found them to be significantly different from scholarly articles in such fields as medicine or physics, in their lack of critical scholarship. In the context of making a somewhat different point, Professor Saks described an example of critical medical scholarship:
There are a whole series of surgical techniques, for example, many of which have been employed without empirical testing and there is a book entitled Costs, Risk and Benefits of Surgery, which brought together all of the empirical research that was available on a fairly wide array of surgical techniques.
And what it discovered was that about a third of the surgical techniques worked and provided a benefit to patients. About a third of them were a waste of effort; and about a third of them actually did more harm than good.
(Tr. at 397.) That a medical researcher feels free to conduct and publish an analysis critical of established medical practices appears to the Court to be the essence of good science. It is the manner in which scientists discover, if not absolute truth, then at least increasingly correct truths. By comparison, FDEs conduct no such critical self-examination.
While the literature of forensic document examination may technically satisfy Daubert's publication factor, it fails to meet the expectations of the Daubert court -- that a competitive, unbiased community of practitioners and academics would generate increasingly valid science. Given the uncritical nature of the literature, its scrutiny during peer review, on which there was no testimony, is of little significance.
The final Daubert factor is "general acceptance" by the "relevant scientific community." 113 S. Ct. at 2797. The Daubert Court did not suggest that acceptance by a legal, rather than a scientific community, would suffice.
FDEs certainly find "general acceptance" within their own community, but this community is devoid of financially disinterested parties, such as academics. Were the community expanded to include other forensic sciences, such as medical examination and forensic psychiatry, there is no indication that these additional practitioners would concern themselves with the reliability of forensic document examination.
A logical choice for a relevant scientific community would seem to be a collection of such mainstream sciences as pattern recognition (within the field of computer science) and motor control (within the field of medicine). It appears to the Court that such scientists are either unfamiliar with forensic document examination, or are critical of the field. Professor Stelmach, whom the Court views as a mainstream scientist, expressed his strong conviction that FDEs are not scientists, and have not demonstrated an ability to do what they claim to do. The government, on the other hand, produced no evidence of mainstream scientific support for forensic document examination.
The Court also found relevant the apparent stagnation of research within the FDE community.
Even the most recent texts and training materials in the field cite to Osborn's Questioned Documents, published in 1910 and 1929, with disturbing frequency. Michael J. Saks & Jonathan J. Koehler, What DNA 'Fingerprinting' Can Teach the Law About the Rest of Forensic Science, 13 Cardozo L. Rev. 363 (1991) ("[A] handful of seminal works . . . some of them generations old . . . remain the principle works to which contemporary analysts turn"). More than half a century later, one would have expected scientists to have either rigorously confirmed the methods described in these texts, to have rejected them as unreliable, or to have replaced them with substantially improved methods. By way of explanation, Defendants note that "unlike DNA profiling technology or epidemiology, this discipline has no counterpart in industry or academic with an economic incentive to study and refine its scientific basis." (Defs.' Mot. at 29.)
In sum, the testimony at the Daubert hearing firmly established that forensic document examination, despite the existence of a certification program, professional journals and other trappings of science, cannot, after Daubert, be regarded as "scientific . . . knowledge." Fed. R. Evid. 702.
IV. Daubert's Reach: The Scientific Branch of Rule 702
Daubert involved the question of whether prenatal ingestion of the prescription drug Bendectin was a risk factor for certain human birth defects. An expert witness for the defendant manufacturer reviewed the extensive published literature on the causation issue, and testified that Bendectin was not a risk factor. The district court granted the defendant's motion for summary judgment, rejecting the testimony of plaintiffs' equally well-credentialed experts that they had established, through statistical "reanalysis" of the existing literature, a significant causal link between the drug and birth defects. The trial court grounded its rejection of the latter testimony on the failure of plaintiffs' experts to publish their results or to subject them to any form of peer review. Daubert v. Merrell Dow Pharmaceuticals, Inc., 727 F. Supp. 570, 575 (S.D. Cal. 1989).
Daubert turned on medical and statistical issues in the area of disease causation. Given the universal perception of medicine as being a science (albeit not a "hard" science) and given the unchallenged scientific credentials of the expert witnesses on both sides, the Supreme Court appropriately focused on the "scientific . . . knowledge" branch of Fed. R. Evid. 702. Daubert may therefore be viewed as establishing reliability standards for expert testimony in fields whose scientific character is undisputed. This characterization is not as narrow as it may first appear, since it includes such diverse scenarios as the legitimate scientist seeking to testify as to cutting-edge science, as well as the venal professional witness seeking to introduce purely speculative scientific testimony. In either case, the context for the expert testimony is properly characterized as scientific.
An example of testimony appropriate for a Daubert analysis might relate to the cancer risk posed by weak electric or magnetic fields ("EMF"). See generally Kenneth R. Foster et al., Phantom Risk: Scientific Inference and The Law ch. 3 (1993).
Expert testimony by both physicists and medical researchers may be appropriate in this area, but each expert is likely to lack even rudimentary knowledge of the complementary discipline. Should the physicist offer a (medical) theory of cancer causation, or the medical researcher seek to testify as to the basic physical properties of EMF, increased judicial scrutiny of the proffered testimony is called for. In requiring each expert to satisfy such requirements as peer-reviewed publication in the appropriate literature, the Daubert Court exacted a reasonable price for the admission of testimony often viewed as being outcome-determinative -- that scientists risk the critical evaluation, and possible disproof, of their work by other scientists.
While trial courts must always be concerned with the reliability of expert witness testimony, it is unclear whether Daubert provides, or was intended to provide, useful guidance for nonscientific expert testimony.
In the long term, we may discover that Daubert addressed the easier part of the problem of expert testimony; it was a relatively straightforward matter for the Daubert Court to deduce objective reliability standards from the nature of the process of experimental Newtonian science. However, although ultimately all types of expert knowledge are inferences from underlying experience, the epistemology of nonscientific expert knowledge is quite different from that of scientific propositions. . . . The development of objective validation standards for nonscientific opinion is likely to prove to be a more difficult task than the formulation of such tests for scientific testimony.
Edward J. Imwinkelried, The Next Step After Daubert: Developing a Similarly Epistemological Approach to Ensuring the Reliability of Nonscientific Expert Testimony, 15 Cardozo L. Rev. 2271, 2294 (1994).
The question before this Court, a question that does not appear to be definitively resolved, is whether Daubert reaches past the scientific branch of Rule 702 to impose reliability requirements on "technical, or other specialized knowledge" testimony.
The Daubert Court noted that "our discussion is limited to the scientific context because that is the nature of the expertise offered here." 113 S. Ct. at 2795 n.8. "Our discussion" may refer to the entire opinion, or to the particular factors, such as peer review, the Court believed useful in evaluating scientific testimony.
Certainly the Daubert factors themselves, such as peer review and publication, are irrelevant for many categories of expert testimony. For example, the real estate valuation expert is neither expected nor required to publish his "valuation methodology" for scrutiny and criticism by the larger community of real estate experts.
Looking to the Court's repeated citations to Rule 702 generally, one might speculate that Daubert imposes a new reliability standard for all expert testimony, limiting its detailed guidance to the scientific evidence before it. The opinion itself provides little support for this view. The Daubert Court derives the gatekeeping task of the trial judge from a reevaluation of Frye (itself limited to scientific testimony) and from consideration of the word "scientific" in Rule 702. The essence of Daubert's "reliability" standard lies within the Court's citation to philosopher of science Karl Popper's statement that "the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability." 113 S. Ct. at 2797 (quoting Karl Popper, Conjectures and Refutations: The Growth of Scientific Knowledge 37 (5th ed. 1989)). Imwinkelreid notes:
the Court did not foreclose the possibility of applying its validation test to nonscientific testimony as well as expert evidence. However a moment's reflection demonstrates that this possibility is not a lively option. Neither the essential test enunciated in Daubert, nor the factors listed by the Court are applicable to nonscientific opinion. The Daubert test is grounded in the scientific process and directs the judge to evaluate the quality of the testing supporting the scientific conclusion.
15 Cardozo L. Rev. at 2283 (footnote omitted).
The Second Circuit has interpreted Daubert to apply specifically to scientific testimony. In Iacobelli Constr. v. County of Monroe, 32 F.3d 19 (2d Cir. 1994), the plaintiff sought to introduce the affidavits of a geotechnical consultant and an underground-construction consultant. Relying on Daubert, the district judge disqualified the affidavits from consideration. The Second Circuit reversed, observing that:
Daubert sought to clarify the standard for evaluating "scientific knowledge" for purposes of admission under Fed. R. Evid. 702. . . . The affidavits of [plaintiff's experts] do not present the kind of "junk science" problem that Daubert meant to address. See Tamarin v. Adam Caterers, Inc., 13 F.3d 51, 53 (2d Cir. 1993) (Daubert "specifically dealt with the admissibility of scientific evidence").
32 F.3d at 25; see also Hawthorne Partners v. AT&T Technologies, Inc., 1993 WL 502384 (N.D. Ill. Aug. 11, 1993) (Daubert inapplicable to expert testimony concerning commercial real estate appraisal).
The Court therefore finds no support for the proposition that Daubert extends past the "scientific" branch of Rule 702 to other forms of expert testimony. In other words, Daubert does not impose any new standard, other than what is found in the text of the Federal Rules of Evidence, for the admissibility of the testimony of nonscientific experts such as harbor pilots or real estate appraisers.
At the Daubert hearing, the government declined to present forensic document examination as a hard science:
Ms. Likwornik: It's the government's position . . . that this testimony is something of a hybrid. It is based on scientific principles. It is a forensic science, but I think as the court will see from the testimony, it's not the same as a hard science, such as chemistry, for example, analyzing narcotics to see if they are narcotics, which would have an objective test. This is a field that has objective principles but handwriting is more subjective because it's variable . . . .
(Tr. at 4). Similarly Gus Lesnevich, the FDE whose testimony is at least the principal subject of the present motion, has, in a previous appearance before this Court, characterized his field as an "art based on scientific principles." (Ex. A to Govt.'s Mot. ( United States v. Roges, 91 Cr. 0612, Tr. at 1407).)
Rule 702, which governs the admissibility of all expert testimony, does not speak of "hybrid" knowledge. It classifies knowledge as being either scientific, technical, or specialized in nature. Were the Court to accept the government's characterization of FDE expertise as being at least partially scientific, then, as discussed supra, Daubert would have mandated the rejection of some or all of the testimony as insufficiently reliable.
The Court is not persuaded, however, that FDE expertise is properly characterized as a hybrid. It could equally well be said that the harbor pilot's testimony is a hybrid, in its reliance upon underlying principles of fluid dynamics and ocean science. In fact, the pilot's navigational expertise derives not from a mastery of underlying principles of mathematics and physics, but wholly from his or her practical training and experience. Faced with a difficult maneuver, the wise passenger would prefer a pilot who had devoted less time to the pursuit of abstract mathematical knowledge, and more time to controlling the vessel. A wise fact-finder would similarly prefer the testimony of a harbor pilot with the relevant practical experience.
As with the harbor pilot, it was apparent from the Daubert hearing that while scientific principles may relate to aspects of handwriting analysis, they have little or nothing to do with the day-to-day tasks performed by FDEs. Specifically, while principles of "chemistry, physics [and] mathematics" are implicated in the work of FDEs, 1994 Annual Book of ASTM Standards, vol. 14.02, at 248 ("Standard Descriptions of Scope of Work Relating to Forensic Document Examiners"), this attenuated relationship does not transform the FDE into a scientist.
Q: So how in particular is handwriting influenced by each of those components [mental, mechanical and physical] that you say make up the document examiner's work?
Ms. Kelly: I can't pretend to be a neurologist or a medical doctor or anything. I don't know the underlying theory of . . . how our muscular ability might be different. I can't pretend to have the medical knowledge to explain all that.
(Tr. at 69-70.)
Rule 702's respect for non-academic skills is reflected in the word "technical," which is defined as "practical knowledge especially of a mechanical or scientific subject." Webster's Third New International Dictionary 2348 (1986). The Court concludes that FDE expertise, constituting "technical, or other specialized knowledge," is therefore outside the scope of Daubert, except to the extent that Daubert dictates that Fed. R. Evid. 702 governs the admissibility of particular fields of expert testimony.
V. The Admissibility of Nonscientific Expert Testimony
Rule 702 permits expert testimony that is nonscientific in character, so long as it "assists the trier of fact":
Whether the situation is a proper one for the use of expert testimony is to be determined on the basis of assisting the trier. "There is no more certain test for determining when experts may be used than the common sense inquiry whether the untrained layman would be qualified to determine intelligently and to the best possible degree the particular issue without enlightenment from those having a specialized understanding of the subject involved in the dispute." Ladd, Expert Testimony, 5 Vand. L. Rev. 414, 418 (1952). When opinions are excluded, it is because they are unhelpful and therefore superfluous and a waste of time. 7 Wigmore § 1918.
The rule is broadly phrased. The fields of knowledge which may be drawn upon are not limited merely to the "scientific" and "technical" but extend to all "specialized" knowledge. Similarly, the expert is viewed, not in a narrow sense, but as a person qualified by "knowledge, skill, experience, training, or education." Thus within the scope of the rule are not only experts in the strictest sense of the word, e.g. physicians, physicists, and architects, but also the large group sometimes called "skilled" witnesses, such as bankers or landowners testifying to land values.
Fed. R. Evid. 702 advisory committee's note (emphasis supplied); see also Michael H. Graham, Federal Practice and Procedure § 6641, at 244 (1992) ("The local carpenter and auto mechanic are illustrative of admissible skilled witness testimony."); Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1110 (5th Cir. 1991) ("The Advisory Committee Note accompanying Rule 702 reads the broad language of the rule to permit expert testimony . . . by so-called skilled witnesses, whose experiences permit them to testify with authority on a given topic."), cert. denied, 503 U.S. 912, 112 S. Ct. 1280, 117 L. Ed. 2d 506 (1992).
The testimony of skilled witnesses of quite diverse experiences have been held admissible in this and other circuits. See, e.g., United States v. Locascio, 6 F.3d 924 (2d Cir. 1993), cert. denied, 114 S. Ct. 1645 (1994) (testimony of organized crime and drug sale expert admissible under Rule 702); United States v. Daccarett, 6 F.3d 37 (2d Cir. 1993), cert. denied, 114 S. Ct. 1294 (1994) (testimony of money laundering expert); McKinney v. De Bord, 507 F.2d 501 (9th Cir. 1974) (testimony of pipefitter as to strength of a clamp, despite lack of metallurgic training); Pampillonia v. Concord Line, A/S, 536 F.2d 476 (2d Cir. 1976) (testimony of ship captain as to proper use of grease during the lashing down of cargo); United States v. Diaz, 25 F.3d 392 (6th Cir. 1994) (testimony of dog trainer).
The fact that Daubert does not apply to nonscientific expertise does not suggest that judges are without an obligation to evaluate proffered expert testimony for reliability. This obligation derives from Rule 702 and from Rule 104(a), which provides that "preliminary questions concerning the qualification of a person to be a witness . . . shall be determined by the court."
The Daubert Court also noted the utility of other rules of evidence in dealing with expert testimony, such as Rule 703
and Rule 706, which "allows the court at its discretion to procure the assistance of an expert of its own choosing." 113 S. Ct. at 2797-98.
Within Rule 702 itself, there are several requirements imposed on "technical, or other specialized knowledge" witnesses. First, of course, the witness must possess such a relevant form of "knowledge." Second, the knowledge must "assist the trier of fact." Finally, the witness must be "qualified as an expert." Each of these clauses presents a possible barrier to expert testimony. The latter clause, for example, could bar the testimony of weekend sailors professing expertise as harbor pilots, since they would be unable to point to relevant training or experience.
Imwinkelreid argues that courts have largely failed to fulfill their obligation to monitor nonscientific expert testimony for reliability:
In effect, the courts have adopted a laissez-faire attitude toward the reliability of the propositions underlying nonscientific expert testimony.
Most courts insist on a twofold foundational showing that the witness qualifies as an expert and that the testimony will be helpful to the trier of fact; but beyond that minimal showing, these courts tend to uncritically accept a nonscientific expert's claim that the propositions he or she proposes testifying to is reliable. Some jurisdictions go further and require that the proposition the expert proposes vouching for relates to a developed, recognized field of knowledge. However, even that requirement does not guarantee the reliability of the specific proposition in question. Courts accord "considerable weight to any assurance by" the expert "that the underlying data [supporting the proposition] are indeed adequate, in terms of both quality and quantity." For the most part, the courts have failed to develop their own objective reliability standards for propositions of nonscientific expert evidence.
15 Cardozo L. Rev. at 2280-81 (citations omitted).
Whatever the historical practice may have been with regard to evaluating nonscientific testimony, this Court concludes that adequate guidance can be found within Rule 702 to conduct a meaningful inquiry into the reliability of the expertise claimed by FDEs.
VI. Forensic Document Expertise -- Admissible as "Technical, or Other Specialized Knowledge"
The Court considered only the reliability of the particular expertise claimed here -- that given a large number of genuine signatures, an FDE might be able to determine whether particular questioned signatures were genuine ("forgery detection"). The Court did not evaluate other forms of FDE testimony, such as the more difficult task of identifying which of several individuals executed a known forgery ("forger identification"):
To determine whether or not a signature is genuine is a very different problem from that of determining who actually wrote a forged signature. It is not often that the writer will put enough of his own writing qualities into it to identify himself. From this meager evidence it is of course just as presumptuous to say that the suspected writer did not write it.
Albert Osborn, Questioned Documents 384 (2d ed. 1929).
Presented with a forgery detection problem, FDEs conduct a two stage analysis. First, FDEs scrutinize the genuine and challenged exemplars and identify "significant" similarities and differences. Second, based on their training
and experience, FDEs combine their first stage observations and draw inferences as to the genuineness of the questioned signatures.
Defendants have made no showing that either stage of this analysis is likely to be faulty, e.g., that no detection of significant differences between writings can be performed. Defendants have simply challenged FDEs to meet a scientific level of proof that such skill exists.
In particular, the first stage of an FDE's analysis is subject to juror confirmation or rejection, or to challenge by an opposing expert. In United States v. Buck, 1987 U.S. Dist. LEXIS 9913 (S.D.N.Y. Oct. 28, 1987), Judge Haight observed that, unlike other forms of expert testimony, jurors can, to some extent, perform the task that is being performed by FDEs, namely visual comparisons between handwriting exemplars. "The ability of jurors to perform the crucial visual comparisons relied upon by handwriting experts cuts against the danger of undue prejudice from the mystique attached to 'experts.'" Id. at *9.
In my experience, expert testimony on handwriting comparison aids the jurors by focusing their attention on the minute similarities and dissimilarities between exemplars that lay jurors might otherwise miss. It is largely in the location of these similarities and dissimilarities that the professional document examiner has an advantage over a lay juror. In that advantage lies the expert's ability to assist the lay juror.
Id. at *10.
Although Ms. Kelly was unable to explain to the Court's satisfaction precisely how "significant" similarities or differences were identified, the Court has no doubt that such identifications can be performed, in some cases by cursory examination. Figure 1 shows two signatures with many identifiable differences, such as the ornamentation of each "B" and the curvature of the initial stroke of each "M." Without additional known genuine exemplars, the lay examiner might correctly conclude that one of the signatures was a forgery. While an FDE might come to the same conclusion, he or she would first have considered the possibility that both signatures were genuine, the differences arising from such sources as natural variation,
the passage of time,
purposeful alteration (e.g., elaborate signatures used when signing checks),
Ms. Kelly's testimony clearly established that FDEs are aware that forgery detection requires an adequate quantity of genuine writings to eliminate such possibilities. (Tr. at 34, 123-24, 130-31, 140.)
[SEE SIGNATURES IN ORIGINAL]
Figure 1. The top signature is genuine, the bottom signature a simulated (non-traced) forgery. Wilson R. Harrison, Suspect Documents: Their Scientific Examination 398 (1958).
Since jurors can confirm the FDE's first stage efforts, it might appear that the jury does not require the assistance of an expert witness. See Michael H. Graham, Federal Practice and Procedure § 6644, at 264 (1992) (at common law, expert testimony was excluded for matters "within the common knowledge and experience of jurors"). No such limitation is imposed under the federal rules, however. See 3 Weinstein & Berger, Weinstein's Evidence § 702, at 702-15 to 702-16 (1989) ("Even when jurors are well equipped to make judgments on the basis of their common knowledge and experience, experts may have specialized knowledge to bring to bear on the same issue which would be helpful.")
FDEs are likely to be helpful to jurors with regard to first stage efforts for two reasons. First, while jurors may have the ability to locate significant similarities and differences between sets of writings, it is not clear that they will, or should, take the time to conduct such comparisons de novo during a trial. Defendants, who have argued that lay examiners are poorly motivated, supra note 9, appear to concede this point. Second, the Court does not doubt that, as with most tasks, skill increases with experience, so FDEs are likely to do a better job than even highly motivated jurors.
In a report dated February 21, 1995, FDE Gus Lesnevich outlined the basis for his conclusions as to the genuineness of the questioned "Ethel Brownstone" signatures. His report describes the comparison of 224 genuine signatures dated between November 1, 1979 and December 15, 1986, with two questioned signatures dated June 3, 1985 and March 11, 1986. Mr. Lesnevich asserts, for example, that "in Two Hundred Nineteen (219) of the Two Hundred Twenty-Four (224) known signatures, the Capital Letter, "E", leans to the right. Only five (5) signatures contain a Capital Letter, "E", with a slant similar to that found in the questioned signatures." (Feb. 21 Letter to U.S. Attorney's Office at 3.) Similarly, Mr. Lesnevich asserts that in all 224 known signatures the "1" in "Ethel" is lower than the capital letter "B" in "Brownstone." (Id. at 4.)
Observation of such points of comparison requires the careful comparison of 224 known signatures to the questioned documents -- perhaps an inappropriate task for jurors who must concern themselves with the entirety of a criminal trial. Since first stage observations are easily confirmed or rejected by jurors, the Court can conceive of no substantial argument for excluding this testimony.
In the second stage of their analysis, FDEs combine their first stage results and draw inferences as to the genuineness of questioned signatures. The Daubert hearing established that this process relies heavily on the training and experience of the FDE, rather than on articulable objective standards. One again, Defendants presented no evidence, beyond the bald assertions of Professors Stelmach and Saks, that FDEs cannot reliably perform this task. Defendants have simply challenged the FDE community to prove that this task can be done reliably. Such a demonstration of proof, which may be appropriate for a scientific expert witness, has never been imposed on "skilled" experts.
The Court is persuaded that the second stage of a forgery detection analysis can be performed with sufficient reliability to merit admission. Even in as limited a class of writings as signatures, there are a large number of possible points of comparison. In Mr. Lesnevich's report of February 21, he details eight assertedly significant differences. As a matter of simple logic, the greater the number of points of comparison, the greater the certainty of the determination as to genuineness. While FDE's have not quantified the minimum number of points of comparison that would ensure a particular level of reliability, the Court concludes that conclusions as to genuineness can be reliably drawn, where, for example, one FDE documents a large number of significant differences, and an opposing FDE can neither rebut those differences, nor present a large number of countervailing "significant similarities."
Defendants criticized the second stage effort, as well as the first, for the absence of objective standards. (Tr. at 201-03.) By way of comparison, objective standards are employed in fingerprint analysis. See Schleicher v. Wyrick, 529 F.2d 906, 909 (8th Cir. 1976) ("The International Association of Identification Officers considered . . . that eight to twelve [fingerprint] characteristics as points of comparison are sufficient to be a valid basis for drawing a conclusion."). Fingerprint analysis, however, is a far simpler task than handwriting analysis. Fingerprint patterns contain no "natural variation," are unaffected by such factors as disease, intoxication and the passage of time, and do not easily permit purposeful disguise. To demonstrate that handwriting comparisons may not be amenable to objective numerical standards, Ms. Kelly observed that "some people have a lot of individuality present in their writing and other people do not." (Tr. at 205.) Where such individuality is present, a single significant difference may suffice to indicate a forgery. (Id. at 98.)
FDEs also lack objective standards in regard to the number of exemplars required for an accurate determination as to genuineness. (Id. at 89.) However both Ms. Kelly's testimony, (Id. at 41, 86, 124, 140), and the FDE literature demonstrate a strong concern that a sufficient number of exemplars be scrutinized so as to produce statistically reliable results.
In sum, the Court finds that experienced individuals may be able to detect "significant" similarities and differences between sets of handwriting exemplars. Where several points of comparison have been established, reliable determinations of genuineness are possible. As Figure 1 demonstrates, points of comparison are often more easily established than described. FDEs can devote greater time and concentration to the task of detecting points of comparison than can jurors. FDEs also appreciate, if they cannot articulate, the relevant statistical considerations required for reliable inferences. The Court therefore finds sufficient indicia of reliability to sustain the admissibility of FDE expertise as nonscientific expert testimony.
While jurors can visually confirm an FDE's first stage efforts, no ready verification of the second stage results, the opinion as to genuineness, is possible. If jurors choose to accept the FDE's opinion,
they will be relying, at least in part, on the skill and integrity of the practitioner. Such reliance may suggest that a higher standard for reliability is called for. For this reason, and in light of the evidence adduced at the Daubert hearing, the Court considered limiting the scope of FDE testimony to first stage evidence.
On careful reflection, no such limitation is required. Rule 702 permits qualified experts to offer their opinions. To the extent that experts possess knowledge not "within the common knowledge and experience of jurors," Federal Practice and Procedure § 6644, at 264, reasonable reliance on the expert, rather than formal proof by the expert, will often inform the fact-finder. In any event, jurors are likely to divine an FDE's opinion as to genuineness from the thrust of the first stage testimony. By permitting direct testimony as to that opinion, jurors may receive the benefit of a thorough cross-examination as to the basis for the opinion. As the Daubert Court emphasized, traditional mechanisms of "vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof" are the appropriate means of attacking "shaky but admissible evidence." 113 S. Ct. at 2798.
Finally, in determining that forensic document examination constitutes admissible nonscientific expert testimony, the Court was guided by other forms of evidence held admissible, despite substantial concerns as to reliability:
While handwriting analysis may not be as scientifically accurate as fingerprint identification, it is, on the whole, probably no less reliable than eyewitness identification which is often made after a quick glance at a human face.
United States v. Acosta, 369 F.2d 41, 42 (4th Cir. 1966), cert. denied, 386 U.S. 921, 17 L. Ed. 2d 792, 87 S. Ct. 886 (1967). Indeed, it was anticipated that informal handwriting "expertise" would likely be admissible:
[Rule 702] recognizes that the possession by a witness of 'specialized knowledge' of a nonscientific or technical type may serve as a sufficient qualification in certain areas. For example, while a bank teller may not be as expert a handwriting analyst as the forensic 'questioned document examiner,' he may have an acceptable degree of expertise in relation to the knowledge possessed by the average man on the jury.
Proposed Federal Rules of Evidence with Supreme Court Advisory Committee's Notes 17 (1974) (emphasis supplied).
The Court therefore finds the proffered expert testimony admissible under Rule 702 as nonscientific, or "skilled" testimony. The remaining problem, which is a substantial one, is the possible prejudice deriving from the perception by jurors that this forensic testimony meets scientific standards of reliability.
VII. The Problem of Prejudice
Fed. R. Evid. 403 states that relevant evidence may be excluded "if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury." Expert testimony often raises concerns of prejudice.
Expert evidence can be both powerful and quite misleading because of the difficulty in evaluating it. Because of this risk, the judge in weighing possible prejudice against probative force under Rule 403 of the present rules exercises more control over experts than over lay witnesses.
113 S. Ct. at 2798 (quoting Jack B. Weinstein, Rule 702 of the Federal Rules of Evidence is Sound; It Should Not Be Amended, 138 F.R.D. 631, 632 (1991)).
Expert witnesses generally fall into two categories: scientific experts, such as physicists, doctors, and engineers; and skilled experts, usually in such clearly nonscientific fields as accounting and real estate valuation. With regard to scientific experts, a major rationale for Frye, and now Daubert, is that scientific testimony may carry an "aura of infallibility." 1 Charles T. McCormick et al., McCormick on Evidence § 203, at 608-09 (3d ed. 1984); see also John W. Strong, Language and Logic in Expert Testimony: Limiting Expert Testimony by Restrictions of Function, Reliability, and Form, 71 Or. L. Rev. 349, 361 (1992) ("There is virtual unanimity among courts and commentators that evidence perceived by jurors to be 'scientific' in nature will have particularly persuasive effect.") Skilled experts generally present less of a problem, as, with all due respect, accountants are unlikely bearers of an aura of infallibility.
FDEs do not fit comfortably into either of these categories. While the Court has determined that FDE expertise is purely practical in nature, jurors might well view FDEs as experimental scientists, placing more weight on their testimony than was justified by the government's showing at the Daubert hearing.
Jurors are likely to hold this view for two reasons. First, FDEs may discuss their years of training, employment, and practice in terms that suggest that they are scientists. For example, an FDE may have studied or been employed at a government crime "laboratory,"
may have academic training in a scientific field such as chemistry, and may refer to forensic texts whose titles contain the word "science" or "scientific."
Second, during their second stage testimony, FDEs may testify as to precise levels of confidence in their opinions as to genuineness. Ms. Kelly brought this problem to the Court's attention when she explained that the American Board of Forensic Document Examiners has adopted a standardized terminology for conclusions as to handwriting identification. (Tr. at 41-42.) The standardized terminology consists of a nine level scale of probabilities:
1. identification (definite conclusion of identity)
2. strong probability
4. indications (evidence to suggest)
5. no conclusion (indeterminable)
6. indications did not
7. probably did not
8. strong probability did not
Thomas McAlexander et al., The Standardization of Handwriting Opinion Terminology, 36 Journal of Forensic Sciences 311 (1991).
The Court finds that FDEs, as a group, meet the minimum standards to qualify as nonscientific expert witnesses, by providing jurors with a helpful practical skill derived from their training and experience. No showing has been made, however, that FDEs can combine their first stage observations into such accurate conclusions as would justify a nine level scale.
Overly fine distinctions are inappropriate even in scientific efforts amenable to precise quantification. Lucius Tuttle & John Satterly, The Theory of Measurements 142 (1946) ("In any kind of careful scientific work a number is never written with too many [digits], which would appear to give it an unwarranted degree of accuracy."). Such distinctions are certainly improper in forensic document examination, where it is conceded that conclusions are drawn, in large part, on subjective criteria. (Tr. at 4, 95, 121, 191, 205.) See also Ellen, supra note 28, at 66 (recommending a scale of no more than five or six levels, given the limited accuracy of forensic determinations).
While the Court does not take the problem of prejudice lightly, it is also important not to overreact to it. As Imwinkelreid observes:
The rub is that there is little or no hard evidence indicating that purportedly scientific testimony has the supposed prejudicial effect. It seems plausible to assume that laypersons will find that scientific testimony -- for instance, evidence generated by "'sophisticated instruments capable of precise,' objective measurement" -- carries more weight during jury deliberation. Further, there is some anecdotal support for the assumption. However, given the research data currently available, it would be dishonest to make any purportedly scientific claim about the impact of scientific or nonscientific testimony on lay jurors.
15 Cardozo L. Rev. at 2286 (citations omitted); see also Edward J. Imwinkelried, The Standard for Admitting Scientific Evidence: A Critique from the Perspective of Juror Psychology, 28 Vill. L. Rev. 554, 566-68 (1982) (reviewing studies showing that jurors are not unduly influenced by polygraphy and sound spectrography evidence); Michael S. Jacobs, Testing the Assumptions Underlying the Debate About Scientific Evidence: A Closer Look at Juror "Incompetence" and Scientific "Objectivity", 25 Conn. L. Rev. 1083 (1993) (recent studies establish that juries are capable of deciding very complex cases involving scientific and technical matters); Joe S. Cecil et al., Citizen Comprehension of Difficult Issues: Lessons from Civil Jury Trials, 50 Am. U. L. Rev. 727, 764 (1991) (same); Elizabeth Loftus, Psychological Aspects of Courtroom Testimony, 347 Annals of the New York Academy of Sciences 27, 34 (1980) (lay jurors are more willing to convict on the basis of lay testimony, such as eyewitness testimony, than on the highest caliber scientific evidence, including fingerprints.)
The Daubert Court, responding to concerns that the abandonment of Frye would result in "a 'free-for-all' in which befuddled juries are confounded by absurd and irrational pseudoscientific assertions," criticized the defendant for being "overly pessimistic about the capabilities of the jury, and of the adversary system generally." 113 S. Ct. at 2798; see also United States v. Jakobetz, 955 F.2d 786, 797 (2d Cir.) (While DNA evidence presents special challenges, "the jury is intelligent enough, aided by counsel, to ignore what is unhelpful in its deliberations." (quoting 3 Weinstein & Berger, Weinstein's Evidence § 702, at 702-30 (1989))), cert. denied, 121 L. Ed. 2d 63, 113 S. Ct. 104 (1992).
Balancing the probative value of FDE testimony against the danger of unfair prejudice, the Court, with due respect for the abilities of jurors, concludes that the prejudice problem does not require the exclusion of the proffered testimony. Certain protections are called for, however. First, the jury will be instructed, in advance of any forensic document testimony, that FDEs offer practical, rather than scientific expertise. A similar expert witness instruction will appear in the jury charge. A draft initial instruction appears in Appendix I to this Memorandum and Order.
Counsel may propose modifications to this instruction no later than April 11, 1995 at 9:30 a.m.
Second, the Court will consider restricting the testimony of FDEs as regards their degree of certainty in determining the genuineness of a signature. In particular, the Court may exclude testimony relating to the existence of the nine level scale, and any expert's selection of a level within the scale. This question will be discussed with counsel prior to any testimony by an FDE. Rule 403 has been invoked to exclude scientific testimony as well as statistical testimony that, in the words of the Second Circuit, can easily become "an item of prejudicial overweight." Marx & Co. v. Diners' Club, 550 F.2d 505, 511 (2d Cir.), cert. denied, 434 U.S. 861, 54 L. Ed. 2d 134, 98 S. Ct. 188 (1977); see also 1 Weinstein & Berger, Weinstein's Evidence, § 403 at 403-68 to 403-70 (collecting cases).
Finally, at trial, Defendants will be permitted to attack the reliability of forensic document examination, as they did at the Daubert hearing, to attack the expertise of each testifying FDE, to introduce the testimony of their own FDE, or to employ any combination of such approaches.
Dated: April 3, 1995
New York, New York
LAWRENCE M. McKENNA
Draft Initial Jury Instruction
You are about to hear the testimony of a forensic document examiner, who claims special qualification in the field of handwriting comparison, including the detection of forgeries.
Witnesses are usually permitted to testify only as to matters within their direct experience, such as what they saw or what they did on a particular occasion. Witnesses are not generally allowed to express their opinions. However, some individuals are permitted to offer their opinions because they have acquired a skill, through their training, education or experience, that few members of the general public possess. Such witnesses are frequently referred to as "experts" or "expert witnesses."
For example, in a lawsuit relating to a collision between vessels in a harbor, jurors might find it helpful to hear the opinions of one or more witnesses who have no direct connection to the lawsuit, but have spent years piloting vessels in that harbor. No one would regard the harbor pilot as having "scientific" knowledge of piloting. Nor does referring to the harbor pilot as an "expert" or an "expert witness" suggest anything more than knowledge or skill, acquired through years of experience, that may prove useful to you as jurors.
Just because a witness is allowed to offer opinion testimony does not mean that you must accept his or her opinion. As with any other witness, it is up to you to decide whether you believe this testimony and wish to rely upon it. Part of that decision will depend on your judgment about whether the witness's training and experience are sufficient for the witness to give the opinion that you heard. You may also consider such factors as the information provided to the witness, and the reasoning and judgment the witness employed in coming to the conclusion that he or she testified to.
Forensic document examiners, as a group, may develop skills not possessed by members of the general public, skills that may give rise to opinions useful to you in your deliberations. A forensic document examiner may spend a substantial amount of time looking at handwriting samples, in many cases focusing on signatures. In the course of their work, forensic document examiners may have acquired skill in identifying significant similarities and differences between real and forged writings.
The Court has studied the nature of the skill claimed by forensic document examiners, and finds it to be closer to a practical skill, such as piloting a vessel, than to a scientific skill, such as that which might be developed by a chemist or a physicist. That is, although forensic document examiners may work in "laboratories," and may rely on textbooks with titles like "The Scientific Examination of Documents," forensic document examiners are not scientists -- they are more like artisans, that is, skilled craftsmen.
The determination that a forensic document examiner is not a scientist does not suggest that this testimony is somehow inadequate, but it does suggest that his or her opinion may be less precise, less demonstrably accurate, than, say, the opinion of a chemist who testifies as to the results of a standard blood test.
In sum, the Court is convinced that forensic document examiners may be of assistance to you. However, their skill is practical in nature, and despite anything you may hear or have heard, it does not have the demonstrable certainty that some sciences have.
You may accept a forensic document examiner's testimony in whole, or you may reject it in whole. If you find that the field of forensic document examination is not sufficiently reliable, or that the particular document examiner is not sufficiently reliable, you are free to reject the testimony in whole. You may also accept the testimony in part, finding, as one possible example, that while the forensic document examiner has found significant similarities and differences between various handwriting samples, his or her conclusion as to the genuineness of a particular writing is in error, or is inconclusive. In any event, you should not substitute the forensic document examiner's opinion for your own reason, judgment, or common sense. I am not in any way suggesting what you should do. The determination of the facts in this case rests solely with you.