April 3, 1995


The opinion of the court was delivered by: LAWRENCE M. MCKENNA


 A Grand Jury has charged Defendants with conspiring to steal paintings, sculpture, silver and jewelry from Ethel Brownstone, defendant Roberta Starzecpyzel's aunt. ( S1 93 Cr. 553.) The indictment charges, inter alia, that the Starzecpyzels removed over 100 items of artwork from Brownstone's apartment, delivered them to Sotheby's and Christie's auction houses, authorized the sale of the items, and directed the auction houses to forward the sale proceeds to Swiss bank accounts. The Defendants have also been charged with Interstate and Foreign Transportation of Stolen Moneys, Mail Fraud, Laundering of Monetary Instruments, and Tax Evasion.

 On December 12, 1994, Defendants moved the Court, pursuant to Fed. R. Evid. 702 and 403, to exclude all expert witness testimony and other evidence relating to the alleged forgery of Ethel Brownstone's signatures on two documents dated June 3, 1985 and March 11, 1986. These writings had been examined by Gus Lesnevich, a forensic document examiner ("FDE"), who concluded that the challenged signatures were not genuine. Defendants argued that:

this alleged expertise [of forensic document examination] has never been validated as credible scientific or technical knowledge and does not comport with the requirements of evidentiary reliability articulated by the Supreme Court in Daubert v. Merrell Dow Pharmaceuticals, Inc., 125 L. Ed. 2d 469, U.S. , 113 S. Ct. 2786 (1993).

 (Defs.' Mem. at 1-2.) In the alternative, Defendants requested a Daubert hearing on this issue pursuant to Fed. R. Evid. 104(a) and 702.

 The Court granted Defendants' request for a Daubert hearing, which was held from February 28 through March 2, 1995. The government offered the testimony of Mary Wenderoth Kelly, an FDE employed by the City of Cleveland Police Forensic Laboratory, who currently serves as vice-president of the American Board of Forensic Document Examiners. Defendants offered the testimony of George Edward Stelmach, Professor of Exercise Science and Psychology at Arizona State University, and Michael J. Saks, Professor of Law and Psychology at the University of Iowa.

 While the Court originally considered Daubert to be controlling as to the admissibility of the forensic testimony at issue -- relating to the comparison of a large body of genuine writings to claimed forgeries -- the Court now concludes that Daubert, which focuses on the "junk science" problem, is largely irrelevant to the challenged testimony. The Daubert hearing established that forensic document examination, which clothes itself with the trappings of science, does not rest on carefully articulated postulates, does not employ rigorous methodology, and has not convincingly documented the accuracy of its determinations. The Court might well have concluded that forensic document examination constitutes precisely the sort of junk science that Daubert addressed.

  Yet, as distinguished from such discredited ventures as hedonic damage expertise, *fn1" clinical ecology, *fn2" trauma-cancer expertise *fn3" or the Bendectin plaintiffs' statistical machinations, *fn4" forensic document examination does involve true expertise, which may prove helpful to a fact-finder. FDE expertise is not properly characterized as scientific, but as practical in character. In a nutshell, over a period of years, FDEs gradually acquire the skill of identifying similarities and differences between groups of handwriting exemplars. Such expertise is similar to that developed by a harbor pilot who has repeatedly navigated a particular waterway. The Court therefore treats forensic document expertise under the "technical, or other specialized knowledge" branch of Rule 702, which is apparently not governed by Daubert.

 Such experts, who acquire their skills through practical training, apprenticeships, and long years of practice, are generally not expected to be able to articulate and justify the theoretical bases underlying their practice, to expose their techniques to a larger community of practitioners through peer-reviewed publication, or to subject those techniques to extensive testing. Although Daubert standards do not apply to such "skilled" witnesses, trial courts need not certify every individual accomplished at a particular task as an expert. The Federal Rules of Evidence have long imposed a "gatekeeping" function on trial judges to ensure relevance and helpfulness to the fact-finder before admitting expert witness testimony. Finding this standard satisfied for the proffered testimony, Defendants' motion to exclude the testimony is denied.

 FDE testimony, while acceptable under Rule 702, does suffer from a substantial problem of prejudice, which is the subject of Fed. R. Evid. 403. The problem arises from the likely perception by jurors that FDEs are scientists, which would suggest far greater precision and reliability than was established by the Daubert hearing. This perception might arise from several sources, such as the appearance of the words "scientific" and "laboratory" in much of the relevant literature, and the overly precise manner in which FDEs describe their level of confidence in their opinions as to whether questioned writings are genuine. The Court has determined that the problem of prejudice can be sufficiently diminished with the use of procedural safeguards, including a pre-testimony jury instruction, that FDE testimony need not be excluded pursuant to Rule 403.

 I. The Daubert Reliability Standard

 Daubert was concerned with the standard for admitting expert scientific testimony in a federal trial. The Supreme Court held that the "general acceptance" test established 70 years ago in Frye v. United States, 54 App. D.C. 46, 47, 293 F. 1013, 1014 (1923), was superseded by Rule 702 of the Federal Rules of Evidence, which states:

If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise.

 The Court first observed the "liberal thrust" of the Federal Rules of Evidence, shown, for example, by Rule 402, which states that "all relevant evidence is admissible, except as otherwise provided." 113 S. Ct. at 2793-94; see also Beech Aircraft Corp. v. Rainey, 488 U.S. 153, 169, 102 L. Ed. 2d 445, 109 S. Ct. 439 (1988) (noting the "general approach [of the Federal Rules] of relaxing the traditional barriers to 'opinion' testimony"). The Daubert Court then noted that nothing in Rule 702 establishes general acceptance in the relevant scientific community as an absolute prerequisite to admissibility.

 Having rejected Frye, the Court looked to the language of Rule 702 itself to provide a new standard for admissibility. The Court focused primarily on the words "scientific" and "knowledge" which respectively imply "a grounding in the methods and procedures of science" and "more than subjective belief or unsupported speculation." 113 S. Ct. at 2795.

 The Court wisely observed that scientific testimony need not be established to a certainty, as "arguably, there are no certainties in science." Id. The Court also cited with approval the brief of the American Association for the Advancement of Science and the National Academy of Sciences as Amici Curiae 7-8:

Science is not an encyclopedic body of knowledge about the universe. Instead it represents a process for proposing and refining theoretical explanations about the world that are subject to further testing and refinement.

 The Court offered district courts guidance as to how to respond to a proffer of expert scientific testimony. Rule 104(a) of the Federal Rules of Evidence provides the starting point:

Preliminary questions concerning the qualification of a person to be a witness, the existence of a privilege, or the admissibility of evidence shall be determined by the court . . . . In making its determination it is not bound by the rules of evidence except those with respect to privileges.

 Construing Rule 104(a) with Rule 702, the Court assigned trial judges the gatekeeping task of determining "at the outset, whether the expert was proposing to testify to (1) scientific knowledge that (2) will assist the trier of fact to understand or determine a fact in issue. This entails a preliminary assessment of whether the reasoning or methodology underlying the testimony is scientifically valid." 113 S. Ct. at 2796.

 The Court expressed confidence in district judges to undertake this review, and declined to set out a "definitive checklist or test," providing only "general observations" as to the inquiry. Id. The first factor the Court identified was whether the scientific knowledge "can be (and has been) tested" -- or testability. "Scientific methodology today is based on generating hypotheses and testing them to see if they can be falsified; indeed, this methodology is what distinguishes science from other fields of human inquiry." Id. (quoting Green, Expert Witnesses and Sufficiency of Evidence in Toxic Substances Litigation, 86 Nw. U.L. Rev. 643, 645 (1992)). A closely related factor was whether particular scientific techniques had known or potential error rates. Id. (citing United States v. Smith, 869 F.2d 348, 353-54 (7th Cir. 1989) (reviewing studies of the error rate of spectrographic voice identification methods)).

 The Court also noted that whether a theory or technique had been subjected to peer review and publication was significant, for these mechanisms increased the likelihood that methodological flaws would be detected. Trial courts should consider, however, that incorrect theories are sometimes published, while correct theories may be too new, too controversial or of too limited interest to be published.

 The final factor identified by the Court was "general acceptance." While Frye had been rejected as too inflexible, a widely known technique able to attract only minimal support might "properly be viewed with skepticism." 113 S. Ct. at 2797.

 While the Court characterized this inquiry as a flexible one, it noted that, given the lack of certainties in science the Daubert hearing should focus "solely on principles and methodology, not on the conclusions that they generate." Id. Finally, the Court observed that judges should be mindful of other applicable rules of evidence, particularly Rule 403, which permits the exclusion of relevant evidence "if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury." Id. at 2798.

 In the spirit of Daubert, this Court concludes that the inquiry conducted pursuant to Rule 104(a) represents primarily a legal question for the court, rather than a factual one for the parties. That is, while the proponent of the evidence may have the burden to come forward with factual evidence of such matters as the training received by FDEs, the question as to whether those facts suggest sufficient reliability is a purely legal question. See Paul C. Giannelli, Daubert: Interpreting the Federal Rules of Evidence, 15 Cardozo L. Rev. 1999, 2011 (1994) ("The court must make an independent assessment" of reliability. (emphasis supplied)).

 Believing that Daubert controlled the admissibility of the challenged testimony, the Court scheduled an in limine Daubert hearing. See, e.g., United States v. Williams, 583 F.2d 1194 (2d Cir. 1978) (pretrial hearing established spectrographic voice analysis admissible as identification evidence), cert. denied, 439 U.S. 1117, 59 L. Ed. 2d 77, 99 S. Ct. 1025 (1979); Government of the Virgin Islands v. Penn, 838 F. Supp. 1054, 1073-74 (D.V.I. 1993) (in limine Daubert hearing established DNA profiling test as sufficiently reliable).

 II. The Daubert Hearing

 The hearing commenced with the testimony of Mary Wenderoth Kelly, an FDE with more than a decade of experience, currently employed by the City of Cleveland Police Forensic Laboratory. Ms. Kelly holds a Bachelor of Science degree in general ethics from Ohio State University and a juris doctorate from Cleveland Marshal College of Law. She currently serves as the vice-president of the American Board of Forensic Document Examiners (the "Board"), which certifies document examiners. Kelly described her duties on the Board:

My primary duties involve the testing committee. I am chairperson of the testing committee and oversee the tests that are administered to the different candidates. I also sit as the professional review committee chairperson thus handling any kind of professional complaints that would be [lodged].

 (Transcript of Daubert Hearing ("Tr.") at 18.) Ms. Kelly also serves as a director for the American Society of Questioned Document Examiners and is a fellow in the questioned document section of the American Academy of Forensic Sciences.

 Kelly first described the recommended two-year training program for FDEs. (Id. at 25.) She also described the process by which approximately 200 FDEs have achieved certification, and retained it through the acquisition of continuing education credits. (Id. at 280-81.)

 Kelly testified that two basic principles underlie the field of forensic document examination. The first principle is that "no two people write exactly the same way." (Id. at 31.) For brevity the Court will refer to the differences between the handwriting of distinct writers as "inter-writer" differences. The second principle is that "no person will write exactly the same way when repeating." (Id. at 34.) These differences, referred to by FDEs, and hereinafter by the Court, as "natural variation," arise from the fact that human beings "lack the machine-like precision" to be able to reproduce even their signature the same way each time. (Id.)

 If forensic document examination does rely on an underlying principle, logic dictates that the principle must embody the notion that inter-writer differences, even when intentionally suppressed, can be distinguished from natural variation. How FDEs might accomplish this was unclear to the Court before the hearing, and largely remains so after the hearing.

 One can speculate that while natural variation involves primarily quantitative differences (such as small changes in the "slant" of successive "t's" produced by a single individual), inter-writer variations involve qualitative differences (such as loops produced in a clockwise, rather than counter-clockwise, fashion). *fn5" Both the Defendants and the Court sought elucidation on this issue, but met with little success:

Q: [Osborn, questioned Documents 147 (2d ed. 1929), states that] a slight but persistent difference [in] slant in two writings of considerable length may be evidence that the writings are by two different writers, while a pronounced difference might be the result of intended disguise. What would be the difference between a slight but persistent difference and a profound difference in slant?
Ms. Kelly: I think what he is getting to, once again, is that a slight but persistent difference in two writings might indicate it's two different writers. In order to make that evaluation, you would have to have sufficient known writing of both writers. . . .
Q: So slight in that context would be a comparative notion?
Ms. Kelly: It would seem like it, yes.
. . . .
Q: So a slight difference in slant would be?
Ms. Kelly: He is using it as a comparative between two different writers . . . .
Q: What degree of measure might be slight?
Ms. Kelly: . . . I wouldn't quantify it.

 (Tr. at 241-43.)

Court: Do these natural variations ever occur to a degree that's equal to [inter-writer] differences so that a document examiner would not be able to distinguish between a [natural] variation and a difference as I have defined those terms?
Ms. Kelly: I think, your Honor, the only time that might occur is if they were not provided with an adequate amount of known writing. If you only have a small amount of the writing, you can't properly evaluate those characteristics to determine if in fact they are differences or variations of a given writer . . . .
. . . .
Court: Do you know or are you aware of any quantitative, by which I mean numerical, standards which would form a document examiner's conclusion that a given feature in a questioned document is . . . not by the author of the known samples?
To make that precise, let me give you an example. Suppose, let us say, that if we look at a known writing and a reasonably fair sample of known writing by a particular writer, that that writer's small L is always slanted between say 8 to 12 degrees to the right, you find that in an adequate sample of known writing, is there some numerical standard which a document examiner would apply to determine what [] degree of slant in the questioned document would lead to the conclusion that the questioned document is by a different author? Would you need 14 [degrees], 17, 19, is the real question.
Ms. Kelly: Is there a way of measuring?
Court: Is there some numerical standard that document examiners use?
Ms. Kelly: I would say as a general rule, no, your Honor; there is no numerical measurement of that kind of slant.

 (Id. at 277-79.)

Q: A major difference between two samples of writing is different than a minor difference?
Ms. Kelly: I think I can agree that a major difference is more important than a minor difference.
Q: Is there a standard that document examiners use to distinguish what a minor difference would be from a major difference in two samples of handwriting?
Ms. Kelly: I think that gets back to the definition that I read as to what is significant, what a significant feature would be. And if that difference rises to something that becomes significant, then it would lead you to the conclusion that they were two different writers. If it's something that is only a minor difference, then I don't think you could reach that conclusion. Is that what you're getting at?

 (Id. at 145.)

 Ms. Kelly, who may not have accepted, or, perhaps, to have understood as posed to her, the problem of distinguishing inter-writer differences from natural variation, repeatedly insisted that with an adequate number of handwriting samples, FDEs, through unspecified methods, could reliably do what they purport to do.

Q: How is the notion of sufficient quantity judged?
Ms. Kelly: It's based on the training and experience of the particular examiner.
Q: So there are no standard measures of sufficient quantity?
Ms. Kelly: There is no way to measure that, no.

 (Id. at 202-03.)

 While not at all finding Ms. Kelly to be evasive, her responses were far from satisfactory in establishing the scientific reliability of FDEs' efforts:

Q: What would the appropriate weight of [an individual handwriting] characteristic be [for purposes of making an identification]?
Ms. Kelly: How individual that feature might be . . . . The divergence from the standard copy book form of maybe that particular letter or some particular habit that is very individual.
Q: How is the individuality of a feature determined?
Ms. Kelly: Once again, it goes to how much that might diverge from the standard copy book form of a given letter formation and it really is based on the training and experience of the examiner in evaluating that characteristic.
Q: So then it's a subjective determination of individuality?
Ms. Kelly: There is a subjective component, but once again there are objective components to ...

