UNITED STATES DISTRICT COURT SOUTHERN DISTRICT OF NEW YORK
June 11, 1984
THE PROCTER & GAMBLE COMPANY, Plaintiff, against CHESEBROUGH-POND'S INC., Defendant; CHESEBROUGH-POND'S INC., Plaintiff, against THE PROCTER & GAMBLE COMPANY and BENTON & BOWLES, INC., Defendants.
The opinion of the court was delivered by: GOETTEL
Advertising is a pervasive part of modern life. We are confronted with it wherever we turn. Most of us do not take advertising too seriously.
Those who do are the advertisers themselves, particularly when their competitors make comparative claims of superiority which may influence consumer purchases.
In these actions, two of the nation's largest consumer product manufacturers, The Procter & Gamble Company ("P&G")
and Chesebrough-Pond's Inc. ("Chesebrough"),
have sued one another. Each alleges that the other's comparative advertising claims concerning certain products are false.
Filed just a few business hours apart, the two actions have been consolidated, and evidentiary hearings lasting more than seven days have been held to consider the parties' cross-motions for preliminary injunction.
The following opinion constitutes the Court's findings of fact and conclusions of law on these cross-motions.
The Skin Lotions
The products involved are skin lotions, which are referred to by the parties as "hand and body lotions" to distinguish them from facial lotions. They are consumer products sold in stores throughout the country and shipped in interstate commerce. No prescription is necessary for their purchase and no restriction has been placed on their usage.
Two of these skin lotions figure prominently in this litigation. Chesebrough's product, Vasoline Intensive Care Lotion ("VICL"), has a leading sixteen percent share of the skin lotion market, and P&G's product, Wondra, commands just over five percent of the market.
Recently, a "new and improved" formulation of the latter product ("New Wondra" or "Wondra V") has been marketed and extensively advertised.
These and the other skin lotions are designed to counter dry, rough skin, a condition which has a number of causes. Experts consider one of the causes to be genetic. Another is exposure to water, wind, and sun. A third is contact with detergents, particularly those for clothing and dishes.
To counter dry, rough skin, the lotions work on the basis of one or both of two methods. The first is occlusivity, whereby an impermeable layer prevents the loss of water from the skin. Petroleum jelly, lanolin (from sheep's oil), and other relatively greasy substances are effective in creating such an impermeable layer. The second method relies upon the use of a humectant, a chemical that permeates to the stratum corneum of the skin and there attracts and holds water. The most commonly used humectant is glycerin, an ingredient of skin lotion for more than fifty years.
Indeed, the claimed improvement in New Wondra was the addition of more glycerin.
Glycerin and the effective occlusive agents tend to be greasy, however, and the consumers who use these products, overwhelmingly women, customarily reject products that look or feel greasy. The manufacturer's task, therefore, has been to create a product that contains the effective agents of occlusive or humectant products yet rubs into the skin easily and leaves no greasy coat.
In the latter respect, the products are more cosmetic than medicinal.
Although there are approximately 100 brands of skin lotion besides Chesebrough's VICL and P&G's Wondra, only VICL has a substantial position in the market. Each of the other Chesebrough skin lotions -- Intensive Care Extra Strength ("VICL Extra Strength"), Intensive Care Herbal and Aloe, and Vasoline's Dermatology Formula Lotion ("VDL") -- has a relatively small market share, so that the four Chesebrough products together command only about 25% of the market.
VICL is Chesebrough's most heavily promoted product.In its advertising, Chesebrough claims that "no leading lotion beats" VICL. Following the reformulation of Wondra (known as Wondra V within the company, but called New Wondra for advertising purposes) and the completion of certain consumer tests discussed more fully below, P&G began intensive advertising of New Wondra in the summer of 1983. These advertisements proclaim that New Wondra is more effective because of its additional glycerin and that clinical tests have established that New Wondra relieves dry skin better than any other leading lotion.
Thus, with P&G claiming that its product is superior to all other lotions and Chesebrough claiming that nobody's product is better than VICL, we have a situation in which at least one of the advertising claims most logically be wrong. Looking to the Court to determine which claim is misleading, each party contends that, it is being injured by the other's advertising and that because the injury cannot be concretely measured, the harm is irreparable.
Consumer Product Testing
Both companies have conducted extensive tests to support their advertising claims. The evidence presented at the hearings in this matter concerned primarily the propriety and accuracy of the testing methods employed by each company. The methods range from small-scale tests, often done with expert panels who give subjective responses to the use of the product, to large-scale clinical tests in which numerous consumers participate. In between are various other types of tests, including one known as the Kligman regression, which is named after a doctor at the University of Pennsylvania who has been active in attempting to make testing methods more scientific.
While it is generally agreed that a fairly large-scale clinical test offers the best opportunity for demonstrating differences in efficacy, maintaining controlled laboratory conditions when conducting such tests is very difficult. Questions also arise in determining who should be included in the test population. Should it be composed primarily of adult women, who are the major users of the product, or should it be drawn from the population at large? Should the participants be lotion users and persons with an existing problem (such as "dishpan hands")? In this regard, it seems fairly obvious that participants in the test must start with some degree of rough, dry skin if there is to be any possibility of improvement. Also, because the tests used are relatively crude, it is generally agreed that those with fairly extensive problems make better subjects because the differences in improvement are more likely to be measureable. Nonetheless, it is also essential to exclude people who have rough, dry skin that is caused by skin diseases, rather than by the more common, everyday causes. Finally, the parties also agree that large-scale clinical tests are best conducted on a "double blind" basis (meaning that neither the subject nor the grader knows which product has been applied),
and that, where the subjects are using different products, an attempt should be made to classify the subjects by initial skin condition, age, and detergent exposure, so that every product is tested against a relatively random sample.
What the real controversy in this case concerns is the means used to evaluate the condition of the skin at the start of the testing and at the various grading points thereafter. No reliable mechanical or electrical tests for accomplishing this exist. The evaluation, therefore, must be made visually.
Toward this end, the manufacturers employ "graders." These are persons who are trained and schooled in evaluating skin condition by sight and touch. In their grading, they assign numerical values to the condition of the skin, using various scales ("parametric systems"), some of which set forth verbal descriptions of the equivalent skin condition at various points on the scale. The graders need not be dermatologists or even doctors, but a dermatologist is necessary for the initial screening in order to eliminate potential subjects who have specific skin diseases.
Both companies employed dermatologists as graders in their large-scale clinical tests. Having heard considerable testimony concerning the grading process, the Court is convinced that a skilled grader can determine with reasonable accuracy the relative condition of a subject's skin and compare it to that of other people. The Court also concludes, however, that the numerical designations do not necessarily correspond to the verbal descriptions on the scale. Moreover, although the graders are internally consistent, which is to say that one of them will give approximately the same score to all skin having the same condition, that score will not necessarily be the same as that which is given by another grader. Thus, for example, what one grader may consistently rate as 3.5 another grader may consistently rate as 4.5.
This weakness in the grading system creates a problem with any statistical analysis that is done on the scores. Those who believe that it is improper to apply a parametric system to the condition of skin find little basis for statistical analysis.
In addition, even allowing for the difficulty in a numerical grading system, and the subjectiveness of the grading, it is inherently difficult to detect comparative differences between two or more effective products unless a placebo is also tested. Yet, if a placebo is incorporated into the tests, it increases the number of subjects required and thus makes the testing even more expensive.
Beyond this, a major dispute between the parties is whether testing should be done on a controlled clinical basis or on an ad libitum basis. Chesebrough contends that, in conducting studies designed to compare the effectiveness of skin lotions, it is important to control as many variables as possible to ensure that the study reflects the effects of the products tested instead of the effects of unknown or uncontrolled variables. Chesebrough continues by arguing that an ad libitum test, which allows subjects to use as much of the product as often as they wish, amounts to nothing more than a consumer preformance test, and that the better scores for New Wondra are due to the fact that more of the product was applied. P&G contends that ad libitum testing is superior because it mirrors the actual use of skin lotions, for which there is no prescribed dosage. Although P&G acknowledges that the instructions to the subjects to apply the product wherever and whenever they would normally use a lotion might have resulted in its product being used more frequently or in larger doses than Chesebrough's, P&G argues that this result indicates the effectiveness of the product because it is designed to encourage extensive consumer use.
It is apparent from the testimony that, if the intent is to determine the abstract efficacy of one product as compared to another, controlled clinical tests, such as would be used with a prescription drug, are most appropriate. However, if the product is a non-prescription product that is in part a cosmetic, and if the intent is to induce the consumer to use large amounts of the effective ingredients, then the "effectiveness" of the product (in terms of its ultimate goal) may be validly tested in an ad libitum manner. It must be noted, however, that there is a substantial school of scientific thought that holds that, while such tests may be appropriate for product efficiency, they are not suitable for comparative advertising.
P&G's advertising claims are based primarily on two clinical studies: SC-207 and SC-215. SC-207 was conducted in Tucson, Arizona, in November of 1981. The test compared the efficacy of four products: New Wondra, VICL, Wondra GD (the Wondra formula being marketed at the time), and a placebo. SC-215 was conducted in Chicago Heights, Illinois, in January and February of 1983. Six products were tested: New Wondra, Jergen's Extra Dry and Soft Sense Extra Moisturizing (the two most popular lotions after VICL), Wondra E (the formula of Wondra being marketed at the time), and Chesebrough's VICL Extra Strength and VDL.
The test design of SC-207 and SC-215 was the same as that used in earlier comparative efficacy tests of earlier formulas of Wondra. In four of those tests, the prior Wondra formulations had been found to be either less effective than or only equally effective as the other leading products on the market, including VICL. For three of these tests and for SC-207 and SC-215, P&G employed a dermatologist Dr. Frank Dunlap,
to grade the subjects' hand conditions. In all of these tests, P&G used when is known as a parallel design test, in which a large number of subjects are selected and then divided into as many treatment groups as there are products to be tested. Those in a treatment group use only one product.
The subjects in SC-207 and SC-215 were selected randomly to create a representative group of users of hand and body lotions. To insure that some dry skin was present, subjects were required to have a certain minimum combined score for the dorsal (back) surfaces of the left and right hands.
The subjects were then classified according to age, initial skin condition, and dishwashing frequency, and were randomly assigned to the products to be compared.
Both SC-207 and SC-215 were properly "double-blinded." The subjects and the examiner did not know which products they were using. All of the products were placed in identical containers labeled only with random subject numbers and the words "skin lotion." The type of container was similar to the type most commonly used for hand and body lotions.
Analysis of the scores given by the dermatologist in SC-207 showed New Wondra to be better than either VICL, the old Wondra formulation (GD), or the placebo. The superiority was sufficiently demonstrated to be considered statistically significant.
Analysis of the scores in SC-215 showed New Wondra to be better than the other five products tested but not necessarily at a statistically significant level. When P&G went further, however, and analyzed subsections of the SC-215 data, it found that if only those with demonstrably rough, dry skin, were compared, New Wondra's superiority over all the competing products was again demonstrated at a statistically significant level. Consequently, the advertising claims were qualified to refer only to the treatment of dry, rough skin.
Chesebrough has numerous criticisms of the P&G tests. Its primary challenge is that VICL was not included in the second, large clinical test, even though there had been a change in the formula of New Wondra during the period between the two tests. Chesebrough claims that P&G has thus failed to establish the efficacy of New Wondra as compared with VICL. In response, P&G argues that there was no change in the effective ingredients in the Wondra formula during this two-year period. Indeed, just over three percent of the old non-active ingredients were removed and just under two percent new ingredients were added. (The difference of betweeen one and two percent was made up by adding more water.)
Chesebrough points out, however, that because the product is four-fifths water, fifteen percent of the non-water ingredients have been eliminated and ten percent new ingredients added. Although these changes were not made to the active ingredients, they could have affected the physical characteristics of the emulsion, which, in turn, may have affected the amount of product used and the way in which the consumer applied it. P&G ran several limited tests to determine whether the product's effectiveness had been altered
and concluded that there had been no change.
Chesebrough argues, however, that without a valid clinical comparison equivalency cannot be established. Having heard extensive testimony on the point, the Court can only say that, although the changes could have affected how the product is applied (for better or for worse), neither side has actually demonstrated the importance or lack of importance of the changes.
An additional argument made by P&G is that, because SC-215 demonstrated New Wondra's superiority over Chesebrough's VICL Extra Strength and VDL, and because both of those products are touted by Chesebrough as being superior to VICL, it follows that New Wondra must also be better than VICL. The problem with this argument is that, while the evidence established that both VICL Extra Strength and VDL have more effective ingredients than VICL, it also demonstrates that these two products are thicker and perhaps greasier than VICL and may not invite the same amount of usage. Under the ad libitum conditions that prevailed in SC-215, the amount of usage of these products could differ from that of VICL and did differ from that of New Wondra. It cannot be said, therefore, that, simply because these two products have more intrinsic efficacy, they should perform more effectively than VICL in an ad libitum test.
Another serious challenge offered by Chesebrough is that P&G's first parametric statistical analysis of the total test population in SC-215 did not show to a statistically significant degree that New Wondra was better than the other tested lotions, and that subsequent reliance upon analyses made of subsets of the total population was not justified. P&G argues that such reliance was proper and that it has narrowed its advertising campaign to conform with the subset involved: those with ver rough, dry skin. Chesebrough's response is that this qualified advertising claim does not conform with the verbal equivalents on the 0-5 scale, which do not mention roughness for any scores below 2.5.
Indeed, it does appear that the only basis for P&G's having chosen 1.5 as the minimum indication of roughness was that it was the mid-point score for all subjects tested. In other words, starting with a somewhat select population in the first place (women who use skin lotions), New Wondra proved to be a superior product to a statistically significant degree only with that half that had the worst problems. Of course, as noted earlier, considering the limitations of this type of testing, product efficacy can best be demonstrated by studying those who have significant problems and can, therefore, manifest the most change.
Chesebrough is also very critical of the manner in which the statistical analysis was done. As provided in the protocols, the data from SC-207 and SC-215 were subjected to an analysis of covariance.This is a parametric analysis that is used in studies involving grading scales of various sorts. Chesebrough claims that use of this analysis was inappropriate because it could find no documents showing that the assumptions justifying the use of this analysis had been met. There was testimony, however, that appropriate tests had been performed and that they indicated the appropriateness of this analysis.
Chesebrough also criticizes the failure to use weather statistics as a co-variable in the statistical analyses. Unquestionably, weather has a pronounced affect upon the subject's skin. Cold, dry weather tends to make the skin lose its natural moisture, whereas warm, moist weather does not. The flaw in this criticism, however, is that the weather was the same for all subjects using every product. Consequently, while weather might affect skin condition of the subjects independently of the products used (and, indeed, the tests reflect this), the overall effect should be negligible because of the random selection of the subjects.
Chesebrough further argues that the results of the subset may be statistically significant but they are not "clinically significant." By that it means that the differences are not great enough to be visually or tangibly noticeable. The problem with this argument is that there is no established criteria to make such a determination, and P&G is not claiming that the tests showed a "clinically significant" difference.
Finally, of course, Chesebrough argues against the use of any ad libitum tests whatsoever. That point has already been discussed, however. See supra pp. 7-9.
From the foregoing, we can conclude that the P&G tests were far from perfect and are subject to various infirmities. We cannot conclude, however, that they were worthless. The question of whether advertising based on such tests is illegal is considered later.
Chesebrough's claims of parity for VICL
were made before New Wondra became commercially available in quantity in November of 1983, because P&G had carefully guarded the secrecy of its new ad campaign. Consequently, Chesebrough had to run its tests hastily. It did not withdraw its ads while awaiting the results of these tests -- two small tests and a third large-scale clinical test, which were conducted by independent testers using different methodologies to compare the effectiveness of New Wondra with that of VICL.
Very little evidence was introduced concerning the first two small-scale tests.
Although they purported to show no significant differences between the products, these two tests were probably not of sufficient size to detect any such differences.
Chesebrough's large-scale clinical test was conducted by a consumer product testing service in Killington, Vermont, in December of 1983.The test took place over ten days and required seventy-three subjects to apply New Wondra to one hand and VICL to the other. The subjects were instructed to apply the lotion twice a day. On days one three, five, eight, and ten, grading was done by Dr. Donald I. McIntyre, a dermatologist of some experience. He used a numerical scale with whole number intervals from one to nine, with the higher numbers indicating increased dryness, roughness, etc. The participants were ski instructors and other employees of a ski lodge, most of whom were in their twenties and had substantial skin problems. In order to be selected for the test, subjects had to have grades of seven on the nine-point scale and at least one of the following attributes: scaling, peeling or flaking, erythema, or cracking or fissures. Analysis of the results of the grading revealed no statistically significant difference between the products.
P&G criticizes several aspects of Chesebrough's tests, however. The two small-scale tests are dismissed out of hand for reasons already described. As for the Killington clinical test, P&G makes a number of points. Although the subjects were graded and double-blinded as in P&G tests, P&G contends that it was not appropriate to use New Wondra on one hand and regular VICL on the other.
The instructions that were given were long and somewhat complicated. There was a substantial risk that subjects would forget and put the wrong product on the wrong hand. Also, there were substantial possibilities of contamination since New Wondra was applied on the New Wondra test hand by the VICL test hand and vice versa, a risk that was exacerbated by the fact that the self-applications were usually unsupervised. In addition, there is the fact that the subjects, because of their severe skin conditions, their ages, and their occupations, were not representative users of skin lotions. Furthermore, the grading was done by showing the right and left hands of each subject to the grader consecutively, which increased the chance of a bias for parity.
To compound this grading problem, the nine-point grading scale in the Chesebrough protocol did not have specific verbal descriptions for any grade other than grades 1 and 9. There were no intermediate descriptions. In addition, Dr. McIntyre used one grading scale on the first day and another on subsequent grading days.
With the first he separated the nine numbers into three groups to help him determine who would qualify for the study. Then he used the second nine-point scale, the one described above, to determine the improvement of the hands as the study progressed.This use of two different scales may have resulted in the application of different criteria to the initial scores than to subsequent scores. Not surprisingly, some uncertainty in grading was shown in the tabulated results. For example, there was a dramatic but unexplained improvement in the condition of all the hands graded by Dr. McIntyre between day eight and day ten, far greater than what had previously occurred and inexplicable on any basis other than the grader's inconsistency.
Another weakness in the execution of the Killington study was that the treatment period was too short. Indeed, the Killington test was far shorter than any of the other tests conducted by Chesebrough or P&G. Concluded before the Christmas holidays, the test did not permit a determination of whether the dramatic changes registered between days eight and ten were real and would continue.
Since this was a controlled test, the participants were each given a measured amount of lotion, which was considered to be a minimum dosage. This, of course, made the Killington test substantially different from the ad libitum tests conducted by P&G.
Interestingly, although P&G criticizes the Killington test in many respects, the majority of the participants found in favor of P&G's New Wondra on each day in the non-parametric portion of the test. This was the part of the test in which the users were simply asked to give their own subjective evaluation of the overall effectiveness of each product, the relative softness of the skin on each hand, and the relative degree of relief of tautness. The participant's preference for New Wondra was deemed statistically significant for all but the last couple of evaluations.
The Court concludes that Chesebrough's tests were more questionable than P&G's. They have been used, however, to support a lesser advertising claim of parity, not one of superiority. Moreover, Chesebrough's conclusion that "nothing beats Intensive Care" derives not merely from the Killington test but also from numerous other studies, including those made by various respectable, outside consultants. These studies have consistently shown that there is no clinically significant difference between any of the products in their ability to relieve dry skin.
Of course, whether differences are "significant" depends upon how fine a line is being drawn. Thus, in the final analysis, this case becomes little more than a dispute over testing methods, with neither side able to show fraud, deception, or bad faith on the part of its competitor.
Section 43(a) of the Lanham Trade-Mark Act, 15 U.S.C. § 1125(a) (1982) (the "Lanham Act"), was not addressed primarily to advertising. Indeed, for several decades after its passage, it was rarely invoked with respect to advertising,
and even then almost never to challenge comparative claims.
When the Lanham Act was used to challenge comparative advertising, the general response of the courts was to restrict its application. See, e.g., Bernard Food Industries, Inc. v. Dietene Co., 415 F.2d 1279, 1283-84 (7th Cir. 1969) (defendant's false comparative advertising claim found not to constitute a section 43(a) violation), cert. denied, 397 U.S. 912, 25 L. Ed. 2d 92, 90 S. Ct. 911 (1970). Only ten years ago, this circuit held that it was not a violation of the Lanham Act to sell water-damaged goods as if they were first quality goods as long as no affirmative misrepresentations as to their quality were made. Alfred Dunhill Ltd. v. Interstate Cigar Co., 499 F.2d 232, 237-38 (2d Cir. 1974). As a result, we find law review articles as late as 1976 bemoaning the failure of the judiciary to apply the Lanham Act to comparative advertising and exhorting the courts "to fashion a comprehensive set of remedies for comparative advertising abuses." Note, The Law of Comparative Advertising: How Much Worse is "Better" Than "Great", 76 Colum. L. Rev. 80, 112 (1976).
That challenge was taken up in this circuit six years ago in American Home Products Corp. v. Johnson & Johnson, 577 F.2d 160 (2d Cir. 1978). In that case, Judge Oakes asserted flatly: "That section 43(a) of the Lanham Act encompasses more than literal falsehoods cannot be questioned." Id. at 165. The court held that advertising claims made by the manufacturers of Anacin for relief of pain and inflamation were, in light of the consumers' interpretation of other claims, ultimately false and enjoinable under the Lanham Act. Id. at 169-70.
The next step in the development of the law concerning comparative advertising was Vidal Sassoon, Inc. v. Bristol-Myers Co., 661 F.2d 272 (2d Cir. 1981). There, Judge Kaufman began by acknowledging that "[o]ne of the most delicate tasks a court faces is the application of the legislative mandate of a prior generation to novel circumstances created by a culture grown more complex." Id. at 273. The issue before the court was whether the Lanham Act's prohibition against false advertising included misrepresentations regarding the results and methods of tests purporting to reflect consumer preferences.The court noted that
[n]othing in the history of either the 1920 Act or the 1946 amendments speaks to consumer testing. This silence is hardly surprising, given that the growth in consumer testing for comparative advertising claims occurred only with the advent of television and increasing sophistication of marketing techniques during the 1950's and 1960's.
Id. at 277. The court went on to conclude: "We are therefore reluctant to accord the language of § 43(a) a cramped construction, lest rapid advances in advertising and marketing methods outpace technical revisions in statutory language and finally defeat the clear purpose of Congress in protecting the consumer." Id. The court further noted that, while the Lanham Act literally applied only to misrepresentations concerning the "inherent quality of characteristic" of a product, if the intent and effect of an advertisement was to lead consumers into believing that a product was comparatively superior, then the statement of superiority amounted to a representation concerning the product's inherent quality. Id. at 278. Consequently, the court concluded:
In a case like this, where many of the qualities of a product (such as "body") are not susceptible to objective measurement, it is difficult to see how the manufacturer can advertise its product's "quality" more effectively than through the dissemination of the results of consumer preference studies. In such instances, the medium of the consumer test truly becomes the message of inherent superiority. We do not hold that every misrepresentation concerning consumer test results or methodology can result in liability pursuant to § 43(a). But where depictions of consumer test results or methodology are so significantly misleading that the reasonably intelligent consumer would be deceived about the product's inherent quality or characteristics, an action under § 43(a) may lie.
The Second Circuit's concern for the gullible consumer was further demonstrated in the case of Coca-Cola Co. v. Tropicana Products, Inc., 690 F.2d 312 (2d Cir. 1982). There the court enjoined advertisements stating that "Tropicana's product was "pasteurized juice as it comes from the orange," since the court was apparently concerned that consumers might believe that oranges contain pasteurized juice.
Id. at 318.
In the instant actions, the parties attempt to go a significant step further by attacking advertisements that are not obviously false but that rest upon tests whose efficacy is questioned. Essentially, the Court is being called upon to evaluate the standards for conducting tests intended to form the basis of comparative advertising claims.
In theory at least, a respectable argument can be made for the wisdom of creating such standards. The Court, however, listened for more than seven days to the testimony of more than a dozen expert witnesses
-- statisticians, dermatologists, chemists, and physicists -- and found that much of their testimony was incomprehensible.
Indeed, it is doubtful that there are many, if any, trial judges who could fully comprehend the testimony.Courts generally lack the expertise of the Federal Trade Commission when it comes to evaluating advertising practices.American Home Products Corp., supra, 577 F.2d at 172 n.27. Although it is, of course, improper to explicitly misrepresent the results of tests or the manner in which they are carried out, an advertiser is not required to disclose all aspects of his test findings, provided the non-disclosure does not render the advertising misleading. See, e.g., FTC v. Sterling Drug, Inc., 317 F.2d 669, 675-76 (2d Cir. 1963). Not every misrepresentation concerning consumer test methodology results in Lanham Act liability -- only those so significantly misleading that consumers would be deceived about a product's inherent quality or characteristics.Vidal Sasson, supra, 661 at 278.
Here, we are confronted with somewhat inconsistent product claims based on tests that were conducted in apparent good faith but with somewhat differing results. The difference in the results, in turn, was partially caused by the different test protocols that the parties chose. As a consequence, neither of the parties has successfully proven that the other has chosen tests and conducted them in such a manner as to mislead the public. Courts are not always able to determine whether an advertising claim is true or false, see, e.g., American Home Products Corp. v. Johnson & Johnson, 436 F. Supp. 785, 795 (S.D.N.Y. 1977), aff'd, 577 F.2d 160 (1978), and where this occurs, the only possible conclusion is that the moving party has failed to prove by a preponderance of the evidence that the advertising claim in false. Such is the case here. Moreover, the Court is most skeptical of the parties' contention that it is, or should be, the duty of a court in a case such as this to determine the winner and enjoin the loser. Only by making policy for the testing of consumer goods could a court take such a step. While there are those who believe that for every wrong there must be a remedy and that courts should intervene where the executive branch and the legislature have not, there are substantial constitutional objections to judicial policy-making under our form of government. Wilkey, Activism by the Branch of Last Resort: Of the Seizure of Abandoned Swords and Purses at 12 (National Legal Center for the Public Interest 1984).Judge Wilkey notes that courts have no facilities for holding public hearings to gather the information and facts on which public policy decisions should be based, and that judges are not adequately trained to make such policy decisions. Id. at 12-14. As his colleague on the D.C. Circuit, Judge Robert Bork, has noted, the proliferation of cases like these is changing the nature of courts from that of a judicial body to that of a bureaucratic model. R. Bork, Dealing with the Overload in Article III Courts, address delivered at the National Conference on the Causes of Popular Dissatisfaction with the Administration of Justice, 70 F.R.D. 231, 233-34 (1976).
One does not have to oppose judicial activism to recognize that the role that these parties ask the judiciary to play exceeds that which the judiciary has the power to accept under our form of government. We are dealing with rough tests that have no certifiable standards and that rest upon nothing more than subjective evaluations of skin conditions. The conditions being evaluated are not of serious import, and the products being evaluated are far from the most effective available to achieve the results desired. The parties are sparring to obtain commercial advantage over what is at most a cosmetological distinction. Thus, if any injunctive relief were called for, it would be an order requiring both parties to remove from their advertisements any implication that their products are the most effective available, for they are really nothing more than the most acceptable adaptations for female users.
Finding that neither party has shown a likelihood of success on the merits of its claim, both parties' motions for preliminary injunction are denied.