UNITED STATES DISTRICT COURT SOUTHERN DISTRICT OF NEW YORK
July 1, 2008
VIACOM INTERNATIONAL INC., ET AL., PLAINTIFFS,
YOUTUBE INC., YOUTUBE LLC, AND GOOGLE INC., DEFENDANTS.
THE FOOTBALL ASSOCIATION PREMIER LEAGUE LIMITED, ET AL., ON BEHALF OF THEMSELVES AND ALL OTHERS SIMILARLY SITUATED, PLAINTIFFS,
YOUTUBE INC., YOUTUBE LLC, AND GOOGLE INC., DEFENDANTS.
The opinion of the court was delivered by: Louis L. Stanton, Usdj
OPINION AND ORDER
Plaintiffs in these related lawsuits (the "Viacom action" and the "Premier League class action") claim to own the copyrights in specified television programs, motion pictures, music recordings, and other entertainment programs. They allege violations of the Copyright Act of 1976 (17 U.S.C. § 101 et seq.) by defendants YouTube*fn1 and Google Inc., who own and operate the video-sharing website known as "YouTube.com". Plaintiffs claim, as set forth in Viacom's First Amended Complaint ¶¶ 30-31, that:
Defendants encourage individuals to upload videos to the YouTube site, where YouTube makes them available for immediate viewing by members of the public free of charge. Although YouTube touts itself as a service for sharing home videos, the well-known reality of YouTube's business is far different. YouTube has filled its library with entire episodes and movies and significant segments of popular copyrighted programming from Plaintiffs and other copyright owners, that neither YouTube nor the users who submit the works are licensed to use in this manner. Because YouTube users contribute pirated copyrighted works to YouTube by the thousands, including those owned by Plaintiffs, the videos "deliver[ed]" by YouTube include a vast unauthorized collection of Plaintiffs' copyrighted audiovisual works. YouTube's use of this content directly competes with uses that Plaintiffs have authorized and for which Plaintiffs receive valuable compensation. . . . .
When a user uploads a video, YouTube copies the video in its own software format, adds it to its own servers, and makes it available for viewing on its own website. A user who wants to view a video goes to the YouTube site . . . enters search terms into a search and indexing function provided by YouTube for this purpose on its site, and receives a list of thumbnails of videos in the YouTube library matching those terms . . . and the user can select and view a video from the list of matches by clicking on the thumbnail created and supplied by YouTube for this purpose. YouTube then publicly performs the chosen video by sending streaming video content from YouTube's servers to the user's computer, where it can be viewed by the user. Simultaneously, a copy of the chosen video is downloaded from the YouTube website to the user's computer. . . . Thus, the YouTube conduct that forms the basis of this Complaint is not simply providing storage space, conduits, or other facilities to users who create their own websites with infringing materials. To the contrary, YouTube itself commits the infringing duplication, distribution, public performance, and public display of Plaintiffs' copyrighted works, and that infringement occurs on YouTube's own website, which is operated and controlled by Defendants, not users. (Viacom's brackets).
Plaintiffs allege that those are infringements which YouTube and Google induced and for which they are directly, vicariously or contributorily subject to damages of at least $1 billion (in the Viacom action), and injunctions barring such conduct in the future.
Among other defenses, YouTube and Google claim the protection afforded by the Digital Millennium Copyright Act of 1998 ("DMCA") (17 U.S.C. §§ 512(c)-(d), (i)-(j)), which among other things limits the terms of injunctions, and bars copyright-damage awards, against an online service provider who: (1) performs a qualified storage or search function for internet users; (2) lacks actual or imputed knowledge of the infringing activity; (3) receives no financial benefit directly from such activity in a case where he has the right and ability to control it; (4) acts promptly to remove or disable access to the material when his designated agent is notified that it is infringing; (5) adopts, reasonably implements and publicizes a policy of terminating repeat infringers; and (6) accommodates and does not interfere with standard technical measures used by copyright owners to identify or protect copyrighted works.
Plaintiffs move jointly pursuant to Fed. R. Civ. P. 37 to compel YouTube and Google to produce certain electronically stored information and documents, including a critical trade secret: the computer source code which controls both the YouTube.com search function and Google's internet search tool "Google.com". YouTube and Google cross-move pursuant to Fed. R. Civ. P. 26(c) for a protective order barring disclosure of that search code, which they contend is responsible for Google's growth "from its founding in 1998 to a multi-national presence with more than 16,000 employees and a market valuation of roughly $150 billion" (Singhal Decl. ¶¶ 3, 11), and cannot be disclosed without risking the loss of the business.
1. Search Code
The search code is the product of over a thousand person-years of work. Singhal Decl. ¶ 9. There is no dispute that its secrecy is of enormous commercial value. Someone with access to it could readily perceive its basic design principles, and cause catastrophic competitive harm to Google by sharing them with others who might create their own programs without making the same investment. Id.
¶ 12. Plaintiffs seek production of the search code to support their claim that "Defendants have purposefully designed or modified the tool to facilitate the location of infringing content." Pls.' Reply 10. However, the predicate for that proposition is that the "tool" treats infringing material differently from innocent material, and plaintiffs offer no evidence that the search function can discriminate between infringing and non-infringing videos.
YouTube and Google maintain that "no source code in existence today can distinguish between infringing and non-infringing video clips -- certainly not without the active participation of rights holders" (Defs.' Cross-Mot. Reply 11), and Google engineer Amitabh Singhal declares under penalty of perjury that:
The search function employed on the YouTube website was not, in any manner, designed or modified to facilitate the location of allegedly infringing materials. The purpose of the YouTube search engine is to allow users to find videos they are looking for by entering text-based search terms. In some instances, the search service suggests search terms when there appears to be a misspelling entered by the user and attempts to distinguish between search terms with multiple meanings. Those functions are automated algorithms that run across Google's services and were not designed to make allegedly infringing video clips more prominent in search results than non-infringing video clips. Indeed, Google has never sought to increase the rank or visibility of allegedly infringing material over non-infringing material when developing its search services.
Singhal Reply Decl. ¶ 2.
Plaintiffs argue that the best way to determine whether those denials are true is to compel production and examination of the search code. Nevertheless, YouTube and Google should not be made to place this vital asset in hazard merely to allay speculation. A plausible showing that YouTube and Google's denials are false, and that the search function can and has been used to discriminate in favor of infringing content, should be required before disclosure of so valuable and vulnerable an asset is compelled.
Nor do plaintiffs offer evidence supporting their conjecture that the YouTube.com search function might be adaptable into a program which filters out infringing videos. Plaintiffs wish to "demonstrate what Defendants have not done but could have" to prevent infringements, Pls.' Reply 12 (plaintiffs' italics), but there may be other ways to show that filtering technology is feasible*fn2 and reasonably could have been put in place.
Finally, the protections set forth in the stipulated confidentiality order are careful and extensive, but nevertheless not as safe as nondisclosure. There is no occasion to rely on them, without a preliminary proper showing justifying production of the search code.
Therefore, the cross-motion for a protective order is granted and the motion to compel production of the search code is denied.
2. Video ID Code
Plaintiffs also move to compel production of another undisputed trade secret, the computer source code for the newly invented "Video ID" program. Using that program, copyright owners may furnish YouTube with video reference samples, which YouTube will use to search for and locate video clips in its library which have characteristics sufficiently matching those of the samples as to suggest infringement. That program's source code is the product of "approximately 50,000 man hours of engineering time and millions of dollars of research and development costs", and maintaining its confidentiality is essential to prevent others from creating competing programs without any equivalent investment, and to bar users who wish to post infringing content onto YouTube.com from learning ways to trick the Video ID program and thus "escape detection." Salem Decl. ¶¶ 8-12.
Plaintiffs claim that they need production of the Video ID source code to demonstrate what defendants "could be doing -- but are not -- to control infringement" with the Video ID program (Pls.' Reply 6). However, plaintiffs can learn how the Video ID program works from use and observation of its operation (Salem Decl. ¶ 13), and examination of pending patent applications, documentation and white papers regarding Video ID (id.), all of which are available to them (see Defs.' Opp. 7). If there is a way to write a program that can identify and thus control infringing videos, plaintiffs are free to demonstrate it, with or without reference to the way the Video ID program works. But the question is what infringement detection operations are possible, not how the Video ID source code makes it operate as it does. The notion that examination of the source code might suggest how to make a better method of infringement detection is speculative. Considered against its value and secrecy, plaintiffs have not made a sufficient showing of need for its disclosure.
Therefore, the motion to compel production of the Video ID code is denied.
3. Removed Videos
Plaintiffs seek copies of all videos that were once available for public viewing on YouTube.com but later removed for any reason, or such subsets as plaintiffs designate (Pls.' Reply 41). Plaintiffs claim that their direct access to the removed videos is essential to identify which (if any) infringe their alleged copyrights. Plaintiffs offer to supply the hard drives needed to receive those copies (id. 41), which defendants store on computer hard drives.
Defendants concede that "Plaintiffs should have some type of access to removed videos in order to identify alleged infringements" (Defs.' Opp. 27), but propose to make plaintiffs identify and specify the videos plaintiffs select as probable infringers by use of data such as their titles and topics and a search program (which defendants have furnished) that gives plaintiffs the capacity both to run searches against that data and to view "snapshots" taken from each removed video. That would relieve defendants of producing all of the millions of removed videos, a process which would require a total of about five person-weeks of labor without unexpected glitches, as well as the dedication of expensive computer equipment and network bandwidth. Do Decl. ¶¶ 5-7.
However, it appears that the burden of producing a program for production of all of the removed videos should be roughly equivalent to, or at least not significantly greater than, that of producing a program to create and copy a list of specific videos selected by plaintiffs (see Davis Decl. ¶ 21).
While the total number of removed videos is intimidating (millions, according to defendants), the burden of inspection and selection, leading to the ultimate identification of individual "works-in-suit", is on the plaintiffs who say they can handle it electronically.
Under the circumstances, the motion to compel production of copies of all removed videos is granted.
4. Video-Related Data from the Logging Database
Defendants' "Logging" database contains, for each instance a video is watched, the unique "login ID" of the user who watched it, the time when the user started to watch the video, the internet protocol address other devices connected to the internet use to identify the user's computer ("IP address"), and the identifier for the video. Do Sept. 12, 2007 Dep. 154:8-21 (Kohlmann Decl. Ex. B); Do Decl. ¶ 16. That database (which is stored on live computer hard drives) is the only existing record of how often each video has been viewed during various time periods. Its data can "recreate the number of views for any particular day of a video." Do Dep. 211:16-21.
Plaintiffs seek all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website. Pls.' Mot. 19.
They need the data to compare the attractiveness of allegedly infringing videos with that of non-infringing videos. A markedly higher proportion of infringing-video watching may bear on plaintiffs' vicarious liability claim,*fn3 and defendants' substantial non-infringing use defense.*fn4
Defendants argue generally that plaintiffs' request is unduly burdensome because producing the enormous amount of information in the Logging database (about 12 terabytes of data) "would be expensive and time-consuming, particularly in light of the need to examine the contents for privileged and work product material." Defs.' Opp. 22.
But defendants do not specifically refute that "There is no need to engage in a detailed privilege review of the logging database, since it simply records the numbers of views for each video uploaded to the YouTube website, and the videos watched by each user" (Pls.' Reply 45). While the Logging database is large, all of its contents can be copied onto a few "over-the-shelf" four-terabyte hard drives (Davis Decl. ¶ 22). Plaintiffs' need for the data outweighs the unquantified and unsubstantiated cost of producing that information.
Defendants argue that the data should not be disclosed because of the users' privacy concerns, saying that "Plaintiffs would likely be able to determine the viewing and video uploading habits of YouTube's users based on the user's login ID and the user's IP address" (Do Decl. ¶ 16).
But defendants cite no authority barring them from disclosing such information in civil discovery proceedings,*fn5 and their privacy concerns are speculative. Defendants do not refute that the "login ID is an anonymous pseudonym that users create for themselves when they sign up with YouTube" which without more "cannot identify specific individuals" (Pls.' Reply 44), and Google has elsewhere stated:
We . . . are strong supporters of the idea that data protection laws should apply to any data that could identify you. The reality is though that in most cases, an IP address without additional information cannot.
Google Software Engineer Alma Whitten, Are IP addresses personal?, GOOGLE PUBLIC POLICY BLOG (Feb. 22, 2008), http://goo glepublicpolicy.blogspot.com/2008/02/are-ip-addresses-perso nal.html (Wilkens Decl. Ex. M).
Therefore, the motion to compel production of all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website is granted.
5. Video-Related Data from the User and Mono Databases
Defendants' "User" and "Mono" databases contain information about each video available in YouTube's collection, including its user-supplied title and keywords, public comments from others about it, whether it has been flagged as inappropriate by others (for copyright infringement or for other improprieties such as obscenity) and the reason it was flagged, whether an administrative action was taken in response to a complaint about it, whether the user who posted it was terminated for copyright infringement, and the username of the user who posted it. Defendants store the User and Mono databases on computer hard drives, and have agreed to produce specified data from them which concern the removed videos and those publicly available videos which plaintiffs identify as infringing "works-in-suit". Plaintiffs now seek production of, "for the rest of the videos, all of the data fields Defendants have agreed to provide for works-in-suit." Pls.' Mot. 16.
Plaintiffs give a variety of reasons for requesting data for the complete universe of videos available on YouTube: to identify alleged infringements that are not yet works-in-suit; to find evidence (especially in the public comments)*fn6 that defendants knew or should have known about infringing activity; and to determine "the proportion or extent of Defendants' control over the YouTube website -- such as what percentage of videos have been restricted, reviewed and/or flagged by the Defendants for any reason" (Pls.' Reply 47-48), which they argue is relevant (among other things) to show that defendants have an ability to control infringements. Plaintiffs contend that only direct access to the electronic data would give them "the ability to quickly search, sort and analyze millions of pieces of information." Pls.' Reply 45.
Defendants contend that plaintiffs' request is overbroad because it encompasses almost all of the data in the User and Mono databases, which contain information about millions of non-infringing videos (Defs.' Opp. 18), and have no data reflecting "any review of a flagged video, or disciplinary actions taken by YouTube on a video flagged by a user as inappropriate" for "the substantial majority of the videos" (Do Decl. ¶ 15). Defendants argue that plaintiffs' request is unduly burdensome, and that they have fully accommodated plaintiffs' need to identify potential infringements by giving plaintiffs access to use a search program "which allows users to search for and watch any video currently available on YouTube." Defs.' Opp. 17, 21.
No sufficiently compelling need is shown to justify the analysis of "millions of pieces of information" sought by this request, at least until the other disclosures have been utilized, and found to be so insufficient that this almost unlimited field should be further explored.
Therefore, the motion to compel production of all those data fields which defendants have agreed to produce for works-in-suit, for all videos that have been posted to the YouTube website is denied.
6. Database Schemas
Plaintiffs seek the schemas for the "Google Advertising" and "Google Video Content" databases.*fn7 A schema is an electronic index that shows how the data in a database are organized by listing the database's fields and tables, but not its underlying data.
A. Google Advertising Schema
Google earns most of its revenue from fees it charges advertisers to display advertisements on Google.com (the "AdWords" program) or on third party websites that participate in its "AdSense" program. Huchital Decl. ¶¶ 1-7. Google stores data about each of the billions of advertising transactions made in connection with those programs in the Google Advertising database. Id. The schema for that database "constitutes commercially sensitive information regarding Google's advertising business", the disclosure of which would permit others to profit without equivalent investment from the "years of refinement and thousands of person hours" of work Google spent selecting the numerous data points it tracks in connection with its advertising programs. Id. ¶¶ 8-10. Only trivial percentages of the fields and tables in the database "possibly relate to advertising revenue generated from advertisements run on YouTube" (id. ¶ 7), and defendants have "already agreed to provide Plaintiffs with the small amount of YouTube-related data contained in the Google Advertising database" (Defs.' Opp. 25).
Plaintiffs argue that the schema is relevant to "show what Defendants could have or should have known about the extent to which their advertising revenues were associated with infringing content, and the extent to which Defendants had the ability to control, block or prevent advertising from being associated with infringing videos." Pls.' Reply 50 (italics in original).
However, given that plaintiffs have already been promised the only relevant data in the database, they do not need its confidential schema (Huchital Decl. ¶ 8), which "itself provides a detailed to roadmap to how Google runs its advertising business" (id. ¶ 9), to show whether defendants were on notice that their advertising revenues were associated with infringing videos, or that defendants decline to exercise their claimed ability to prevent such associations.
Therefore, the motion for production of the Google Advertising schema is denied.
B. Google Video Schema
By plaintiffs' description the Google Video Content database stores "information Defendants collect regarding videos on the Google Video website, which is a video-sharing website, similar to YouTube, that is operated by Defendant Google." Pls.' Mot. 22. The Google Video website has its own video library, but searches for videos on it will also access YouTube videos. See Pls.' Reply 51.
Plaintiffs argue that the schema for that database will reveal "The extent to which Defendants are aware of and can control infringements on Google Video" which "is in turn relevant to whether Defendants had 'reason to know' of infringements, or had the ability to control infringements, on YouTube, which they also own and which features similar content." Id. 52 (plaintiffs' italics). That states a sufficiently plausible showing that the schema is relevant to require its disclosure, there being no assertion that it is confidential or unduly burdensome to produce.
Therefore, the motion to compel production of the Google Video schema is granted.
7. Private Videos and Related Data
YouTube.com users may override the website's default setting--which makes newly added videos available to the public--by electing to mark as "private" the videos they post to the website. Plaintiffs move to compel production of copies of all those private videos, which can only be viewed by others authorized by the user who posted each of them, as well as specified data related to them.
Defendants are prohibited by the Electronic Communications Privacy Act ("ECPA") (18 U.S.C. § 2510 et seq.) from disclosing to plaintiffs the private videos and the data which reveal their contents because ECPA § 2702(a)(2) requires that entities such as YouTube who provide "remote computing service to the public shall not knowingly divulge to any person or entity the contents" of any electronic communication stored on behalf of their subscribers,*fn8 and ECPA § 2702 contains no exception for disclosure of such communications pursuant to civil discovery requests. See In re Subpoena Duces Tecum to AOL, LLC, No. 1:07mc34, ___ F. Supp. 2d ___, 2008 WL 1956266, *4 (E.D.Va. Apr. 18, 2008).
But the ECPA does not bar disclosure of non-content data about the private videos (e.g., the number of times each video has been viewed on YouTube.com or made accessible on a third-party website through an 'embedded' link to the video). Plaintiffs argue that such data are relevant to show whether videos designated private are in fact shared with numerous members of the public and therefore not protected by the ECPA, and to then obtain discovery on their claim (supported by evidence)*fn12 that users abuse YouTube's privacy feature "to share infringing videos with any interested member of the public while evading detection by content owners" (Pls.' Reply 62). It is not clear from this record whether plaintiffs' interpretation of the ECPA is correct, but their view is colorable, as the statute's legislative history states that "a subscriber who places a communication on a computer 'electronic bulletin board,' with a reasonable basis for knowing that such communications are freely made available to the public, should be considered to have given consent to the disclosure or use of the communication." H.R. Rep. No. 99-647, at 66 (1986). Plaintiffs need the requested non-content data so that they can properly argue their construction of the ECPA on the merits and have an opportunity to obtain discovery of allegedly infringing private videos claimed to be public.
Therefore, the motion to compel is denied at this time, except to the extent it seeks production of specified non-content data about such videos.
That ruling is unaltered by plaintiffs' contention that defendants disclose private videos "to third party content owners as part of their regular business dealings" (Pls.' Reply 57), as supposedly shown by a clause in the Content Identification and Management Agreement between Viacom and Google which bars Viacom from disclosing to any third party private videos it receives during the process of resolving copyright infringement claims against such videos (see Wilkens Decl. Ex. T, ¶ 4). The record shows that defendants do not disclose to content owners any private videos processed for potentially infringing the owners' copyrights unless defendants receive the express consent of the users who designated the videos as private (Salem Sur-Reply Decl. ¶¶ 1-5), and that the clause plaintiffs rely upon merely requires content owners to maintain the confidentiality of such consensually divulged private videos (id.).
For the reasons set forth above:
(1) The cross-motion for a protective order barring disclosure of the source code for the YouTube.com search function is granted, and the motion to compel production of that search code is denied;
(2) The motion to compel production of the source code for the Video ID program is denied;
(3) The motion to compel production of all removed videos is granted;
(4) The motion to compel production of all data from the Logging database concerning each time a YouTube video has been viewed on the YouTube website or through embedding on a third-party website is granted;
(5) The motion to compel production of those data fields which defendants have agreed to produce for works-in-suit, for all videos that have been posted to the YouTube website is denied;
(6) The motion to compel production of the schema for the Google Advertising database is denied;
(7) The motion to compel production of the schema for the Google Video Content database is granted; and
(8) The motion to compel production of the private videos and data related to them is denied at this time except to the extent it seeks production of specified non-content data about such videos.