百度智能云

All Product Document

          MapReduce

          Data Preparation

          With the preparation of Weblog data as example, you can directly use sample data provided by Baidu AI Cloud, or construct your input data according to instructions:

          • For the use of sample data provided by Baidu AI Cloud, the path is as follows:

            • The storage path of sample data in the “North China - Beijing” region is BOS://datamart-bj/web-log-10k/, and it is only available for BMR clusters in the North China region.
            • The storage path of sample data in the “South China - Guangzhou” region is BOS://datamart-gz/web-log-10k/, and it is only available for BMR clusters in the South China region.
          • You can construct your input data according to following instructions, and upload data to BOS (for more information, please see Baidu Object Storage (BOS) Start Guide) or local HDFS.

            Web access log generated by Nginx is in following format:

            $remote_addr - [$time_local] "$request" $status $body_bytes_sent "$http_referer"  "$http_cookie" $remote_user "$http_user_agent"       $request_time $host $msec

            For example:

            10.81.78.220 - [04/Oct/2015:21:31:22 +0800] "GET /u2bmp.html?dm=37no6.com/003&ac=1510042131161237772&v=y88j6-1.0&rnd=1510042131161237772&ext_y88j6_tid=003&ext_y88j6_uid=1510042131161237772 HTTP/1.1" 200 54 "-" "-" 9CA13069CB4D7B836DC0B8F8FD06F8AF "ImgoTV-iphone/4.5.3.150815 CFNetwork/672.1.13 Darwin/14.0.0" 0.004 test.com.org 1443965482.737
          Previous
          Create Clusters
          Next
          Development Step