选取文件
所有文档
menu

对象存储 BOS

选取文件

产品详情自助选购

BOS Java SDK提供了SelectObject接口,用于向Bucket中指定object执行SQL语句,选取出指定内容返回,请参考选取Object。目前支持object类型为CSV(包括TSV等类CSV文件)、JSON文件和Parquet文件:

示例代码可以参考选取文件Demo

  • 选取CSV文件
  • 选取JSON文件

选取CSV文件

Java SDK选取CSV文件请参考以下代码:

final String csvContent = "header1,header2,header3\r\n" +
                          "1,2,3.4\r\n" +
                          "a,b,c\r\n" +
                          "\"d\",\"e\",\"f\"\r\n" +
                          "true,false,true\r\n" +
                          "2006-01-02 15:04:06,2006-01-02 16:04:06,2006-01-02 17:04:06";
client.putObject("bucketName", "test-csv", new ByteArrayInputStream(csvContent.getBytes()));

SelectObjectRequest request = new SelectObjectRequest("bucketName", "test-csv")
        .withSelectType("csv")                                   
        .withExpression("select * from BosObject limit 3")
        .withInputSerialization(new InputSerialization()
                .withCompressionType("NONE")
                .withFileHeaderInfo("NONE")
                .withRecordDelimiter("\r\n")
                .withFieldDelimiter(",")
                .withQuoteCharacter("\"")
                .withCommentCharacter("#"))
        .withOutputSerialization(new OutputSerialization()
                .withOutputHeader(false)
                .withQuoteFields("ALWAYS")
                .withRecordDelimiter("\n")
                .withFieldDelimiter(",")
                .withQuoteCharacter("\""))
        .withRequestProgress(false);
SelectObjectResponse response = client.selectObject(request);

// 输出返回的记录
SelectObjectResponse.Messages messages = response.getMessages();
while (messages.hasNext()) {
    SelectObjectResponse.CommonMessage message = messages.next();
    if (message.Type.equals("Records")) {
        for (String record: message.getRecords()) {
            System.out.println(record);
        }
    }
}

选取CSV文件输出的结果:

"header1","header2","header3"
"1","2","3.4"
"a","b","c"

注意:

  • Unix/Linux系统里,每行结尾只有"<换行>",即"\n";
  • Windows系统里面,每行结尾是"<回车><换行>",即"\r\n";
  • Mac系统里,每行结尾是"<换行>",即"\n",只有 v9 之前 Mac OS 才是用 '\r'。
  • 根据文件内容,设置合适的recordDelimiter。

选取JSON文件

Java SDK选取JSON文件请参考以下代码:

final String jsonContent = "{\n" +
        "\t\"name\": \"Smith\",\n" +
        "\t\"age\": 16,\n" +
        "\t\"org\": null\n" +
        "}\n" +
        "{\n" +
        "\t\"name\": \"charles\",\n" +
        "\t\"age\": 27,\n" +
        "\t\"org\": \"baidu\"\n" +
        "}\n" +
        "{\n" +
        "\t\"name\": \"jack\",\n" +
        "\t\"age\": 35,\n" +
        "\t\"org\": \"bos\"\n" +
        "}";
client.putObject("bucketName", "test-json", new ByteArrayInputStream(jsonContent.getBytes()));

SelectObjectRequest request = new SelectObjectRequest("bucketName", "test-json")
        .withSelectType("json")
        .withExpression("select * from BosObject where age > 20")
        .withInputSerialization(new InputSerialization()
                .withCompressionType("NONE")
                .withJsonType("LINES"))
        .withOutputSerialization(new OutputSerialization()
                .withRecordDelimiter("\n"))
        .withRequestProgress(false);
SelectObjectResponse response = client.selectObject(request);

// 输出返回的记录
SelectObjectResponse.Messages messages = response.getMessages();
while (messages.hasNext()) {
    SelectObjectResponse.CommonMessage message = messages.next();
    if (message.Type.equals("Records")) {
        for (String record: message.getRecords()) {
            System.out.println(record);
        }
    }
}

选取JSON文件输出的结果:

{"name":"charles","age":27,"org":"baidu"}
{"name":"jack","age":35,"org":"bos"}

选取Parquet文件

Java SDK选取Parquet文件请参考以下代码:

/*
parquet文件解析内容
{"Name":"StudentName","Age":20,"Id":0,"Weight":50,"Sex":true,"Day":19240,"Scores":{"computer":80,"math":90,"physics":90}}
{"Name":"StudentName","Age":21,"Id":1,"Weight":50.1,"Sex":false,"Day":19240,"Scores":{"computer":81,"math":91,"physics":91}}
{"Name":"StudentName","Age":22,"Id":2,"Weight":50.2,"Sex":true,"Day":19240,"Scores":{"computer":82,"math":92,"physics":92}}
{"Name":"StudentName","Age":23,"Id":3,"Weight":50.3,"Sex":false,"Day":19240,"Scores":{"computer":83,"math":93,"physics":90}}
{"Name":"StudentName","Age":24,"Id":4,"Weight":50.4,"Sex":true,"Day":19240,"Scores":{"computer":84,"math":94,"physics":91}}
{"Name":"StudentName","Age":20,"Id":5,"Weight":50.5,"Sex":false,"Day":19240,"Scores":{"computer":85,"math":90,"physics":92}}
{"Name":"StudentName","Age":21,"Id":6,"Weight":50.6,"Sex":true,"Day":19240,"Scores":{"computer":86,"math":91,"physics":90}}
{"Name":"StudentName","Age":22,"Id":7,"Weight":50.7,"Sex":false,"Day":19240,"Scores":{"computer":87,"math":92,"physics":91}}
{"Name":"StudentName","Age":23,"Id":8,"Weight":50.8,"Sex":true,"Day":19240,"Scores":{"computer":88,"math":93,"physics":92}}
{"Name":"StudentName","Age":24,"Id":9,"Weight":50.9,"Sex":false,"Day":19240,"Scores":{"computer":89,"math":94,"physics":90}}
*/

SelectObjectRequest request = new SelectObjectRequest("bucketName", "test-parquet")
        .withSelectType("parquet")
        .withExpression("select * from BosObject s where s.Scores.computer > 85")
        .withInputSerialization(new InputSerialization()
                .withCompressionType("NONE"))
        .withOutputSerialization(new OutputSerialization()
                .withRecordDelimiter("\n"))
        .withRequestProgress(false);
SelectObjectResponse response = client.selectObject(request);

// 输出返回的记录
SelectObjectResponse.Messages messages = response.getMessages();
while (messages.hasNext()) {
    SelectObjectResponse.CommonMessage message = messages.next();
    if (message.Type.equals("Records")) {
        for (String record: message.getRecords()) {
            System.out.println(record);
        }
    }
}

选取Parquet文件输出的结果:

{"Name":"StudentName","Age":21,"Id":6,"Weight":50.6,"Sex":true,"Day":19240,"Scores":{"computer":86,"math":91,"physics":90}}
{"Name":"StudentName","Age":22,"Id":7,"Weight":50.7,"Sex":false,"Day":19240,"Scores":{"computer":87,"math":92,"physics":91}}
{"Name":"StudentName","Age":23,"Id":8,"Weight":50.8,"Sex":true,"Day":19240,"Scores":{"computer":88,"math":93,"physics":92}}
{"Name":"StudentName","Age":24,"Id":9,"Weight":50.9,"Sex":false,"Day":19240,"Scores":{"computer":89,"math":94,"physics":90}}

注意,对于CSV、JSON和Parquet文件查询时,初始化SelectObjectRequest的参数有很大不同,详细参数设置请参考SelectObject接口

上一篇
拷贝文件
下一篇
数据处理及使用