选取文件
更新时间:2023-04-07
BOS Java SDK提供了SelectObject接口,用于向Bucket中指定object执行SQL语句,选取出指定内容返回,请参考选取Object。目前支持object类型为CSV(包括TSV等类CSV文件)、JSON文件和Parquet文件:
示例代码可以参考选取文件Demo
- 选取CSV文件
- 选取JSON文件
选取CSV文件
Java SDK选取CSV文件请参考以下代码:
final String csvContent = "header1,header2,header3\r\n" +
"1,2,3.4\r\n" +
"a,b,c\r\n" +
"\"d\",\"e\",\"f\"\r\n" +
"true,false,true\r\n" +
"2006-01-02 15:04:06,2006-01-02 16:04:06,2006-01-02 17:04:06";
client.putObject("bucketName", "test-csv", new ByteArrayInputStream(csvContent.getBytes()));
SelectObjectRequest request = new SelectObjectRequest("bucketName", "test-csv")
.withSelectType("csv")
.withExpression("select * from BosObject limit 3")
.withInputSerialization(new InputSerialization()
.withCompressionType("NONE")
.withFileHeaderInfo("NONE")
.withRecordDelimiter("\r\n")
.withFieldDelimiter(",")
.withQuoteCharacter("\"")
.withCommentCharacter("#"))
.withOutputSerialization(new OutputSerialization()
.withOutputHeader(false)
.withQuoteFields("ALWAYS")
.withRecordDelimiter("\n")
.withFieldDelimiter(",")
.withQuoteCharacter("\""))
.withRequestProgress(false);
SelectObjectResponse response = client.selectObject(request);
// 输出返回的记录
SelectObjectResponse.Messages messages = response.getMessages();
while (messages.hasNext()) {
SelectObjectResponse.CommonMessage message = messages.next();
if (message.Type.equals("Records")) {
for (String record: message.getRecords()) {
System.out.println(record);
}
}
}
选取CSV文件输出的结果:
"header1","header2","header3"
"1","2","3.4"
"a","b","c"
注意:
- Unix/Linux系统里,每行结尾只有"<换行>",即"\n";
- Windows系统里面,每行结尾是"<回车><换行>",即"\r\n";
- Mac系统里,每行结尾是"<换行>",即"\n",只有 v9 之前 Mac OS 才是用 '\r'。
- 根据文件内容,设置合适的recordDelimiter。
选取JSON文件
Java SDK选取JSON文件请参考以下代码:
final String jsonContent = "{\n" +
"\t\"name\": \"Smith\",\n" +
"\t\"age\": 16,\n" +
"\t\"org\": null\n" +
"}\n" +
"{\n" +
"\t\"name\": \"charles\",\n" +
"\t\"age\": 27,\n" +
"\t\"org\": \"baidu\"\n" +
"}\n" +
"{\n" +
"\t\"name\": \"jack\",\n" +
"\t\"age\": 35,\n" +
"\t\"org\": \"bos\"\n" +
"}";
client.putObject("bucketName", "test-json", new ByteArrayInputStream(jsonContent.getBytes()));
SelectObjectRequest request = new SelectObjectRequest("bucketName", "test-json")
.withSelectType("json")
.withExpression("select * from BosObject where age > 20")
.withInputSerialization(new InputSerialization()
.withCompressionType("NONE")
.withJsonType("LINES"))
.withOutputSerialization(new OutputSerialization()
.withRecordDelimiter("\n"))
.withRequestProgress(false);
SelectObjectResponse response = client.selectObject(request);
// 输出返回的记录
SelectObjectResponse.Messages messages = response.getMessages();
while (messages.hasNext()) {
SelectObjectResponse.CommonMessage message = messages.next();
if (message.Type.equals("Records")) {
for (String record: message.getRecords()) {
System.out.println(record);
}
}
}
选取JSON文件输出的结果:
{"name":"charles","age":27,"org":"baidu"}
{"name":"jack","age":35,"org":"bos"}
选取Parquet文件
Java SDK选取Parquet文件请参考以下代码:
/*
parquet文件解析内容
{"Name":"StudentName","Age":20,"Id":0,"Weight":50,"Sex":true,"Day":19240,"Scores":{"computer":80,"math":90,"physics":90}}
{"Name":"StudentName","Age":21,"Id":1,"Weight":50.1,"Sex":false,"Day":19240,"Scores":{"computer":81,"math":91,"physics":91}}
{"Name":"StudentName","Age":22,"Id":2,"Weight":50.2,"Sex":true,"Day":19240,"Scores":{"computer":82,"math":92,"physics":92}}
{"Name":"StudentName","Age":23,"Id":3,"Weight":50.3,"Sex":false,"Day":19240,"Scores":{"computer":83,"math":93,"physics":90}}
{"Name":"StudentName","Age":24,"Id":4,"Weight":50.4,"Sex":true,"Day":19240,"Scores":{"computer":84,"math":94,"physics":91}}
{"Name":"StudentName","Age":20,"Id":5,"Weight":50.5,"Sex":false,"Day":19240,"Scores":{"computer":85,"math":90,"physics":92}}
{"Name":"StudentName","Age":21,"Id":6,"Weight":50.6,"Sex":true,"Day":19240,"Scores":{"computer":86,"math":91,"physics":90}}
{"Name":"StudentName","Age":22,"Id":7,"Weight":50.7,"Sex":false,"Day":19240,"Scores":{"computer":87,"math":92,"physics":91}}
{"Name":"StudentName","Age":23,"Id":8,"Weight":50.8,"Sex":true,"Day":19240,"Scores":{"computer":88,"math":93,"physics":92}}
{"Name":"StudentName","Age":24,"Id":9,"Weight":50.9,"Sex":false,"Day":19240,"Scores":{"computer":89,"math":94,"physics":90}}
*/
SelectObjectRequest request = new SelectObjectRequest("bucketName", "test-parquet")
.withSelectType("parquet")
.withExpression("select * from BosObject s where s.Scores.computer > 85")
.withInputSerialization(new InputSerialization()
.withCompressionType("NONE"))
.withOutputSerialization(new OutputSerialization()
.withRecordDelimiter("\n"))
.withRequestProgress(false);
SelectObjectResponse response = client.selectObject(request);
// 输出返回的记录
SelectObjectResponse.Messages messages = response.getMessages();
while (messages.hasNext()) {
SelectObjectResponse.CommonMessage message = messages.next();
if (message.Type.equals("Records")) {
for (String record: message.getRecords()) {
System.out.println(record);
}
}
}
选取Parquet文件输出的结果:
{"Name":"StudentName","Age":21,"Id":6,"Weight":50.6,"Sex":true,"Day":19240,"Scores":{"computer":86,"math":91,"physics":90}}
{"Name":"StudentName","Age":22,"Id":7,"Weight":50.7,"Sex":false,"Day":19240,"Scores":{"computer":87,"math":92,"physics":91}}
{"Name":"StudentName","Age":23,"Id":8,"Weight":50.8,"Sex":true,"Day":19240,"Scores":{"computer":88,"math":93,"physics":92}}
{"Name":"StudentName","Age":24,"Id":9,"Weight":50.9,"Sex":false,"Day":19240,"Scores":{"computer":89,"math":94,"physics":90}}
注意,对于CSV、JSON和Parquet文件查询时,初始化SelectObjectRequest的参数有很大不同,详细参数设置请参考SelectObject接口。