- Serializable Support:
- Key data types like
ArrayRecord
,Column
,TableSchema
, andTypeInfo
now support serialization and deserialization, enabling caching and inter-process communication.
- Key data types like
- Predicate Pushdown:
- Introduced
Attribute
type predicates to specify column names.
- Introduced
- Tunnel Interface Refactoring:
- Refactored Tunnel-related interfaces to include seamless retry logic, greatly enhancing stability and robustness.
- Removed
TunnelRetryStrategy
andConfigurationImpl
classes, which are now replaced byTunnelRetryHandler
andConfiguration
respectively.
- SQLExecutor Optimization:
- Improved performance when executing offline SQL jobs through the
SQLExecutor
interface, reducing one network request per job to fetch results, thereby decreasing end-to-end latency.
- Improved performance when executing offline SQL jobs through the
- Decimal Read in Table.read:
- Fixed issue where trailing zeroes in the
decimal
type were not as expected in theTable.read
interface.
- Fixed issue where trailing zeroes in the
- Added the
getPartitionSpecs
method to theTable
interface. Compared to thegetPartitions
method, this method does not require fetching detailed partition information, resulting in faster execution.
-
Removed the
isPrimaryKey
method from theColumn
class. This method was initially added to support users in specifying certain columns as primary keys when creating a table. However, it was found to be misleading in read scenarios, as it does not communicate with the server. Therefore, it is not suitable for determining whether a column is a primary key. Moreover, when using this method for table creation, primary keys should be table-level fields (since primary keys are ordered), and this method neglected the order of primary keys, leading to a flawed design. Hence, it has been removed in version 0.48.5.For read scenarios, users should use the
Table.getPrimaryKey()
method to retrieve primary keys. For table creation, users can now use thewithPrimaryKeys
method in theTableCreator
to specify primary keys during table creation.
- Fixed an issue in the
RecordConverter
where formatting aRecord
of typeString
would throw an exception when the data type wasbyte[]
.
- Use
table-api
to write MaxCompute tables, now supportsJSON
andTIMESTAMP_NTZ
types odps-sdk-udf
functions continue to be improved
- When the Table.read() interface encounters the Decimal type, it will currently remove the trailing 0 by default (but will not use scientific notation)
- Fixed the problem that ArrayRecord does not support the getBytes method for JSON type
- Support for passing
retryStrategy
when buildingUpsertSession
.
- The
onFlushFail(String, int)
interface inUpsertStream.Listener
has been marked as@Deprecated
in favor ofonFlushFail(Throwable, int)
interface. This interface will be removed in version 0.50.0. - Default compression algorithm for Tunnel upsert has been changed to
ODPS_LZ4_FRAME
.
- Fixed an issue where data couldn't be written correctly in Tunnel upsert when the compression algorithm was set to something other than
ZLIB
. - Fixed a resource leak in
UpsertSession
that could persist for a long time ifclose
was not explicitly called by the user. - Fixed an exception thrown by Tunnel data retrieval interfaces (
preview
,download
) when encountering invalidDecimal
types (such asinf
,nan
) in tables; will now returnnull
to align with thegetResult
interface.
- Fixed the issue of relying on the user's local time zone when bucketing primary keys of DATE and DATETIME types during Tunnel upsert. This may lead to incorrect bucketing and abnormal data query. Users who rely on this feature are strongly recommended to upgrade to version 0.48.2.
Table
adds a methodgetTableLifecycleConfig()
to obtain the lifecycle configuration of hierarchical storage.TableReadSession
now supports predicate pushdown
Arrow and ANTLR Libraries: Added new includes to the Maven Shade Plugin configuration for better handling and packaging of specific libraries. These includes ensure that certain essential libraries are correctly packaged into the final shaded artifact. The newly included libraries are:
- org.apache.arrow:arrow-format:jar
- org.apache.arrow:arrow-memory-core:jar
- org.apache.arrow:arrow-memory-netty:jar
- org.antlr:ST4:jar
- org.antlr:antlr-runtime:jar
- org.antlr:antlr4:jar
- org.antlr:antlr4-runtime:jar
Shaded Relocation for ANTLR and StringTemplate: The configuration now includes updated relocation rules for org.antlr and org.stringtemplate.v4 packages to prevent potential conflicts with other versions of these libraries that may exist in the classpath. The new shaded patterns are: org.stringtemplate.v4 relocated to com.aliyun.odps.thirdparty.org.stringtemplate.v4 org.antlr relocated to com.aliyun.odps.thirdparty.antlr
- Introduced
odps-sdk-udf
module to allow batch data reading in UDFs for MaxCompute, significantly improving performance in high-volume data scenarios. Table
now supports retrievingColumnMaskInfo
, aiding in data desensitization scenarios and relevant information acquisition.- Support for setting proxies through the use of
odps.getRestClient().setProxy(Proxy)
method. - Implementation of iterable
RecordReader
andRecordReader.stream()
method, enabling conversion to a Stream ofRecord
objects. - Added new parameters
upsertConcurrentNum
andupsertNetworkNum
inTableAPI RestOptions
for more detailed control for users performing upsert operations via the TableAPI. - Support for
Builder
pattern in constructingTableSchema
. - Support for
toString
method inArrayRecord
.
UploadSession
now supports configuration of theGET_BLOCK_ID
parameter to speed up session creation when the client does not needblockId
.- Enhanced table creation method using the
builder
pattern (TableCreator
), making table creation simpler.
- Fixed a bug in
Upsert Session
where the timeout setting was configured incorrectly. - Fixed the issue where
TimestampWritable
computed one second less when nanoseconds were negative.
- Support for new Stream type that enables incremental queries.
preview
method to theTableTunnel
for data preview purposes.OdpsRecordConverter
for parsing and formatting records.- Enhancements to the
Projects
class withcreate
anddelete
methods now available, andupdate
method made public. Operations related to thegroup-api
package are now marked as deprecated. - Improved
Schemas
class to support filtering schemas withSchemaFilter
, listing schemas, and retrieving detailed schema metadata. DownloadSession
introduces new parameterdisableModifiedCheck
to bypass modification checks andfetchBlockId
to skip block ID list retrieval.TableWriteSession
supports writingTIMESTAMP_NTZ
/JSON
types and adds a new parameterMaxFieldSize
.TABLE_API
addspredicate
related classes to support predicate pushdown in the future.
- The implementation of the
read
method in theTable
class is now replaced withTableTunnel.preview
, supporting new types in MaxCompute and time types switched to Java 8 time types without timezone. - The default
MapWritable
implementation switched fromHashMap
toLinkedHashMap
to ensure order. Column
class now supports creation using the Builder pattern.
TableReadSession
now introduces new parametersmaxBatchRawSize
andsplitMaxFileNum
.UpsertSession
enhancements:- Supports writing partial columns.
- Allows setting the number of Netty thread pools with the default changed to 1.
- Enables setting maximum concurrency with the default value changed to 16.
TableTunnel
now supports settingquotaName
option.