Skip to content

Commit

Permalink
chore: cli esm bug and sdk docs (#122)
Browse files Browse the repository at this point in the history
* fix: cli

* release: publish beta packages

* chore: esm

* release: publish beta packages

* chore: sdk
  • Loading branch information
ycjcl868 authored Feb 20, 2025
1 parent 756ccc4 commit d169e46
Show file tree
Hide file tree
Showing 23 changed files with 191 additions and 42 deletions.
6 changes: 1 addition & 5 deletions .changeset/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,7 @@
"$schema": "https://unpkg.com/@changesets/[email protected]/schema.json",
"changelog": "@changesets/cli/changelog",
"commit": false,
"fixed": [
[
"@ui-tars/*"
]
],
"fixed": [],
"linked": [],
"access": "restricted",
"baseBranch": "main",
Expand Down
18 changes: 10 additions & 8 deletions .changeset/pre.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,17 @@
"tag": "beta",
"initialVersions": {
"ui-tars-desktop": "0.0.6",
"@ui-tars/action-parser": "1.2.0-beta.6",
"@ui-tars/cli": "1.2.0-beta.8",
"@ui-tars/electron-ipc": "1.2.0-beta.6",
"@ui-tars/operator-nut-js": "1.2.0-beta.8",
"@ui-tars/sdk": "1.2.0-beta.8",
"@ui-tars/shared": "1.2.0-beta.6",
"@ui-tars/utio": "1.2.0-beta.6"
"@ui-tars/action-parser": "1.2.0-beta.10",
"@ui-tars/cli": "1.2.0-beta.10",
"@ui-tars/electron-ipc": "1.2.0-beta.10",
"@ui-tars/operator-nut-js": "1.2.0-beta.10",
"@ui-tars/sdk": "1.2.0-beta.10",
"@ui-tars/shared": "1.2.0-beta.10",
"@ui-tars/utio": "1.2.0-beta.10"
},
"changesets": [
"selfish-humans-drive"
"selfish-humans-drive",
"strange-schools-help",
"witty-points-rescue"
]
}
5 changes: 5 additions & 0 deletions .changeset/strange-schools-help.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@ui-tars/cli': patch
---

chore: node-fetch
6 changes: 6 additions & 0 deletions .changeset/witty-points-rescue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
'@ui-tars/cli': patch
'@ui-tars/sdk': patch
---

update
63 changes: 58 additions & 5 deletions docs/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,27 @@ classDiagram
Operator <|.. MobileOperator
```

## Installation
## Try it out

```bash
npm install @ui-tars/sdk@beta
npx @ui-tars/cli start
```

> Note: Later, we will release the stable version `@ui-tars/sdk@latest`.
Input your UI-TARS Model Service Config(`baseURL`, `apiKey`, `model`), then you can control your computer with CLI.

## Quick Start
```
Need to install the following packages:
Ok to proceed? (y) y
◆ Input your instruction
│ _ Open Chrome
```

<video src="https://github.com/user-attachments/assets/991c6063-d474-40a7-8bbc-f5700d95977a" height="300" />

## Agent Execution Process

```mermaid
flowchart LR
Expand Down Expand Up @@ -164,7 +176,46 @@ stateDiagram-v2

### Operator Interface

When implementing a custom operator, you need to implement two core methods:
When implementing a custom operator, you need to implement two core methods: `screenshot()` and `execute()`.

#### Initialize

`npm init` to create a new operator package, configuration is as follows:

```json
{
"name": "your-operator-tool",
"version": "1.0.0",
"main": "./dist/index.js",
"module": "./dist/index.mjs",
"types": "./dist/index.d.ts",
"scripts": {
"dev": "tsup --watch",
"prepare": "npm run build",
"build": "tsup",
"test": "vitest"
},
"files": [
"dist"
],
"publishConfig": {
"access": "public",
"registry": "https://registry.npmjs.org"
},
"dependencies": {
"jimp": "^1.6.0"
},
"peerDependencies": {
"@ui-tars/sdk": "latest"
},
"devDependencies": {
"@ui-tars/sdk": "latest",
"tsup": "^8.3.5",
"typescript": "^5.7.2",
"vitest": "^3.0.2"
}
}
```

#### screenshot()

Expand Down Expand Up @@ -268,6 +319,8 @@ const agent = new GUIAgent({
### Planning

You can combine planning/reasoning models (such as OpenAI-o1, DeepSeek-R1) to implement complex GUIAgent logic for planning, reasoning, and execution:

```ts
const guiAgent = new GUIAgent({
// ... other config
Expand Down
6 changes: 6 additions & 0 deletions packages/action-parser/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# @ui-tars/action-parser

## 1.2.0-beta.10

### Patch Changes

- @ui-tars/shared@1.2.0-beta.10

## 1.2.0-beta.9

### Patch Changes
Expand Down
2 changes: 1 addition & 1 deletion packages/action-parser/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@ui-tars/action-parser",
"version": "1.2.0-beta.9",
"version": "1.2.0-beta.10",
"description": "Action parser SDK for UI-TARS",
"repository": {
"type": "git",
Expand Down
15 changes: 15 additions & 0 deletions packages/cli/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# @ui-tars/cli

## 1.2.0-beta.11

### Patch Changes

- chore: node-fetch

## 1.2.0-beta.10

### Patch Changes

- update
- Updated dependencies
- @ui-tars/sdk@1.2.0-beta.10
- @ui-tars/operator-nut-js@1.2.0-beta.10

## 1.2.0-beta.9

### Patch Changes
Expand Down
7 changes: 5 additions & 2 deletions packages/cli/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@ui-tars/cli",
"version": "1.2.0-beta.9",
"version": "1.2.0-beta.11",
"description": "CLI for UI-TARS",
"repository": {
"type": "git",
Expand Down Expand Up @@ -34,12 +34,15 @@
"dependencies": {
"commander": "^13.1.0",
"jimp": "1.6.0",
"js-yaml": "^4.1.0",
"@clack/prompts": "^0.10.0",
"@ui-tars/operator-nut-js": "workspace:*",
"@ui-tars/sdk": "workspace:*"
"@ui-tars/sdk": "workspace:*",
"node-fetch": "^2.7.0"
},
"devDependencies": {
"tsup": "^8.3.5",
"@types/js-yaml": "^4.0.9",
"typescript": "^5.7.2",
"vitest": "^3.0.2"
}
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/src/cli/commands.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ export const run = () => {
program
.command('start')
.description('starting the ui-tars agent...')
.option('-p, --presets', 'Model Config Presets')
.option('-p, --presets <url>', 'Model Config Presets')
.action(async (options: CliOptions) => {
try {
await start(options);
Expand Down
31 changes: 23 additions & 8 deletions packages/cli/src/cli/start.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
* Copyright (c) 2025 Bytedance, Inc. and its affiliates.
* SPDX-License-Identifier: Apache-2.0
*/
import { GUIAgent } from '@ui-tars/sdk';
import * as p from '@clack/prompts';
// import { inspect } from 'node:util';
import fs from 'node:fs';
import path from 'node:path';
import os from 'node:os';

import fetch from 'node-fetch';
import { GUIAgent } from '@ui-tars/sdk';
import * as p from '@clack/prompts';
import yaml from 'js-yaml';

import { NutJSOperator } from '@ui-tars/operator-nut-js';

export interface CliOptions {
Expand All @@ -24,14 +26,27 @@ export const start = async (options: CliOptions) => {
model: '',
};

try {
if (fs.existsSync(CONFIG_PATH)) {
if (options.presets) {
const response = await fetch(options.presets);
if (!response.ok) {
throw new Error(`Failed to fetch preset: ${response.status}`);
}

const yamlText = await response.text();
const preset = yaml.load(yamlText) as any;

config.apiKey = preset?.vlmApiKey;
config.baseURL = preset?.vlmBaseUrl;
config.model = preset?.vlmModelName;
} else if (fs.existsSync(CONFIG_PATH)) {
try {
config = JSON.parse(fs.readFileSync(CONFIG_PATH, 'utf-8'));
} catch (error) {
console.warn('read config file failed', error);
return;
}
} catch (error) {
console.warn('read config file failed', error);
return;
}

if (!config.baseURL || !config.apiKey || !config.model) {
const configAnswers = await p.group(
{
Expand Down
2 changes: 2 additions & 0 deletions packages/electron-ipc/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# @ui-tars/electron-ipc

## 1.2.0-beta.10

## 1.2.0-beta.9

### Patch Changes
Expand Down
2 changes: 1 addition & 1 deletion packages/electron-ipc/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@ui-tars/electron-ipc",
"version": "1.2.0-beta.9",
"version": "1.2.0-beta.10",
"description": "Type-safe Electron inter-process communication for UI-TARS",
"repository": {
"type": "git",
Expand Down
8 changes: 8 additions & 0 deletions packages/operators/nut-js/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# @ui-tars/operator-nut-js

## 1.2.0-beta.10

### Patch Changes

- Updated dependencies
- @ui-tars/sdk@1.2.0-beta.10
- @ui-tars/shared@1.2.0-beta.10

## 1.2.0-beta.9

### Patch Changes
Expand Down
2 changes: 1 addition & 1 deletion packages/operators/nut-js/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@ui-tars/operator-nut-js",
"version": "1.2.0-beta.9",
"version": "1.2.0-beta.10",
"description": "Operator Nut JS SDK for UI-TARS",
"repository": {
"type": "git",
Expand Down
8 changes: 8 additions & 0 deletions packages/sdk/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# @ui-tars/sdk

## 1.2.0-beta.10

### Patch Changes

- update
- @ui-tars/action-parser@1.2.0-beta.10
- @ui-tars/shared@1.2.0-beta.10

## 1.2.0-beta.9

### Patch Changes
Expand Down
26 changes: 19 additions & 7 deletions packages/sdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ It provides a flexible framework to create agents that can interact with graphic

```mermaid
classDiagram
class GUIAgent {
+model
+operator
+signal
class GUIAgent~T extends Operator~ {
+model: UITarsModel
+operator: T
+signal: AbortSignal
+onData
+run()
}
Expand All @@ -21,6 +21,7 @@ classDiagram
}
class Operator {
<<interface>>
+screenshot()
+execute()
}
Expand All @@ -30,9 +31,21 @@ classDiagram
+execute()
}
class WebOperator {
+screenshot()
+execute()
}
class MobileOperator {
+screenshot()
+execute()
}
GUIAgent --> UITarsModel
GUIAgent --> Operator
Operator <|-- NutJSOperator
GUIAgent ..> Operator
Operator <|.. NutJSOperator
Operator <|.. WebOperator
Operator <|.. MobileOperator
```

## Installation
Expand All @@ -55,7 +68,6 @@ flowchart LR
Prediction --> Agent
Agent --> Operator
Operator --> Action[Execute]
Action --> Agent
```

### Basic Usage
Expand Down
2 changes: 1 addition & 1 deletion packages/sdk/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@ui-tars/sdk",
"version": "1.2.0-beta.9",
"version": "1.2.0-beta.10",
"description": "A powerful cross-platform(ANY device/platform) toolkit for building GUI automation agents for UI-TARS",
"repository": {
"type": "git",
Expand Down
2 changes: 2 additions & 0 deletions packages/shared/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# @ui-tars/shared

## 1.2.0-beta.10

## 1.2.0-beta.9

### Patch Changes
Expand Down
2 changes: 1 addition & 1 deletion packages/shared/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@ui-tars/shared",
"version": "1.2.0-beta.9",
"version": "1.2.0-beta.10",
"description": "Shared types for UI-TARS",
"repository": {
"type": "git",
Expand Down
Loading

0 comments on commit d169e46

Please sign in to comment.