CMU计算机科学学校研究生:
兴趣:人工智能,计算性博弈,均衡计算,线性和整数计算等等。
相关论文:
我们提供了一个基于博弈论的无限单打扑克机器人,Tartanian。Tartanian 由三个组件构建,第一,为处理实际上的无限的无上限扑克策略空间,我们开发了一个离散的下注模型用来抓取在游戏中重要的策略选择。第二,我们使用了潜在知道的自动提取算法确定战略上相同的情况来减少游戏树大小。第三,我们开发一个新技术自动生成发现均衡的算法的源代码。该代码从基于XML的游戏描述文件中来。这个自动生成程序比普通目的的均衡发现算法更有效。最后,我们的结果从. AAAI-07计算机扑克比赛中获得了十名中的第二名。
Finding an equilibrium of an extensive form game of imperfect information is a fundamental problem in computational game theory, but current techniques do not scale to large games. To address this, we introduce the ordered game isomorphism and the related ordered game isomorphic abstraction transformation. For a multi-player sequential game of imperfect information with observable actions and an ordered signal space, we prove that any Nash equilibrium in an abstracted smaller game, obtained by one or more applications of the transformation, can be easily converted into a Nash equilibrium in the original game. We present an algorithm, GameShrink, for abstracting the game using our isomorphism exhaustively. Its complexity is Õ(n2), where n is the number of nodes in a structure we call the signal tree. It is no larger than the game tree, and on nontrivial games it is drastically smaller, so GameShrink has time and space complexity sublinear in the size of the game tree. Using GameShrink, we find an equilibrium to a poker game with 3.1 billion nodes--over four orders of magnitude more than in the largest poker game solved previously. To address even larger games, we introduce approximation methods that do not preserve equilibrium, but nevertheless yield (ex post) provably close-to-optimal strategies. Subsumes:
We present a new abstraction algorithm for sequential imperfect information games. While most prior abstraction algorithms employ a myopic expected-value computation as a similarity metric, our algorithm considers a higher-dimensional space consisting of histograms over abstracted classes of states from later stages of the game. This enables our bottom-up abstraction algorithm to automatically take into account potential: a hand can become relatively better (or worse) over time and the strength of different hands can get resolved earlier or later in the game. We further improve the abstraction quality by making multiple passes over the abstraction, enabling the algorithm to narrow the scope of analysis to information that is relevant given abstraction decisions made for earlier parts of the game. We also present a custom indexing scheme based on suit isomorphisms that enables one to work on significantly larger models than before.
We apply the techniques to heads-up limit Texas Hold'em poker. Whereas all prior game theory-based work for Texas Hold'em poker used generic off-the-shelf linear program solvers for the equilibrium analysis of the abstracted game, we make use of a recently developed algorithm based on the excessive gap technique from convex optimization. This paper is, to our knowledge, the first to abstract and game-theoretically analyze all four betting rounds in one run (rather than splitting the game into phases). The resulting player, GS3, beats BluffBot, GS2, Hyperborean, Monash-BPP, Sparbot, Teddy, and Vexbot, each with statistical significance. To our knowledge, those competitors are the best prior programs for the game.
We present new approximation methods for computing game-theoretic strategies for sequential games of imperfect information. At a high level, we contribute two new ideas. First, we introduce a new state-space abstraction algorithm. In each round of the game, there is a limit to the number of strategically different situations that an equilibrium-finding algorithm can handle. Given this constraint, we use clustering to discover similar positions, and we compute the abstraction via an integer program that minimizes the expected error at each stage of the game. Second, we present a method for computing the leaf payoffs for a truncated version of the game by simulating the actions in the remaining portion of the game. This allows the equilibrium-finding algorithm to take into account the entire game tree while having to explicitly solve only a truncated version. Experiments show that each of our two new techniques improves performance dramatically in Texas Hold'em poker. The techniques lead to a drastic improvement over prior approaches for automatically generating agents, and our agent plays competitively even against the best agents overall.
We present a game theory-based heads-up Texas Hold'em poker player, GS1. To overcome the computational obstacles stemming from Texas Hold'em's gigantic game tree, the player employs our automated abstraction techniques to reduce the complexity of the strategy computations. Texas Hold'em consists of four betting rounds. Our player solves a large linear program (offline) to compute strategies for the abstracted first and second rounds. After the second betting round, our player updates the probability of each possible hand based on the observed betting actions in the first two rounds as well as the revealed cards. Using these updated probabilities, our player computes in real-time an equilibrium approximation for the last two abstracted rounds. We demonstrate that our player, which incorporates very little poker-specific knowledge, is competitive with leading poker-playing programs which incorporate extensive domain knowledge, as well as with advanced human players.
- Software demonstrations
- Andrew Gilpin, Tuomas Sandholm, and Troels Bjerre Sørensen. 2008. GS3 and Tartanian: Game theory-based heads-up limit and no-limit Texas Hold'em poker-playing programs. Seventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'08), Estoril, Portugal. Demonstration Track. [view abstract]
We demonstrate two game theory-based programs for heads-up limit and no-limit Texas Hold'em poker. The first player, GS3, is designed for playing limit Texas Hold'em, in which all bets are a fixed amount. The second player, Tartanian, is designed for the no-limit variant of the game, in which the amount bet can be any amount up to the number of chips the player has. Both GS3 and Tartanian are based on our potential-aware automated abstraction algorithm for identifying strategically similar situations in order to decrease the size of the game tree. Tartanian, in order to deal with the virtually infinite strategy space of no-limit poker, in addition uses a discretized betting model designed to capture the most important strategic choices in the game. The strategies for both players are computed using our improved version of Nesterov's excessive gap technique specialized for poker.
In this demonstration, participants will be invited to play against both of the players, and to experience first-hand the sophisticated strategies employed by our players.
- Andrew Gilpin and Tuomas Sandholm. 2006. A Texas Hold'em Poker player based on automated abstraction and real-time equilibrium computation. Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'06), Hakodate, Japan. Demonstration Track. [view abstract]
We demonstrate our game theory-based Texas Hold'em poker player. To overcome the computational difficulties stemming from Texas Hold'em's gigantic game tree, our player uses automated abstraction and real-time equilibrium approximation. Our player solves the first two rounds of the game in a large off-line computation, and solves the last two rounds in a real-time equilibrium approximation. Participants in the demonstration will be able to compete against our opponent and experience first-hand the cognitive abilities of our player. Some of the techniques used by our player, which does not directly incorporate any poker-specific expert knowledge, include such poker techniques as bluffing, slow-playing, check-raising, and semi-bluffing, all techniques normally associated with human play.
- Andrew Gilpin and Tuomas Sandholm. 2005. Optimal Rhode Island Hold'em Poker. 20th National Conference on Artificial Intelligence (AAAI'05), pp. 1684-1685, Pittsburgh, PA, 2005. Intelligent Systems Demonstration. [view abstract]
Rhode Island Hold'em is a poker card game that has been proposed as a testbed for AI research. This game features many characteristics present in full-scale poker (e.g., Texas Hold'em). Our research advances in equilibrium computation have enabled us to solve for the optimal (equilibrium) strategies for this game. Some features of the equilibrium include poker techniques such as bluffing, slow-playing, check-raising, and semi-bluffing. In this demonstration, participants will compete with our optimal opponent and will experience these strategies firsthand.
- Play against the optimal Rhode Island Hold'em Poker player
- Auctions and exchanges
- Tuomas Sandholm and Andrew Gilpin. 2006. Sequences of Take-It-or-Leave-It Offers: Near-Optimal Auctions Without Full Valuation Revelation. Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'06), Hakodate, Japan. [view abstract]
We introduce take-it-or-leave-it auctions (TLAs) as an allocation mechanism that allows buyers to retain much of their private valuation information, yet generates close-to-optimal expected utility for the seller. We show that if each buyer receives at most one offer, each buyer's dominant strategy is to act truthfully. In more general TLAs, the buyers' optimal strategies are more intricate, and we derive the perfect Bayesian equilibrium for the game. We develop algorithms for finding the equilibrium and also for optimizing the offers so as to maximize the seller's expected utility. In several example settings we show that the seller's expected utility already is close to optimal for a small number of offers. As the number of buyers increases, the seller's expected utility increases, and becomes increasingly (but not monotonically) more competitive with Myerson's expected utility maximizing auction.
- Early version: AAMAS'03 5th Workshop on Agent Mediated Electronic Commerce (AMEC-V), Melbourne, Australia, 2003.
- Andrew Gilpin and Tuomas Sandholm. 2004. Arbitrage in Combinatorial Exchanges. AAMAS'04 6th Workshop on Agent Mediated Electronic Commerce (AMEC-VI), New York, NY, 2004. [view abstract]
Combinatorial exchanges are trading mechanisms that allow agents to specify preferences over bundles of goods. When agents' preferences exhibit complementarity and/or substitutability, this additional expressiveness can lead to more efficient allocations than is possible using traditional exchanges. In the context of combinatorial exchanges, this paper examines arbitrage, a risk-free profit opportunity. We show that some combinatorial exchanges allow agents to perform arbitrage and thus extract a positive payment from the market while contributing nothing, something that is not possible in traditional exchanges (using a single trade). We analyze the extent to which arbitrage is possible and computationally feasible in combinatorial exchanges. We show that the surplus-maximizing combinatorial exchange with free disposal is resistant to arbitrage, but without free disposal arbitrage is possible. For unit-volume-maximizing and trade-volume-maximizing combinatorial exchanges, we show that arbitrage is sometimes possible and we propose an improved combinatorial exchange that achieves the same economic objective but eliminates a particularly undesirable form of arbitrage. We show that the computational complexity of detecting winning arbitraging bids is NP-complete and that the ability for an agent to submit arbitraging bids depends on the type of feedback in the exchange. We also show that a variant of combinatorial exchanges in which arbitrage is impossible becomes susceptible to arbitrage if certain side constraints are placed on the allocation or if an approximating clearing algorithm is used.
- Winner determination and search
- Andrew Gilpin and Tuomas Sandholm. 2007. Information-theoretic approaches to branching in search. International Joint Conference on Artificial Intelligence (IJCAI'07), Hyderabad, India. [view abstract]
Deciding what question to branch on at each node is a key element of search algorithms. We introduce the information-theoretic paradigm for branching question selection. The idea is to drive the search to reduce uncertainty (entropy) in the current subproblem. We present four families of methods that fall within this paradigm. In the first, a variable to branch on is selected based on lookahead. This method performs comparably to strong branching on MIPLIB, and better than strong branching on hard real-world procurement optimization instances on which CPLEX.s default strong branching outperforms CPLEX.s default branching strategy. The second family combines this idea with strong branching. The third family does not use lookahead, but instead exploits the tie between indicator variables and the other variables they govern. This significantly outperforms the state-of-the-art branching strategies. The fourth family is about branching using carefully constructed linear inequality constraints over sets of variables.
- Early version: AAMAS'06, Hakodate, Japan.
- Tuomas Sandholm, Subhash Suri, Andrew Gilpin, and David Levine. 2005. CABOB: A Fast Optimal Algorithm for Winner Determination in Combinatorial Auctions. Management Science, Special Issue on Electronic Markets, 51(3):374--390. [view abstract]
Combinatorial auctions where bidders can bid on bundles of items can lead to more economically efficient allocations, but determining the winners is NP-complete and inapproximable. We present CABOB, a sophisticated optimal search algorithm for the problem. It uses decomposition techniques, upper and lower bounding (also across components), elaborate and dynamically chosen bid ordering heuristics, and a host of structural observations. CABOB attempts to capture structure in any instance without making assumptions about the instance distribution. Experiments against the fastest prior algorithm, CPLEX 8.0, show that CABOB is often faster, seldom drastically slower, and in many cases drastically faster---especially in cases with structure. CABOB's search runs in linear space and has significantly better anytime performance than CPLEX.
We also uncover interesting aspects of the problem itself. First, problems with short bids--that were hard for the first-generation of specialized algorithms--are easy. Second, almost all of the CATS distributions are easy, and the run-time is virtually unaffected by the number of goods. Third, we test several random restart strategies, showing that they do not help on this problem--the run-time distribution does not have a heavy tail.
- Early version: International Joint Conference on Artificial Intelligence (IJCAI'01), Seattle, WA, 2001.
- Tuomas Sandholm, Subhash Suri, Andrew Gilpin, and David Levine. 2002. Winner Determination in Combinatorial Auction Generalizations. First International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'02), pp. 69-76, Bologna, Italy. [view abstract]
Combinatorial markets where bids can be submitted on bundles of items can be economically desirable coordination mechanisms in multiagent systems where the items exhibit complementarity and substitutability. There has been a surge of research on winner determination in combinatorial auctions. In this paper we study a wider range of combinatorial market designs: auctions, reverse auctions, and exchanges, with one or multiple units of each item, with and without free disposal. We rst theoretically characterize the complexity of nding a feasible, approximate, or optimal solution. Reverse auctions with free disposal can be approximated (even in the multi-unit case), although auctions cannot. When XOR-constraints between bids are allowed (to express substitutability), the hardness turns the other way around: even nding a feasible solution for a reverse auction or exchanges is NP-complete, while in auctions that is trivial. Finally, in all of the cases without free disposal, even nding a feasible solution is NP-complete.
We then ran experiments on known benchmarks as well as ones which we introduced, to study the complexity of the market variants in practice. Cases with free disposal tended to be easier than ones without. On many distributions, reverse auctions with free disposal were easier than auctions with free disposal|as the approximability would suggest|but interestingly, on one of the most realistic distributions they were harder. Single-unit exchanges were easy, but multi-unit exchanges were extremely hard.
- Early version: AGENTS'01 Workshop on Agent-based Approaches to B2B, pp. 35-41, Montreal, Canada, May 28th, 2001.
|